Skip to content

[26.04_linux-nvidia-bos] Backport Vera PMU support#384

Open
nvmochs wants to merge 17 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
nvmochs:skip_pmccntr_el0_70_bos
Open

[26.04_linux-nvidia-bos] Backport Vera PMU support#384
nvmochs wants to merge 17 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
nvmochs:skip_pmccntr_el0_70_bos

Conversation

@nvmochs
Copy link
Copy Markdown
Collaborator

@nvmochs nvmochs commented Apr 21, 2026

The Vera PMU patches are upstream as of v7.1. Revert the existing SAUCE patches and pick their upstream counterparts. Also backport one additional patch that is still being reviewed on LKML and addresses behavior of PMCCNTR_EL0 on Vera when in single-threaded mode.

Upstream patches:
d332424 perf/arm_cspmu: nvidia: Rename doc to Tegra241
f5caf26 perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
bc86281 perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
bf585ba perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
3dd7302 perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
429b763 perf: add NVIDIA Tegra410 CPU Memory Latency PMU
2f89b7f perf: add NVIDIA Tegra410 C2C PMU
86ff690 perf vendor events arm64: Add Tegra410 Olympus PMU events

LKML patch: https://lore.kernel.org/all/20260406232034.2566133-1-bwicaksono@nvidia.com/

LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-7.0/+bug/2149756
NVB (for PMCCNTR issue): 5736369


Test results with SMT disabled (for PMCCNTR case):

$ sudo PERF=/home/nvidia/mochs/NV-Kernels/tools/perf/perf ./test_tegra410_pmu/test_pmccntr_skip.sh
========================================
Olympus PMCCNTR_EL0 Skip Test
========================================

[INFO] Using perf: perf version 7.0.gad0ae17b290d

----------------------------------------
Test 1: Platform Check
----------------------------------------
[INFO] Olympus CPU found: cpu0 (MIDR=0x000000004e0f0100)
[PASS] Running on NVIDIA Olympus CPU (cpu0)

----------------------------------------
Test 2: PMCCNTR_EL0 WFI Inflation
----------------------------------------
[INFO] Measuring '{cpu_cycles,cpu_cycles}' group on cpu0 during sleep 1...
[INFO] (PMCCNTR_EL0 inflates during WFI; programmable counters do not)
[INFO] Raw perf output:
CPU0              6089534      cpu_cycles                                                            
CPU0              6089534      cpu_cycles                                                            
[INFO]   cpu_cycles[0] (may be PMCCNTR_EL0): 6089534
[INFO]   cpu_cycles[1] (programmable counter): 6089534
[INFO]   ratio cc[0]/cc[1]: 1.0x
[PASS] cc[0]/cc[1]=1.0x — both counters equal, PMCCNTR_EL0 not used (patch working)

========================================
Test Summary
========================================
Passed:  2
Failed:  0
Skipped: 0

[INFO] All tests passed.
$ sudo PERF=/home/nvidia/mochs/NV-Kernels/tools/perf/perf ./test_tegra410_pmu/test_tegra410_pmu.sh 
========================================
Tegra410 PMU Functional Test
========================================

[INFO] Checking T410 (Vera) platform...
[INFO] T410 (Vera) detected (SoC: jep106:036b:0410)

[INFO] Using perf: /home/nvidia/mochs/NV-Kernels/tools/perf/perf (perf version 7.0.gad0ae17b290d)

----------------------------------------
Test 1: UCF (Unified Coherence Fabric) PMU
----------------------------------------
[INFO] Testing UCF PMU...
[PASS] UCF PMU - Found 2 device(s)
  Device: nvidia_ucf_pmu_0
    Type: 28
    CPU mask: 0
    Events: 20 available
    Format attributes: 8
  Device: nvidia_ucf_pmu_1
    Type: 58
    CPU mask: 88
    Events: 20 available
    Format attributes: 8
  Available events for nvidia_ucf_pmu_0:
    bus_cycles                     : event=0x1d
    cycles                         : event=0x100000000
    ext_snp_access                 : event=0x181
    ext_snp_evict                  : event=0x182
    local_snoop                    : event=0x180
    mem_access_rd                  : event=0x121
    mem_access_wr                  : event=0x122
    mem_bytes_rd                   : event=0x123
    mem_bytes_wr                   : event=0x124
    slc_access_atomic              : event=0x184
    slc_access_dataless            : event=0x183
    slc_access_rd                  : event=0x111
    slc_access_wr                  : event=0x112
    slc_allocate                   : event=0xf0
    slc_bytes_rd                   : event=0x113
    slc_bytes_wr                   : event=0x114
    slc_hit_rd                     : event=0x119
    slc_refill_rd                  : event=0x109
    slc_refill_wr                  : event=0x10a
    slc_wb                         : event=0xf3
[INFO] Testing nvidia_ucf_pmu_0 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_ucf_pmu_0/cycles/ sleep 0.1
[PASS] nvidia_ucf_pmu_0 basic event - Event counted successfully (cycles)
           250965925      nvidia_ucf_pmu_0/cycles/                                              
  Available events for nvidia_ucf_pmu_1:
    bus_cycles                     : event=0x1d
    cycles                         : event=0x100000000
    ext_snp_access                 : event=0x181
    ext_snp_evict                  : event=0x182
    local_snoop                    : event=0x180
    mem_access_rd                  : event=0x121
    mem_access_wr                  : event=0x122
    mem_bytes_rd                   : event=0x123
    mem_bytes_wr                   : event=0x124
    slc_access_atomic              : event=0x184
    slc_access_dataless            : event=0x183
    slc_access_rd                  : event=0x111
    slc_access_wr                  : event=0x112
    slc_allocate                   : event=0xf0
    slc_bytes_rd                   : event=0x113
    slc_bytes_wr                   : event=0x114
    slc_hit_rd                     : event=0x119
    slc_refill_rd                  : event=0x109
    slc_refill_wr                  : event=0x10a
    slc_wb                         : event=0xf3
[INFO] Testing nvidia_ucf_pmu_1 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_ucf_pmu_1/cycles/ sleep 0.1
[PASS] nvidia_ucf_pmu_1 basic event - Event counted successfully (cycles)
           251130096      nvidia_ucf_pmu_1/cycles/                                              

----------------------------------------
Test 2: PCIE PMU
----------------------------------------
[INFO] Testing PCIE PMU...
[PASS] PCIE PMU - Found 12 device(s)
  Device: nvidia_pcie_pmu_0_rc_0
    Type: 31
    CPU mask: 0
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_0_rc_1
    Type: 34
    CPU mask: 0
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_0_rc_2
    Type: 37
    CPU mask: 0
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_0_rc_3
    Type: 40
    CPU mask: 0
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_0_rc_4
    Type: 43
    CPU mask: 0
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_0_rc_5
    Type: 46
    CPU mask: 0
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_1_rc_0
    Type: 61
    CPU mask: 88
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_1_rc_1
    Type: 63
    CPU mask: 88
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_1_rc_2
    Type: 65
    CPU mask: 88
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_1_rc_3
    Type: 30
    CPU mask: 88
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_1_rc_4
    Type: 33
    CPU mask: 88
    Events: 6 available
    Format attributes: 9
  Device: nvidia_pcie_pmu_1_rc_5
    Type: 39
    CPU mask: 88
    Events: 6 available
    Format attributes: 9
  Available events for nvidia_pcie_pmu_0_rc_0:
    cycles                         : event=0x100000000
    rd_bytes                       : event=0x0
    rd_cum_outs                    : event=0x4
    rd_req                         : event=0x2
    wr_bytes                       : event=0x1
    wr_req                         : event=0x3
[INFO] Testing nvidia_pcie_pmu_0_rc_0 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_pcie_pmu_0_rc_0/cycles/ sleep 0.1
[PASS] nvidia_pcie_pmu_0_rc_0 basic event - Event counted successfully (cycles)
           187400513      nvidia_pcie_pmu_0_rc_0/cycles/                                        
  Available events for nvidia_pcie_pmu_0_rc_1:
    cycles                         : event=0x100000000
    rd_bytes                       : event=0x0
    rd_cum_outs                    : event=0x4
    rd_req                         : event=0x2
    wr_bytes                       : event=0x1
    wr_req                         : event=0x3
[INFO] Testing nvidia_pcie_pmu_0_rc_1 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_pcie_pmu_0_rc_1/cycles/ sleep 0.1
[PASS] nvidia_pcie_pmu_0_rc_1 basic event - Event counted successfully (cycles)
           188429846      nvidia_pcie_pmu_0_rc_1/cycles/                                        
  Available events for nvidia_pcie_pmu_0_rc_2:
    cycles                         : event=0x100000000
    rd_bytes                       : event=0x0
    rd_cum_outs                    : event=0x4
    rd_req                         : event=0x2
    wr_bytes                       : event=0x1
    wr_req                         : event=0x3
[INFO] Testing nvidia_pcie_pmu_0_rc_2 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_pcie_pmu_0_rc_2/cycles/ sleep 0.1
[PASS] nvidia_pcie_pmu_0_rc_2 basic event - Event counted successfully (cycles)
           188243410      nvidia_pcie_pmu_0_rc_2/cycles/                                        

----------------------------------------
Test 3: PCIE-TGT PMU
----------------------------------------
[INFO] Testing PCIE-TGT PMU...
[PASS] PCIE-TGT PMU - Found 12 device(s)
  Device: nvidia_pcie_tgt_pmu_0_rc_0
    Type: 27
    CPU mask: 0
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_0_rc_1
    Type: 36
    CPU mask: 0
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_0_rc_2
    Type: 42
    CPU mask: 0
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_0_rc_3
    Type: 49
    CPU mask: 0
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_0_rc_4
    Type: 52
    CPU mask: 0
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_0_rc_5
    Type: 55
    CPU mask: 0
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_1_rc_0
    Type: 45
    CPU mask: 88
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_1_rc_1
    Type: 48
    CPU mask: 88
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_1_rc_2
    Type: 51
    CPU mask: 88
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_1_rc_3
    Type: 54
    CPU mask: 88
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_1_rc_4
    Type: 57
    CPU mask: 88
    Events: 5 available
    Format attributes: 5
  Device: nvidia_pcie_tgt_pmu_1_rc_5
    Type: 60
    CPU mask: 88
    Events: 5 available
    Format attributes: 5
  Available events for nvidia_pcie_tgt_pmu_0_rc_0:
    cycles                         : event=0x4
    rd_bytes                       : event=0x0
    rd_req                         : event=0x2
    wr_bytes                       : event=0x1
    wr_req                         : event=0x3
[INFO] Testing nvidia_pcie_tgt_pmu_0_rc_0 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_0/cycles/ sleep 0.1
[PASS] nvidia_pcie_tgt_pmu_0_rc_0 basic event - Event counted successfully (cycles)
           104654795      nvidia_pcie_tgt_pmu_0_rc_0/cycles/                                      
  Available events for nvidia_pcie_tgt_pmu_0_rc_1:
    cycles                         : event=0x4
    rd_bytes                       : event=0x0
    rd_req                         : event=0x2
    wr_bytes                       : event=0x1
    wr_req                         : event=0x3
[INFO] Testing nvidia_pcie_tgt_pmu_0_rc_1 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/cycles/ sleep 0.1
[PASS] nvidia_pcie_tgt_pmu_0_rc_1 basic event - Event counted successfully (cycles)
           104571150      nvidia_pcie_tgt_pmu_0_rc_1/cycles/                                      
  Available events for nvidia_pcie_tgt_pmu_0_rc_2:
    cycles                         : event=0x4
    rd_bytes                       : event=0x0
    rd_req                         : event=0x2
    wr_bytes                       : event=0x1
    wr_req                         : event=0x3
[INFO] Testing nvidia_pcie_tgt_pmu_0_rc_2 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_2/cycles/ sleep 0.1
[PASS] nvidia_pcie_tgt_pmu_0_rc_2 basic event - Event counted successfully (cycles)
           104524980      nvidia_pcie_tgt_pmu_0_rc_2/cycles/                                      

----------------------------------------
Test 4: CPU Memory (CMEM) Latency PMU
----------------------------------------
[INFO] Testing CMEM Latency PMU...
[PASS] CMEM Latency PMU - Found 2 device(s)
  Device: nvidia_cmem_latency_pmu_0
    Type: 11
    CPU mask: 0
    Events: 3 available
    Format attributes: 1
  Device: nvidia_cmem_latency_pmu_1
    Type: 13
    CPU mask: 88
    Events: 3 available
    Format attributes: 1
  Available events for nvidia_cmem_latency_pmu_0:
    cycles                         : event=0x0
    rd_cum_outs                    : event=0x2
    rd_req                         : event=0x1
[INFO] Testing nvidia_cmem_latency_pmu_0 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_cmem_latency_pmu_0/cycles/ sleep 0.1
[PASS] nvidia_cmem_latency_pmu_0 basic event - Event counted successfully (cycles)
           250184186      nvidia_cmem_latency_pmu_0/cycles/                                      
  Available events for nvidia_cmem_latency_pmu_1:
    cycles                         : event=0x0
    rd_cum_outs                    : event=0x2
    rd_req                         : event=0x1
[INFO] Testing nvidia_cmem_latency_pmu_1 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_cmem_latency_pmu_1/cycles/ sleep 0.1
[PASS] nvidia_cmem_latency_pmu_1 basic event - Event counted successfully (cycles)
           250989505      nvidia_cmem_latency_pmu_1/cycles/                                      

----------------------------------------
Test 5: NVLink-C2C PMU
----------------------------------------
[INFO] Testing NVLink-C2C PMU...
[PASS] NVLink-C2C PMU - Found 2 device(s)
  Device: nvidia_nvlink_c2c_pmu_0
    Type: 15
    CPU mask: 0
    Events: 9 available
    Format attributes: 2
  Device: nvidia_nvlink_c2c_pmu_1
    Type: 16
    CPU mask: 88
    Events: 9 available
    Format attributes: 2
  Peer info: nr_gpu=2
  Available events for nvidia_nvlink_c2c_pmu_0:
    cycles                         : event=0x0
    in_rd_cum_outs                 : event=0x1
    in_rd_req                      : event=0x2
    in_wr_cum_outs                 : event=0x3
    in_wr_req                      : event=0x4
    out_rd_cum_outs                : event=0x5
    out_rd_req                     : event=0x6
    out_wr_cum_outs                : event=0x7
    out_wr_req                     : event=0x8
[INFO] Testing nvidia_nvlink_c2c_pmu_0 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_nvlink_c2c_pmu_0/cycles/ sleep 0.1
[PASS] nvidia_nvlink_c2c_pmu_0 basic event - Event counted successfully (cycles)
           205777273      nvidia_nvlink_c2c_pmu_0/cycles/                                       
  Peer info: nr_gpu=2
  Available events for nvidia_nvlink_c2c_pmu_1:
    cycles                         : event=0x0
    in_rd_cum_outs                 : event=0x1
    in_rd_req                      : event=0x2
    in_wr_cum_outs                 : event=0x3
    in_wr_req                      : event=0x4
    out_rd_cum_outs                : event=0x5
    out_rd_req                     : event=0x6
    out_wr_cum_outs                : event=0x7
    out_wr_req                     : event=0x8
[INFO] Testing nvidia_nvlink_c2c_pmu_1 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_nvlink_c2c_pmu_1/cycles/ sleep 0.1
[PASS] nvidia_nvlink_c2c_pmu_1 basic event - Event counted successfully (cycles)
           205871709      nvidia_nvlink_c2c_pmu_1/cycles/                                       

----------------------------------------
Test 6: NV-CLink PMU
----------------------------------------
[INFO] Testing NV-CLink PMU...
[PASS] NV-CLink PMU - Found 2 device(s)
  Device: nvidia_nvclink_pmu_0
    Type: 17
    CPU mask: 0
    Events: 5 available
    Format attributes: 1
  Device: nvidia_nvclink_pmu_1
    Type: 18
    CPU mask: 88
    Events: 5 available
    Format attributes: 1
  Available events for nvidia_nvclink_pmu_0:
    cycles                         : event=0x0
    in_rd_cum_outs                 : event=0x1
    in_rd_req                      : event=0x2
    out_rd_cum_outs                : event=0x5
    out_rd_req                     : event=0x6
[INFO] Testing nvidia_nvclink_pmu_0 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_nvclink_pmu_0/cycles/ sleep 0.1
[PASS] nvidia_nvclink_pmu_0 basic event - Event counted successfully (cycles)
           187493957      nvidia_nvclink_pmu_0/cycles/                                          
  Available events for nvidia_nvclink_pmu_1:
    cycles                         : event=0x0
    in_rd_cum_outs                 : event=0x1
    in_rd_req                      : event=0x2
    out_rd_cum_outs                : event=0x5
    out_rd_req                     : event=0x6
[INFO] Testing nvidia_nvclink_pmu_1 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_nvclink_pmu_1/cycles/ sleep 0.1
[PASS] nvidia_nvclink_pmu_1 basic event - Event counted successfully (cycles)
           186687358      nvidia_nvclink_pmu_1/cycles/                                          

----------------------------------------
Test 7: NV-DLink PMU
----------------------------------------
[INFO] Testing NV-DLink PMU...
[PASS] NV-DLink PMU - Found 2 device(s)
  Device: nvidia_nvdlink_pmu_0
    Type: 12
    CPU mask: 0
    Events: 3 available
    Format attributes: 1
  Device: nvidia_nvdlink_pmu_1
    Type: 14
    CPU mask: 88
    Events: 3 available
    Format attributes: 1
  Available events for nvidia_nvdlink_pmu_0:
    cycles                         : event=0x0
    in_rd_cum_outs                 : event=0x1
    in_rd_req                      : event=0x2
[INFO] Testing nvidia_nvdlink_pmu_0 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_nvdlink_pmu_0/cycles/ sleep 0.1
[PASS] nvidia_nvdlink_pmu_0 basic event - Event counted successfully (cycles)
           187086040      nvidia_nvdlink_pmu_0/cycles/                                          
  Available events for nvidia_nvdlink_pmu_1:
    cycles                         : event=0x0
    in_rd_cum_outs                 : event=0x1
    in_rd_req                      : event=0x2
[INFO] Testing nvidia_nvdlink_pmu_1 basic event: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -a -e nvidia_nvdlink_pmu_1/cycles/ sleep 0.1
[PASS] nvidia_nvdlink_pmu_1 basic event - Event counted successfully (cycles)
           186941908      nvidia_nvdlink_pmu_1/cycles/                                          

----------------------------------------
Test 8: ARM CoreSight PMU (used by Tegra410)
----------------------------------------
[INFO] Testing ARM CoreSight PMU...
[SKIP] ARM CoreSight PMU - No devices found matching pattern: arm_cspmu_*

----------------------------------------
Test 9: CPU Core Vendor Events (Tegra410 Olympus)
----------------------------------------
[INFO] Testing CPU core vendor events...
[INFO] Detected NVIDIA Olympus CPU (implementer=0x4e, part=0x010)
  Checking for vendor events:
    ✓ CPU_CYCLES - available
    ✓ INST_RETIRED - available
    ✓ L1D_CACHE_REFILL - available
    ✓ BR_MIS_PRED - available
[INFO] Testing CPU core event counting: /home/nvidia/mochs/NV-Kernels/tools/perf/perf stat -e cycles -a sleep 0.1
[PASS] CPU vendor events - Core PMU operational
           103562317      cycles                                                                

  Checking for vendor-specific metrics (from JSON files):
    ✓ backend_bound metric found
    ✓ IPC metric found
    ✓ frontend_bound metric found
  ✓ Vendor events/metrics loaded from JSON files

========================================
Test Summary
========================================
Passed:  24
Failed:  0
Skipped: 1

[INFO] All tests passed!

nvmochs and others added 16 commits April 16, 2026 17:24
…mpus PMU events"

This reverts commit cf682dc.

This will be replaced by the equivalent patch from v7.1.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
This reverts commit 4defdae.

This will be replaced by the equivalent patch from v7.1.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…cy PMU"

This reverts commit eff2e93.

This will be replaced by the equivalent patch from v7.1.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…TGT PMU"

This reverts commit ba06e25.

This will be replaced by the equivalent patch from v7.1.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…PMU"

This reverts commit 6984fc5.

This will be replaced by the equivalent patch from v7.1.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
This reverts commit a2ab08d.

This will be replaced by the equivalent patch from v7.1.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
This reverts commit e12d030.

This will be replaced by the equivalent patch from v7.1.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…a241"

This reverts commit 575f7ef.

This will be replaced by the equivalent patch from v7.1.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
The documentation in nvidia-pmu.rst contains PMUs specific
to NVIDIA Tegra241 SoC. Rename the file for this specific
SoC to have better distinction with other NVIDIA SoC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit d332424)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
The Unified Coherence Fabric (UCF) contains last level cache
and cache coherent interconnect in Tegra410 SOC. The PMU in
this device can be used to capture events related to access
to the last level cache and memory from different sources.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit f5caf26)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Add interface to get ACPI device associated with the
PMU. This ACPI device may contain additional properties
not covered by the standard properties.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit bc86281)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds PCIE PMU support in Tegra410 SOC. This PMU is instanced
in each root complex in the SOC and can capture traffic from
PCIE device to various memory types. This PMU can filter traffic
based on the originating root port or BDF and the target memory
types (CPU DRAM, GPU Memory, CXL Memory, or remote Memory).

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit bf585ba)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds PCIE-TGT PMU support in Tegra410 SOC. This PMU is
instanced in each root complex in the SOC and it captures
traffic originating from any source towards PCIE BAR and CXL
HDM range. The traffic can be filtered based on the
destination root port or target address range.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 3dd7302)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC.
The PMU is used to measure latency between the edge of the
Unified Coherence Fabric to the local system DRAM.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 429b763)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds NVIDIA C2C PMU support in Tegra410 SOC. This PMU is
used to measure memory latency between the SOC and device
memory, e.g GPU Memory (GMEM), CXL Memory, or memory on
remote Tegra410 SOC.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 2f89b7f)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Add JSON files for NVIDIA Tegra410 Olympus core PMU events.
Also updated the common-and-microarch.json.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(cherry picked from commit 86ff690)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

✅ Patchscan: No Missing Fixes

All cherry-picked commits have been checked — no missing upstream fixes found.

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Structure

Three-stage layout is correct:

  1. 8 reverts of existing NVIDIA: VR: SAUCE: preliminary patches (332b9f95178a81)
  2. 8 upstream cherry-picks of the v7.1-merged counterparts (4729da45e154c9)
  3. 1 backport of an in-review LKML patch (f09d5a0)

Cherry-pick verification (8/8 accurate)

For every commit I compared git patch-id --stable between the local commit and the SHA named in its (cherry picked
from commit …) trailer:

┌──────────────┬─────────────────────┬──────────┬─────────┬─────────────────────────┐                                 
│    Local     │ Referenced upstream │ Patch-ID │ Subject │        SoB chain        │                                 
├──────────────┼─────────────────────┼──────────┼─────────┼─────────────────────────┤                                 
│ 4729da47268d │ d332424d1d06        │ match    │ match   │ preserved + mochs added │
├──────────────┼─────────────────────┼──────────┼─────────┼─────────────────────────┤
│ 2eadbf648c10 │ f5caf26fd6c7        │ match    │ match   │ preserved + mochs added │                                 
├──────────────┼─────────────────────┼──────────┼─────────┼─────────────────────────┤                                 
│ 1f7f66984674 │ bc86281fe4bd        │ match    │ match   │ preserved + mochs added │                                 
├──────────────┼─────────────────────┼──────────┼─────────┼─────────────────────────┤                                 
│ f0aab13d4398 │ bf585ba14726        │ match    │ match   │ preserved + mochs added │
├──────────────┼─────────────────────┼──────────┼─────────┼─────────────────────────┤                                 
│ 1ba0cbbe4128 │ 3dd73022306b        │ match    │ match   │ preserved + mochs added │
├──────────────┼─────────────────────┼──────────┼─────────┼─────────────────────────┤                                 
│ ae82af010ba6 │ 429b7638b2df        │ match    │ match   │ preserved + mochs added │
├──────────────┼─────────────────────┼──────────┼─────────┼─────────────────────────┤                                 
│ 526e58071c41 │ 2f89b7f78c50        │ match    │ match   │ preserved + mochs added │
├──────────────┼─────────────────────┼──────────┼─────────┼─────────────────────────┤                                 
│ 5e154c93f76c │ 86ff690f45cc        │ match    │ match   │ preserved + mochs added │
└──────────────┴─────────────────────┴──────────┴─────────┴─────────────────────────┘

All 8 are byte-for-byte identical to upstream — no backport edits were needed. Each retains the upstream subject,
body, Reviewed-by: / upstream Signed-off-by: chain, adds the (cherry picked from commit <40-hex>) trailer pointing at
the correct SHA, and closes with Signed-off-by: Matthew R. Ochs. Referenced upstream commits all resolve in the repo.

Revert verification (8/8 accurate)

Patch-ID of each revert did not match the reverse of its target (because of intervening line-number shifts), so I
validated semantically: git diff ^ -- returned empty for every pair. Each
revert exactly restores the pre-SAUCE file state. All eight reverted SAUCE SHAs exist and are ancestors of the PR base
(b2b7ddf = Ubuntu-nvidia-bos-7.0.0-2005.5), so the reverts have valid targets.

LKML backport verification (f09d5a0)

  • Trailer uses (backported from https://lore.kernel.org/all/20260406232034.2566133-1-bwicaksono@nvidia.com/) — correct
    form for an in-review (non-merged) patch; using cherry picked from would be wrong here.
  • Subject gets NVIDIA: VR: SAUCE: prefix per downstream convention; the upstream-facing portion matches the LKML v1
    subject exactly.
  • Body text matches the LKML v1 message verbatim.
  • Signed-off-by: Besar Wicaksono from the author is retained; Signed-off-by: Matthew R. Ochs appended as the
    backporter — correct.
  • Diffstat: drivers/perf/arm_pmuv3.c | 31 +++ matches the v1 patch (patchwork rendering confirmed via
    patchwork.kernel.org). The two hunks (the armv8pmu_avoid_pmccntr_cpus[] MIDR table and the is_midr_in_range_list check
    inside armv8pmu_can_use_pmccntr) match byte-for-byte aside from cosmetic comment line-wrap differences introduced by
    patchwork's HTML rendering.
  • Dependency MIDR_NVIDIA_OLYMPUS is already defined in arch/arm64/include/asm/cputype.h:226 in the PR base (commit
    e185c8a arm64: cputype: Add NVIDIA Olympus definitions), so the backport builds standalone.
  • Note: I could not fetch the raw mbox from lore.kernel.org directly (Anubis 403'd me), so the LKML comparison is
    based on patchwork's rendered view, which preserves the diff verbatim but reformats prose. Worth a human spot-check
    against the raw mbox if you want belt-and-braces.

Net result

Relative to the PR base: 27 files changed, +893/−885. That net +8 is exactly what you'd expect — the v7.1 upstream
versions picked up small cleanups during review. Series applies cleanly (no conflict markers, no drift).

Findings

  • All "cherry picked from" trailers accurate.
  • All revert targets accurate and cleanly restore base state.
  • "backported from" URL for the LKML patch is correctly formed and points at the right message-id; content aligns with
    patchwork's rendering of v1.
  • SoB chains are well-formed throughout.

No backport/cherry-pick issues found. The series is a clean swap of downstream SAUCE preliminaries for the
upstream-merged v7.1 commits, plus one properly-tagged LKML-in-review backport.

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

jamieNguyenNVIDIA commented Apr 21, 2026

Oops, looks like the kernel test robot reported a build failure on arm32? https://lore.kernel.org/all/202604180247.SBxRBqqS-lkp@intel.com/#t

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented Apr 21, 2026

Will this move to draft because of that issue with arm32?
The PR looks ok for me.

@nvmochs
Copy link
Copy Markdown
Collaborator Author

nvmochs commented Apr 21, 2026

Oops, looks like the kernel test robot reported a build failure on arm32? https://lore.kernel.org/all/202604180247.SBxRBqqS-lkp@intel.com/#t

Yep, I saw that. The likely fix is just to wrap the new code in a #ifdef CONFIG_ARM64.

I decided to proceed with submitting this PR without any modifications because we don't build for arm32.

We have a few options here...

  • Take the PR as-is (shouldn't be an issue since we don't build for arm32)
  • Wait for Besar to post a v2
  • Add the #ifdef in this PR along with a backport note in the commit message

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented Apr 21, 2026

Oops, looks like the kernel test robot reported a build failure on arm32? https://lore.kernel.org/all/202604180247.SBxRBqqS-lkp@intel.com/#t

Yep, I saw that. The likely fix is just to wrap the new code in a #ifdef CONFIG_ARM64.

I decided to proceed with submitting this PR without any modifications because we don't build for arm32.

We have a few options here...

  • Take the PR as-is (shouldn't be an issue since we don't build for arm32)
  • Wait for Besar to post a v2
  • Add the #ifdef in this PR along with a backport note in the commit message

Im ok with adding the ifdef and backport note.

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Oops, looks like the kernel test robot reported a build failure on arm32? https://lore.kernel.org/all/202604180247.SBxRBqqS-lkp@intel.com/#t

Yep, I saw that. The likely fix is just to wrap the new code in a #ifdef CONFIG_ARM64.
I decided to proceed with submitting this PR without any modifications because we don't build for arm32.
We have a few options here...

  • Take the PR as-is (shouldn't be an issue since we don't build for arm32)
  • Wait for Besar to post a v2
  • Add the #ifdef in this PR along with a backport note in the commit message

Im ok with adding the ifdef and backport note.

I agree with this.

@nvmochs nvmochs force-pushed the skip_pmccntr_el0_70_bos branch from f09d5a0 to 33b0894 Compare April 22, 2026 00:46
@nvmochs
Copy link
Copy Markdown
Collaborator Author

nvmochs commented Apr 22, 2026

@clsotog @jamieNguyenNVIDIA Besar posted v2 which resolves the compile issue on arm32. I have verified this builds on arm32 and arm64. Ready for re-review.

Copy link
Copy Markdown
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <csoto@nvidia.com>

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Acked-by: Jamie Nguyen <jamien@nvidia.com>

Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Nirmoy Das <nirmoyd@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator Author

nvmochs commented Apr 22, 2026

Unfortunately it looks like Besar's latest patch will need a v3. I'm going to move this to draft while this gets sorted out.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

PR Validation Report

Patchscan ✅ No Missing Fixes

All cherry-picked commits checked — no missing upstream fixes found.

PR Lint ✅ All checks passed

Details
Checking 17 commits...

Cherry-pick digest:
┌──────────────┬───────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject           │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ cd5faf674b6b │ perf/arm_pmu: skip pmccntr_el0 on nvidia olym │ match      │ found   │ ok, backporter: mochs     │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 5e154c93f76c │ 86ff690f45cc                                  │ match      │ match   │ preserved + mochs added   │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 526e58071c41 │ 2f89b7f78c50                                  │ match      │ match   │ preserved + mochs added   │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ ae82af010ba6 │ 429b7638b2df                                  │ match      │ match   │ preserved + mochs added   │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 1ba0cbbe4128 │ 3dd73022306b                                  │ match      │ match   │ preserved + mochs added   │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ f0aab13d4398 │ bf585ba14726                                  │ match      │ match   │ preserved + mochs added   │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 1f7f66984674 │ bc86281fe4bd                                  │ match      │ match   │ preserved + mochs added   │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2eadbf648c10 │ f5caf26fd6c7                                  │ match      │ match   │ preserved + mochs added   │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 4729da47268d │ d332424d1d06                                  │ match      │ match   │ preserved + mochs added   │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 5178a81d9b4d │ [Revert] perf/arm_cspmu: nvidia: rename doc t │ N/A        │ N/A     │ mochs                     │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 02a9cdcca68f │ [Revert] perf/arm_cspmu: nvidia: add tegra410 │ N/A        │ N/A     │ mochs                     │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 08c61448ccf5 │ [Revert] perf/arm_cspmu: add arm_cspmu_acpi_d │ N/A        │ N/A     │ mochs                     │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 07a7992713f3 │ [Revert] perf/arm_cspmu: nvidia: add tegra410 │ N/A        │ N/A     │ mochs                     │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d007bc52c295 │ [Revert] perf/arm_cspmu: nvidia: add tegra410 │ N/A        │ N/A     │ mochs                     │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 54c6101c74df │ [Revert] perf: add nvidia tegra410 cpu memory │ N/A        │ N/A     │ mochs                     │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2dcaf0ba3149 │ [Revert] perf: add nvidia tegra410 c2c pmu    │ N/A        │ N/A     │ mochs                     │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 332b9f99ddbe │ [Revert] perf vendor events arm64: add tegra4 │ N/A        │ N/A     │ mochs                     │
└──────────────┴───────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

@nvmochs
Copy link
Copy Markdown
Collaborator Author

nvmochs commented Apr 30, 2026

Backported v3 of the PMCCNTR_EL0 patch and tested again:

========================================
Olympus PMCCNTR_EL0 Skip Test
========================================

[INFO] Using perf: perf version 7.0.gcd34908fed95

----------------------------------------
Test 1: Platform Check
----------------------------------------
[INFO] Olympus CPU found: cpu0 (MIDR=0x000000004e0f0100)
[PASS] Running on NVIDIA Olympus CPU (cpu0)

----------------------------------------
Test 2: PMCCNTR_EL0 WFI Inflation
----------------------------------------
[INFO] Measuring '{cpu_cycles,cpu_cycles}' group on cpu0 during sleep 1...
[INFO] (PMCCNTR_EL0 inflates during WFI; programmable counters do not)
[INFO] Raw perf output:
CPU0              9003875      cpu_cycles                                                            
CPU0              9003875      cpu_cycles                                                            
[INFO]   cpu_cycles[0] (may be PMCCNTR_EL0): 9003875
[INFO]   cpu_cycles[1] (programmable counter): 9003875
[INFO]   ratio cc[0]/cc[1]: 1.0x
[PASS] cc[0]/cc[1]=1.0x — both counters equal, PMCCNTR_EL0 not used (patch working)

========================================
Test Summary
========================================
Passed:  2
Failed:  0
Skipped: 0

[INFO] All tests passed.

I will reply to the series with a TB tag.

Let's wait a bit longer to see if this gets accepted soon so we can pick from -next and avoid a SAUCE tag.

PMCCNTR_EL0 may continue to increment on NVIDIA Olympus CPUs while the
PE is in WFI/WFE. That does not necessarily match the CPU_CYCLES event
counted by a programmable counter, so using PMCCNTR_EL0 for cycles can
give results that differ from the programmable counter path.

Extend the existing PMCCNTR avoidance decision from the SMT case to
also cover Olympus. Store the result in the common arm_pmu state at
registration time, so arm_pmuv3 can keep using a single flag when
deciding whether CPU_CYCLES may use PMCCNTR_EL0.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260504175204.3122979-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
@nvmochs nvmochs force-pushed the skip_pmccntr_el0_70_bos branch from cd34908 to cd5faf6 Compare May 4, 2026 20:19
@nvmochs
Copy link
Copy Markdown
Collaborator Author

nvmochs commented May 5, 2026

Backported v4 and tested on Strata.

@nvmochs nvmochs marked this pull request as ready for review May 13, 2026 01:58
@nvmochs
Copy link
Copy Markdown
Collaborator Author

nvmochs commented May 13, 2026

@clsotog @nirmoy @jamieNguyenNVIDIA I've moved this to ready state. We'll take v4 of the Skip PMCCNTR_EL0 patch as SAUCE since it has not made -next.

Copy link
Copy Markdown
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <csoto@nvidia.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants