[26.04_linux-nvidia-bos] Backport Vera PMU support#384
Conversation
…mpus PMU events" This reverts commit cf682dc. This will be replaced by the equivalent patch from v7.1. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
This reverts commit 4defdae. This will be replaced by the equivalent patch from v7.1. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…cy PMU" This reverts commit eff2e93. This will be replaced by the equivalent patch from v7.1. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…TGT PMU" This reverts commit ba06e25. This will be replaced by the equivalent patch from v7.1. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…PMU" This reverts commit 6984fc5. This will be replaced by the equivalent patch from v7.1. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
This reverts commit a2ab08d. This will be replaced by the equivalent patch from v7.1. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
This reverts commit e12d030. This will be replaced by the equivalent patch from v7.1. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…a241" This reverts commit 575f7ef. This will be replaced by the equivalent patch from v7.1. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
The documentation in nvidia-pmu.rst contains PMUs specific to NVIDIA Tegra241 SoC. Rename the file for this specific SoC to have better distinction with other NVIDIA SoC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit d332424) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
The Unified Coherence Fabric (UCF) contains last level cache and cache coherent interconnect in Tegra410 SOC. The PMU in this device can be used to capture events related to access to the last level cache and memory from different sources. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit f5caf26) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Add interface to get ACPI device associated with the PMU. This ACPI device may contain additional properties not covered by the standard properties. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit bc86281) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds PCIE PMU support in Tegra410 SOC. This PMU is instanced in each root complex in the SOC and can capture traffic from PCIE device to various memory types. This PMU can filter traffic based on the originating root port or BDF and the target memory types (CPU DRAM, GPU Memory, CXL Memory, or remote Memory). Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit bf585ba) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds PCIE-TGT PMU support in Tegra410 SOC. This PMU is instanced in each root complex in the SOC and it captures traffic originating from any source towards PCIE BAR and CXL HDM range. The traffic can be filtered based on the destination root port or target address range. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit 3dd7302) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC. The PMU is used to measure latency between the edge of the Unified Coherence Fabric to the local system DRAM. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit 429b763) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds NVIDIA C2C PMU support in Tegra410 SOC. This PMU is used to measure memory latency between the SOC and device memory, e.g GPU Memory (GMEM), CXL Memory, or memory on remote Tegra410 SOC. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit 2f89b7f) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Add JSON files for NVIDIA Tegra410 Olympus core PMU events. Also updated the common-and-microarch.json. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> (cherry picked from commit 86ff690) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
✅ Patchscan: No Missing FixesAll cherry-picked commits have been checked — no missing upstream fixes found. |
|
Structure Three-stage layout is correct:
Cherry-pick verification (8/8 accurate) For every commit I compared git patch-id --stable between the local commit and the SHA named in its (cherry picked All 8 are byte-for-byte identical to upstream — no backport edits were needed. Each retains the upstream subject, Revert verification (8/8 accurate) Patch-ID of each revert did not match the reverse of its target (because of intervening line-number shifts), so I LKML backport verification (f09d5a0)
Net result Relative to the PR base: 27 files changed, +893/−885. That net +8 is exactly what you'd expect — the v7.1 upstream Findings
No backport/cherry-pick issues found. The series is a clean swap of downstream SAUCE preliminaries for the |
|
Oops, looks like the kernel test robot reported a build failure on arm32? https://lore.kernel.org/all/202604180247.SBxRBqqS-lkp@intel.com/#t |
|
Will this move to draft because of that issue with arm32? |
Yep, I saw that. The likely fix is just to wrap the new code in a #ifdef CONFIG_ARM64. I decided to proceed with submitting this PR without any modifications because we don't build for arm32. We have a few options here...
|
Im ok with adding the ifdef and backport note. |
I agree with this. |
f09d5a0 to
33b0894
Compare
|
@clsotog @jamieNguyenNVIDIA Besar posted v2 which resolves the compile issue on arm32. I have verified this builds on arm32 and arm64. Ready for re-review. |
clsotog
left a comment
There was a problem hiding this comment.
Acked-by: Carol L Soto <csoto@nvidia.com>
|
|
nirmoy
left a comment
There was a problem hiding this comment.
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
|
Unfortunately it looks like Besar's latest patch will need a v3. I'm going to move this to draft while this gets sorted out. |
33b0894 to
cd34908
Compare
PR Validation ReportPatchscan ✅ No Missing FixesAll cherry-picked commits checked — no missing upstream fixes found. PR Lint ✅ All checks passedDetailsChecking 17 commits... Cherry-pick digest: ┌──────────────┬───────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐ │ Local │ Referenced upstream / Patch subject │ Patch-ID │ Subject │ SoB chain │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ cd5faf674b6b │ perf/arm_pmu: skip pmccntr_el0 on nvidia olym │ match │ found │ ok, backporter: mochs │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 5e154c93f76c │ 86ff690f45cc │ match │ match │ preserved + mochs added │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 526e58071c41 │ 2f89b7f78c50 │ match │ match │ preserved + mochs added │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ ae82af010ba6 │ 429b7638b2df │ match │ match │ preserved + mochs added │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 1ba0cbbe4128 │ 3dd73022306b │ match │ match │ preserved + mochs added │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ f0aab13d4398 │ bf585ba14726 │ match │ match │ preserved + mochs added │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 1f7f66984674 │ bc86281fe4bd │ match │ match │ preserved + mochs added │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 2eadbf648c10 │ f5caf26fd6c7 │ match │ match │ preserved + mochs added │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 4729da47268d │ d332424d1d06 │ match │ match │ preserved + mochs added │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 5178a81d9b4d │ [Revert] perf/arm_cspmu: nvidia: rename doc t │ N/A │ N/A │ mochs │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 02a9cdcca68f │ [Revert] perf/arm_cspmu: nvidia: add tegra410 │ N/A │ N/A │ mochs │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 08c61448ccf5 │ [Revert] perf/arm_cspmu: add arm_cspmu_acpi_d │ N/A │ N/A │ mochs │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 07a7992713f3 │ [Revert] perf/arm_cspmu: nvidia: add tegra410 │ N/A │ N/A │ mochs │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ d007bc52c295 │ [Revert] perf/arm_cspmu: nvidia: add tegra410 │ N/A │ N/A │ mochs │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 54c6101c74df │ [Revert] perf: add nvidia tegra410 cpu memory │ N/A │ N/A │ mochs │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 2dcaf0ba3149 │ [Revert] perf: add nvidia tegra410 c2c pmu │ N/A │ N/A │ mochs │ ├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 332b9f99ddbe │ [Revert] perf vendor events arm64: add tegra4 │ N/A │ N/A │ mochs │ └──────────────┴───────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘ Lint: all checks passed. |
|
Backported v3 of the PMCCNTR_EL0 patch and tested again: I will reply to the series with a TB tag. Let's wait a bit longer to see if this gets accepted soon so we can pick from -next and avoid a SAUCE tag. |
PMCCNTR_EL0 may continue to increment on NVIDIA Olympus CPUs while the PE is in WFI/WFE. That does not necessarily match the CPU_CYCLES event counted by a programmable counter, so using PMCCNTR_EL0 for cycles can give results that differ from the programmable counter path. Extend the existing PMCCNTR avoidance decision from the SMT case to also cover Olympus. Store the result in the common arm_pmu state at registration time, so arm_pmuv3 can keep using a single flag when deciding whether CPU_CYCLES may use PMCCNTR_EL0. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260504175204.3122979-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
cd34908 to
cd5faf6
Compare
|
Backported v4 and tested on Strata. |
|
@clsotog @nirmoy @jamieNguyenNVIDIA I've moved this to ready state. We'll take v4 of the Skip PMCCNTR_EL0 patch as SAUCE since it has not made -next. |
clsotog
left a comment
There was a problem hiding this comment.
Acked-by: Carol L Soto <csoto@nvidia.com>
The Vera PMU patches are upstream as of v7.1. Revert the existing SAUCE patches and pick their upstream counterparts. Also backport one additional patch that is still being reviewed on LKML and addresses behavior of PMCCNTR_EL0 on Vera when in single-threaded mode.
Upstream patches:
d332424 perf/arm_cspmu: nvidia: Rename doc to Tegra241
f5caf26 perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
bc86281 perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
bf585ba perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
3dd7302 perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
429b763 perf: add NVIDIA Tegra410 CPU Memory Latency PMU
2f89b7f perf: add NVIDIA Tegra410 C2C PMU
86ff690 perf vendor events arm64: Add Tegra410 Olympus PMU events
LKML patch: https://lore.kernel.org/all/20260406232034.2566133-1-bwicaksono@nvidia.com/
LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-7.0/+bug/2149756
NVB (for PMCCNTR issue):
5736369Test results with SMT disabled (for PMCCNTR case):