Skip to content

[linux-nvidia-6.18] CPPC updates and bug fixes#415

Closed
sforshee wants to merge 25 commits into
NVIDIA:linux-nvidia-6.18from
sforshee:linux-nvidia-6.18
Closed

[linux-nvidia-6.18] CPPC updates and bug fixes#415
sforshee wants to merge 25 commits into
NVIDIA:linux-nvidia-6.18from
sforshee:linux-nvidia-6.18

Conversation

@sforshee
Copy link
Copy Markdown

This pull updates CPPC sauce patches enhancing autonomous selection with the upstream commits, which includes a number of changes as compared to the mailing list patches applied previously. It also adds some bug fixes as well as patches to add a per-policy boost_freq_req for PM QoS.

sforshee added 7 commits May 11, 2026 14:23
… sysfs write"

This reverts commit c560a13 for
replacement with upstream commits.

Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…and perf_limited"

This reverts commit dac410c for
replacement with upstream commits.

Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…or perf_limited"

This reverts commit 44125e2 for
replacement with upstream commits.

Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…or min/max_perf"

This reverts commit 2c47458 for
replacement with upstream commits.

Signed-off-by: Seth Forshee <sforshee@nvidia.com>
… FFH/SystemMemory"

This reverts commit 70b1c3e for
replacement with upstream commits.

Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…RED_PERF register"

This reverts commit db42171 for
replacement with upstream commits.

Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…d performance controls"

This reverts commit c45d5a9 for
replacement with upstream commits.

Signed-off-by: Seth Forshee <sforshee@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

PR Validation Report

Patchscan ⚠️ Missing Fixes Detected

The following upstream fix commits appear to be missing:

Details
Checking 17 commits...
checking a172876e29bf279aeb8cc01d39e3ac77817c272a cpufreq: Add boost_freq_req QoS request..... found upstream
W: Found fix commit 9266b4da051a410d9e6c5c0b0ef0c877855aa1b8 for 6e39ba4e5a82aa5469b2ac517b74a71accb0540f
checking 770d732eb20bbf24fa13131d822bca11dc6dffa2 cpufreq: Remove max_freq_req update for pre-existing policy..... found upstream
checking ee2da6f10821f607fad3b55c108c9a00de8e540e ACPI: CPPC: Check cpc_read() return values consistently..... found upstream
checking 5d48779df489a1521ad3427f9332080dd232945a cpufreq: CPPC: Add sysfs documentation for perf_limited..... found upstream
checking c351d931e6060a629356503894db82da13314aa1 ACPI: CPPC: add APIs and sysfs interface for perf_limited..... found upstream
checking 84b8f6d207666dc2ef0328d3719449c6392ad6ae cpufreq: cppc: Update MIN_PERF/MAX_PERF in target callbacks..... found upstream
checking 79f7e7f6306a2a826e4f0d3a7d579632b5bda2cc cpufreq: CPPC: Update cached perf_ctrls on sysfs write..... found upstream
checking 282a4bc9fba2de4246dd9abf0d6424e07a48c33d ACPI: CPPC: Extend cppc_set_epp_perf() for FFH/SystemMemory..... found upstream
checking 831a2e8510096435cbf790728ed14b48e8530f31 ACPI: CPPC: Warn on missing mandatory DESIRED_PERF register..... found upstream
checking 9431dd4baf34118646e54cfb2d63230b8642e030 ACPI: CPPC: Add cppc_get_perf() API to read performance controls..... found upstream
checking 97b7337f77f5aa4d57c9ee2b9f255591d6315f52 Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: Add cppc_get_perf() API to read performance controls"..... no upstream reference
checking 392214df67e0e9a258958d208010ab5904f084c2 Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: Warn on missing mandatory DESIRED_PERF register"..... no upstream reference
checking 56a493170801b51561e494b87166ee96c1e57a38 Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: Extend cppc_set_epp_perf() for FFH/SystemMemory"..... no upstream reference
checking 6fc1cd9cce31d306af5faa9c55963526a07eb2e1 Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: add APIs and sysfs interface for min/max_perf"..... no upstream reference
checking 1bd944cbf201788ec345e4e6069ed09f21cadef3 Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: add APIs and sysfs interface for perf_limited"..... no upstream reference
checking 5e298f57867c7e55edb0368fe75da763e13378ef Revert "NVIDIA: VR: SAUCE: cpufreq: CPPC: Add sysfs for min/max_perf and perf_limited"..... no upstream reference
checking 4405a46b119e8276684717ab2f86d41a054d765a Revert "NVIDIA: VR: SAUCE: cpufreq: CPPC: Update cached perf_ctrls on sysfs write"..... no upstream reference
All fixes:
Fixes for a172876e29bf ("cpufreq: Add boost_freq_req QoS request")
          9266b4da051a ("cpufreq: Allocate QoS freq_req objects with policy")

PR Lint ❌ Errors found

Details
Checking 17 commits...

Cherry-pick digest:
E: ee2da6f10821 ("ACPI: CPPC: Check cpc_read() return valu"): patch-ID mismatch with upstream 0cc24977224a
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject                              │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ a172876e29bf │ 6e39ba4e5a82 cpufreq: Add boost_freq_req QoS request             │ match      │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 770d732eb20b │ 04aa9d0726cc cpufreq: Remove max_freq_req update for pre-existin │ match      │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ ee2da6f10821 │ 0cc24977224a ACPI: CPPC: Check cpc_read() return values consiste │ MISMATCH   │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 5d48779df489 │ 856250ba2e81 cpufreq: CPPC: Add sysfs documentation for perf_lim │ match      │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c351d931e606 │ 13c45a26635f ACPI: CPPC: add APIs and sysfs interface for perf_l │ match      │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 84b8f6d20766 │ ea3db45ae476 cpufreq: cppc: Update MIN_PERF/MAX_PERF in target c │ match      │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 79f7e7f6306a │ 24ad4c6c136b cpufreq: CPPC: Update cached perf_ctrls on sysfs wr │ match      │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 282a4bc9fba2 │ 38428a680026 ACPI: CPPC: Extend cppc_set_epp_perf() for FFH/Syst │ match      │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 831a2e851009 │ b3e45fb2db9d ACPI: CPPC: Warn on missing mandatory DESIRED_PERF  │ match      │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 9431dd4baf34 │ 658fa7b1c47a ACPI: CPPC: Add cppc_get_perf() API to read perform │ match      │ match   │ preserved + sforshee adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 97b7337f77f5 │ [Revert] acpi: cppc: add cppc_get_perf() api to read performance │ N/A        │ N/A     │ sforshee                  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 392214df67e0 │ [Revert] acpi: cppc: warn on missing mandatory desired_perf regi │ N/A        │ N/A     │ sforshee                  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 56a493170801 │ [Revert] acpi: cppc: extend cppc_set_epp_perf() for ffh/systemme │ N/A        │ N/A     │ sforshee                  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 6fc1cd9cce31 │ [Revert] acpi: cppc: add apis and sysfs interface for min/max_pe │ N/A        │ N/A     │ sforshee                  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 1bd944cbf201 │ [Revert] acpi: cppc: add apis and sysfs interface for perf_limit │ N/A        │ N/A     │ sforshee                  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 5e298f57867c │ [Revert] cpufreq: cppc: add sysfs for min/max_perf and perf_limi │ N/A        │ N/A     │ sforshee                  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 4405a46b119e │ [Revert] cpufreq: cppc: update cached perf_ctrls on sysfs write  │ N/A        │ N/A     │ sforshee                  │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 11, 2026

@sforshee Couple of comments...

It looks like there is an upstream fix for "6e39ba4e5a82 cpufreq: Add boost_freq_req QoS request" that should also be included in this backport: 9266b4d cpufreq: Allocate QoS freq_req objects with policy


These 2 patches from LKML (still being upstreamed) are also needed:
https://lore.kernel.org/lkml/20260423084731.1090384-2-pierre.gondois@arm.com/
https://lore.kernel.org/all/20260424201814.230071-1-sumitg@nvidia.com/


What kind of testing was performed?

Maybe you can leverage the same script Jamie used for #366.

@sforshee
Copy link
Copy Markdown
Author

@nvmochs

These 2 patches from LKML (still being upstreamed) are also needed

Ok, so you'd prefer we take these now as SAUCE rather than giving them some time to hit linux-next?

I'll grab the additional commits and add update with additional details about testing.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 11, 2026

@nvmochs

These 2 patches from LKML (still being upstreamed) are also needed

Ok, so you'd prefer we take these now as SAUCE rather than giving them some time to hit linux-next?

For our other kernels we set a deadline of this Wed. to pull in those non-upstream CPPC patches as SAUCE. For consistency, I think it's best for us to do the same thing here and then we can fix up again when they do finally land upstream.

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 11, 2026

From the PR 366 jamie has these commits:
ac17906 28aac1f
Not sure if those commits will avoid the changes you have to make in ee2da6f

@sforshee
Copy link
Copy Markdown
Author

From the PR 366 jamie has these commits: ac17906 28aac1f Not sure if those commits will avoid the changes you have to make in ee2da6f

Yes, and some minor context adjustments in another patch. Also 8cdc494 could help avoid amd-pstate conflicts in f61effa, not that this driver is of particular interest here. The conflict resolutions were pretty trivial, but the other two do look like good changes to include, so I can add them.

Sumit Gupta and others added 17 commits May 12, 2026 13:43
Add cppc_get_perf() function to read values of performance control
registers including desired_perf, min_perf, max_perf, energy_perf,
and auto_sel.

This provides a read interface to complement the existing
cppc_set_perf() write interface for performance control registers.

Note that auto_sel is read by cppc_get_perf() but not written by
cppc_set_perf() to avoid unintended mode changes during performance
updates. It can be updated with existing dedicated cppc_set_auto_sel()
API.

Use cppc_get_perf() in cppc_cpufreq_get_cpu_data() to initialize
perf_ctrls with current hardware register values during cpufreq
policy initialization.

Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://patch.msgid.link/20260206142658.72583-2-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 658fa7b)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Add a warning during CPPC processor probe if the Desired Performance
register is not supported when it should be.

As per 8.4.6.1.2.3 section of ACPI 6.6 specification,
"The Desired Performance Register is optional only when OSPM indicates
support for CPPC2 in the platform-wide _OSC capabilities and the
Autonomous Selection Enable field is encoded as an Integer with a
value of 1."

In other words:
- In CPPC v1, DESIRED_PERF is mandatory
- In CPPC v2, it becomes optional only when AUTO_SEL_ENABLE is supported

This helps detect firmware configuration issues early during boot.

Link: https://lore.kernel.org/lkml/9fa21599-004a-4af8-acc2-190fd0404e35@nvidia.com/
Suggested-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://patch.msgid.link/20260206142658.72583-3-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit b3e45fb)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Extend cppc_set_epp_perf() to write both auto_sel and energy_perf
registers when they are in FFH or SystemMemory address space.

This keeps the behavior consistent with PCC case where both registers
are already updated together, but was missing for FFH/SystemMemory.

Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://patch.msgid.link/20260206142658.72583-4-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 38428a6)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Update the cached perf_ctrls values when writing via sysfs to keep
them in sync with hardware registers:
- store_auto_select(): update perf_ctrls.auto_sel
- store_energy_performance_preference_val(): update perf_ctrls.energy_perf

This ensures consistent cached values after sysfs writes, which
complements the cppc_get_perf() initialization during policy setup.

Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://patch.msgid.link/20260206142658.72583-5-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 24ad4c6)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Update MIN_PERF and MAX_PERF registers from policy->min and policy->max
in the .target() and .fast_switch() callbacks. This allows controlling
performance bounds via standard scaling_min_freq and scaling_max_freq
sysfs interfaces.

Similar to intel_cpufreq which updates HWP min/max limits in .target(),
cppc_cpufreq now programs MIN_PERF/MAX_PERF along with DESIRED_PERF.
Since MIN_PERF/MAX_PERF can be updated even when auto_sel is disabled,
they are updated unconditionally.

Also program MIN_PERF/MAX_PERF in store_auto_select() when enabling
autonomous selection so the platform uses correct bounds immediately.

Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Link: https://patch.msgid.link/20260206142658.72583-6-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ea3db45)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Add sysfs interface to read/write the Performance Limited register.

The Performance Limited register indicates to the OS that an
unpredictable event (like thermal throttling) has limited processor
performance. It contains two sticky bits set by the platform:
  - Bit 0 (Desired_Excursion): Set when delivered performance is
    constrained below desired performance. Not used when Autonomous
    Selection is enabled.
  - Bit 1 (Minimum_Excursion): Set when delivered performance is
    constrained below minimum performance.

These bits remain set until OSPM explicitly clears them. The write
operation accepts a bitmask of bits to clear:
  - Write 0x1 to clear bit 0
  - Write 0x2 to clear bit 1
  - Write 0x3 to clear both bits

This enables users to detect if platform throttling impacted a workload.
Users clear the register before execution, run the workload, then check
afterward - if set, hardware throttling occurred during that time window.

The interface is exposed as:
  /sys/devices/system/cpu/cpuX/cpufreq/perf_limited

Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://patch.msgid.link/20260206142658.72583-7-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 13c45a2)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Add ABI documentation for the Performance Limited Register sysfs
interface in the cppc_cpufreq driver.

Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://patch.msgid.link/20260206142658.72583-8-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 856250b)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Currently, the `Reference Performance` register is read every time
the CPU frequency is sampled in `cppc_get_perf_ctrs()`. This function
is on the hot path of the cppc_cpufreq driver.

Reference Performance indicates the performance level that corresponds
to the Reference Counter incrementing and is not expected to change
dynamically during runtime (unlike the Delivered and Reference counters).

Reading this register in the hot path incurs unnecessary overhead,
particularly on platforms where CPC registers are located in the PCC
(Platform Communication Channel) subspace. This patch moves
`reference_perf` from the dynamic feedback counters structure
(`cppc_perf_fb_ctrs`) to the static capabilities structure
(`cppc_perf_caps`).

Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com>
[ rjw: Changelog adjustment ]
Link: https://patch.msgid.link/20260213100935.19111-1-zhangpengjie2@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(backported from commit 8505bfb)
[sforshee: fix up for not having cppc_perf_ctrs_in_pcc_cpu() split out
 from cppc_perf_ctrs_in_pcc()]
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Commit 8505bfb ("ACPI: CPPC: Move reference performance
to capabilities") introduced a logical error when retrieving
the reference performance.

On platforms lacking the reference performance register, the fallback
logic leaves the local 'ref' variable uninitialized (0). This causes
the subsequent sanity check to incorrectly return -EFAULT, breaking
amd_pstate initialization.

Fix this by assigning 'ref = nom' in the fallback path.

Fixes: 8505bfb ("ACPI: CPPC: Move reference performance to capabilities")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/all/20260310003026.GA2639793@ax162/
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20260311071334.1494960-1-zhangpengjie2@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit be473f0)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Callers of cpc_read() ignore its return value, which can lead
to using uninitialized or stale values when the read fails.

Fix this by consistently checking cpc_read() return values in
cppc_get_perf_caps(), cppc_get_perf_ctrs(), and cppc_get_perf().

Link: https://lore.kernel.org/lkml/48bdf87e-39f1-402f-a7dc-1a0e1e7a819d@nvidia.com/
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Link: https://patch.msgid.link/20260318095005.2437960-1-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 0cc2497)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
policy->max_freq_req QoS constraint represents the maximal allowed
frequency than can be requested. It is set by:
 - writing to policyX/scaling_max sysfs file
 - toggling the cpufreq/boost sysfs file

Upon calling freq_qos_update_request(), a successful update
of the max_freq_req value triggers cpufreq_notifier_max(),
followed by cpufreq_set_policy() which update the requested
frequency for the policy.
If the new max_freq_req value is not different from the
original value, no frequency update is triggered.

In a specific sequence of toggling:
 - cpufreq/boost sysfs file
 - CPU hot-plugging
a CPU could end up with boost enabled but running at the
maximal non-boost frequency, cpufreq_notifier_max() not being
triggered. The following fixed that:
commit 1608f02 ("cpufreq: Fix re-boost issue after hotplugging
a CPU")

The following:
commit dd016f3 ("cpufreq: Introduce a more generic way to
set default per-policy boost flag")
also fixed the issue by correctly setting the max_freq_req
constraint of a policy that is re-activated. This makes the
first fix unnecessary.

As the original issue is fixed by another method,
this patch reverts:
commit 1608f02 ("cpufreq: Fix re-boost issue after hotplugging
a CPU")

Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20260326204404.1401849-2-pierre.gondois@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 04aa9d0)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
The Power Management Quality of Service (PM QoS) allows to
aggregate constraints from multiple entities. It is currently
used to manage the min/max frequency of a given policy.

Frequency constraints can come for instance from:
 - Thermal framework: acpi_thermal_cpufreq_init()
 - Firmware: _PPC objects: acpi_processor_ppc_init()
 - User: by setting policyX/scaling_[min|max]_freq
The minimum of the max frequency constraints is used to compute
the resulting maximum allowed frequency.

When enabling boost frequencies, the same frequency request object
(policy->max_freq_req) as to handle requests from users is used.
As a result, when setting:
 - scaling_max_freq
 - boost
The last sysfs file used overwrites the request from the other
sysfs file.

To avoid this, create a per-policy boost_freq_req to save the boost
constraints instead of overwriting the last scaling_max_freq
constraint.

policy_set_boost() calls the cpufreq set_boost callback.
Update the newly added boost_freq_req request from there:
 - whenever boost is toggled
 - to cover all possible paths

In the existing .set_boost() callbacks:
 - Don't update policy->max as this is done through the qos notifier
   cpufreq_notifier_max() which calls cpufreq_set_policy().
 - Remove freq_qos_update_request() calls as the qos request is now
   done in policy_set_boost() and updates the new boost_freq_req

$ ## Init state
scaling_max_freq:1000000
cpuinfo_max_freq:1000000

$ echo 700000 > scaling_max_freq
scaling_max_freq:700000
cpuinfo_max_freq:1000000

$ echo 1 > ../boost
scaling_max_freq:1200000
cpuinfo_max_freq:1200000

$ echo 800000 > scaling_max_freq
scaling_max_freq:800000
cpuinfo_max_freq:1200000

$ ## Final step:
$ ## Without the patches:
$ echo 0 > ../boost
scaling_max_freq:1000000
cpuinfo_max_freq:1000000

$ ## With the patches:
$ echo 0 > ../boost
scaling_max_freq:800000
cpuinfo_max_freq:1000000

Note:
cpufreq_frequency_table_cpuinfo() updates policy->min
and max from:
A.
cpufreq_boost_set_sw()
\-cpufreq_frequency_table_cpuinfo()
B.
cpufreq_policy_online()
\-cpufreq_table_validate_and_sort()
  \-cpufreq_frequency_table_cpuinfo()
Keep these updates as some drivers expect policy->min and
max to be set through B.

Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20260326204404.1401849-3-pierre.gondois@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 6e39ba4)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
A recent change exposed a bug in the error path: if
freq_qos_add_request(boost_freq_req) fails, min_freq_req may remain a
valid pointer even though it was never successfully added. During policy
teardown, this leads to an unconditional call to
freq_qos_remove_request(), triggering a WARN.

The current design allocates all three freq_req objects together, making
the lifetime rules unclear and error handling fragile.

Simplify this by allocating the QoS freq_req objects at policy
allocation time. The policy itself is dynamically allocated, and two of
the three requests are always needed anyway. This ensures consistent
lifetime management and eliminates the inconsistent state in failure
paths.

Reported-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Fixes: 6e39ba4 ("cpufreq: Add boost_freq_req QoS request")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Link: https://patch.msgid.link/a293f29d841b86c51f34699c6e717e01858d8ada.1774933424.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 9266b4d)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…support

Add a kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable
CPPC autonomous performance selection on all CPUs at system startup.
When autonomous mode is enabled, the hardware automatically adjusts
CPU performance based on workload demands using Energy Performance
Preference (EPP) hints.

When auto_sel_mode=1:
- Configure all CPUs for autonomous operation on first init
- Set EPP to performance preference (0x0)
- Use HW min/max_perf when available; otherwise initialize from caps
- Clamp desired_perf to bounds before enabling autonomous mode
- Hardware controls frequency instead of the OS governor

The boot parameter is applied only during first policy initialization.
Skip applying it on CPU hotplug to preserve runtime sysfs configuration.

This patch depends on patch [2] ("cpufreq: Set policy->min and max
as real QoS constraints") so that the policy->min/max set in
cppc_cpufreq_cpu_init() are not overridden by cpufreq_set_policy()
during init.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
(backported from https://lore.kernel.org/all/20260424201814.230071-1-sumitg@nvidia.com/)
[sforshee: adjust context]
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Extract the QoS related logic from cpufreq_policy_online()
to make the function shorter/simpler.

The logic is placed in cpufreq_policy_init_qos() and is
now executed right after the following calls:
- cpufreq_driver->init()
- cpufreq_table_validate_and_sort()

This helps preparing following patches that will,
in cpufreq_policy_init_qos():
- treat the policy->min/max values set by drivers as QoS requests.
- set a default policy->min/max value to all policies.

No functional change.

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
(cherry picked from https://lore.kernel.org/lkml/20260511135538.522653-2-pierre.gondois@arm.com/)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…l drivers

Some drivers set policy->min/max in their .init() callback.
cpufreq_set_policy() will ultimately override them through:
cpufreq_policy_online()
\-cpufreq_init_policy()
  \-cpufreq_set_policy()
    \-/* Set policy->min/max */
Thus the policy min/max values provided are only temporary.

There is an exception if CPUFREQ_NEED_INITIAL_FREQ_CHECK is set and:
cpufreq_policy_online()
\-__cpufreq_driver_target()
  \-cpufreq_driver->target()

To prepare for a following patch that will remove all
policy->min/max initialization in the driver .init() callback
if the min/max value is equal to the cpuinfo.min/max_freq,
set a default policy->min/max value for all drivers.

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
(cherry picked from https://lore.kernel.org/lkml/20260511135538.522653-3-pierre.gondois@arm.com/)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Prior to [1], drivers were setting policy->min/max and
the value was used as a QoS constraint. After that change,
the values were only temporarily used: cpufreq_set_policy()
ultimately overriding them through:
cpufreq_policy_online()
\-cpufreq_init_policy()
  \-cpufreq_set_policy()
    \-/* Set policy->min/max */

This patch reinstate the initial behaviour. This will allow
drivers to request min/max QoS frequencies if desired.
For instance, the cppc driver advertises a lowest non-linear
frequency, which should be used as a min QoS value.

To avoid having drivers setting policy->min/max to default
values which are considered as QoS values (i.e. the reason
why [1] was introduced), remove the initialization of
policy->min/max in .init() callbacks wherever the
policy->min/max values are identical to the
policy->cpuinfo.min/max_freq.

Indeed, the previous patch ("cpufreq: Set default
policy->min/max values for all drivers") makes this initialization
redundant.

The only drivers where these values are different are:
- gx-suspmod.c (min)
- cppc-cpufreq.c (min)
- longrun.c

[1]
commit 521223d ("cpufreq: Fix initialization of min and
max frequency QoS requests")

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
(backported from https://lore.kernel.org/lkml/20260511135538.522653-4-pierre.gondois@arm.com/)
[sforshee: adjustments to amd-pstate for missing max_freq caching patch]
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Consider policy->min/max being set in the driver .init()
callback as a QoS request. Impacted driver are:
- gx-suspmod.c (min)
- cppc-cpufreq.c (min)
- longrun.c (min/max)

Update the documentation accordingly.

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
(cherry picked from https://lore.kernel.org/lkml/20260511135538.522653-5-pierre.gondois@arm.com/)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
@sforshee sforshee force-pushed the linux-nvidia-6.18 branch from 3ce775a to a1ffa95 Compare May 12, 2026 21:33
@sforshee
Copy link
Copy Markdown
Author

v2 updates:

Testing:

  • Build on x86_64 (allmodconfig cpufeq drivers)
  • Build on arm64 (defconfig + nvidia)
  • checkpatch.pl on all SAUCE / new commits
  • Autonomus mode engagement on AUTO_SEL-supported HW
  • -EOPNOTSUPP fallthrough on AUTO_SEL-unsupported HW (Grace)
  • Qos-durability across hotplug - both systems
  • Sysfs writes (auto_select, EPP, auto_act_window, perf_limited, scalinig_*_freq)
  • Sysfs validation (EPP=256 rejected, perf_limited=4 rejected)
  • Stress: 100+ iterations of toggle/EPP/freq writes; 5-iter hotplug
  • rmmod + modprobe cycle

Test results with Grace (no AUTO_SEL support):

local-sforshee@lego-cg1-qs-120:~$ sudo ./test_cppc_cpufreq.sh
cppc_cpufreq test suite
========================================
Tue May 12 21:10:23 UTC 2026
6.18.25-ga1ffa955c73b


Test 1: Basic driver load and sysfs layout
  PASS: scaling_driver is cppc_cpufreq (got 'cppc_cpufreq')
  PASS: governor is set (got 'performance', not '')
  PASS: auto_select sysfs exists
  PASS: auto_act_window sysfs exists
  PASS: energy_performance_preference_val sysfs exists
  PASS: perf_limited sysfs exists
  PASS: min_perf sysfs removed
  PASS: max_perf sysfs removed

Test 2: auto_sel_mode boot parameter
  auto_sel_mode=N
  PASS: boot param is read-only (0444) (got '444')
./test_cppc_cpufreq.sh: line 80: /sys/module/cppc_cpufreq/parameters/auto_sel_mode: Permission denied
  PASS: boot param rejects writes
  SKIP: auto_sel_mode not enabled (boot without param to test this)

Test 3: Runtime auto_select toggle via sysfs
  SKIP: auto_select not supported

Test 4: MIN_PERF/MAX_PERF via scaling_min/max_freq
  cpuinfo range: 81000-3348000 kHz
  PASS: scaling_max_freq accepts midpoint (got '1714500')
  PASS: scaling_min_freq accepts min (got '81000')
  PASS: frequency clamped by scaling_max_freq (1714500 <= ~1714500)

Test 5: Energy Performance Preference
  SKIP: EPP not supported on this platform

Test 6: perf_limited register
  PASS: perf_limited readable (value=0)
  PASS: clear bit 0 (desired excursion)
  PASS: clear bit 1 (minimum excursion)
  PASS: clear both bits
  PASS: zero is valid no-op
  PASS: reject 0x4 (invalid bit)
  PASS: reject 0xff (invalid bits)

Test 7: CPU hotplug
  PASS: cpu1 offlined
  PASS: cpu1 re-onlined
  PASS: auto_select readable after hotplug (value=<unsupported>)
  PASS: driver restored after hotplug (got 'cppc_cpufreq')

Test 8: auto_act_window sysfs
  SKIP: auto_act_window not supported on this platform

Test 9: Diagnostics in dmesg
  PASS: no DESIRED_PERF warning (firmware is compliant)
  PASS: no kernel errors related to CPPC (got '0')

Test 10: Stress / regression
  Rapid governor switching (performance <-> schedutil, 100 iterations)...
  PASS: 100 governor switches completed
  SKIP: auto_select not supported for toggle stress
  Rapid scaling_max_freq changes (50 iterations)...
  PASS: 50 scaling_max_freq changes completed
  PASS: no kernel warnings after stress (got '0')

========================================
Results: 34 tests: 29 passed, 0 failed, 5 skipped
ALL TESTS PASSED

Test results on DGX Spark:

nvidia@spark-3a5d:~/sforshee$ sudo ./test_cppc_cpufreq.sh
cppc_cpufreq test suite
========================================
Tue May 12 06:59:19 PM UTC 2026
6.18.25-ga1ffa955c73b


Test 1: Basic driver load and sysfs layout
  PASS: scaling_driver is cppc_cpufreq (got 'cppc_cpufreq')
  PASS: governor is set (got 'performance', not '')
  PASS: auto_select sysfs exists
  PASS: auto_act_window sysfs exists
  PASS: energy_performance_preference_val sysfs exists
  PASS: perf_limited sysfs exists
  PASS: min_perf sysfs removed
  PASS: max_perf sysfs removed

Test 2: auto_sel_mode boot parameter
  auto_sel_mode=Y
  PASS: boot param is read-only (0444) (got '444')
./test_cppc_cpufreq.sh: line 80: /sys/module/cppc_cpufreq/parameters/auto_sel_mode: Permission denied
  PASS: boot param rejects writes
  PASS: auto_select enabled when boot param=Y (got '1')
  PASS: EPP set to performance (0) by boot param (got '0')
  PASS: auto_select=1 on all CPUs

Test 3: Runtime auto_select toggle via sysfs
  PASS: enable auto_select (got '1')
  PASS: disable auto_select (got '0')
  PASS: governor changes accepted after auto_select disable (perf=2808000, other=338000)

Test 4: MIN_PERF/MAX_PERF via scaling_min/max_freq
  cpuinfo range: 338000-2808000 kHz
  PASS: scaling_max_freq accepts midpoint (got '1573000')
  PASS: scaling_min_freq accepts min (got '338000')
  PASS: frequency clamped by scaling_max_freq (1573000 <= ~1573000)

Test 5: Energy Performance Preference
  PASS: write EPP=0 (performance)
  PASS: read back EPP=0 (got '0')
  PASS: write EPP=255 (energy-efficiency)
  PASS: read back EPP=255 (got '255')
  PASS: write EPP=128 (balanced)
  PASS: read back EPP=128 (got '128')
  PASS: reject EPP=256 (out of range)

Test 6: perf_limited register
  PASS: perf_limited readable (value=0)
  PASS: clear bit 0 (desired excursion)
  PASS: clear bit 1 (minimum excursion)
  PASS: clear both bits
  PASS: zero is valid no-op
  PASS: reject 0x4 (invalid bit)
  PASS: reject 0xff (invalid bits)

Test 7: CPU hotplug
  PASS: cpu1 offlined
  PASS: cpu1 re-onlined
  PASS: auto_select readable after hotplug (value=1)
  PASS: driver restored after hotplug (got 'cppc_cpufreq')

Test 8: auto_act_window sysfs
  PASS: auto_act_window readable (value=1)
  PASS: write auto_act_window=0

Test 9: Diagnostics in dmesg
  PASS: no DESIRED_PERF warning (firmware is compliant)
  PASS: no kernel errors related to CPPC (got '0')

Test 10: Stress / regression
  Rapid governor switching (performance <-> schedutil, 100 iterations)...
  PASS: 100 governor switches completed
  Rapid auto_select toggling (100 iterations)...
  PASS: 100 auto_select toggles completed
  Rapid scaling_max_freq changes (50 iterations)...
  PASS: 50 scaling_max_freq changes completed
  PASS: no kernel warnings after stress (got '0')

========================================
Results: 45 tests: 45 passed, 0 failed, 0 skipped
ALL TESTS PASSED

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Acked-by: Jamie Nguyen <jamien@nvidia.com>

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 12, 2026

Acked-by: Carol L Soto <csoto@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 13, 2026

Thanks Seth!

I reviewed manually and with codex and did not find any issues of significance.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>


I'll take care of merging this.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 13, 2026

Merged, closing MR.

68ae0a518ea3 (HEAD -> linux-nvidia-6.18, origin/linux-nvidia-6.18) NVIDIA: VR: SAUCE: cpufreq: Use policy->min/max init as QoS request
f66a7f2e5fab NVIDIA: VR: SAUCE: cpufreq: Remove driver default policy->min/max init
5ad7e70187b3 NVIDIA: VR: SAUCE: cpufreq: Set default policy->min/max values for all drivers
e61c21ccb18c NVIDIA: VR: SAUCE: cpufreq: Extract cpufreq_policy_init_qos() function
8841b3c3805f NVIDIA: VR: SAUCE: cpufreq: CPPC: add autonomous mode boot parameter support
eb0079306fbd cpufreq: Allocate QoS freq_req objects with policy
a706759733c6 cpufreq: Add boost_freq_req QoS request
718cda0879b0 cpufreq: Remove max_freq_req update for pre-existing policy
64dcfba14c74 ACPI: CPPC: Check cpc_read() return values consistently
dcab05e4228f ACPI: CPPC: Fix uninitialized ref variable in cppc_get_perf_caps()
ffe992b4fc71 ACPI: CPPC: Move reference performance to capabilities
90b4bb5bc865 cpufreq: CPPC: Add sysfs documentation for perf_limited
e87b14552eea ACPI: CPPC: add APIs and sysfs interface for perf_limited
2f23db804782 cpufreq: cppc: Update MIN_PERF/MAX_PERF in target callbacks
8be38a1f1f07 cpufreq: CPPC: Update cached perf_ctrls on sysfs write
7e21eb955a2c ACPI: CPPC: Extend cppc_set_epp_perf() for FFH/SystemMemory
4a3b28db93de ACPI: CPPC: Warn on missing mandatory DESIRED_PERF register
76fd43a7f392 ACPI: CPPC: Add cppc_get_perf() API to read performance controls
d711c0bd163b Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: Add cppc_get_perf() API to read performance controls"
577d3120d5fb Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: Warn on missing mandatory DESIRED_PERF register"
64ccfaafa2c8 Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: Extend cppc_set_epp_perf() for FFH/SystemMemory"
d2e86eca91f6 Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: add APIs and sysfs interface for min/max_perf"
016610e519e7 Revert "NVIDIA: VR: SAUCE: ACPI: CPPC: add APIs and sysfs interface for perf_limited"
659bad6f9e8b Revert "NVIDIA: VR: SAUCE: cpufreq: CPPC: Add sysfs for min/max_perf and perf_limited"
00dd1df06f74 Revert "NVIDIA: VR: SAUCE: cpufreq: CPPC: Update cached perf_ctrls on sysfs write"

@nvmochs nvmochs closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants