[linux-nvidia-6.18] CPPC updates and bug fixes#415
Conversation
… sysfs write" This reverts commit c560a13 for replacement with upstream commits. Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…and perf_limited" This reverts commit dac410c for replacement with upstream commits. Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…or perf_limited" This reverts commit 44125e2 for replacement with upstream commits. Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…or min/max_perf" This reverts commit 2c47458 for replacement with upstream commits. Signed-off-by: Seth Forshee <sforshee@nvidia.com>
… FFH/SystemMemory" This reverts commit 70b1c3e for replacement with upstream commits. Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…RED_PERF register" This reverts commit db42171 for replacement with upstream commits. Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…d performance controls" This reverts commit c45d5a9 for replacement with upstream commits. Signed-off-by: Seth Forshee <sforshee@nvidia.com>
PR Validation ReportPatchscan
|
|
@sforshee Couple of comments... It looks like there is an upstream fix for "6e39ba4e5a82 cpufreq: Add boost_freq_req QoS request" that should also be included in this backport: 9266b4d cpufreq: Allocate QoS freq_req objects with policy These 2 patches from LKML (still being upstreamed) are also needed: What kind of testing was performed? Maybe you can leverage the same script Jamie used for #366. |
Ok, so you'd prefer we take these now as SAUCE rather than giving them some time to hit linux-next? I'll grab the additional commits and add update with additional details about testing. |
For our other kernels we set a deadline of this Wed. to pull in those non-upstream CPPC patches as SAUCE. For consistency, I think it's best for us to do the same thing here and then we can fix up again when they do finally land upstream. |
Yes, and some minor context adjustments in another patch. Also 8cdc494 could help avoid amd-pstate conflicts in f61effa, not that this driver is of particular interest here. The conflict resolutions were pretty trivial, but the other two do look like good changes to include, so I can add them. |
Add cppc_get_perf() function to read values of performance control registers including desired_perf, min_perf, max_perf, energy_perf, and auto_sel. This provides a read interface to complement the existing cppc_set_perf() write interface for performance control registers. Note that auto_sel is read by cppc_get_perf() but not written by cppc_set_perf() to avoid unintended mode changes during performance updates. It can be updated with existing dedicated cppc_set_auto_sel() API. Use cppc_get_perf() in cppc_cpufreq_get_cpu_data() to initialize perf_ctrls with current hardware register values during cpufreq policy initialization. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-2-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 658fa7b) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Add a warning during CPPC processor probe if the Desired Performance register is not supported when it should be. As per 8.4.6.1.2.3 section of ACPI 6.6 specification, "The Desired Performance Register is optional only when OSPM indicates support for CPPC2 in the platform-wide _OSC capabilities and the Autonomous Selection Enable field is encoded as an Integer with a value of 1." In other words: - In CPPC v1, DESIRED_PERF is mandatory - In CPPC v2, it becomes optional only when AUTO_SEL_ENABLE is supported This helps detect firmware configuration issues early during boot. Link: https://lore.kernel.org/lkml/9fa21599-004a-4af8-acc2-190fd0404e35@nvidia.com/ Suggested-by: Pierre Gondois <pierre.gondois@arm.com> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-3-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit b3e45fb) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Extend cppc_set_epp_perf() to write both auto_sel and energy_perf registers when they are in FFH or SystemMemory address space. This keeps the behavior consistent with PCC case where both registers are already updated together, but was missing for FFH/SystemMemory. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-4-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 38428a6) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Update the cached perf_ctrls values when writing via sysfs to keep them in sync with hardware registers: - store_auto_select(): update perf_ctrls.auto_sel - store_energy_performance_preference_val(): update perf_ctrls.energy_perf This ensures consistent cached values after sysfs writes, which complements the cppc_get_perf() initialization during policy setup. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-5-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 24ad4c6) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Update MIN_PERF and MAX_PERF registers from policy->min and policy->max in the .target() and .fast_switch() callbacks. This allows controlling performance bounds via standard scaling_min_freq and scaling_max_freq sysfs interfaces. Similar to intel_cpufreq which updates HWP min/max limits in .target(), cppc_cpufreq now programs MIN_PERF/MAX_PERF along with DESIRED_PERF. Since MIN_PERF/MAX_PERF can be updated even when auto_sel is disabled, they are updated unconditionally. Also program MIN_PERF/MAX_PERF in store_auto_select() when enabling autonomous selection so the platform uses correct bounds immediately. Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Link: https://patch.msgid.link/20260206142658.72583-6-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit ea3db45) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Add sysfs interface to read/write the Performance Limited register.
The Performance Limited register indicates to the OS that an
unpredictable event (like thermal throttling) has limited processor
performance. It contains two sticky bits set by the platform:
- Bit 0 (Desired_Excursion): Set when delivered performance is
constrained below desired performance. Not used when Autonomous
Selection is enabled.
- Bit 1 (Minimum_Excursion): Set when delivered performance is
constrained below minimum performance.
These bits remain set until OSPM explicitly clears them. The write
operation accepts a bitmask of bits to clear:
- Write 0x1 to clear bit 0
- Write 0x2 to clear bit 1
- Write 0x3 to clear both bits
This enables users to detect if platform throttling impacted a workload.
Users clear the register before execution, run the workload, then check
afterward - if set, hardware throttling occurred during that time window.
The interface is exposed as:
/sys/devices/system/cpu/cpuX/cpufreq/perf_limited
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://patch.msgid.link/20260206142658.72583-7-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 13c45a2)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Add ABI documentation for the Performance Limited Register sysfs interface in the cppc_cpufreq driver. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-8-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 856250b) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Currently, the `Reference Performance` register is read every time the CPU frequency is sampled in `cppc_get_perf_ctrs()`. This function is on the hot path of the cppc_cpufreq driver. Reference Performance indicates the performance level that corresponds to the Reference Counter incrementing and is not expected to change dynamically during runtime (unlike the Delivered and Reference counters). Reading this register in the hot path incurs unnecessary overhead, particularly on platforms where CPC registers are located in the PCC (Platform Communication Channel) subspace. This patch moves `reference_perf` from the dynamic feedback counters structure (`cppc_perf_fb_ctrs`) to the static capabilities structure (`cppc_perf_caps`). Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com> [ rjw: Changelog adjustment ] Link: https://patch.msgid.link/20260213100935.19111-1-zhangpengjie2@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (backported from commit 8505bfb) [sforshee: fix up for not having cppc_perf_ctrs_in_pcc_cpu() split out from cppc_perf_ctrs_in_pcc()] Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Commit 8505bfb ("ACPI: CPPC: Move reference performance to capabilities") introduced a logical error when retrieving the reference performance. On platforms lacking the reference performance register, the fallback logic leaves the local 'ref' variable uninitialized (0). This causes the subsequent sanity check to incorrectly return -EFAULT, breaking amd_pstate initialization. Fix this by assigning 'ref = nom' in the fallback path. Fixes: 8505bfb ("ACPI: CPPC: Move reference performance to capabilities") Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/all/20260310003026.GA2639793@ax162/ Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com> [ rjw: Subject tweak ] Link: https://patch.msgid.link/20260311071334.1494960-1-zhangpengjie2@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit be473f0) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Callers of cpc_read() ignore its return value, which can lead to using uninitialized or stale values when the read fails. Fix this by consistently checking cpc_read() return values in cppc_get_perf_caps(), cppc_get_perf_ctrs(), and cppc_get_perf(). Link: https://lore.kernel.org/lkml/48bdf87e-39f1-402f-a7dc-1a0e1e7a819d@nvidia.com/ Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Link: https://patch.msgid.link/20260318095005.2437960-1-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 0cc2497) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
policy->max_freq_req QoS constraint represents the maximal allowed frequency than can be requested. It is set by: - writing to policyX/scaling_max sysfs file - toggling the cpufreq/boost sysfs file Upon calling freq_qos_update_request(), a successful update of the max_freq_req value triggers cpufreq_notifier_max(), followed by cpufreq_set_policy() which update the requested frequency for the policy. If the new max_freq_req value is not different from the original value, no frequency update is triggered. In a specific sequence of toggling: - cpufreq/boost sysfs file - CPU hot-plugging a CPU could end up with boost enabled but running at the maximal non-boost frequency, cpufreq_notifier_max() not being triggered. The following fixed that: commit 1608f02 ("cpufreq: Fix re-boost issue after hotplugging a CPU") The following: commit dd016f3 ("cpufreq: Introduce a more generic way to set default per-policy boost flag") also fixed the issue by correctly setting the max_freq_req constraint of a policy that is re-activated. This makes the first fix unnecessary. As the original issue is fixed by another method, this patch reverts: commit 1608f02 ("cpufreq: Fix re-boost issue after hotplugging a CPU") Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20260326204404.1401849-2-pierre.gondois@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 04aa9d0) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
The Power Management Quality of Service (PM QoS) allows to aggregate constraints from multiple entities. It is currently used to manage the min/max frequency of a given policy. Frequency constraints can come for instance from: - Thermal framework: acpi_thermal_cpufreq_init() - Firmware: _PPC objects: acpi_processor_ppc_init() - User: by setting policyX/scaling_[min|max]_freq The minimum of the max frequency constraints is used to compute the resulting maximum allowed frequency. When enabling boost frequencies, the same frequency request object (policy->max_freq_req) as to handle requests from users is used. As a result, when setting: - scaling_max_freq - boost The last sysfs file used overwrites the request from the other sysfs file. To avoid this, create a per-policy boost_freq_req to save the boost constraints instead of overwriting the last scaling_max_freq constraint. policy_set_boost() calls the cpufreq set_boost callback. Update the newly added boost_freq_req request from there: - whenever boost is toggled - to cover all possible paths In the existing .set_boost() callbacks: - Don't update policy->max as this is done through the qos notifier cpufreq_notifier_max() which calls cpufreq_set_policy(). - Remove freq_qos_update_request() calls as the qos request is now done in policy_set_boost() and updates the new boost_freq_req $ ## Init state scaling_max_freq:1000000 cpuinfo_max_freq:1000000 $ echo 700000 > scaling_max_freq scaling_max_freq:700000 cpuinfo_max_freq:1000000 $ echo 1 > ../boost scaling_max_freq:1200000 cpuinfo_max_freq:1200000 $ echo 800000 > scaling_max_freq scaling_max_freq:800000 cpuinfo_max_freq:1200000 $ ## Final step: $ ## Without the patches: $ echo 0 > ../boost scaling_max_freq:1000000 cpuinfo_max_freq:1000000 $ ## With the patches: $ echo 0 > ../boost scaling_max_freq:800000 cpuinfo_max_freq:1000000 Note: cpufreq_frequency_table_cpuinfo() updates policy->min and max from: A. cpufreq_boost_set_sw() \-cpufreq_frequency_table_cpuinfo() B. cpufreq_policy_online() \-cpufreq_table_validate_and_sort() \-cpufreq_frequency_table_cpuinfo() Keep these updates as some drivers expect policy->min and max to be set through B. Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20260326204404.1401849-3-pierre.gondois@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 6e39ba4) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
A recent change exposed a bug in the error path: if freq_qos_add_request(boost_freq_req) fails, min_freq_req may remain a valid pointer even though it was never successfully added. During policy teardown, this leads to an unconditional call to freq_qos_remove_request(), triggering a WARN. The current design allocates all three freq_req objects together, making the lifetime rules unclear and error handling fragile. Simplify this by allocating the QoS freq_req objects at policy allocation time. The policy itself is dynamically allocated, and two of the three requests are always needed anyway. This ensures consistent lifetime management and eliminates the inconsistent state in failure paths. Reported-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Fixes: 6e39ba4 ("cpufreq: Add boost_freq_req QoS request") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Tested-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Link: https://patch.msgid.link/a293f29d841b86c51f34699c6e717e01858d8ada.1774933424.git.viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 9266b4d) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…support
Add a kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable
CPPC autonomous performance selection on all CPUs at system startup.
When autonomous mode is enabled, the hardware automatically adjusts
CPU performance based on workload demands using Energy Performance
Preference (EPP) hints.
When auto_sel_mode=1:
- Configure all CPUs for autonomous operation on first init
- Set EPP to performance preference (0x0)
- Use HW min/max_perf when available; otherwise initialize from caps
- Clamp desired_perf to bounds before enabling autonomous mode
- Hardware controls frequency instead of the OS governor
The boot parameter is applied only during first policy initialization.
Skip applying it on CPU hotplug to preserve runtime sysfs configuration.
This patch depends on patch [2] ("cpufreq: Set policy->min and max
as real QoS constraints") so that the policy->min/max set in
cppc_cpufreq_cpu_init() are not overridden by cpufreq_set_policy()
during init.
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
(backported from https://lore.kernel.org/all/20260424201814.230071-1-sumitg@nvidia.com/)
[sforshee: adjust context]
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Extract the QoS related logic from cpufreq_policy_online() to make the function shorter/simpler. The logic is placed in cpufreq_policy_init_qos() and is now executed right after the following calls: - cpufreq_driver->init() - cpufreq_table_validate_and_sort() This helps preparing following patches that will, in cpufreq_policy_init_qos(): - treat the policy->min/max values set by drivers as QoS requests. - set a default policy->min/max value to all policies. No functional change. Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> (cherry picked from https://lore.kernel.org/lkml/20260511135538.522653-2-pierre.gondois@arm.com/) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
…l drivers
Some drivers set policy->min/max in their .init() callback.
cpufreq_set_policy() will ultimately override them through:
cpufreq_policy_online()
\-cpufreq_init_policy()
\-cpufreq_set_policy()
\-/* Set policy->min/max */
Thus the policy min/max values provided are only temporary.
There is an exception if CPUFREQ_NEED_INITIAL_FREQ_CHECK is set and:
cpufreq_policy_online()
\-__cpufreq_driver_target()
\-cpufreq_driver->target()
To prepare for a following patch that will remove all
policy->min/max initialization in the driver .init() callback
if the min/max value is equal to the cpuinfo.min/max_freq,
set a default policy->min/max value for all drivers.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
(cherry picked from https://lore.kernel.org/lkml/20260511135538.522653-3-pierre.gondois@arm.com/)
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Prior to [1], drivers were setting policy->min/max and
the value was used as a QoS constraint. After that change,
the values were only temporarily used: cpufreq_set_policy()
ultimately overriding them through:
cpufreq_policy_online()
\-cpufreq_init_policy()
\-cpufreq_set_policy()
\-/* Set policy->min/max */
This patch reinstate the initial behaviour. This will allow
drivers to request min/max QoS frequencies if desired.
For instance, the cppc driver advertises a lowest non-linear
frequency, which should be used as a min QoS value.
To avoid having drivers setting policy->min/max to default
values which are considered as QoS values (i.e. the reason
why [1] was introduced), remove the initialization of
policy->min/max in .init() callbacks wherever the
policy->min/max values are identical to the
policy->cpuinfo.min/max_freq.
Indeed, the previous patch ("cpufreq: Set default
policy->min/max values for all drivers") makes this initialization
redundant.
The only drivers where these values are different are:
- gx-suspmod.c (min)
- cppc-cpufreq.c (min)
- longrun.c
[1]
commit 521223d ("cpufreq: Fix initialization of min and
max frequency QoS requests")
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
(backported from https://lore.kernel.org/lkml/20260511135538.522653-4-pierre.gondois@arm.com/)
[sforshee: adjustments to amd-pstate for missing max_freq caching patch]
Signed-off-by: Seth Forshee <sforshee@nvidia.com>
Consider policy->min/max being set in the driver .init() callback as a QoS request. Impacted driver are: - gx-suspmod.c (min) - cppc-cpufreq.c (min) - longrun.c (min/max) Update the documentation accordingly. Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> (cherry picked from https://lore.kernel.org/lkml/20260511135538.522653-5-pierre.gondois@arm.com/) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
3ce775a to
a1ffa95
Compare
|
v2 updates:
Testing:
Test results with Grace (no AUTO_SEL support): Test results on DGX Spark: |
|
|
|
|
|
Thanks Seth! I reviewed manually and with codex and did not find any issues of significance.
I'll take care of merging this. |
|
Merged, closing MR. |
This pull updates CPPC sauce patches enhancing autonomous selection with the upstream commits, which includes a number of changes as compared to the mailing list patches applied previously. It also adds some bug fixes as well as patches to add a per-policy boost_freq_req for PM QoS.