fix: prevent cgroup isolation churn during WLT polling#116
Open
Karabiner07 wants to merge 1 commit intointel:mainfrom
Open
fix: prevent cgroup isolation churn during WLT polling#116Karabiner07 wants to merge 1 commit intointel:mainfrom
Karabiner07 wants to merge 1 commit intointel:mainfrom
Conversation
Signed-off-by: Joy Philip Pilli <joyphilip.p2001@gmail.com>
|
@Karabiner07 not related to the pr itself but can i ask you how did you manage to make lpmd work under family 6 model 197? for me when i compile the from source and run it says Platform not supported yet!
|
Author
I'll commit the required changes for the ARL 255H this weekend in my fork repo and I'll create a new PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Fix: Prevent cgroup isolation churn during WLT state flapping
Platform
Intel Core Ultra 7 255H (Arrow Lake-H, Family 6 Model 197)
Kernel: 6.17.0-22 — Ubuntu 24.04
CPU layout: 0–5 = P-cores, 6–13 = E-cores, 14–15 = L-cores
Problem
In Balanced (AUTO) mode with
Mode=1(cgroup v2 isolate), P-cores werenot being parked despite the cgroup isolation appearing to apply correctly
at startup. CPUs 0–5 remained active and scheduled work even though the
config specified
ActiveCPUs=6-15for all WLT states.Power Saver mode (
PowersaverDef=1,LPMD_ON) worked correctly — P-coresparked at 400 MHz as expected. Only Balanced AUTO mode was affected.
Root Cause
process_cpu_isolate()insrc/lpmd_cgroup.cunconditionally executesthis sequence every time it is called:
The
partition=memberwrite, even for milliseconds, returns CPUs 0–5 tothe root cgroup's effective CPU set. The kernel scheduler immediately
places waking tasks on them.
In AUTO mode, WLT (Workload Type) hints from the hardware flux rapidly —
multiple transitions per second between
UTIL_IDLE,UTIL_IDLE_BURSTY,and
UTIL_IDLE_GFX_BUSY. Each transition causesneed_enter()to returntrue (it compares state indices, not cpumask content), which calls
enter_state(), which callsprocess_cgroup(), which callsprocess_cpu_isolate()— re-running the destructive write sequence everytime.
In the
intel_lpmd_config_F6_M197.xmlconfig (implementing arrow-lake support RN)and Panther Lake M204 config, all four WLT states declare
identical
ActiveCPUsmasks. So every WLT flip is pure churn:the cgroup is torn down and rebuilt withthe same content, many times per second.
The P-cores are never stably parked — every
partition=memberwrite opens ascheduling window before the next
partition=isolatedcloses it.Power Saver was unaffected because
LPMD_ONshort-circuits toDEFAULT_ONinchoose_next_state(), bypassing WLT iteration entirely.One stable cgroup entry, no flapping.
Fix
Added a file-static
last_applied_cpumaskcache toprocess_cgroup().Before executing the cgroup write sequence, the function now compares the
requested cpumask against the last successfully applied one using the
existing
cpumask_equal(). If the content isidentical, the write sequence is skipped entirely.
The cache is reset in
cgroup_cleanup()so a subsequent daemon startis never confused by stale state from a previous run.
Verification
After the fix, in Balanced mode on the 255H:
cat /sys/fs/cgroup/lpm/cpuset.cpus.partition→isolated(stable)Skip cgroup: cpumask unchangedandProcess Cgroupfires only on genuine mask transitionsto apply correct isolation at each transition
isolation at each transition