Skip to content

thermald --adaptive throttles a Core Ultra 7 270K via cpufreq_cooling cur_state=3 + PL1=0W, triggered by BIOS DSDT passive trips at -274 °C #550

@mpcusack

Description

@mpcusack

Summary

On an Intel Core Ultra 7 270K (Arrow Lake, Gigabyte Z890 AERO G, Ubuntu 25.10, kernel 6.17.0-20-generic), thermald --adaptive — as shipped and enabled by default — starts clamping CPU performance via cpufreq_cooling, powercap, and intel_pstate sysfs within ~60 seconds of startup despite the CPU sitting at 34–40 °C with zero thermal events. Specifically, it writes intel_pstate/max_perf_pct=68, intel_pstate/no_turbo=1, cur_state=3 on every Processor cpufreq_cooling device, and zeroes the RAPL package PL1. The PL1=0W write is particularly destructive: with the MSR enable bit set, it clamps the CPU to LFM (~400 MHz) in hardware regardless of any HWP_REQUEST value. End-to-end impact on a 24-core stress-ng matrixprod benchmark: 4,640 bogo-ops/s with thermald running, 82,805 bogo-ops/s with thermald absent — a 17.8× performance regression and a 17 W → 320 W drop in package power at identical ambient temperature.

The proximate cause appears to be two ACPI thermal zones (thermal_zone3 "TCPU_PCI" and thermal_zone4 "x86_pkg_temp") shipping passive trip points at -274000 mC — which is the kernel's encoding for an uninitialized 0 K ACPI value. thermald interprets the current 40 °C temperature as "above passive trip" (since 40 > -274) and escalates to maximum cooling. No valid thermal condition exists.

The kernel side (intel_pstate, HWP notification handler, thermal framework plumbing) is working correctly — confirmed with ftrace, bpftrace+BTF, and a full reboot test with thermald + power-profiles-daemon purged. The bug is entirely in thermald's decision-making on this hardware.

TL;DR asks

  1. thermald should detect and reject/ignore ACPI thermal zone trip points at physically impossible temperatures (e.g. < -50 °C). A trip temperature of -274 °C is the kernel's encoding for an unset Kelvin ACPI value and should never be treated as actionable.
  2. thermald should never write PL1 = 0 W with the enable bit set under any circumstance. There is no valid thermal scenario where the correct action is "clamp sustained package power to zero watts." That's a hardware wedge, not a cooling strategy.
  3. thermald should log its decisions at default verbosity — which trip fired on which zone, the current temperature vs the trip threshold, which cooling device it is engaging, and at what level. The current default output leaves no way to figure out what thermald --adaptive is doing without isolating it under bpftrace or similar.
  4. (Optional, related) The Linux kernel thermal core (x86_pkg_temp_thermal, int340x_thermal, or the shared trip-registration code) should validate ACPI-sourced trip temperatures against a sanity range before exposing them to user-space governors. Noted here for routing consideration since both intel_pstate and thermald live in the same maintainer's tree.

Reproduction

Cold boot on the hardware listed in "System details" below. Disable any local workarounds first; then:

# Confirm starting state — all healthy numbers (see "Good state" section below).
sudo /home/mpcusack/cpubench.sh baseline  # or any equivalent 24-core load benchmark

# Start thermald (if not already running).
sudo systemctl start thermald

# Wait 60 seconds.
sleep 60

# Observe the damage.
cat /sys/devices/system/cpu/intel_pstate/max_perf_pct            # → 68 (was 100)
cat /sys/devices/system/cpu/intel_pstate/no_turbo                # → 1 (was 0)
cat /sys/class/thermal/cooling_device13/cur_state                # → 3 (was 0)
cat /sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw # → 0 (was 4095000000)
sudo rdmsr -p 0 0x774                                            # → 1717 (was 4000580d)

# Re-run benchmark.
sudo /home/mpcusack/cpubench.sh with-thermald

Benchmarking before thermald runs: 82,805 bogo/s on 24-core stress-ng matrixprod, ~320 W package, 5,388 MHz P-core / 4,701 MHz E-core. After thermald runs for 60 s: ~4,640 bogo/s, ~18 W package, 400 MHz every core.

Recovery without a reboot requires systemctl stop thermald, manually resetting cur_state=0 on all 24 Processor cooling devices, and writing the PL1 back to its default via /sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw. Simply stopping thermald is not sufficient because it leaves the state it wrote in place; stopping it before it runs (e.g. systemctl mask thermald followed by a reboot) is the clean workaround.

Isolation test

Performed on 2026-04-09 with a clean repaired baseline, both thermald and power-profiles-daemon stopped, cur_state=0 on all cdevs, no_turbo=0, max_perf_pct=100, PL1=250 W, HWP_REQUEST=0x5824 on every CPU. Then only thermald was started and state was snapshotted at fixed intervals:

Snapshot HWP_REQ cpu0 HWP_REQ cpu12 no_turbo max_perf_pct cpu0 scaling_max cdev13
baseline (both stopped) 0x5824 0x5824 0 100 5,500,000 0
thermald + 2 s 0x5824 0x5824 0 100 5,500,000 0
thermald + 12 s 0x5824 0x5824 0 100 5,500,000 0
thermald + 42 s 0x3c3c 0x2d2d 1 68 3,700,000 0
thermald + 62 s 0x1717 0x1111 1 68 1,480,000 3

Between T+12s and T+42s, thermald:

  • wrote max_perf_pct=68 (= HWP_CAPABILITIES.Guaranteed / Highest × 100 = 60/88 × 100 = 68.1868)
  • wrote no_turbo=1
  • caused scaling_max_freq to drop to base clock (3,700,000 kHz on P-cores, 3,200,000 on E-cores)
  • HWP_REQUEST.Max consequently recomputed via intel_pstate_set_policy to HWP_CAPABILITIES.Guaranteed per core type: 0x3c = 60 on P-cores, 0x2d = 45 on E-cores

Between T+42s and T+62s, thermald:

  • set cur_state=3 on every Processor cpufreq_cooling device (level 3 of 3 = maximum throttle)
  • this added a freq_qos MAX constraint at freq_table[3] = 1,480,000 kHz on P-cores / 1,280,000 on E-cores
  • HWP_REQUEST recomputed again via the cpufreq policy-refresh path to 0x1717 (Min=Max=23) on P-cores / 0x1111 (Min=Max=17) on E-cores
  • also zeroed PL1 at some point in this window (exact write moment not captured)

The CPU was at 34–40 °C throughout the entire isolation run. Ambient room temperature. Cooler: Noctua NH-D15, substantial thermal headroom. /proc/interrupts TRM and THR columns: zero. THERM_STATUS bit 2 (PROCHOT): clear. PERF_LIMIT_REASONS (MSR 0x64F): only log bits, no live limits until after thermald's PL1=0 write.

A separate isolation run confirmed that power-profiles-daemon does a smaller, different corruption (writes EPP=balance_performance on every CPU via sysfs within 2 s of startup, which routes through intel_pstate_set_policy and recomputes HWP_REQUEST with Min=Max=Guaranteed), but does not touch cur_state, no_turbo, max_perf_pct, or PL1. Masking/purging both daemons is the clean fix — this report focuses on the thermald half since it is by far the more destructive one.

Ruled out

The following were exhaustively ruled out as causes during the investigation:

  • intel_pstate HWP notification handler (notify_hwp_interrupt, intel_pstate_notify_work) — four separate ftrace passes across idle, 24-core stress-ng, systemctl restart k3s, and sustained heavy load captured zero calls to either function while IA32_HWP_INTERRUPT = 0x05 (the hardware default).
  • Stale intel_pstate per-CPU cached state. bpftrace with kernel BTF on intel_pstate_update_perf_limits printed cpudata struct fields live: pstate.max_pstate=60, turbo_pstate=88, max_freq=3,700,000, turbo_freq=5,500,000, scaling=62711. All correct. intel_pstate_update_perf_limits faithfully propagates whatever policy->max it receives; the stale value sits one level up in cpufreq's freq_qos constraint list.
  • HWP firmware / PMC misbehavior. HWP_CAPABILITIES.Highest is stable at 0x58 on P-cores and 0x42 on E-cores across every observation. HWP_CAPABILITIES.Guaranteed is stable at 0x3c on P-cores and 0x2d on E-cores. No HWP interrupt ever fires. The PMC is correctly reporting the hardware's actual capability.
  • BIOS settings. All power-related BIOS knobs were exhaustively tested: PerfDrive profiles (Default Performance, Default Extreme, Optimization), Energy Efficient Turbo (on/off), Race to Halt (on/off), Intel Turbo Boost Technology (Auto/Enabled), Turbo Power Limit (Auto/Enabled), CPU Flex Ratio, C-States, Package C State Limit, ErP, Platform Power Management. None of them affect the throttle when thermald is running, and none of them are needed when thermald is absent.
  • RAPL hardware limits. Pre-thermald: MSR_PKG_POWER_LIMIT = 0x43fff8001bfff8 → PL1 = 4,095 W (BIOS default, effectively unlimited), PL2 = 4,095 W. MSR_PP0_POWER_LIMIT = 0. MSR_PLATFORM_POWER_LIMIT bit 15 (enable) = 0. MSR_VR_CURRENT_CONFIG = 0x80000d48 → IccMax = 425 A (above the 270K rated 307 A). MSR_CONFIG_TDP_NOMINAL = 0x25 (nominal ratio 37, correct). No hardware-level limit is asserting.
  • Thermal sensor readings. All hwmon sensors, MSR 0x19C IA32_THERM_STATUS, MSR 0x1B1 IA32_PACKAGE_THERM_STATUS, and /sys/class/thermal/thermal_zone*/temp agree: CPU is at 34–40 °C, no trip point above 85 °C (where valid trip points exist) has ever been crossed.

Why thermald's --adaptive fires

thermal_zone3 (TCPU_PCI) and thermal_zone4 (x86_pkg_temp) both expose passive trip points at -274000 mC (= -274 °C). This is the kernel's encoding for an unset Kelvin ACPI value: 0 K converted to millicelsius is -273150 mC, but the specific rounding / sign handling in drivers/thermal/* yields -274000 on this firmware revision.

$ for tz in /sys/class/thermal/thermal_zone*; do
    type=$(cat $tz/type); temp=$(cat $tz/temp)
    echo "$(basename $tz)  $type  temp=$temp"
    for trip in $tz/trip_point_*_type; do
      n=$(basename $trip | sed 's/trip_point_\([0-9]*\)_type/\1/')
      tt=$(cat $trip); tv=$(cat $tz/trip_point_${n}_temp 2>/dev/null)
      printf "  trip %s: type=%s temp=%s\n" "$n" "$tt" "$tv"
    done
done
# (output excerpt, only zones with passive trips shown)
thermal_zone3  TCPU_PCI    temp=39000
  trip 0: type=passive temp=-274000
thermal_zone4  x86_pkg_temp temp=41000
  trip 0: type=passive temp=-274000
  trip 1: type=passive temp=-274000

A sane thermal governor would refuse to act on a passive trip at -274 °C: no real CPU is or can be "above" that value in any useful sense. thermald --adaptive instead treats the current 40 °C reading as "above passive trip" and escalates cooling to maximum on every available device.

The underlying cause of these bogus trip values is a BIOS DSDT bug on the Gigabyte Z890 AERO G (BIOS F19, release 2026-02-02, the latest stable as of this report). Gigabyte has shown no interest in fixing Linux-specific BIOS issues on this board (the _GPE.AL6B.WAK0 ACPI error flood, which required an acpi_osi="Windows 2022" kernel cmdline workaround, is still present in F19). Asking Gigabyte to fix the DSDT is not a realistic mitigation.

That leaves thermald as the only layer that can realistically defend against this. A simple validation — "reject trip points with temperatures outside -50 °C .. +200 °C" — would fix this bug and provide defensive hardening against any BIOS with similar DPTF issues across the entire Arrow Lake / Lunar Lake family, of which there appear to be several (see "related reports" below).

System details

Field Value
CPU Intel Core Ultra 7 270K ("Plus")
CPUID family:model:stepping 0x6:0xC6:0x2 (family 6, model 198, stepping 2)
Cores 8 P-cores + 16 E-cores (no SMT, 24 logical = 24 physical)
Motherboard Gigabyte Z890 AERO G
BIOS version F19, 2026-02-02 (latest stable from Gigabyte as of this report)
RAM 92 GB DDR5
Cooler Noctua NH-D15
OS Ubuntu 25.10 (Questing Quokka)
Kernel 6.17.0-20-generic
thermald version 2.5.9-1ubuntu0.1
power-profiles-daemon version 0.30-1.1
scaling_driver intel_pstate (active mode, default)
/etc/thermald/thermal-conf.xml not present (stock config)
thermald invocation /usr/sbin/thermald --systemd --dbus-enable --adaptive (Ubuntu's shipped systemd unit)

Good-state reference (with thermald masked/purged)

Fresh boot, thermald.service and power-profiles-daemon.service both apt purged, no other userspace intervention:

HWP_INTERRUPT cpu0      = 0x5               # kernel default, bits 0 + 2
HWP_REQUEST   cpu0      = 0x4000580d        # Max=0x58=88 (full turbo), Min=0x0d=13 (LFM), EPP=0x40
HWP_REQUEST   cpu12     = 0x4000420c        # E-core Max=0x42=66 (full E-turbo), Min=0x0c=12
HWP_CAPABILITIES cpu0   = 0x11d3c58         # Highest=0x58 Guaranteed=0x3c MostEff=0x1d Lowest=0x01
HWP_CAPABILITIES cpu12  = 0x10c2d42         # Highest=0x42 Guaranteed=0x2d MostEff=0x0c Lowest=0x01
no_turbo=0 max_perf_pct=100 min_perf_pct=14 status=active
PKG_POWER_LIMIT (0x610) = 0x43fff8001bfff8  # PL1 ≈ 4095 W, PL2 ≈ 4095 W, both enabled
PL1 sysfs               = 4095000000 μW     # BIOS default, effectively unlimited
cpu0 cpuinfo_max_freq   = 5500000
cpu12 cpuinfo_max_freq  = 4700000
cpu0 scaling_max_freq   = 5500000
cpu12 scaling_max_freq  = 4700000
cdev13..36 (Processor) cur_state=0
PERF_LIMIT_REASONS 0x64F = 0x18421000       # only log bits set, no live limits

intel_pstate correctly sets HWP_REQUEST.Max = HWP_CAPABILITIES.Highest per core type on its own at boot. No per-CPU-type workaround code is necessary — this is exactly what intel_pstate_hybrid_hwp_adjust is for, and it works correctly when nothing is interfering.

Benchmark impact

Four stress-ng workloads under turbostat, 10 s each, wrapped in a small bash runner that snapshots intel_pstate state plus the full HWP_REQUEST / HWP_CAPABILITIES / PKG_POWER_LIMIT MSRs alongside the bogo-ops result so every run is reproducible against the system state it produced.

Test thermald running thermald purged Ratio
1-core matrixprod 236.57 bogo/s 3,174.69 bogo/s 13.4×
1-core crc16 120.54 bogo/s 1,651.65 bogo/s 13.7×
24-core matrixprod 4,640.86 bogo/s 82,805.02 bogo/s 17.8×
24-core float 4,469.28 bogo/s 78,011.58 bogo/s 17.5×
PkgWatt under 24c load 17.81 W 320.51 W 18.0×
Bzy_MHz under 24c load 400 4,931 12.3×

Idle package power drops from ~18 W under load-clamp to ~14 W truly idle after the fix, because the kernel default HWP_REQUEST.Min = 0x0d = 13 (LFM) lets the cores enter deeper C-states than the prior forced-on state allowed.

Approaches that did not fix the throttle

  • intel_pstate=passive kernel cmdline
  • intel_pstate=disable kernel cmdline
  • wrmsr 0x774 0x5858 direct MSR write (reverted within seconds as long as thermald was running)
  • RAPL PL1 override via powercap sysfs (reverted to 0 by thermald)
  • BIOS profile iteration (Intel Default Performance → Extreme → Gigabyte Optimization, with and without EE Turbo + Race-to-Halt off)
  • wrmsr 0x773 0x0 (disable IA32_HWP_INTERRUPT) — works as a workaround because it suppresses something early in boot that the kernel HWP notification handler would have acted on, but is not the actual cause of the steady-state throttle
  • intel_powerclamp blacklist — addressed the idle_inject/N symptom of a related thermald cascade (intel/thermal_daemon#263), not the root cause
  • Assorted cooling device cur_state=0 writes — cleared by thermald on the next poll cycle

The only thing that fixed it durably was not running thermald.

Workaround

sudo apt purge thermald power-profiles-daemon
sudo reboot

thermald is only Recommends: from linux-image-generic (not Depends:), so apt purge removes it without affecting the kernel. power-profiles-daemon's reverse-deps (gnome-control-center, gnome-shell, tuned-ppd) are typically not installed on a headless server, so purging it is also clean.

The reboot after purge is load-bearing: simply stopping thermald at runtime leaves the damage in place. Every cooling device it pinned at cur_state=3, the zeroed PL1, the max_perf_pct=68, the no_turbo=1 — all stay written to their respective sysfs and MSR locations after systemctl stop thermald. Recovering without a reboot requires writing cur_state=0 on every Processor cpufreq_cooling device, restoring PL1 via /sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw, writing 100 to max_perf_pct and 0 to no_turbo, and optionally writing HWP_REQUEST directly via wrmsr 0x774. A reboot from a purged state is simpler and produces a clean fresh-boot kernel init.

Masking via systemctl mask thermald power-profiles-daemon is a weaker alternative: it prevents the services from starting but leaves the packages installed, which means a future dist-upgrade pulling them back in as a Recommends: dependency can silently un-mask them on reload. apt purge is the stronger form and is preferred when nothing else on the system requires the packages.

Related reports (same symptom family)

  • Tom's Hardware — Intel 258V (Lunar Lake) stuck at 400 MHz on Linux 6.14.3 (Fedora, openSUSE TW). Same "400 MHz on every core under any load" pattern, does not reproduce on Windows on the same ASUS Zenbook S 14. Thread open, still unresolved as of this report.
  • Tom's Hardware — same user, follow-up on 6.14.4. Tried governor changes, power cycling, checking BD_PROCHOT. Still unresolved.
  • Manjaro — Core Ultra 9 275HX (Arrow Lake-HX) stuck at 1.6–1.8 GHz on kernel 6.17. Closed and marked solved via a BIOS toggle: enabling "Smart Power" on a Lenovo laptop restored normal turbo behavior. "Smart Power" is a Lenovo-specific DPTF configuration switch, so the fix was platform-BIOS-level rather than a Linux-side change. This strongly suggests a DPTF data-shape issue on the Arrow Lake / Lunar Lake family: the upstream Linux thermal stack (or thermald, or both) handles some DPTF configurations correctly and others catastrophically, and which branch you hit depends on the OEM's BIOS DPTF table. The Gigabyte Z890 board in this report is the "bad DPTF" branch; the Lenovo 275HX with Smart Power enabled is the "good DPTF" branch. None of these users captured MSR dumps, so the comparison is structural rather than confirmed — but the common thread is Intel Core Ultra 200-series family + Linux kernel 6.14–6.17 + Linux-only symptom + DPTF-adjacent fix when a fix exists at all.

A thermald-side fix that validates ACPI trip temperatures against a sanity range would resolve the Gigabyte Z890 case directly, and would also cover the Lenovo 275HX case without requiring the BIOS Smart Power toggle.

Additional data available

Point-in-time snapshots from the investigation:

  • Full ftrace captures (notify_hwp_interrupt / intel_pstate_notify_work / sysfs-update path) — ~5,900 lines
  • bpftrace captures of intel_pstate_update_perf_limits with full cpudata struct dumps via BTF
  • journalctl -u thermald across multiple boots
  • Full cpubench.log with 15+ labeled runs covering every intermediate state
  • Full dmesg from a fresh boot (pre-thermald and post-thermald)
  • sudo turbostat --quiet --debug --interval 1 --num_iterations 5 output in both states
  • Full MSR dumps: 0x19C, 0x1B1, 0x1FC, 0x648, 0x64B, 0x64F, 0x770, 0x771, 0x774, 0x610, 0x611, 0x1A2, 0x1AD, 0x601, 0x638, 0x65C, 0x19A, 0x150

Continuous time-series at 60-second sample interval:

  • per-CPU HWP_REQUEST (Min / Max / Desired / EPP) and HWP_CAPABILITIES (Highest / Guaranteed / Most_Efficient / Lowest)
  • cur_state for all 24 Processor cpufreq_cooling devices
  • intel_pstate sysfs knobs (no_turbo, max_perf_pct, min_perf_pct)
  • RAPL PL1 in watts
  • thermald.service / power-profiles-daemon.service active state

The host is a production homelab server currently running with thermald purged, and can be taken in and out of the investigation on short notice for patch testing, instrumented thermald builds, or newer-kernel tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions