Skip to content

feat(governor): Lane H PR-3c4 — wire apply_cascade_step_to_policy + base/active split + restore-speculation-one-step-later#1365

Merged
joelteply merged 1 commit into
canaryfrom
feat/substrate-governor-pr3c4-wire-cascade-policy
May 17, 2026
Merged

feat(governor): Lane H PR-3c4 — wire apply_cascade_step_to_policy + base/active split + restore-speculation-one-step-later#1365
joelteply merged 1 commit into
canaryfrom
feat/substrate-governor-pr3c4-wire-cascade-policy

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Lane H PR-3c4 per GENOME-FOUNDRY-SENTINEL #1327 Part 11. Stacks on #1364 (PR-3c3 apply_cascade_step_to_policy, MERGED).

PR-3c3 shipped the pure function. PR-3c4 wires it into LocalSubstrateGovernor with the base-vs-active policy split + the spec's restore-speculation-one-step-later anti-oscillation rule.

What ships

src/workers/continuum-core/src/governor/local.rs:

  • base_policy: Mutex<GovernorPolicy> — canonical un-throttled policy (cascade_step always 0). Cascade transitions re-derive active from base via apply_cascade_step_to_policy, never from already-throttled current. Addresses PR-3c3's not-reversible-from-transformed documented limitation.
  • SnapshotState.pending_speculation_retreat: bool — tracks whether cascade just retreated; if true, NEXT Hold or Retreat restores speculation to lower-step value. First retreat keeps speculation at higher-step (pre-retreat) value for one cycle.
  • new() initializes base_policy from initial_policy (cascade_step normalized to 0 on base).
  • try_hardware_detected() refreshes base + resets cascade to step 0 + clears pending marker. New hardware = fresh start.
  • on_pressure_signal() rewired: derive active from base + bump policy_version + apply restore-speculation-one-step-later on retreat path + clear marker on advance + deliver restoration on Hold-with-pending-marker.

Restore-speculation-one-step-later (spec rationale)

Speculation thrash is the most user-visible cascade flapping. By keeping speculation throttled for ONE EXTRA cycle after the cascade retreats, we dampen the most observable form of oscillation while letting the rest of the policy (tier sizes, cadence, concurrency) restore immediately. Cost: one cycle of slightly-throttled speculation. Benefit: no observable flicker between Aggressive ↔ Balanced.

Test pins the behavior: Advance 0→1 drops Aggressive→Balanced; Retreat 1→0 KEEPS Balanced; next Hold RESTORES Aggressive.

Test plan

29 passing on cargo test --lib --features metal,accelerate governor::local:: (22 prior + 7 new for PR-3c4):

  • advance_derives_active_from_base_with_step_transformations
  • emergency_advance_applies_full_throttle_transformations (step 5 cumulative: tier_sizes shrunk, federation maxed, consolidation Manual, speculation dropped, personas-1)
  • retreat_holds_speculation_for_one_more_cycle (anti-oscillation rule pinned)
  • advance_during_pending_retreat_clears_marker
  • hardware_detected_refreshes_base_and_resets_cascade
  • advance_then_retreat_returns_to_base_values_modulo_speculation_dampening (proves derive-from-base prevents compounding transformations — was PR-3c3's not-reversible warning)
  • Helpers: policy_with_l1, policy_with_l1_nvidia

Stack

VDD evidence

N/A — wiring + state machine. Evidence with PR-4 + harness measurements when real pressure flows + downstream consumers read throttled policy fields.

Coordination

Explicit claim posted 00:40Z; codex on demand-aligned-recall PR-1 per their 00:40:22Z broadcast. claude-tab-1 on whatever-next. No collision.

…ase/active split + restore-speculation-one-step-later

Stacks on #1364 (PR-3c3 apply_cascade_step_to_policy, MERGED).

PR-3c3 shipped the pure function. PR-3c4 wires it into
LocalSubstrateGovernor with the base-vs-active policy split + the
spec's restore-speculation-one-step-later anti-oscillation rule.

What changed in local.rs:

- LocalSubstrateGovernor.base_policy: Mutex<GovernorPolicy> field
  added. Holds the canonical un-throttled policy (cascade_step always
  0). Cascade transitions re-derive active from base via
  apply_cascade_step_to_policy, never from the already-throttled
  current. This addresses PR-3c3's not-reversible-from-transformed
  documented limitation.

- SnapshotState.pending_speculation_retreat: bool added. Tracks
  whether the cascade just retreated; if true, the NEXT Hold or
  Retreat restores speculation to the lower-step value. The first
  retreat keeps speculation at the higher-step (pre-retreat) value
  for one more cycle.

- new() initializes base_policy from the supplied initial_policy
  (cascade_step normalized to 0 on the base; active keeps the supplied
  cascade_step).

- try_hardware_detected() refreshes base_policy + resets cascade
  (step 0, last_step_change_ms now, pending_speculation_retreat
  cleared). New hardware = fresh start; existing pressure state
  discarded.

- on_pressure_signal() rewired:
  * Same time-in-step gate as PR-3c2 (Advance from step > 0 within
    MIN_TIME_IN_STEP_MS Hold; emergency bypasses; retreat never gated)
  * On step change: clone base_policy + call apply_cascade_step_to_policy
    + bump policy_version + update committed_at_ms
  * On retreat: also apply prev_step's speculation to next_policy
    (one-step-later semantics) + set pending_speculation_retreat
  * On Advance after pending-retreat: clear marker (new pressure
    re-throttles speculation immediately)
  * On Hold with pending marker: deliver the restoration (publish
    new policy with current_step's speculation; clear marker)

Restore-speculation-one-step-later rationale (from spec):

  Speculation thrash is the most user-visible cascade flapping. By
  keeping speculation throttled for ONE EXTRA cycle after the cascade
  retreats, we dampen the most observable form of oscillation while
  letting the rest of the policy (tier sizes, cadence, concurrency)
  restore immediately. The cost is one cycle of slightly-throttled
  speculation; the benefit is no observable flicker between
  Aggressive and Balanced (or whatever pair the cascade is bouncing
  between).

Failure-mode discipline:

- Base policy is the ONLY source of truth for transformations.
  Active is always derived; never mutated in place.
- Restore-one-step-later is typed (bool marker, not a magic time
  comparison or a sentinel value).
- Hardware change wipes pending retreat marker — new hardware = clean
  slate; old cascade state doesn't bleed into new policy.

Tests: 29 passing on cargo test --lib --features metal,accelerate
governor::local:: (22 prior + 7 new for PR-3c4)

NEW (7):
- advance_derives_active_from_base_with_step_transformations
- emergency_advance_applies_full_throttle_transformations (full
  step-5 cumulative: tier_sizes shrunk, federation maxed,
  consolidation Manual, speculation dropped, personas-1)
- retreat_holds_speculation_for_one_more_cycle (anti-oscillation rule
  pinned: Advance 0→1 drops Aggr→Balanced; Retreat 1→0 KEEPS Balanced;
  next Hold RESTORES Aggressive)
- advance_during_pending_retreat_clears_marker
- hardware_detected_refreshes_base_and_resets_cascade
- advance_then_retreat_returns_to_base_values_modulo_speculation_dampening
  (proves derive-from-base prevents compounding transformations —
  was PR-3c3's not-reversible warning)
- (helpers: policy_with_l1, policy_with_l1_nvidia)

Stack:
- #1345 / #1350 / #1352 / #1354 / #1356 / #1360 / #1364 — Lane H PRs MERGED
- This PR (PR-3c4): wire apply_cascade_step_to_policy + base/active
  split + restore-speculation-one-step-later
- Future PR-3d: file watcher (notify crate) — hot-reload policy file
  changes via set_candidates
- Future PR-4: PressureBroker → governor wiring (subscribe to typed
  pressure events from broker)

VDD evidence N/A — wiring + state machine. Evidence with PR-4 +
harness measurements when real pressure flows + downstream consumers
read throttled policy fields.

Coordination: explicit claim posted 00:40Z; codex on demand-aligned-
recall PR-1 per their 00:40:22Z broadcast. claude-tab-1 on whatever-
next. No collision.
@joelteply joelteply force-pushed the feat/substrate-governor-pr3c4-wire-cascade-policy branch from 4279690 to 272cd7f Compare May 17, 2026 02:24
@joelteply joelteply merged commit 7995dcb into canary May 17, 2026
3 checks passed
@joelteply joelteply deleted the feat/substrate-governor-pr3c4-wire-cascade-policy branch May 17, 2026 02:25
joelteply pushed a commit that referenced this pull request May 17, 2026
… bridge

Pure-function bridge between PressureBroker's PressureAlert surface
(disk/memory pool eviction events) and the governor's typed
PressureSignal cascade input. Per GENOME-FOUNDRY-SENTINEL.md Part 11
line 1121: "PressureBroker informs the SubstrateGovernor. Pressure
signals from the broker drive the governor's adjustment cascade."

Scope:
- `alert_to_signal(&PressureAlert) -> Option<PressureSignal>` — pure
  mapping. High/Critical tier → SystemMemHigh{used_pct}; Normal/
  Warning/unknown → None.
- `governor_alert_sink(Arc<dyn SubstrateGovernor>) -> AlertSink` —
  factory that wraps a governor as an AlertSink the broker can register
  via `PressureBroker::add_alert_sink`. Sink derives the signal and
  forwards via `governor.on_pressure_signal` when Some; drops when None.

NOT in this PR (deferred to PR-5):
- Wiring the sink into PressureBrokerModule's boot path. The bridge is
  the data-side primitive; the wiring is a separate concern.
- Pool-name-aware mapping (vram → VRAMHigh, etc.). Today's broker pools
  are all memory-adjacent (Docker disk, HF cache, future VRAM via
  GpuMemoryManager); SystemMemHigh is the conservative single-mapping
  the cascade reacts to identically. Refinement when pool tier_name
  conventions stabilize.

Discipline:
- No silent default-on-error. Mapping is total — every alert maps to
  either Some(signal) or None explicitly.
- Pressure clamped to [0.0, 1.0] before percent conversion so transient
  over-budget snapshots map to 100% and negative artifacts map to 0%
  rather than wrapping via `as u8`.
- Sink forwards via `Arc<dyn SubstrateGovernor>` (object-safe trait) so
  the bridge does not depend on LocalSubstrateGovernor concretely.

Tests (14, all passing):
- normal/warning/unknown tiers -> None (4 tests)
- high/critical tiers -> SystemMemHigh with rounded used_pct (3 tests)
- pressure clamping above 1.0 + below 0.0 + rounding (3 tests)
- sink forwarding high/critical + non-forwarding normal/warning (4 tests)
- sink survives construction-scope drop + multi-call ordering (2 tests)

Lane H 8-PR stack progress: PR-1 (#1330/1331) -> PR-2 (#1345) -> PR-3a
(#1352) -> PR-3b (#1354) -> PR-3c1 (#1356) -> PR-3c2 (#1360) -> PR-3c3
(#1364) -> PR-3c4 (#1365) -> **PR-4 (this PR)**. PR-3d governor file
watcher in flight from codex on parallel branch (no overlap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply pushed a commit that referenced this pull request May 17, 2026
… bridge

Pure-function bridge between PressureBroker's PressureAlert surface
(disk/memory pool eviction events) and the governor's typed
PressureSignal cascade input. Per GENOME-FOUNDRY-SENTINEL.md Part 11
line 1121: "PressureBroker informs the SubstrateGovernor. Pressure
signals from the broker drive the governor's adjustment cascade."

Scope:
- `alert_to_signal(&PressureAlert) -> Option<PressureSignal>` — pure
  mapping. High/Critical tier → SystemMemHigh{used_pct}; Normal/
  Warning/unknown → None.
- `governor_alert_sink(Arc<dyn SubstrateGovernor>) -> AlertSink` —
  factory that wraps a governor as an AlertSink the broker can register
  via `PressureBroker::add_alert_sink`. Sink derives the signal and
  forwards via `governor.on_pressure_signal` when Some; drops when None.

NOT in this PR (deferred to PR-5):
- Wiring the sink into PressureBrokerModule's boot path. The bridge is
  the data-side primitive; the wiring is a separate concern.
- Pool-name-aware mapping (vram → VRAMHigh, etc.). Today's broker pools
  are all memory-adjacent (Docker disk, HF cache, future VRAM via
  GpuMemoryManager); SystemMemHigh is the conservative single-mapping
  the cascade reacts to identically. Refinement when pool tier_name
  conventions stabilize.

Discipline:
- No silent default-on-error. Mapping is total — every alert maps to
  either Some(signal) or None explicitly.
- Pressure clamped to [0.0, 1.0] before percent conversion so transient
  over-budget snapshots map to 100% and negative artifacts map to 0%
  rather than wrapping via `as u8`.
- Sink forwards via `Arc<dyn SubstrateGovernor>` (object-safe trait) so
  the bridge does not depend on LocalSubstrateGovernor concretely.

Tests (14, all passing):
- normal/warning/unknown tiers -> None (4 tests)
- high/critical tiers -> SystemMemHigh with rounded used_pct (3 tests)
- pressure clamping above 1.0 + below 0.0 + rounding (3 tests)
- sink forwarding high/critical + non-forwarding normal/warning (4 tests)
- sink survives construction-scope drop + multi-call ordering (2 tests)

Lane H 8-PR stack progress: PR-1 (#1330/1331) -> PR-2 (#1345) -> PR-3a
(#1352) -> PR-3b (#1354) -> PR-3c1 (#1356) -> PR-3c2 (#1360) -> PR-3c3
(#1364) -> PR-3c4 (#1365) -> **PR-4 (this PR)**. PR-3d governor file
watcher in flight from codex on parallel branch (no overlap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 17, 2026
)

Pure-function bridge between PressureBroker's PressureAlert surface
(disk/memory pool eviction events) and the governor's typed
PressureSignal cascade input. Per GENOME-FOUNDRY-SENTINEL.md Part 11
line 1121: "PressureBroker informs the SubstrateGovernor. Pressure
signals from the broker drive the governor's adjustment cascade."

Scope:
- `alert_to_signal(&PressureAlert) -> Option<PressureSignal>` — pure
  mapping. High/Critical tier → SystemMemHigh{used_pct}; Normal/
  Warning/unknown → None.
- `governor_alert_sink(Arc<dyn SubstrateGovernor>) -> AlertSink` —
  factory that wraps a governor as an AlertSink the broker can register
  via `PressureBroker::add_alert_sink`. Sink derives the signal and
  forwards via `governor.on_pressure_signal` when Some; drops when None.

NOT in this PR (deferred to PR-5):
- Wiring the sink into PressureBrokerModule's boot path. The bridge is
  the data-side primitive; the wiring is a separate concern.
- Pool-name-aware mapping (vram → VRAMHigh, etc.). Today's broker pools
  are all memory-adjacent (Docker disk, HF cache, future VRAM via
  GpuMemoryManager); SystemMemHigh is the conservative single-mapping
  the cascade reacts to identically. Refinement when pool tier_name
  conventions stabilize.

Discipline:
- No silent default-on-error. Mapping is total — every alert maps to
  either Some(signal) or None explicitly.
- Pressure clamped to [0.0, 1.0] before percent conversion so transient
  over-budget snapshots map to 100% and negative artifacts map to 0%
  rather than wrapping via `as u8`.
- Sink forwards via `Arc<dyn SubstrateGovernor>` (object-safe trait) so
  the bridge does not depend on LocalSubstrateGovernor concretely.

Tests (14, all passing):
- normal/warning/unknown tiers -> None (4 tests)
- high/critical tiers -> SystemMemHigh with rounded used_pct (3 tests)
- pressure clamping above 1.0 + below 0.0 + rounding (3 tests)
- sink forwarding high/critical + non-forwarding normal/warning (4 tests)
- sink survives construction-scope drop + multi-call ordering (2 tests)

Lane H 8-PR stack progress: PR-1 (#1330/1331) -> PR-2 (#1345) -> PR-3a
(#1352) -> PR-3b (#1354) -> PR-3c1 (#1356) -> PR-3c2 (#1360) -> PR-3c3
(#1364) -> PR-3c4 (#1365) -> **PR-4 (this PR)**. PR-3d governor file
watcher in flight from codex on parallel branch (no overlap).

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant