feat(governor): Lane H PR-3a — policy selection from HardwareClass + applies_to fingerprint#1351
Closed
joelteply wants to merge 1 commit into
Closed
feat(governor): Lane H PR-3a — policy selection from HardwareClass + applies_to fingerprint#1351joelteply wants to merge 1 commit into
joelteply wants to merge 1 commit into
Conversation
…applies_to fingerprint Per GENOME-FOUNDRY-SENTINEL #1327 Part 11. Stacks on #1350 (PR-2 TOML loader). PR-2 ships file → PolicyFile. This PR-3a ships the SELECTION layer: given a HardwareClass + a list of PolicyFile, pick the right one. Match algorithm (documented in module docstring): - Comma-separated constraints in applies_to string - Constraint kinds: silicon tag (apple-m/nvidia/amd/vulkan/none), thermal tag (thinandlight/workstation/server/mobile), uma tag (redundant with apple-m, for reader clarity), numeric range (vram_mb=lo..hi, ram_mb=lo..hi, both inclusive) - ALL constraints must hold - If multiple files match, LONGEST applies_to wins (most specific) - ZERO matches → typed NoMatchingPolicy error with HardwareClass + candidate_count Pure function. Same (hardware_class, candidates) always returns same result. Failure-mode discipline: - NoMatchingPolicy on zero matches (never silent default to wrong-hardware policy) - MalformedConstraint with field + reason for range syntax errors - UnknownConstraintTag for unrecognized tags (no silent wildcard interpretation) Tests: 23 passing on cargo test --lib --features metal,accelerate governor::policy_selection:: - M-Air policy matches M2 Air hardware (canonical Mac path) - Blackwell policy matches Blackwell hardware (canonical discrete-GPU path) - Multiple candidates → only matching returned - Multiple matches → longest applies_to wins (tiebreaker) - Empty candidates → NoMatchingPolicy candidate_count=0 - Silicon tag must match (each variant) - Thermal tag must match - Range inclusive at lower + upper boundary (off-by-one defense) - Range misses one-below-lower + one-above-upper - vram_mb range matches Blackwell - UMA tag holds for Apple, fails for discrete - Unknown tag → typed err with tag named - Range without '..' → MalformedConstraint - Range non-numeric lo → MalformedConstraint - Range with hi < lo → MalformedConstraint - Unknown range field (cpu_ghz) → MalformedConstraint - Whitespace tolerated in applies_to - Empty applies_to acts as wildcard (documented) - PolicySelectionError: Display + Error trait - Determinism Stack: - #1345 governor PR-1 (MERGED) - #1350 governor PR-2 TOML loader (OPEN) - This PR (PR-3a): policy selection - Future PR-3b: LocalSubstrateGovernor reference impl with arc_swap - Future PR-3c: cascade state machine + hysteresis - Future PR-3d: file watcher (notify crate) - Future PR-4: PressureBroker → governor wiring
Contributor
Author
|
Closing in favor of codex's #1352 which addresses the same scope (policy selection from HardwareClass + applies_to). We shipped this in parallel within ~10min — coordination miss on my end (should have checked queue before claiming PR-3a). codex's #1352 adds: AmbiguousMatch refusal (stricter than my longest-applies_to tiebreaker), stable hardware fingerprints surface for diagnostics/VDD. Both well-designed; #1352 wins on the ambiguity-refusal which is more aligned with the no-silent-defaults rule. Will rebase my PR-3b LocalSubstrateGovernor work onto #1352's selector API once it merges + take a harness lane item in parallel (T1 #1 chat-roundtrip-live-harness is unclaimed). |
This was referenced May 16, 2026
joelteply
added a commit
that referenced
this pull request
May 16, 2026
Stacks on #1352 (codex's PR-3a policy_selector, MERGED). Per GENOME-FOUNDRY-SENTINEL #1327 Part 11. LocalSubstrateGovernor is the reference impl of the SubstrateGovernor trait (from #1345 PR-1). Holds the live policy behind arc_swap for wait-free reads; mutex-protected snapshot history for telemetry. What ships in src/workers/continuum-core/src/governor/local.rs: - LocalSubstrateGovernor struct: Arc<ArcSwap<GovernorPolicy>> for policy + Mutex<SnapshotState> for cascade-transition-count + recent-signals ring - new(initial_policy) constructor — ready to serve current_policy() immediately - set_candidates(Vec<PolicyFile>) — file watcher (PR-3d) will call this on fs change events; for PR-3b, set manually - try_hardware_detected(hw) → Result<(), PolicySelectionError> — fallible variant for callers that want the typed error - on_hardware_detected(hw) — trait method, swallows errors per spec (logs/telemetry surface them separately) - on_pressure_signal(signal) — records into ring (PR-3c adds threshold + cascade logic; PR-3b only records) - snapshot() → GovernorSnapshot — telemetry consumer reads this - candidate_count() — diagnostic for 'did the file watcher load anything?' Concurrency model (matches spec's 'never blocks reads'): - Reads: arc_swap.load_full() → Arc<GovernorPolicy> clone (wait-free) - Writes: arc_swap.store(Arc::new(new_policy)) + mutex on snapshot state for transition-count bump (~µs hold) - Tests prove the wait-free guarantee: many_concurrent_reads_dont_block + concurrent_read_during_write_sees_consistent_snapshot What this PR DOES NOT do: - Cascade state machine + threshold/hysteresis (PR-3c) - File watcher / hot reload (PR-3d) - PressureBroker subscription wiring (PR-4) - Built-in default policy fallback (caller handles NoMatchingPolicy) Failure-mode discipline: - on_hardware_detected with no matching candidate KEEPS previous policy (trait swallows error per spec — operator monitors via snapshot.cascade_transition_count which stays unchanged on Err) - on_hardware_detected with empty candidates is a no-op (first-boot before file watcher loads anything — governor still serves initial_policy) - cascade_transition_count increments per PUBLISH, not per call — failed selections don't count - on_pressure_signal does NOT bump cascade_transition_count in PR-3b (test pins this so PR-3c lands the threshold logic together) Tests: 16 passing on cargo test --lib --features metal,accelerate governor::local:: (79 total governor:: across PR-1/PR-2/PR-3a/PR-3b) - new() serves initial policy immediately - candidate_count reflects set_candidates - on_hardware_detected publishes matching policy - try_hardware_detected returns NoMatchingPolicy err - on_hardware_detected no-match KEEPS previous policy - on_hardware_detected empty candidates no-op - Successive hardware_detected publishes multiple times - on_pressure_signal records signal - recent_signals ring capped at RECENT_SIGNALS_CAPACITY=32 (FIFO eviction) - snapshot includes policy + signals - cascade_transition_count increments per publish - cascade_transition_count UNCHANGED on no-match - on_pressure_signal does NOT transition in PR-3b (PR-3c adds it) - many_concurrent_reads_dont_block (Arc<Self> + 16 threads × 1000 reads each) - concurrent_read_during_write_sees_consistent_snapshot (writer mutates + reader observes Arc snapshots that are always one of {1, 2, 8} — no torn read) - current_policy returns same Arc when no writes (Arc::ptr_eq) Added deps: arc-swap = '1.7' (tiny crate, no transitive deps). Coordination: ceded my own PR-3a (#1351 closed) in favor of codex's #1352 which has stricter AmbiguousPolicy refusal + hardware_fingerprint diagnostic surface. This PR-3b rebased onto codex's policy_selector API (arg order: select_policy(policies, hw), not (hw, policies)) + imports updated. Stack: - #1335 hw_probe (MERGED) - #1345 PR-1 governor-types (MERGED) - #1350 PR-2 TOML loader (MERGED) - #1352 PR-3a policy_selector (codex's, MERGED) - This PR (PR-3b): LocalSubstrateGovernor + arc_swap publish - Future PR-3c: cascade state machine + hysteresis (5 steps; restore- speculation-one-step-later anti-oscillation rule per spec) - Future PR-3d: file watcher (notify crate) - Future PR-4: PressureBroker → governor wiring VDD evidence N/A — pure-state impl. Evidence with PR-3c when the cascade is wired + with PR-4 when actual pressure signals flow. Co-authored-by: Test <test@test.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lane H PR-3a per GENOME-FOUNDRY-SENTINEL #1327 Part 11. Stacks on #1345 (PR-1 types, MERGED) + #1350 (PR-2 TOML loader, MERGED).
PR-2 shipped file →
PolicyFile. This PR-3a ships the SELECTION layer: given aHardwareClass+ a list ofPolicyFiles, pick the right one.Splitting PR-3 into atomic sub-slices: this PR-3a (selection), future PR-3b (
LocalSubstrateGovernorreference impl witharc_swap), PR-3c (cascade state machine + hysteresis), PR-3d (file watcher).Match algorithm
Comma-separated constraints in
applies_to:apple-m/nvidia/amd/vulkan/nonethinandlight/workstation/server/mobileuma(redundant withapple-m, for reader clarity)vram_mb=lo..hi,ram_mb=lo..hi(inclusive both ends)ALL constraints must hold. Multiple matches → longest
applies_towins (most specific). Zero matches → typedNoMatchingPolicyerror withHardwareClass+candidate_count— never silent default to a wrong-hardware policy.Failure-mode discipline
NoMatchingPolicyon zero matches (named candidate count + named hardware)MalformedConstraintfor range syntax errors (field + reason named)UnknownConstraintTagfor unrecognized tags — no silent wildcard interpretation(hardware_class, candidates)always returns same result. No I/O, no globals.Test plan
23 passing on
cargo test --lib --features metal,accelerate governor::policy_selection::applies_towins (tiebreaker)NoMatchingPolicy { candidate_count=0 }..→MalformedConstraintMalformedConstrainthi < lo→MalformedConstraint(nonsense rejected)cpu_ghz) →MalformedConstraintapplies_toapplies_toacts as wildcard (documented)PolicySelectionError:Display + ErrortraitStack
LocalSubstrateGovernorreference impl witharc_swapnotifycrate)PressureBroker→ governor wiringVDD evidence
N/A — pure function. Evidence with PR-3b when LocalSubstrateGovernor publishes via arc_swap in production.