Add Rust model resolver with hardware capability tiers (Lane C)#1066
Conversation
PR-D from docs/architecture/ALPHA-GAP-RUST-PERSONA-RUNTIME.md: capability-shaped model resolution with no-fallback contract. Builds on the typed model_registry SSOT (models.toml + providers.toml + Arch/Capability vocab) and the TargetSilicon 2-axis from #1062 (and dropped_no_budget loud-fail from #1063). cognition/model_resolver.rs (pure module — no IPC, no ORM, no inference): - ModelRequirement: required_capabilities, arch_preference, context_window_min, memory_budget_mb, provider_policy, host - ResolvedModel: model_id, provider_id, expected_memory_mb, target_silicon, hw_capability_tier, reason - HwCapabilityTier: finer-grained than TargetSilicon (M1Uma8Gb..M3UmaProMax, Sm70..Sm120, VulkanAmd, Cloud) - LocalOrCloudPolicy: LocalOnly | CloudOnly | PreferLocal | PreferCloud | Any - HostCapability: per-machine snapshot (tier + memory + primary silicon) - ResolutionError: NoModelMatchesRequirement{registry_count, candidates_after_filter, unmet_filters} — typed, no fallback - resolve_model(): pure function over IntoIterator<&Model> target_silicon derivation: local providers (llamacpp-local, docker-model-runner) inherit host.primary_target_silicon; cloud providers always TargetSilicon::Cloud. Hardcoded local-provider list for v1; follow-up moves it to a kind: local|cloud field on Provider in providers.toml. expected_memory_mb stays None until Model schema gains an estimated_memory_mb field — separate followup. Today's resolver still rejects cloud models from LocalOnly queries, which prevents the worst class of mis-routing. model_registry/types.rs: Arch gains #[derive(TS)] + ts(export) parallel to the existing Capability derivation. Backwards-compatible additive change; required because ModelRequirement.arch_preference: Vec<Arch> crosses the TS boundary. 11 logic tests + 6 ts-rs export-binding tests = 16/16 green: - local_chat_resolves_to_qwen35_on_m1 - vision_request_resolves_to_qwen2_vl - cloud_only_skips_local_models - missing_capability_errors_no_fallback (NO FALLBACK assertion) - vision_with_local_only_on_cpu_host_still_finds_local_vision_model - context_window_min_filters_small_models - arch_preference_filters_to_qwen35_only - prefer_local_ranks_local_first - prefer_cloud_ranks_cloud_first - five_persona_resolution_smoke (Lane C contract test) Validation: - cargo test --features metal,accelerate cognition::model_resolver: 16/16 - npx tsx scripts/build-with-loud-failure.ts: TypeScript compilation succeeded Two SSOTs noted (TOML registry vs shared/models.json) — out of Lane C scope, filed for separate consolidation followup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orced Address sibling Mac review on PR #1066 — non-blocking doc-clarity flags: (α) HwCapabilityTier doc: spell out the closed-enum design choice. New hardware classes require enum-edit + ts-rs regen + an explicit alias decision. No Other(String) / wildcard fallback variant by design — silent routing to a default tier hides exactly the capacity-mismatch bugs the resolver exists to catch. Per Joel's no-fallback rule. (β) ModelRequirement.memory_budget_mb doc: explicitly state OBSERVED but NOT ENFORCED until Model schema gains estimated_memory_mb. Without this note, callers may pass it expecting filtering and silently get over- budget models. Loud-fail on memory pressure is a downstream Lane B (FootprintRegistry / PressureBroker) concern, not a resolver filter. ts-rs regenerated HwCapabilityTier.ts + ModelRequirement.ts with new docstrings. cargo test --features metal,accelerate cognition::model_resolver: 16/16 still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review notes before this leaves draft:
I like the pure-module shape and no-fallback error contract. I would keep it draft until #1 is resolved, and ideally take #2 in the same PR because it is small and removes an avoidable hardcode at the boundary. |
|
Fixup pushed: Checklist items resolved:
Validation:
Native-arch Docker helper still hit generated-file dirtiness and continued; CI arch verification remains the merge guard for image availability. |
Updated
|
…F PR-2) Per-pattern ratchet on src/system/user/server/, mirroring PR CambrianTech#1091's LOC ratchet shape. Tracks three anti-patterns under the persona surface: - fallback_mention (case-insensitive, baseline 83): Joel 2026-04-22 — "fallbacks have ruined this project ... they are ILLEGAL." The WORD count proxies conceptual presence; comments saying "no fallback here" count too. - direct_adapter_instantiation (baseline 12): matches `new <Name>Adapter(`. TS surface should request providers via the ModelRequirement → ResolvedModel resolver shipped in CambrianTech#1066/CambrianTech#1074, not instantiate adapters directly. - direct_api_key_env_read (baseline 0): matches `process.env.*API_KEY`. Cloud key lookup belongs in the Rust provider registry per Codex's CambrianTech#1077 boundary. Locks 0 in. Per-pattern monotonic-decrease (any pattern growing fails CI; shrinkage allowed and surfaces a hint to --update-baseline post-merge). Same 3-mode shape as PR CambrianTech#1091: default check / --update-baseline / --verbose. Validated locally: clean tree passes (3 patterns hold), intentional +2 fallback growth fails with named pattern + delta + actionable Rust target paths. Lane F (PR CambrianTech#1084 alpha workstreams). Companion to CambrianTech#1091 — extends docs/architecture/TS-PERSONA-COGNITION-RATCHET.md with the new gate. Independent CI workflow (~5s, shell + python only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lane C — model resolver + hardware capability tier
PR-D from
docs/architecture/ALPHA-GAP-RUST-PERSONA-RUNTIME.md. Pure module providing capability-shaped model resolution with a no-fallback contract.Builds on:
model_registry/{loader,types,artifacts,singleton}— typedCapability/ArchSSOTScope
cognition/model_resolver.rs(new pure module — no IPC, no ORM, no inference):Types (all
ts-rsexported toshared/generated/cognition/):ModelRequirement{required_capabilities, arch_preference, context_window_min, memory_budget_mb, provider_policy, host}ResolvedModel{model_id, provider_id, expected_memory_mb, target_silicon, hw_capability_tier, reason}HwCapabilityTier{CpuOnly, M1Uma8Gb, M1Uma16Gb, M2UmaProMax, M3UmaProMax, Sm70..Sm120, VulkanAmd, Cloud}— finer thanTargetSilicon(selects which model VARIANT a host can run)LocalOrCloudPolicy{LocalOnly, CloudOnly, PreferLocal, PreferCloud, Any}HostCapability{hw_capability_tier, available_memory_mb, primary_target_silicon}ResolutionError::NoModelMatchesRequirement{registry_count, candidates_after_filter, unmet_filters}— typed, no fallback per Joel's "fallbacks are illegal" ruletarget_siliconderivation: local providers (llamacpp-local,docker-model-runner) inherithost.primary_target_silicon; cloud providers alwaysTargetSilicon::Cloud. Hardcoded local-provider list for v1; follow-up moves it to akind: local|cloudfield onProvider.expected_memory_mbstaysNoneuntil Model schema gains anestimated_memory_mbfield (separate followup). Today's resolver still rejects cloud models fromLocalOnlyqueries, which prevents the worst class of mis-routing.model_registry/types.rs:Archgains#[derive(TS)]+ts(export)parallel to the existingCapabilityderivation. Backwards-compatible additive change; required becauseModelRequirement.arch_preference: Vec<Arch>crosses the TS boundary.Validation
cargo test --features metal,accelerate cognition::model_resolver: 16/16 pass (10 logic + 6 ts-rs export-binding)npx tsx scripts/build-with-loud-failure.ts: TypeScript compilation succeeded11 logic tests cover:
missing_capability_errors_no_fallback— explicit no-fallback assertion (Joel's rule)five_persona_resolution_smoke— Lane C contract test (5 personas with different needs all resolve correctly + missing-model error path)Out of scope (followups)
models.toml(typed Rust registry — Capability/Arch vocab) +shared/models.json(ci(carl-smoke): advisory-pass AI-reply on llvmpipe-only ICD #1042 install/seed manifest). Resolver builds on TOML; consolidating the two needs a separate PR.expected_memory_mb: requires Model schema to gainestimated_memory_mb: Option<u32>.kind: local|cloudfield onProviderinproviders.toml(replace hardcodedLOCAL_PROVIDER_IDSconstant).HostCapabilityis caller-supplied. Boot-time hardware-detection probe lives in a separate module.resolve_model()calls.🤖 Generated with Claude Code