Add Rust model resolver with hardware capability tiers (Lane C) by joelteply · Pull Request #1066 · CambrianTech/continuum

joelteply · 2026-05-08T03:06:08Z

Lane C — model resolver + hardware capability tier

PR-D from docs/architecture/ALPHA-GAP-RUST-PERSONA-RUNTIME.md. Pure module providing capability-shaped model resolution with a no-fallback contract.

Builds on:

Define Rust persona runtime alpha contract #1062 — TargetSilicon physical-budget axis
Fail loudly on missing throughput budgets #1063 — dropped_no_budget loud-fail bucket
Add throughput lease registry #1064/Mirror throughput leases into footprint registry #1065 — typed lease registry + footprint accounting (separate Lane B work, not consumed by this PR but architecturally aligned)
existing model_registry/{loader,types,artifacts,singleton} — typed Capability/Arch SSOT

Scope

cognition/model_resolver.rs (new pure module — no IPC, no ORM, no inference):

pub fn resolve_model<'a, I: IntoIterator<Item = &'a Model>>(
    requirement: &ModelRequirement,
    models: I,
) -> Result<ResolvedModel, ResolutionError>

Types (all ts-rs exported to shared/generated/cognition/):

ModelRequirement{required_capabilities, arch_preference, context_window_min, memory_budget_mb, provider_policy, host}
ResolvedModel{model_id, provider_id, expected_memory_mb, target_silicon, hw_capability_tier, reason}
HwCapabilityTier{CpuOnly, M1Uma8Gb, M1Uma16Gb, M2UmaProMax, M3UmaProMax, Sm70..Sm120, VulkanAmd, Cloud} — finer than TargetSilicon (selects which model VARIANT a host can run)
LocalOrCloudPolicy{LocalOnly, CloudOnly, PreferLocal, PreferCloud, Any}
HostCapability{hw_capability_tier, available_memory_mb, primary_target_silicon}
ResolutionError::NoModelMatchesRequirement{registry_count, candidates_after_filter, unmet_filters} — typed, no fallback per Joel's "fallbacks are illegal" rule

target_silicon derivation: local providers (llamacpp-local, docker-model-runner) inherit host.primary_target_silicon; cloud providers always TargetSilicon::Cloud. Hardcoded local-provider list for v1; follow-up moves it to a kind: local|cloud field on Provider.

expected_memory_mb stays None until Model schema gains an estimated_memory_mb field (separate followup). Today's resolver still rejects cloud models from LocalOnly queries, which prevents the worst class of mis-routing.

model_registry/types.rs: Arch gains #[derive(TS)] + ts(export) parallel to the existing Capability derivation. Backwards-compatible additive change; required because ModelRequirement.arch_preference: Vec<Arch> crosses the TS boundary.

Validation

cargo test --features metal,accelerate cognition::model_resolver: 16/16 pass (10 logic + 6 ts-rs export-binding)
npx tsx scripts/build-with-loud-failure.ts: TypeScript compilation succeeded

11 logic tests cover:

Local chat resolves to qwen3.5 on M1
Vision request resolves to qwen2-vl
CloudOnly skips local models
missing_capability_errors_no_fallback — explicit no-fallback assertion (Joel's rule)
Vision-with-LocalOnly on CPU host still finds the local vision model (admission is the substrate's job, not the resolver's)
Context window minimum filters small models
Arch preference filters
PreferLocal / PreferCloud ranking
five_persona_resolution_smoke — Lane C contract test (5 personas with different needs all resolve correctly + missing-model error path)

Out of scope (followups)

Two SSOTs noted: models.toml (typed Rust registry — Capability/Arch vocab) + shared/models.json (ci(carl-smoke): advisory-pass AI-reply on llvmpipe-only ICD #1042 install/seed manifest). Resolver builds on TOML; consolidating the two needs a separate PR.
expected_memory_mb: requires Model schema to gain estimated_memory_mb: Option<u32>.
Provider kind: local|cloud field on Provider in providers.toml (replace hardcoded LOCAL_PROVIDER_IDS constant).
Hardware probe: today's HostCapability is caller-supplied. Boot-time hardware-detection probe lives in a separate module.
Wire into chat path: PR-A/B/C/D-E sequence — this PR ships the resolver primitive; subsequent PRs replace TS-side hardcoded persona model strings with resolve_model() calls.

🤖 Generated with Claude Code

PR-D from docs/architecture/ALPHA-GAP-RUST-PERSONA-RUNTIME.md: capability-shaped model resolution with no-fallback contract. Builds on the typed model_registry SSOT (models.toml + providers.toml + Arch/Capability vocab) and the TargetSilicon 2-axis from #1062 (and dropped_no_budget loud-fail from #1063). cognition/model_resolver.rs (pure module — no IPC, no ORM, no inference): - ModelRequirement: required_capabilities, arch_preference, context_window_min, memory_budget_mb, provider_policy, host - ResolvedModel: model_id, provider_id, expected_memory_mb, target_silicon, hw_capability_tier, reason - HwCapabilityTier: finer-grained than TargetSilicon (M1Uma8Gb..M3UmaProMax, Sm70..Sm120, VulkanAmd, Cloud) - LocalOrCloudPolicy: LocalOnly | CloudOnly | PreferLocal | PreferCloud | Any - HostCapability: per-machine snapshot (tier + memory + primary silicon) - ResolutionError: NoModelMatchesRequirement{registry_count, candidates_after_filter, unmet_filters} — typed, no fallback - resolve_model(): pure function over IntoIterator<&Model> target_silicon derivation: local providers (llamacpp-local, docker-model-runner) inherit host.primary_target_silicon; cloud providers always TargetSilicon::Cloud. Hardcoded local-provider list for v1; follow-up moves it to a kind: local|cloud field on Provider in providers.toml. expected_memory_mb stays None until Model schema gains an estimated_memory_mb field — separate followup. Today's resolver still rejects cloud models from LocalOnly queries, which prevents the worst class of mis-routing. model_registry/types.rs: Arch gains #[derive(TS)] + ts(export) parallel to the existing Capability derivation. Backwards-compatible additive change; required because ModelRequirement.arch_preference: Vec<Arch> crosses the TS boundary. 11 logic tests + 6 ts-rs export-binding tests = 16/16 green: - local_chat_resolves_to_qwen35_on_m1 - vision_request_resolves_to_qwen2_vl - cloud_only_skips_local_models - missing_capability_errors_no_fallback (NO FALLBACK assertion) - vision_with_local_only_on_cpu_host_still_finds_local_vision_model - context_window_min_filters_small_models - arch_preference_filters_to_qwen35_only - prefer_local_ranks_local_first - prefer_cloud_ranks_cloud_first - five_persona_resolution_smoke (Lane C contract test) Validation: - cargo test --features metal,accelerate cognition::model_resolver: 16/16 - npx tsx scripts/build-with-loud-failure.ts: TypeScript compilation succeeded Two SSOTs noted (TOML registry vs shared/models.json) — out of Lane C scope, filed for separate consolidation followup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…orced Address sibling Mac review on PR #1066 — non-blocking doc-clarity flags: (α) HwCapabilityTier doc: spell out the closed-enum design choice. New hardware classes require enum-edit + ts-rs regen + an explicit alias decision. No Other(String) / wildcard fallback variant by design — silent routing to a default tier hides exactly the capacity-mismatch bugs the resolver exists to catch. Per Joel's no-fallback rule. (β) ModelRequirement.memory_budget_mb doc: explicitly state OBSERVED but NOT ENFORCED until Model schema gains estimated_memory_mb. Without this note, callers may pass it expecting filtering and silently get over- budget models. Loud-fail on memory pressure is a downstream Lane B (FootprintRegistry / PressureBroker) concern, not a resolver filter. ts-rs regenerated HwCapabilityTier.ts + ModelRequirement.ts with new docstrings. cargo test --features metal,accelerate cognition::model_resolver: 16/16 still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joelteply · 2026-05-08T03:09:13Z

Review notes before this leaves draft:

memory_budget_mb is exposed on ModelRequirement and the resolver docs say memory budget is filter Build(deps): Bump actions/labeler from 4 to 5 #5, but resolve_model() never filters on memory because expected_memory_mb is always None. That makes the API look safer than it is. Either remove/defer the field from v1, or add estimated_memory_mb to Model/TOML and a failing test proving a too-small local budget errors with NoModelMatchesRequirement instead of silently returning qwen2-vl/qwen3.5.
LOCAL_PROVIDER_IDS hardcodes llamacpp-local and docker-model-runner. For this repo direction, provider locality should be data. Please add a typed ProviderKind / ProviderResidency field in Provider + providers.toml (local vs cloud) and have the resolver take provider metadata or a provider-kind lookup, instead of baking provider IDs into cognition code.
The local chat test is loose: it accepts qwen3.5 or qwen2. For the resolver contract, prefer an explicit requirement/ranking expectation so a registry-order change cannot hide a worse local chat pick.

I like the pure-module shape and no-fallback error contract. I would keep it draft until #1 is resolved, and ideally take #2 in the same PR because it is small and removes an avoidable hardcode at the boundary.

joelteply · 2026-05-08T03:18:37Z

Fixup pushed: fbdc357f8

Checklist items resolved:

Removed ModelRequirement.memory_budget_mb from the public resolver contract instead of exposing a non-enforced field. Memory admission stays with Lane B / PressureBroker until Model.estimated_memory_mb exists.
Added typed ProviderKind { local, cloud } to Provider + providers.toml; resolver now takes provider metadata and no longer hardcodes LOCAL_PROVIDER_IDS.
Tightened local chat resolver test to assert the intended qwen3.5 local pick, not "qwen3.5 or qwen2".
Added generated ProviderKind.ts and barrel export.

Validation:

cargo test --features metal,accelerate cognition::model_resolver --lib: 16/16
cargo test --features metal,accelerate model_registry --lib: 13/13
npx tsx scripts/build-with-loud-failure.ts: pass
pre-push: TypeScript clean, Rust clean with metal,accelerate, Rust tests passed

Native-arch Docker helper still hit generated-file dirtiness and continued; CI arch verification remains the merge guard for image availability.

joelteply · 2026-05-08T03:19:03Z

Updated `fbdc357f8` — addresses Codex (3) + continuum-8e97 (1) review flags

The commit subject came through but the pre-push hook stripped the body, so writing it here for review visibility:

Codex flags (all addressed):

memory_budget_mb removed entirely — was exposed as a filter parameter but never enforced (since Model.estimated_memory_mb doesn't exist yet). Field-that-does-nothing is a lie. Will return WITH actual filtering when the Model schema gains estimated_memory_mb (separate followup).
LOCAL_PROVIDER_IDS hardcoded const replaced with ProviderKind::{Local, Cloud} field on Provider:
- model_registry/types.rs: new ProviderKind enum (ts-rs exported), added kind: ProviderKind field to Provider with #[serde(default)] → Cloud (backwards-compatible — existing TOML rows continue parsing)
- providers.toml: llamacpp-local + docker-model-runner gain explicit kind = "local"; cloud providers (anthropic/openai/etc) inherit the Cloud default
- cognition/model_resolver.rs: resolve_model() now takes IntoIterator<Item = &Provider> alongside models, builds HashMap<&str, ProviderKind>, replaces the is_local_provider() const helper
Tightened local_chat_resolves_to_qwen35_on_m1 — was loosely asserting starts_with("continuum-ai/qwen3.5") || starts_with("qwen2") which would ALSO accept qwen2-vl-7b or qwen2-0.5b-gating. Now asserts the specific qwen3.5-4b id so ranking-rule changes get caught.

continuum-8e97 (RTX/Windows) flag — addressed:

HwCapabilityTier gained Sm75 + Sm100:
- Sm75 (Turing — T4 datacenter, RTX 20xx, GTX 16xx) — common on cloud GPU inference instances
- Sm100 (Blackwell datacenter B100/B200 with HBM3e) — distinct from Sm120 (Blackwell-consumer RTX 50xx) because the driver paths differ

Two new tests covering the behavior change:

provider_kind_drives_local_classification_not_id — confirms a custom provider id with kind=Local works (no hardcoded set)
unknown_provider_defaults_to_cloud_for_safety — confirms a model referencing an unknown provider id is treated as Cloud (not silently routed to local hardware)

Validation:

cargo test --features metal,accelerate cognition::model_resolver: 18/18 (was 16/16; net +2 tests, all green)
cargo test --features metal,accelerate model_registry: 13/13 (real providers.toml still parses including new kind = "local" rows)
npx tsx scripts/build-with-loud-failure.ts: TypeScript clean

Out-of-scope flag from continuum-8e97: multi-silicon hosts (Nvidia+iGPU or workstation+enclosure). primary_target_silicon is single-value today; multi-silicon is a v2 concern (would touch HostCapability shape + adaptive_throughput planner).

…F PR-2) Per-pattern ratchet on src/system/user/server/, mirroring PR CambrianTech#1091's LOC ratchet shape. Tracks three anti-patterns under the persona surface: - fallback_mention (case-insensitive, baseline 83): Joel 2026-04-22 — "fallbacks have ruined this project ... they are ILLEGAL." The WORD count proxies conceptual presence; comments saying "no fallback here" count too. - direct_adapter_instantiation (baseline 12): matches `new <Name>Adapter(`. TS surface should request providers via the ModelRequirement → ResolvedModel resolver shipped in CambrianTech#1066/CambrianTech#1074, not instantiate adapters directly. - direct_api_key_env_read (baseline 0): matches `process.env.*API_KEY`. Cloud key lookup belongs in the Rust provider registry per Codex's CambrianTech#1077 boundary. Locks 0 in. Per-pattern monotonic-decrease (any pattern growing fails CI; shrinkage allowed and surfaces a hint to --update-baseline post-merge). Same 3-mode shape as PR CambrianTech#1091: default check / --update-baseline / --verbose. Validated locally: clean tree passes (3 patterns hold), intentional +2 fallback growth fails with named pattern + delta + actionable Rust target paths. Lane F (PR CambrianTech#1084 alpha workstreams). Companion to CambrianTech#1091 — extends docs/architecture/TS-PERSONA-COGNITION-RATCHET.md with the new gate. Independent CI workflow (~5s, shell + python only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the size: XL label May 8, 2026

Make model resolver provider residency data-driven

fbdc357

joelteply marked this pull request as ready for review May 8, 2026 03:18

joelteply merged commit 57a487e into canary May 8, 2026
3 checks passed

joelteply deleted the feat/model-resolver-hw-capability-tier branch May 8, 2026 03:20

This was referenced May 11, 2026

Add host capability probe so resolver actually runs in production #1075

Merged

ratchet(ts-persona): forbidden-strings monotonic-decrease gate (Lane F PR-2) #1094

Merged

feat(ai-key): add redacted status command #1104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Rust model resolver with hardware capability tiers (Lane C)#1066

Add Rust model resolver with hardware capability tiers (Lane C)#1066
joelteply merged 3 commits into
canaryfrom
feat/model-resolver-hw-capability-tier

joelteply commented May 8, 2026

Uh oh!

joelteply commented May 8, 2026

Uh oh!

joelteply commented May 8, 2026

Uh oh!

joelteply commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 8, 2026

Lane C — model resolver + hardware capability tier

Scope

Validation

Out of scope (followups)

Uh oh!

joelteply commented May 8, 2026

Uh oh!

joelteply commented May 8, 2026

Uh oh!

joelteply commented May 8, 2026

Updated fbdc357f8 — addresses Codex (3) + continuum-8e97 (1) review flags

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Updated `fbdc357f8` — addresses Codex (3) + continuum-8e97 (1) review flags