Skip to content

Add Rust model resolver with hardware capability tiers (Lane C)#1066

Merged
joelteply merged 3 commits into
canaryfrom
feat/model-resolver-hw-capability-tier
May 8, 2026
Merged

Add Rust model resolver with hardware capability tiers (Lane C)#1066
joelteply merged 3 commits into
canaryfrom
feat/model-resolver-hw-capability-tier

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Lane C — model resolver + hardware capability tier

PR-D from docs/architecture/ALPHA-GAP-RUST-PERSONA-RUNTIME.md. Pure module providing capability-shaped model resolution with a no-fallback contract.

Builds on:

Scope

cognition/model_resolver.rs (new pure module — no IPC, no ORM, no inference):

pub fn resolve_model<'a, I: IntoIterator<Item = &'a Model>>(
    requirement: &ModelRequirement,
    models: I,
) -> Result<ResolvedModel, ResolutionError>

Types (all ts-rs exported to shared/generated/cognition/):

  • ModelRequirement{required_capabilities, arch_preference, context_window_min, memory_budget_mb, provider_policy, host}
  • ResolvedModel{model_id, provider_id, expected_memory_mb, target_silicon, hw_capability_tier, reason}
  • HwCapabilityTier{CpuOnly, M1Uma8Gb, M1Uma16Gb, M2UmaProMax, M3UmaProMax, Sm70..Sm120, VulkanAmd, Cloud} — finer than TargetSilicon (selects which model VARIANT a host can run)
  • LocalOrCloudPolicy{LocalOnly, CloudOnly, PreferLocal, PreferCloud, Any}
  • HostCapability{hw_capability_tier, available_memory_mb, primary_target_silicon}
  • ResolutionError::NoModelMatchesRequirement{registry_count, candidates_after_filter, unmet_filters} — typed, no fallback per Joel's "fallbacks are illegal" rule

target_silicon derivation: local providers (llamacpp-local, docker-model-runner) inherit host.primary_target_silicon; cloud providers always TargetSilicon::Cloud. Hardcoded local-provider list for v1; follow-up moves it to a kind: local|cloud field on Provider.

expected_memory_mb stays None until Model schema gains an estimated_memory_mb field (separate followup). Today's resolver still rejects cloud models from LocalOnly queries, which prevents the worst class of mis-routing.

model_registry/types.rs: Arch gains #[derive(TS)] + ts(export) parallel to the existing Capability derivation. Backwards-compatible additive change; required because ModelRequirement.arch_preference: Vec<Arch> crosses the TS boundary.

Validation

  • cargo test --features metal,accelerate cognition::model_resolver: 16/16 pass (10 logic + 6 ts-rs export-binding)
  • npx tsx scripts/build-with-loud-failure.ts: TypeScript compilation succeeded

11 logic tests cover:

  • Local chat resolves to qwen3.5 on M1
  • Vision request resolves to qwen2-vl
  • CloudOnly skips local models
  • missing_capability_errors_no_fallback — explicit no-fallback assertion (Joel's rule)
  • Vision-with-LocalOnly on CPU host still finds the local vision model (admission is the substrate's job, not the resolver's)
  • Context window minimum filters small models
  • Arch preference filters
  • PreferLocal / PreferCloud ranking
  • five_persona_resolution_smoke — Lane C contract test (5 personas with different needs all resolve correctly + missing-model error path)

Out of scope (followups)

  1. Two SSOTs noted: models.toml (typed Rust registry — Capability/Arch vocab) + shared/models.json (ci(carl-smoke): advisory-pass AI-reply on llvmpipe-only ICD #1042 install/seed manifest). Resolver builds on TOML; consolidating the two needs a separate PR.
  2. expected_memory_mb: requires Model schema to gain estimated_memory_mb: Option<u32>.
  3. Provider kind: local|cloud field on Provider in providers.toml (replace hardcoded LOCAL_PROVIDER_IDS constant).
  4. Hardware probe: today's HostCapability is caller-supplied. Boot-time hardware-detection probe lives in a separate module.
  5. Wire into chat path: PR-A/B/C/D-E sequence — this PR ships the resolver primitive; subsequent PRs replace TS-side hardcoded persona model strings with resolve_model() calls.

🤖 Generated with Claude Code

PR-D from docs/architecture/ALPHA-GAP-RUST-PERSONA-RUNTIME.md: capability-shaped
model resolution with no-fallback contract. Builds on the typed model_registry
SSOT (models.toml + providers.toml + Arch/Capability vocab) and the TargetSilicon
2-axis from #1062 (and dropped_no_budget loud-fail from #1063).

cognition/model_resolver.rs (pure module — no IPC, no ORM, no inference):
- ModelRequirement: required_capabilities, arch_preference, context_window_min,
  memory_budget_mb, provider_policy, host
- ResolvedModel: model_id, provider_id, expected_memory_mb, target_silicon,
  hw_capability_tier, reason
- HwCapabilityTier: finer-grained than TargetSilicon (M1Uma8Gb..M3UmaProMax,
  Sm70..Sm120, VulkanAmd, Cloud)
- LocalOrCloudPolicy: LocalOnly | CloudOnly | PreferLocal | PreferCloud | Any
- HostCapability: per-machine snapshot (tier + memory + primary silicon)
- ResolutionError: NoModelMatchesRequirement{registry_count,
  candidates_after_filter, unmet_filters} — typed, no fallback
- resolve_model(): pure function over IntoIterator<&Model>

target_silicon derivation: local providers (llamacpp-local, docker-model-runner)
inherit host.primary_target_silicon; cloud providers always TargetSilicon::Cloud.
Hardcoded local-provider list for v1; follow-up moves it to a kind:
local|cloud field on Provider in providers.toml.

expected_memory_mb stays None until Model schema gains an estimated_memory_mb
field — separate followup. Today's resolver still rejects cloud models from
LocalOnly queries, which prevents the worst class of mis-routing.

model_registry/types.rs: Arch gains #[derive(TS)] + ts(export) parallel to the
existing Capability derivation. Backwards-compatible additive change; required
because ModelRequirement.arch_preference: Vec<Arch> crosses the TS boundary.

11 logic tests + 6 ts-rs export-binding tests = 16/16 green:
- local_chat_resolves_to_qwen35_on_m1
- vision_request_resolves_to_qwen2_vl
- cloud_only_skips_local_models
- missing_capability_errors_no_fallback (NO FALLBACK assertion)
- vision_with_local_only_on_cpu_host_still_finds_local_vision_model
- context_window_min_filters_small_models
- arch_preference_filters_to_qwen35_only
- prefer_local_ranks_local_first
- prefer_cloud_ranks_cloud_first
- five_persona_resolution_smoke (Lane C contract test)

Validation:
- cargo test --features metal,accelerate cognition::model_resolver: 16/16
- npx tsx scripts/build-with-loud-failure.ts: TypeScript compilation succeeded

Two SSOTs noted (TOML registry vs shared/models.json) — out of Lane C scope,
filed for separate consolidation followup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orced

Address sibling Mac review on PR #1066 — non-blocking doc-clarity flags:

(α) HwCapabilityTier doc: spell out the closed-enum design choice. New
hardware classes require enum-edit + ts-rs regen + an explicit alias
decision. No Other(String) / wildcard fallback variant by design — silent
routing to a default tier hides exactly the capacity-mismatch bugs the
resolver exists to catch. Per Joel's no-fallback rule.

(β) ModelRequirement.memory_budget_mb doc: explicitly state OBSERVED but
NOT ENFORCED until Model schema gains estimated_memory_mb. Without this
note, callers may pass it expecting filtering and silently get over-
budget models. Loud-fail on memory pressure is a downstream Lane B
(FootprintRegistry / PressureBroker) concern, not a resolver filter.

ts-rs regenerated HwCapabilityTier.ts + ModelRequirement.ts with new
docstrings. cargo test --features metal,accelerate cognition::model_resolver:
16/16 still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply
Copy link
Copy Markdown
Contributor Author

Review notes before this leaves draft:

  1. memory_budget_mb is exposed on ModelRequirement and the resolver docs say memory budget is filter Build(deps): Bump actions/labeler from 4 to 5 #5, but resolve_model() never filters on memory because expected_memory_mb is always None. That makes the API look safer than it is. Either remove/defer the field from v1, or add estimated_memory_mb to Model/TOML and a failing test proving a too-small local budget errors with NoModelMatchesRequirement instead of silently returning qwen2-vl/qwen3.5.

  2. LOCAL_PROVIDER_IDS hardcodes llamacpp-local and docker-model-runner. For this repo direction, provider locality should be data. Please add a typed ProviderKind / ProviderResidency field in Provider + providers.toml (local vs cloud) and have the resolver take provider metadata or a provider-kind lookup, instead of baking provider IDs into cognition code.

  3. The local chat test is loose: it accepts qwen3.5 or qwen2. For the resolver contract, prefer an explicit requirement/ranking expectation so a registry-order change cannot hide a worse local chat pick.

I like the pure-module shape and no-fallback error contract. I would keep it draft until #1 is resolved, and ideally take #2 in the same PR because it is small and removes an avoidable hardcode at the boundary.

@joelteply joelteply marked this pull request as ready for review May 8, 2026 03:18
@joelteply
Copy link
Copy Markdown
Contributor Author

Fixup pushed: fbdc357f8

Checklist items resolved:

  • Removed ModelRequirement.memory_budget_mb from the public resolver contract instead of exposing a non-enforced field. Memory admission stays with Lane B / PressureBroker until Model.estimated_memory_mb exists.
  • Added typed ProviderKind { local, cloud } to Provider + providers.toml; resolver now takes provider metadata and no longer hardcodes LOCAL_PROVIDER_IDS.
  • Tightened local chat resolver test to assert the intended qwen3.5 local pick, not "qwen3.5 or qwen2".
  • Added generated ProviderKind.ts and barrel export.

Validation:

  • cargo test --features metal,accelerate cognition::model_resolver --lib: 16/16
  • cargo test --features metal,accelerate model_registry --lib: 13/13
  • npx tsx scripts/build-with-loud-failure.ts: pass
  • pre-push: TypeScript clean, Rust clean with metal,accelerate, Rust tests passed

Native-arch Docker helper still hit generated-file dirtiness and continued; CI arch verification remains the merge guard for image availability.

@joelteply
Copy link
Copy Markdown
Contributor Author

Updated fbdc357f8 — addresses Codex (3) + continuum-8e97 (1) review flags

The commit subject came through but the pre-push hook stripped the body, so writing it here for review visibility:

Codex flags (all addressed):

  1. memory_budget_mb removed entirely — was exposed as a filter parameter but never enforced (since Model.estimated_memory_mb doesn't exist yet). Field-that-does-nothing is a lie. Will return WITH actual filtering when the Model schema gains estimated_memory_mb (separate followup).

  2. LOCAL_PROVIDER_IDS hardcoded const replaced with ProviderKind::{Local, Cloud} field on Provider:

    • model_registry/types.rs: new ProviderKind enum (ts-rs exported), added kind: ProviderKind field to Provider with #[serde(default)]Cloud (backwards-compatible — existing TOML rows continue parsing)
    • providers.toml: llamacpp-local + docker-model-runner gain explicit kind = "local"; cloud providers (anthropic/openai/etc) inherit the Cloud default
    • cognition/model_resolver.rs: resolve_model() now takes IntoIterator<Item = &Provider> alongside models, builds HashMap<&str, ProviderKind>, replaces the is_local_provider() const helper
  3. Tightened local_chat_resolves_to_qwen35_on_m1 — was loosely asserting starts_with("continuum-ai/qwen3.5") || starts_with("qwen2") which would ALSO accept qwen2-vl-7b or qwen2-0.5b-gating. Now asserts the specific qwen3.5-4b id so ranking-rule changes get caught.

continuum-8e97 (RTX/Windows) flag — addressed:

  1. HwCapabilityTier gained Sm75 + Sm100:
    • Sm75 (Turing — T4 datacenter, RTX 20xx, GTX 16xx) — common on cloud GPU inference instances
    • Sm100 (Blackwell datacenter B100/B200 with HBM3e) — distinct from Sm120 (Blackwell-consumer RTX 50xx) because the driver paths differ

Two new tests covering the behavior change:

  • provider_kind_drives_local_classification_not_id — confirms a custom provider id with kind=Local works (no hardcoded set)
  • unknown_provider_defaults_to_cloud_for_safety — confirms a model referencing an unknown provider id is treated as Cloud (not silently routed to local hardware)

Validation:

  • cargo test --features metal,accelerate cognition::model_resolver: 18/18 (was 16/16; net +2 tests, all green)
  • cargo test --features metal,accelerate model_registry: 13/13 (real providers.toml still parses including new kind = "local" rows)
  • npx tsx scripts/build-with-loud-failure.ts: TypeScript clean

Out-of-scope flag from continuum-8e97: multi-silicon hosts (Nvidia+iGPU or workstation+enclosure). primary_target_silicon is single-value today; multi-silicon is a v2 concern (would touch HostCapability shape + adaptive_throughput planner).

@joelteply joelteply merged commit 57a487e into canary May 8, 2026
3 checks passed
@joelteply joelteply deleted the feat/model-resolver-hw-capability-tier branch May 8, 2026 03:20
joelteply pushed a commit to RebelTechPro/continuum that referenced this pull request May 13, 2026
…F PR-2)

Per-pattern ratchet on src/system/user/server/, mirroring PR CambrianTech#1091's
LOC ratchet shape. Tracks three anti-patterns under the persona surface:

  - fallback_mention (case-insensitive, baseline 83): Joel 2026-04-22 —
    "fallbacks have ruined this project ... they are ILLEGAL." The WORD
    count proxies conceptual presence; comments saying "no fallback
    here" count too.
  - direct_adapter_instantiation (baseline 12): matches `new <Name>Adapter(`.
    TS surface should request providers via the ModelRequirement →
    ResolvedModel resolver shipped in CambrianTech#1066/CambrianTech#1074, not instantiate
    adapters directly.
  - direct_api_key_env_read (baseline 0): matches `process.env.*API_KEY`.
    Cloud key lookup belongs in the Rust provider registry per Codex's
    CambrianTech#1077 boundary. Locks 0 in.

Per-pattern monotonic-decrease (any pattern growing fails CI; shrinkage
allowed and surfaces a hint to --update-baseline post-merge). Same
3-mode shape as PR CambrianTech#1091: default check / --update-baseline / --verbose.

Validated locally: clean tree passes (3 patterns hold), intentional
+2 fallback growth fails with named pattern + delta + actionable Rust
target paths.

Lane F (PR CambrianTech#1084 alpha workstreams). Companion to CambrianTech#1091 — extends
docs/architecture/TS-PERSONA-COGNITION-RATCHET.md with the new gate.
Independent CI workflow (~5s, shell + python only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant