feat(inference): inference-llm PR-4 — adapter integration (translation + new constructors) by joelteply · Pull Request #1395 · CambrianTech/continuum

joelteply · 2026-05-18T17:32:10Z

Summary

Bridges the substrate's typed InferenceRequest/InferenceComplete surface to the existing AIProviderAdapter trait (LlamaCppAdapter for local llama.cpp). Pure Rust, zero TS, zero shim. PR-5 ships the LlamaCppAdapter Runtime wiring + end-to-end stub-adapter test; PR-4 ships the translation logic + new constructors so PR-5 is just plumbing.

What lands

InferenceRequest.prompt_text: Option<String> — wire addition for adapter-based engines that tokenize internally
InferenceComplete.completion_text: Option<String> — wire addition for adapter-based engines that return text not tokens
InferenceLlmModule.adapter: Option<Arc<dyn AIProviderAdapter>> + with_adapter / with_bus_and_adapter constructors
handle_request routing: adapter when wired + prompt_text present; loud refusal when adapter wired + no prompt_text (raw-token path not yet implemented — never silent fallback); stub when no adapter
run_adapter_inference — translates InferenceRequest → TextGenerationRequest, calls adapter, translates TextGenerationResponse → (InferenceComplete, FirstTokenEmitted)
translate_adapter_response + translate_adapter_finish_reason — pure-function translation
Cross-enum mapping: Stop→Stop, Length→MaxTokens, ToolUse→Error{reason} (loud refusal), Error→Error{reason}

Wire-shape decisions

max_tokens=0 in substrate's GenerationBudget → None on adapter (substrate convention: 0=unlimited; adapter convention: None=unlimited, 0=stop-immediately which no caller wants)
Empty stop_sequences → None on adapter
persona_id propagates as stringified UUID for per-persona resource attribution
purpose hardcoded "inference-llm" for adapter routing diagnostics

Sub-fix: missing TS bindings from PR-1

PR-1 (#1387) shipped the Rust types but the shared/generated/inference_llm/ directory of TS exports wasn't included in the commit (regen produced them locally; didn't stage). PR-4 ships all 10 TS files + the barrel index. Closes a wire-contract gap.

Test plan

cargo test --lib --features metal,accelerate inference::llm_module — 44/44 pass (12 PR-3b + new PR-4):
- translate_adapter_response_carries_text_and_usage — text + tokens_generated mapping
- translate_finish_reason_covers_all_adapter_variants — cross-enum mapping pin
- with_adapter_constructor_routes_via_adapter_path — constructors compile + no-adapter regression
No regressions across other 2928 lib tests
Pre-push gate clean

End-to-end "stub adapter via Arc<dyn AIProviderAdapter>" tests deferred to PR-5: the AIProviderAdapter trait has 8+ methods (provider_id / api_style / default_model / get_available_models / health_check / model_metadata / capabilities / etc) and implementing all on a test stub here would pull in ProviderHealth + AdapterCapabilities + ApiStyle + ModelInfo + their dependencies — bigger than atomic-slice. PR-5 will wire LlamaCppAdapter directly through Runtime registration.

Stack

feat(inference): inference-llm PR-1 — typed event surface (MODULE-CATALOG §II) #1387 — inference-llm PR-1: typed event surface
feat(inference): inference-llm PR-2 — InferenceLlmModule ServiceModule impl (stub-backed) #1391 — inference-llm PR-2: ServiceModule impl (stub-backed)
feat(inference): inference-llm PR-3a — canonical ArtifactKeys + publishing helpers #1392 — inference-llm PR-3a: bus keys + publishing helpers
feat(inference): inference-llm PR-3b — InferenceLlmModule auto-publishes via bus hook (pure Rust) #1393 — inference-llm PR-3b: auto-publish wiring
This PR — inference-llm PR-4: adapter integration (translation + constructors)
NEXT — PR-5: LlamaCppAdapter Runtime wiring + end-to-end test through real adapter

🤖 Generated with Claude Code

…n layer + new constructors) Bridges the substrate's typed InferenceRequest/InferenceComplete surface to the existing AIProviderAdapter trait (LlamaCppAdapter for local llama.cpp). PR-5 ships the LlamaCppAdapter Runtime wiring + the end-to-end stub-adapter test; PR-4 ships the translation logic + new constructors so PR-5 is just plumbing. What lands - InferenceRequest.prompt_text: Option<String> — PR-4 wire addition for adapter-based engines that tokenize internally. Backwards-compat (Option = optional on wire). - InferenceComplete.completion_text: Option<String> — wire addition for adapter-based engines that return text not tokens. - InferenceLlmModule.adapter: Option<Arc<dyn AIProviderAdapter>>. - with_adapter(adapter) constructor: real-inference + no bus. - with_bus_and_adapter(bus, registry, adapter) constructor: the full production wiring (adapter + bus publishing). - handle_request: routes via adapter when wired + prompt_text present; refuses loud when adapter wired + no prompt_text (raw- token path not yet implemented — never silent fallback); falls back to PR-2 stub when no adapter. - run_adapter_inference(adapter, request, prompt_text) — translates InferenceRequest → TextGenerationRequest, calls adapter, translates TextGenerationResponse → (InferenceComplete, FirstTokenEmitted). - translate_adapter_response(request, response) — pure-function body of the response-side translation. - translate_adapter_finish_reason(adapter_reason) — cross-enum mapping: Stop→Stop, Length→MaxTokens, ToolUse→Error{reason} (loud refusal — inference-llm doesn't model tool-use), Error→ Error{reason}. Wire-shape decisions - max_tokens=0 in substrate's GenerationBudget translates to None on adapter's wire. Substrate convention: 0=unlimited, caller takes duration responsibility. Adapter convention: None=unlimited, 0=stop immediately. The substrate's "stop immediately" doesn't have an encoding because no caller would ask for it. - stop_sequences: empty Vec on substrate translates to None on adapter (adapter convention: None = no caller stop sequences). - persona_id propagates to adapter as stringified UUID for per-persona resource attribution (matches existing adapter convention from PersonaResponseGenerator). - purpose hardcoded "inference-llm" for adapter routing diagnostics. Sub-fix: missing TS bindings from PR-1 PR-1 (#1387) shipped the Rust types but the shared/generated/inference_llm/ directory of TS exports wasn't included in the commit (regen produced them locally; they didn't get staged). PR-4 ships all 10 TS files + the barrel index. Closes a wire-contract gap. Tests 13 new behavioral tests (44 total in inference::llm_module + inference::llm_module_service + inference::llm_module_bus): - translate_adapter_response_carries_text_and_usage — completion_text + tokens_generated mapping - translate_finish_reason_covers_all_adapter_variants — cross-enum mapping pin - with_adapter_constructor_routes_via_adapter_path — constructors compile + no-adapter regression - 8 existing PR-2 + 4 existing PR-3b tests still pass (no regressions) End-to-end "stub adapter via Arc<dyn AIProviderAdapter>" tests deferred to PR-5: the AIProviderAdapter trait has 8+ methods (provider_id / api_style / default_model / get_available_models / health_check / model_metadata / capabilities / initialize / shutdown / generate_text / create_embedding) and implementing all of them on a test stub here would pull in ProviderHealth + AdapterCapabilities + ApiStyle + ModelInfo + their dependencies — bigger than atomic-slice. PR-5 will wire LlamaCppAdapter directly through Runtime registration. 44/44 inference::llm_module tests pass. No regressions across other 2928 lib tests. Stack - #1387 — inference-llm PR-1: typed event surface - #1391 — inference-llm PR-2: ServiceModule impl (stub-backed) - #1392 — inference-llm PR-3a: bus keys + publishing helpers - #1393 — inference-llm PR-3b: auto-publish wiring - THIS PR — inference-llm PR-4: adapter integration (translation + constructors) - NEXT — PR-5: LlamaCppAdapter Runtime wiring + end-to-end integration test through real (or test-mock) adapter Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires InferenceLlmModule into the Runtime so it's callable from the cognition path via inference/llm/request commands. What lands - Add "inference-llm" to EXPECTED_MODULES in runtime/runtime.rs - runtime.register(Arc::new(InferenceLlmModule::new())) in ipc/mod.rs alongside the existing InferenceModule registration Design choices - Constructed via the .new() (bus-less, stub-backed) constructor rather than .with_bus_and_adapter(). Reason: the with_bus_and_adapter constructor requires an AIProviderAdapter Arc, which would couple PR-5's runtime registration to a specific LlamaCppAdapter init lifecycle. The substrate's LlamaCppAdapter is owned by AIProviderModule's adapter registry with its own initialization phase; threading the adapter Arc here would either duplicate the registration or create an init-ordering dependency this slice shouldn't introduce. - The stub-backed registration is still useful: it exposes the inference/llm/request command surface to the cognition path so downstream PRs (turn-execute that chains drain-turn-frame → response_prompt → inference/llm/request) can wire against the real command name. Bus + adapter integration is a follow-up PR that updates the construction call here. What is NOT changed - AIProviderModule + LlamaCppAdapter unchanged - All InferenceLlmModule trait impl logic unchanged (PR-2/3/4 work intact) - The stub vs real-adapter swap point stays exactly where PR-4 put it: with_bus_and_adapter constructor + run_adapter_inference function Tests - cargo build --features metal,accelerate --lib clean (no new test fixtures needed — the module's existing 44/44 tests cover the trait-impl correctness; this PR just plumbs construction into runtime startup) - EXPECTED_MODULES enforcement validates at boot: if the registration is missing the runtime fails with "missing inference-llm" error - Pre-push gate clean Stack - #1387 PR-1: typed event surface - #1391 PR-2: ServiceModule impl (stub-backed) - #1392 PR-3a: bus keys + publishing helpers - #1393 PR-3b: auto-publish wiring - #1395 PR-4: adapter integration (translation + new constructors) - THIS PR — PR-5: Runtime registration - FOLLOW-UP — adapter Arc wiring when LlamaCppAdapter init phase is integrated with Runtime startup Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joelteply merged commit ed287ba into canary May 18, 2026
3 checks passed

joelteply deleted the feat/inference-llm-adapter-pr4 branch May 18, 2026 17:32

github-actions Bot added the size: XL label May 18, 2026

This was referenced May 18, 2026

feat(persona,LaneD): response_prompt() lazy output on PersonaTurnFrame #1400

Merged

feat(inference): inference-llm PR-5 — Runtime registration #1404

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): inference-llm PR-4 — adapter integration (translation + new constructors)#1395

feat(inference): inference-llm PR-4 — adapter integration (translation + new constructors)#1395
joelteply merged 1 commit into
canaryfrom
feat/inference-llm-adapter-pr4

joelteply commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 18, 2026

Summary

What lands

Wire-shape decisions

Sub-fix: missing TS bindings from PR-1

Test plan

Stack

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant