feat(inference): inference-llm PR-4 — adapter integration (translation + new constructors)#1395
Merged
Merged
Conversation
…n layer + new constructors)
Bridges the substrate's typed InferenceRequest/InferenceComplete surface
to the existing AIProviderAdapter trait (LlamaCppAdapter for local
llama.cpp). PR-5 ships the LlamaCppAdapter Runtime wiring + the
end-to-end stub-adapter test; PR-4 ships the translation logic +
new constructors so PR-5 is just plumbing.
What lands
- InferenceRequest.prompt_text: Option<String> — PR-4 wire
addition for adapter-based engines that tokenize internally.
Backwards-compat (Option = optional on wire).
- InferenceComplete.completion_text: Option<String> — wire
addition for adapter-based engines that return text not tokens.
- InferenceLlmModule.adapter: Option<Arc<dyn AIProviderAdapter>>.
- with_adapter(adapter) constructor: real-inference + no bus.
- with_bus_and_adapter(bus, registry, adapter) constructor: the
full production wiring (adapter + bus publishing).
- handle_request: routes via adapter when wired + prompt_text
present; refuses loud when adapter wired + no prompt_text (raw-
token path not yet implemented — never silent fallback); falls
back to PR-2 stub when no adapter.
- run_adapter_inference(adapter, request, prompt_text) — translates
InferenceRequest → TextGenerationRequest, calls adapter, translates
TextGenerationResponse → (InferenceComplete, FirstTokenEmitted).
- translate_adapter_response(request, response) — pure-function
body of the response-side translation.
- translate_adapter_finish_reason(adapter_reason) — cross-enum
mapping: Stop→Stop, Length→MaxTokens, ToolUse→Error{reason}
(loud refusal — inference-llm doesn't model tool-use), Error→
Error{reason}.
Wire-shape decisions
- max_tokens=0 in substrate's GenerationBudget translates to None
on adapter's wire. Substrate convention: 0=unlimited, caller takes
duration responsibility. Adapter convention: None=unlimited, 0=stop
immediately. The substrate's "stop immediately" doesn't have an
encoding because no caller would ask for it.
- stop_sequences: empty Vec on substrate translates to None on
adapter (adapter convention: None = no caller stop sequences).
- persona_id propagates to adapter as stringified UUID for
per-persona resource attribution (matches existing adapter
convention from PersonaResponseGenerator).
- purpose hardcoded "inference-llm" for adapter routing diagnostics.
Sub-fix: missing TS bindings from PR-1
PR-1 (#1387) shipped the Rust types but the
shared/generated/inference_llm/ directory of TS exports wasn't
included in the commit (regen produced them locally; they didn't
get staged). PR-4 ships all 10 TS files + the barrel index. Closes
a wire-contract gap.
Tests
13 new behavioral tests (44 total in inference::llm_module +
inference::llm_module_service + inference::llm_module_bus):
- translate_adapter_response_carries_text_and_usage — completion_text
+ tokens_generated mapping
- translate_finish_reason_covers_all_adapter_variants — cross-enum
mapping pin
- with_adapter_constructor_routes_via_adapter_path — constructors
compile + no-adapter regression
- 8 existing PR-2 + 4 existing PR-3b tests still pass (no
regressions)
End-to-end "stub adapter via Arc<dyn AIProviderAdapter>" tests
deferred to PR-5: the AIProviderAdapter trait has 8+ methods
(provider_id / api_style / default_model / get_available_models /
health_check / model_metadata / capabilities / initialize /
shutdown / generate_text / create_embedding) and implementing
all of them on a test stub here would pull in ProviderHealth +
AdapterCapabilities + ApiStyle + ModelInfo + their dependencies
— bigger than atomic-slice. PR-5 will wire LlamaCppAdapter
directly through Runtime registration.
44/44 inference::llm_module tests pass. No regressions across
other 2928 lib tests.
Stack
- #1387 — inference-llm PR-1: typed event surface
- #1391 — inference-llm PR-2: ServiceModule impl (stub-backed)
- #1392 — inference-llm PR-3a: bus keys + publishing helpers
- #1393 — inference-llm PR-3b: auto-publish wiring
- THIS PR — inference-llm PR-4: adapter integration (translation +
constructors)
- NEXT — PR-5: LlamaCppAdapter Runtime wiring + end-to-end
integration test through real (or test-mock) adapter
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 18, 2026
joelteply
added a commit
that referenced
this pull request
May 18, 2026
Wires InferenceLlmModule into the Runtime so it's callable from the cognition path via inference/llm/request commands. What lands - Add "inference-llm" to EXPECTED_MODULES in runtime/runtime.rs - runtime.register(Arc::new(InferenceLlmModule::new())) in ipc/mod.rs alongside the existing InferenceModule registration Design choices - Constructed via the .new() (bus-less, stub-backed) constructor rather than .with_bus_and_adapter(). Reason: the with_bus_and_adapter constructor requires an AIProviderAdapter Arc, which would couple PR-5's runtime registration to a specific LlamaCppAdapter init lifecycle. The substrate's LlamaCppAdapter is owned by AIProviderModule's adapter registry with its own initialization phase; threading the adapter Arc here would either duplicate the registration or create an init-ordering dependency this slice shouldn't introduce. - The stub-backed registration is still useful: it exposes the inference/llm/request command surface to the cognition path so downstream PRs (turn-execute that chains drain-turn-frame → response_prompt → inference/llm/request) can wire against the real command name. Bus + adapter integration is a follow-up PR that updates the construction call here. What is NOT changed - AIProviderModule + LlamaCppAdapter unchanged - All InferenceLlmModule trait impl logic unchanged (PR-2/3/4 work intact) - The stub vs real-adapter swap point stays exactly where PR-4 put it: with_bus_and_adapter constructor + run_adapter_inference function Tests - cargo build --features metal,accelerate --lib clean (no new test fixtures needed — the module's existing 44/44 tests cover the trait-impl correctness; this PR just plumbs construction into runtime startup) - EXPECTED_MODULES enforcement validates at boot: if the registration is missing the runtime fails with "missing inference-llm" error - Pre-push gate clean Stack - #1387 PR-1: typed event surface - #1391 PR-2: ServiceModule impl (stub-backed) - #1392 PR-3a: bus keys + publishing helpers - #1393 PR-3b: auto-publish wiring - #1395 PR-4: adapter integration (translation + new constructors) - THIS PR — PR-5: Runtime registration - FOLLOW-UP — adapter Arc wiring when LlamaCppAdapter init phase is integrated with Runtime startup Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bridges the substrate's typed InferenceRequest/InferenceComplete surface to the existing
AIProviderAdaptertrait (LlamaCppAdapter for local llama.cpp). Pure Rust, zero TS, zero shim. PR-5 ships the LlamaCppAdapter Runtime wiring + end-to-end stub-adapter test; PR-4 ships the translation logic + new constructors so PR-5 is just plumbing.What lands
InferenceRequest.prompt_text: Option<String>— wire addition for adapter-based engines that tokenize internallyInferenceComplete.completion_text: Option<String>— wire addition for adapter-based engines that return text not tokensInferenceLlmModule.adapter: Option<Arc<dyn AIProviderAdapter>>+with_adapter/with_bus_and_adapterconstructorshandle_requestrouting: adapter when wired +prompt_textpresent; loud refusal when adapter wired + noprompt_text(raw-token path not yet implemented — never silent fallback); stub when no adapterrun_adapter_inference— translatesInferenceRequest→TextGenerationRequest, calls adapter, translatesTextGenerationResponse→(InferenceComplete, FirstTokenEmitted)translate_adapter_response+translate_adapter_finish_reason— pure-function translationStop→Stop,Length→MaxTokens,ToolUse→Error{reason}(loud refusal),Error→Error{reason}Wire-shape decisions
max_tokens=0in substrate's GenerationBudget →Noneon adapter (substrate convention: 0=unlimited; adapter convention: None=unlimited, 0=stop-immediately which no caller wants)stop_sequences→Noneon adapterpersona_idpropagates as stringified UUID for per-persona resource attributionpurposehardcoded"inference-llm"for adapter routing diagnosticsSub-fix: missing TS bindings from PR-1
PR-1 (#1387) shipped the Rust types but the
shared/generated/inference_llm/directory of TS exports wasn't included in the commit (regen produced them locally; didn't stage). PR-4 ships all 10 TS files + the barrel index. Closes a wire-contract gap.Test plan
cargo test --lib --features metal,accelerate inference::llm_module— 44/44 pass (12 PR-3b + new PR-4):translate_adapter_response_carries_text_and_usage— text + tokens_generated mappingtranslate_finish_reason_covers_all_adapter_variants— cross-enum mapping pinwith_adapter_constructor_routes_via_adapter_path— constructors compile + no-adapter regressionEnd-to-end "stub adapter via
Arc<dyn AIProviderAdapter>" tests deferred to PR-5: theAIProviderAdaptertrait has 8+ methods (provider_id / api_style / default_model / get_available_models / health_check / model_metadata / capabilities / etc) and implementing all on a test stub here would pull inProviderHealth+AdapterCapabilities+ApiStyle+ModelInfo+ their dependencies — bigger than atomic-slice. PR-5 will wireLlamaCppAdapterdirectly through Runtime registration.Stack
🤖 Generated with Claude Code