Skip to content

feat(inference): inference-llm PR-4 — adapter integration (translation + new constructors)#1395

Merged
joelteply merged 1 commit into
canaryfrom
feat/inference-llm-adapter-pr4
May 18, 2026
Merged

feat(inference): inference-llm PR-4 — adapter integration (translation + new constructors)#1395
joelteply merged 1 commit into
canaryfrom
feat/inference-llm-adapter-pr4

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Bridges the substrate's typed InferenceRequest/InferenceComplete surface to the existing AIProviderAdapter trait (LlamaCppAdapter for local llama.cpp). Pure Rust, zero TS, zero shim. PR-5 ships the LlamaCppAdapter Runtime wiring + end-to-end stub-adapter test; PR-4 ships the translation logic + new constructors so PR-5 is just plumbing.

What lands

  • InferenceRequest.prompt_text: Option<String> — wire addition for adapter-based engines that tokenize internally
  • InferenceComplete.completion_text: Option<String> — wire addition for adapter-based engines that return text not tokens
  • InferenceLlmModule.adapter: Option<Arc<dyn AIProviderAdapter>> + with_adapter / with_bus_and_adapter constructors
  • handle_request routing: adapter when wired + prompt_text present; loud refusal when adapter wired + no prompt_text (raw-token path not yet implemented — never silent fallback); stub when no adapter
  • run_adapter_inference — translates InferenceRequestTextGenerationRequest, calls adapter, translates TextGenerationResponse(InferenceComplete, FirstTokenEmitted)
  • translate_adapter_response + translate_adapter_finish_reason — pure-function translation
  • Cross-enum mapping: Stop→Stop, Length→MaxTokens, ToolUse→Error{reason} (loud refusal), Error→Error{reason}

Wire-shape decisions

  • max_tokens=0 in substrate's GenerationBudget → None on adapter (substrate convention: 0=unlimited; adapter convention: None=unlimited, 0=stop-immediately which no caller wants)
  • Empty stop_sequencesNone on adapter
  • persona_id propagates as stringified UUID for per-persona resource attribution
  • purpose hardcoded "inference-llm" for adapter routing diagnostics

Sub-fix: missing TS bindings from PR-1

PR-1 (#1387) shipped the Rust types but the shared/generated/inference_llm/ directory of TS exports wasn't included in the commit (regen produced them locally; didn't stage). PR-4 ships all 10 TS files + the barrel index. Closes a wire-contract gap.

Test plan

  • cargo test --lib --features metal,accelerate inference::llm_module — 44/44 pass (12 PR-3b + new PR-4):
    • translate_adapter_response_carries_text_and_usage — text + tokens_generated mapping
    • translate_finish_reason_covers_all_adapter_variants — cross-enum mapping pin
    • with_adapter_constructor_routes_via_adapter_path — constructors compile + no-adapter regression
  • No regressions across other 2928 lib tests
  • Pre-push gate clean

End-to-end "stub adapter via Arc<dyn AIProviderAdapter>" tests deferred to PR-5: the AIProviderAdapter trait has 8+ methods (provider_id / api_style / default_model / get_available_models / health_check / model_metadata / capabilities / etc) and implementing all on a test stub here would pull in ProviderHealth + AdapterCapabilities + ApiStyle + ModelInfo + their dependencies — bigger than atomic-slice. PR-5 will wire LlamaCppAdapter directly through Runtime registration.

Stack

🤖 Generated with Claude Code

…n layer + new constructors)

Bridges the substrate's typed InferenceRequest/InferenceComplete surface
to the existing AIProviderAdapter trait (LlamaCppAdapter for local
llama.cpp). PR-5 ships the LlamaCppAdapter Runtime wiring + the
end-to-end stub-adapter test; PR-4 ships the translation logic +
new constructors so PR-5 is just plumbing.

What lands

- InferenceRequest.prompt_text: Option<String> — PR-4 wire
  addition for adapter-based engines that tokenize internally.
  Backwards-compat (Option = optional on wire).
- InferenceComplete.completion_text: Option<String> — wire
  addition for adapter-based engines that return text not tokens.
- InferenceLlmModule.adapter: Option<Arc<dyn AIProviderAdapter>>.
- with_adapter(adapter) constructor: real-inference + no bus.
- with_bus_and_adapter(bus, registry, adapter) constructor: the
  full production wiring (adapter + bus publishing).
- handle_request: routes via adapter when wired + prompt_text
  present; refuses loud when adapter wired + no prompt_text (raw-
  token path not yet implemented — never silent fallback); falls
  back to PR-2 stub when no adapter.
- run_adapter_inference(adapter, request, prompt_text) — translates
  InferenceRequest → TextGenerationRequest, calls adapter, translates
  TextGenerationResponse → (InferenceComplete, FirstTokenEmitted).
- translate_adapter_response(request, response) — pure-function
  body of the response-side translation.
- translate_adapter_finish_reason(adapter_reason) — cross-enum
  mapping: Stop→Stop, Length→MaxTokens, ToolUse→Error{reason}
  (loud refusal — inference-llm doesn't model tool-use), Error→
  Error{reason}.

Wire-shape decisions

- max_tokens=0 in substrate's GenerationBudget translates to None
  on adapter's wire. Substrate convention: 0=unlimited, caller takes
  duration responsibility. Adapter convention: None=unlimited, 0=stop
  immediately. The substrate's "stop immediately" doesn't have an
  encoding because no caller would ask for it.
- stop_sequences: empty Vec on substrate translates to None on
  adapter (adapter convention: None = no caller stop sequences).
- persona_id propagates to adapter as stringified UUID for
  per-persona resource attribution (matches existing adapter
  convention from PersonaResponseGenerator).
- purpose hardcoded "inference-llm" for adapter routing diagnostics.

Sub-fix: missing TS bindings from PR-1

PR-1 (#1387) shipped the Rust types but the
shared/generated/inference_llm/ directory of TS exports wasn't
included in the commit (regen produced them locally; they didn't
get staged). PR-4 ships all 10 TS files + the barrel index. Closes
a wire-contract gap.

Tests

13 new behavioral tests (44 total in inference::llm_module +
inference::llm_module_service + inference::llm_module_bus):

- translate_adapter_response_carries_text_and_usage — completion_text
  + tokens_generated mapping
- translate_finish_reason_covers_all_adapter_variants — cross-enum
  mapping pin
- with_adapter_constructor_routes_via_adapter_path — constructors
  compile + no-adapter regression
- 8 existing PR-2 + 4 existing PR-3b tests still pass (no
  regressions)

End-to-end "stub adapter via Arc<dyn AIProviderAdapter>" tests
deferred to PR-5: the AIProviderAdapter trait has 8+ methods
(provider_id / api_style / default_model / get_available_models /
health_check / model_metadata / capabilities / initialize /
shutdown / generate_text / create_embedding) and implementing
all of them on a test stub here would pull in ProviderHealth +
AdapterCapabilities + ApiStyle + ModelInfo + their dependencies
— bigger than atomic-slice. PR-5 will wire LlamaCppAdapter
directly through Runtime registration.

44/44 inference::llm_module tests pass. No regressions across
other 2928 lib tests.

Stack

- #1387 — inference-llm PR-1: typed event surface
- #1391 — inference-llm PR-2: ServiceModule impl (stub-backed)
- #1392 — inference-llm PR-3a: bus keys + publishing helpers
- #1393 — inference-llm PR-3b: auto-publish wiring
- THIS PR — inference-llm PR-4: adapter integration (translation +
  constructors)
- NEXT — PR-5: LlamaCppAdapter Runtime wiring + end-to-end
  integration test through real (or test-mock) adapter

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit ed287ba into canary May 18, 2026
3 checks passed
@joelteply joelteply deleted the feat/inference-llm-adapter-pr4 branch May 18, 2026 17:32
joelteply added a commit that referenced this pull request May 18, 2026
Wires InferenceLlmModule into the Runtime so it's callable from
the cognition path via inference/llm/request commands.

What lands

- Add "inference-llm" to EXPECTED_MODULES in runtime/runtime.rs
- runtime.register(Arc::new(InferenceLlmModule::new())) in
  ipc/mod.rs alongside the existing InferenceModule registration

Design choices

- Constructed via the .new() (bus-less, stub-backed) constructor
  rather than .with_bus_and_adapter(). Reason: the
  with_bus_and_adapter constructor requires an AIProviderAdapter
  Arc, which would couple PR-5's runtime registration to a
  specific LlamaCppAdapter init lifecycle. The substrate's
  LlamaCppAdapter is owned by AIProviderModule's adapter registry
  with its own initialization phase; threading the adapter Arc
  here would either duplicate the registration or create an
  init-ordering dependency this slice shouldn't introduce.
- The stub-backed registration is still useful: it exposes the
  inference/llm/request command surface to the cognition path so
  downstream PRs (turn-execute that chains drain-turn-frame →
  response_prompt → inference/llm/request) can wire against the
  real command name. Bus + adapter integration is a follow-up
  PR that updates the construction call here.

What is NOT changed

- AIProviderModule + LlamaCppAdapter unchanged
- All InferenceLlmModule trait impl logic unchanged (PR-2/3/4
  work intact)
- The stub vs real-adapter swap point stays exactly where PR-4
  put it: with_bus_and_adapter constructor + run_adapter_inference
  function

Tests

- cargo build --features metal,accelerate --lib clean (no new
  test fixtures needed — the module's existing 44/44 tests cover
  the trait-impl correctness; this PR just plumbs construction
  into runtime startup)
- EXPECTED_MODULES enforcement validates at boot: if the registration
  is missing the runtime fails with "missing inference-llm" error
- Pre-push gate clean

Stack

- #1387 PR-1: typed event surface
- #1391 PR-2: ServiceModule impl (stub-backed)
- #1392 PR-3a: bus keys + publishing helpers
- #1393 PR-3b: auto-publish wiring
- #1395 PR-4: adapter integration (translation + new constructors)
- THIS PR — PR-5: Runtime registration
- FOLLOW-UP — adapter Arc wiring when LlamaCppAdapter init phase
  is integrated with Runtime startup

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant