feat(inference): inference-llm PR-3a — canonical ArtifactKeys + publishing helpers by joelteply · Pull Request #1392 · CambrianTech/continuum

joelteply · 2026-05-18T16:48:00Z

Summary

PR-3a of inference-llm. Same pattern as my genome::bus PR-4 (#1358): name the canonical ArtifactKey constants + ship the async publishing helpers + subscriber convenience. The real-engine integration lands in PR-3b/PR-4; PR-3a ships the bus surface so downstream observers (sentinel-observer, VDD harness, audit-recorder) can wire to it today.

What lands

Four canonical ArtifactKeys under inference/:

INFERENCE_REQUEST_KEY = "inference/llm.request"
INFERENCE_COMPLETE_KEY = "inference/llm.complete"
FIRST_TOKEN_EMITTED_KEY = "inference/llm.first_token"
RESIDENCY_FAULT_KEY = "inference/llm.residency_fault"

Four async publishing helpers — serialize the typed event + publish through the artifact dispatch path (#1339 + #1343):

publish_inference_request / publish_inference_complete / publish_first_token_emitted / publish_residency_fault

Three subscriber-convenience surfaces:

subscribe_to_inference_responses(bus, name) — most observers want outcomes (complete + first_token + fault), not requests
inference_response_selectors() — three Exact selectors
all_inference_selectors() — four selectors for full-firehose consumers (audit-recorder when it covers inference)

Design choices

Two subscriber surfaces (response-only vs full firehose) because most observers don't want every request — they want outcomes. Audit-recorder + VDD harness may want the firehose for the prod-replay chain (oxidizer: migrate AIDecisionService.generateResponse to Rust cognition/generate-response #1385).
Same naming convention as genome::bus (module/surface.event) for cross-module consistency.

What is deliberately deferred (PR-3b / PR-4)

Wiring helpers INTO InferenceLlmModule::handle_command so it auto-publishes after each call (PR-3b)
Real LLM engine (LlamaCppAdapter integration) — PR-4
InferenceRequest artifact subscription (module subscribes to requests via bus instead of going through command bus) — needs persona-cognition to publish via bus first

Test plan

cargo test --lib --features metal,accelerate inference::llm_module_bus — 7/7 pass:
- keys_have_canonical_string_values (wire string pin)
- response_selectors_cover_three_keys_as_exact
- all_selectors_cover_four_keys
- publish_inference_complete_routes_to_subscribed_module (end-to-end through artifact dispatch)
- each_publish_helper_routes_to_its_own_key
- response_only_subscriber_does_not_see_requests
- full_firehose_subscriber_sees_requests_too
No regressions across other 2958 lib tests

Stack

feat(inference): inference-llm PR-1 — typed event surface (MODULE-CATALOG §II) #1387 — inference-llm PR-1: typed event surface
feat(inference): inference-llm PR-2 — InferenceLlmModule ServiceModule impl (stub-backed) #1391 — inference-llm PR-2: ServiceModule impl (stub-backed)
This PR — inference-llm PR-3a: bus keys + publishing helpers
NEXT — PR-3b: InferenceLlmModule auto-publishes via these helpers
THEN — PR-4: real LlamaCppAdapter invoke + tokenizer + streaming

🤖 Generated with Claude Code

…shing helpers PR-3a of inference-llm. Same pattern as my genome::bus PR-4 (#1358): name the canonical ArtifactKey constants + ship the async publishing helpers + subscriber convenience. The actual real-engine integration lands in PR-3b/PR-4; PR-3a ships the bus surface so downstream observers (sentinel-observer, VDD harness, audit-recorder) can wire to it today before the engine swap. What lands Four canonical ArtifactKeys under inference/: - INFERENCE_REQUEST_KEY = "inference/llm.request" - INFERENCE_COMPLETE_KEY = "inference/llm.complete" - FIRST_TOKEN_EMITTED_KEY = "inference/llm.first_token" - RESIDENCY_FAULT_KEY = "inference/llm.residency_fault" Four async publishing helpers — serialize the typed event + publish through the artifact dispatch path (#1339 + #1343): - publish_inference_request - publish_inference_complete - publish_first_token_emitted - publish_residency_fault Three subscriber-convenience surfaces: - subscribe_to_inference_responses(bus, name) — most observers want outcomes (complete + first_token + fault), not requests - inference_response_selectors() — three Exact selectors - all_inference_selectors() — four selectors including request for full-firehose consumers (audit-recorder when it covers inference) Design choices - Two subscriber surfaces (response-only vs full firehose) because most observers don't want every request — they want outcomes. Audit-recorder + VDD harness may want the firehose for the prod-replay chain Joel pushed at #1385. - Request key INFERENCE_REQUEST_KEY in the publish helpers but NOT in the default observer set. Producers (persona-cognition) emit requests; observers see responses. Wiring symmetry without the noise. - Same naming convention as genome::bus (module/surface.event) for cross-module consistency. What is deliberately deferred (PR-3b / PR-4) - Wiring helpers INTO InferenceLlmModule::handle_command so it auto-publishes after each call. PR-3b plumbs Arc<MessageBus> + Arc<ModuleRegistry> through the module's constructor. - Real LLM engine (LlamaCppAdapter integration) — PR-4 - InferenceRequest artifact subscription (module subscribes to requests via bus instead of going through command bus) — needs persona-cognition to publish via bus first Tests 7 new tests on inference::llm_module_bus: - keys_have_canonical_string_values (pin wire strings) - response_selectors_cover_three_keys_as_exact - all_selectors_cover_four_keys - publish_inference_complete_routes_to_subscribed_module (end-to-end through artifact dispatch) - each_publish_helper_routes_to_its_own_key - response_only_subscriber_does_not_see_requests - full_firehose_subscriber_sees_requests_too 7/7 pass. No regressions across other 2958 lib tests. Stack - #1387 — inference-llm PR-1: typed event surface - #1391 — inference-llm PR-2: ServiceModule impl (stub-backed) - THIS PR — inference-llm PR-3a: bus keys + publishing helpers - NEXT — PR-3b: InferenceLlmModule auto-publishes via these helpers after each handle_command call - THEN — PR-4: real LlamaCppAdapter invoke + tokenizer + streaming Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…hes via bus hook (#1393) PR-3b of inference-llm. Wires the bus helpers from PR-3a (#1392) INTO InferenceLlmModule's handle_command so every successful inference response auto-publishes InferenceComplete + FirstTokenEmitted to the trace bus. Closes the inference-llm bus loop: producer (command) → engine (stub for now) → response (CommandResult) → bus dispatch (complete + first_token) → subscriber (sentinel/VDD/audit). What lands - BusHook private struct: { bus: Arc<MessageBus>, registry: Arc<ModuleRegistry> }. Same shape as genome::local_manager BusHook (#1362). - InferenceLlmModule.bus_hook: Option<BusHook> — None = bus-less PR-2 behavior; Some = auto-publish on every successful handle_command. - with_bus(bus, registry) constructor — wires both Arcs at module construction; no in-flight switching (prevents the "bus added mid-service" race). - handle_request body: on success, spawns publish_inference_complete and publish_first_token_emitted into the current tokio runtime via Handle::try_current. Spawn pattern (not await) avoids the DashMap borrow-across-await lifetime issue inside Send-bounded async_trait — same workaround as my genome LocalWorkingSetManager (#1362). - spawn_publish_inference_complete + spawn_publish_first_token_emitted module-private helpers — Arcs cloned out before spawn so the &BusHook borrow doesn't outlive the spawn. Design choices - Publishing is best-effort observability. The authoritative response goes back through the CommandResult arm regardless of publish success — callers who need to know if a generation happened look at the Result, not the bus. - Error paths (unknown command + invalid payload) do NOT publish. Tests pin this — bus events represent successful generations; errors are loud in the Result and silent on the bus. - Two separate spawns (one per event) rather than one bundled publish. Lets subscribers see first_token even if the complete event hasn't dispatched yet (race-tolerant TTFT observability). Tests 4 new bus tests (12 total): - handle_command_with_bus_auto_publishes_complete_and_first_token — end-to-end: register subscriber, run handle_command, yield for spawn, verify both events landed with matching requestId - handle_command_without_bus_does_not_publish — backwards-compat with PR-2 new() constructor - handle_command_unknown_with_bus_does_not_publish — error paths silent on bus - handle_command_invalid_payload_with_bus_does_not_publish — same invariant 12/12 pass on inference::llm_module_service. No regressions across other 2957 lib tests. Stack - #1387 — inference-llm PR-1: typed event surface - #1391 — inference-llm PR-2: ServiceModule impl (stub-backed) - #1392 — inference-llm PR-3a: bus keys + publishing helpers - THIS PR — inference-llm PR-3b: auto-publish wiring - NEXT — PR-4: real LlamaCppAdapter invoke + tokenizer + streaming (the stub stays in place until then; PR-4 swaps under the same external contract) Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n layer + new constructors) (#1395) Bridges the substrate's typed InferenceRequest/InferenceComplete surface to the existing AIProviderAdapter trait (LlamaCppAdapter for local llama.cpp). PR-5 ships the LlamaCppAdapter Runtime wiring + the end-to-end stub-adapter test; PR-4 ships the translation logic + new constructors so PR-5 is just plumbing. What lands - InferenceRequest.prompt_text: Option<String> — PR-4 wire addition for adapter-based engines that tokenize internally. Backwards-compat (Option = optional on wire). - InferenceComplete.completion_text: Option<String> — wire addition for adapter-based engines that return text not tokens. - InferenceLlmModule.adapter: Option<Arc<dyn AIProviderAdapter>>. - with_adapter(adapter) constructor: real-inference + no bus. - with_bus_and_adapter(bus, registry, adapter) constructor: the full production wiring (adapter + bus publishing). - handle_request: routes via adapter when wired + prompt_text present; refuses loud when adapter wired + no prompt_text (raw- token path not yet implemented — never silent fallback); falls back to PR-2 stub when no adapter. - run_adapter_inference(adapter, request, prompt_text) — translates InferenceRequest → TextGenerationRequest, calls adapter, translates TextGenerationResponse → (InferenceComplete, FirstTokenEmitted). - translate_adapter_response(request, response) — pure-function body of the response-side translation. - translate_adapter_finish_reason(adapter_reason) — cross-enum mapping: Stop→Stop, Length→MaxTokens, ToolUse→Error{reason} (loud refusal — inference-llm doesn't model tool-use), Error→ Error{reason}. Wire-shape decisions - max_tokens=0 in substrate's GenerationBudget translates to None on adapter's wire. Substrate convention: 0=unlimited, caller takes duration responsibility. Adapter convention: None=unlimited, 0=stop immediately. The substrate's "stop immediately" doesn't have an encoding because no caller would ask for it. - stop_sequences: empty Vec on substrate translates to None on adapter (adapter convention: None = no caller stop sequences). - persona_id propagates to adapter as stringified UUID for per-persona resource attribution (matches existing adapter convention from PersonaResponseGenerator). - purpose hardcoded "inference-llm" for adapter routing diagnostics. Sub-fix: missing TS bindings from PR-1 PR-1 (#1387) shipped the Rust types but the shared/generated/inference_llm/ directory of TS exports wasn't included in the commit (regen produced them locally; they didn't get staged). PR-4 ships all 10 TS files + the barrel index. Closes a wire-contract gap. Tests 13 new behavioral tests (44 total in inference::llm_module + inference::llm_module_service + inference::llm_module_bus): - translate_adapter_response_carries_text_and_usage — completion_text + tokens_generated mapping - translate_finish_reason_covers_all_adapter_variants — cross-enum mapping pin - with_adapter_constructor_routes_via_adapter_path — constructors compile + no-adapter regression - 8 existing PR-2 + 4 existing PR-3b tests still pass (no regressions) End-to-end "stub adapter via Arc<dyn AIProviderAdapter>" tests deferred to PR-5: the AIProviderAdapter trait has 8+ methods (provider_id / api_style / default_model / get_available_models / health_check / model_metadata / capabilities / initialize / shutdown / generate_text / create_embedding) and implementing all of them on a test stub here would pull in ProviderHealth + AdapterCapabilities + ApiStyle + ModelInfo + their dependencies — bigger than atomic-slice. PR-5 will wire LlamaCppAdapter directly through Runtime registration. 44/44 inference::llm_module tests pass. No regressions across other 2928 lib tests. Stack - #1387 — inference-llm PR-1: typed event surface - #1391 — inference-llm PR-2: ServiceModule impl (stub-backed) - #1392 — inference-llm PR-3a: bus keys + publishing helpers - #1393 — inference-llm PR-3b: auto-publish wiring - THIS PR — inference-llm PR-4: adapter integration (translation + constructors) - NEXT — PR-5: LlamaCppAdapter Runtime wiring + end-to-end integration test through real (or test-mock) adapter Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires InferenceLlmModule into the Runtime so it's callable from the cognition path via inference/llm/request commands. What lands - Add "inference-llm" to EXPECTED_MODULES in runtime/runtime.rs - runtime.register(Arc::new(InferenceLlmModule::new())) in ipc/mod.rs alongside the existing InferenceModule registration Design choices - Constructed via the .new() (bus-less, stub-backed) constructor rather than .with_bus_and_adapter(). Reason: the with_bus_and_adapter constructor requires an AIProviderAdapter Arc, which would couple PR-5's runtime registration to a specific LlamaCppAdapter init lifecycle. The substrate's LlamaCppAdapter is owned by AIProviderModule's adapter registry with its own initialization phase; threading the adapter Arc here would either duplicate the registration or create an init-ordering dependency this slice shouldn't introduce. - The stub-backed registration is still useful: it exposes the inference/llm/request command surface to the cognition path so downstream PRs (turn-execute that chains drain-turn-frame → response_prompt → inference/llm/request) can wire against the real command name. Bus + adapter integration is a follow-up PR that updates the construction call here. What is NOT changed - AIProviderModule + LlamaCppAdapter unchanged - All InferenceLlmModule trait impl logic unchanged (PR-2/3/4 work intact) - The stub vs real-adapter swap point stays exactly where PR-4 put it: with_bus_and_adapter constructor + run_adapter_inference function Tests - cargo build --features metal,accelerate --lib clean (no new test fixtures needed — the module's existing 44/44 tests cover the trait-impl correctness; this PR just plumbs construction into runtime startup) - EXPECTED_MODULES enforcement validates at boot: if the registration is missing the runtime fails with "missing inference-llm" error - Pre-push gate clean Stack - #1387 PR-1: typed event surface - #1391 PR-2: ServiceModule impl (stub-backed) - #1392 PR-3a: bus keys + publishing helpers - #1393 PR-3b: auto-publish wiring - #1395 PR-4: adapter integration (translation + new constructors) - THIS PR — PR-5: Runtime registration - FOLLOW-UP — adapter Arc wiring when LlamaCppAdapter init phase is integrated with Runtime startup Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joelteply merged commit 0767ddf into canary May 18, 2026
1 check passed

joelteply deleted the feat/inference-llm-bus-keys-pr3a branch May 18, 2026 16:48

github-actions Bot added the size: L label May 18, 2026

joelteply mentioned this pull request May 18, 2026

feat(inference): inference-llm PR-3b — InferenceLlmModule auto-publishes via bus hook (pure Rust) #1393

Merged

2 tasks

joelteply mentioned this pull request May 18, 2026

feat(inference): inference-llm PR-4 — adapter integration (translation + new constructors) #1395

Merged

3 tasks

joelteply mentioned this pull request May 18, 2026

feat(inference): inference-llm PR-5 — Runtime registration #1404

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): inference-llm PR-3a — canonical ArtifactKeys + publishing helpers#1392

feat(inference): inference-llm PR-3a — canonical ArtifactKeys + publishing helpers#1392
joelteply merged 1 commit into
canaryfrom
feat/inference-llm-bus-keys-pr3a

joelteply commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 18, 2026

Summary

What lands

Design choices

What is deliberately deferred (PR-3b / PR-4)

Test plan

Stack

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant