Skip to content

T041 + T042: LLM selector, StubLLMProvider, and generate-schemas#40

Open
likith1908 wants to merge 4 commits into
T040-llm-provider-protocolfrom
T041-T042-llm-selector-stub
Open

T041 + T042: LLM selector, StubLLMProvider, and generate-schemas#40
likith1908 wants to merge 4 commits into
T040-llm-provider-protocolfrom
T041-T042-llm-selector-stub

Conversation

@likith1908

@likith1908 likith1908 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements T041 (LLM provider selector with fallback) and T042 (StubLLMProvider for tests) as a combined PR on top of T040. Also ships the make generate-schemas target and the CLI improvement that surfaced during review.

  • iris_engine/llm/selector.py: select_llm_provider() returns the configured provider, optionally wrapped in _FallbackLLMProvider. Fallback fires only on LLMUnavailable; LLMRateLimited surfaces to the caller since rate limiting on the primary does not mean the fallback can absorb the load.
  • iris_engine/llm/in_memory.py: StubLLMProvider returns canned LLMResponse values, supports a structured_instances registry for schema-based responses, raises LLMSchemaViolation when a schema is requested but no instance is registered, and accepts raise_on_complete for error injection. Full OTEL instrumentation via instrument_complete.
  • iris_engine/llm/tracing.py: shared instrument_complete() async context manager and log_complete_success() structured log emitter. Mirrors ocr/tracing.py pattern exactly.
  • iris_engine/llm/__init__.py: public re-exports for the llm subsystem.
  • iris_config/schema/adapters.py: llm_fallback: LlmAdapter | None = None added to AdaptersSchema. Module docstring now documents the iris config schema regeneration step.
  • docs/schemas/product.schema.json: regenerated to include llm_fallback.
  • Makefile: make generate-schemas target regenerates all three docs/schemas/*.json files from the Pydantic models in one command.
  • iris config schema: only writes the file when content actually changed; output now prefixed with [model] and distinguishes written vs up-to-date.

Task reference

  • Task IDs: T041, T042
  • Workstream: 004-llm-adapter-set

Acceptance criteria

T041

  • select_llm_provider() returns primary when no fallback configured
  • Primary LLMUnavailable falls back to secondary
  • Both fail surfaces primary error
  • Unknown adapter or fallback raises KeyError

T042

  • Returns LLMResponse with correct adapter_id, usage math, and text
  • Schema provided + registered instance returns structured field populated
  • Schema provided + no registered instance raises LLMSchemaViolation
  • No schema returns structured = None
  • raise_on_complete injects typed errors for test scenarios

PR checklist

  • Every acceptance criterion in the task's tasks.md entry is satisfied.
  • docs-ci workflow passes (markdown lint + tasks structural check).
  • No em-dashes introduced in new prose.
  • All internal links resolve.
  • No secrets, credentials, or personal names introduced.

Notes for the reviewer

OTEL span attribute tests deferred to T048: The unit test file previously set a module-level set_tracer_provider() which overwrote the OCR contract conftest's provider, causing 5 false failures in test_c011_otel_span_carries_tenant_id. Span attribute verification belongs in the contract suite (T048), consistent with how C-OCR-011 is handled. The tracing code paths (success, LLMError, generic exception) are all covered structurally through the stub's complete() call - 100% coverage on tracing.py.

JSON schema regeneration had no enforced workflow: When AdaptersSchema gained llm_fallback, docs/schemas/product.schema.json was silently out of date - caught only by manual inspection. Fixed in this PR: the adapters.py docstring documents the command, make generate-schemas runs all three regenerations in one step, and the CLI now only writes and reports when content actually changed.

_FallbackLLMProvider exposes primary .id: The fallback wrapper's .id is the primary adapter's ID so callers and telemetry always see the configured adapter, not the fallback.

Rebase note

This PR is stacked on #39 (T040). Once #39 merges, rebase onto main before merging this one.

…041+T042)

- iris_engine/llm/selector.py: select_llm_provider() with _FallbackLLMProvider
  that fires only on LLMUnavailable, re-raising the primary error when both fail
- iris_engine/llm/in_memory.py: StubLLMProvider with structured-output registry,
  error injection, and full OTEL span instrumentation
- iris_engine/llm/tracing.py: instrument_complete() async context manager that
  sets llm.* span attributes, and log_complete_success() structured log emitter
- iris_engine/llm/__init__.py: public re-exports for the llm subsystem
- iris_config/schema/adapters.py: add optional llm_fallback field to AdaptersSchema
- docs/schemas/product.schema.json: regenerate to include llm_fallback
- 20 unit tests, 394 total passing, 96.46% coverage, mypy --strict clean
When AdaptersSchema gains a new field the JSON schema in docs/schemas/ must
be regenerated manually. The docstring now points to the exact command so
the step is visible at the point of change.
- Makefile: generate-schemas target regenerates all three docs/schemas/*.json
  files from the Pydantic models in one command
- iris config schema: only writes the file when content changed; output now
  prefixed with [model] and distinguishes written vs up-to-date
@likith1908 likith1908 changed the title T041 + T042: LLM selector and StubLLMProvider T041 + T042: LLM selector, StubLLMProvider, and generate-schemas Jun 17, 2026
@likith1908 likith1908 marked this pull request as ready for review June 17, 2026 07:34
@likith1908 likith1908 requested a review from anmolg1997 as a code owner June 17, 2026 07:34

@anmolg1997 anmolg1997 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. The selector and stub are textbook — every pattern lesson from the OCR sprint is baked in from the first commit, and you closed two of my earlier non-blocking notes along the way. Same one tracking fix as #39 before merge.

What I verified

Selector (iris_engine/llm/selector.py, T041)

  • Mirrors ocr/selector.py exactly: plain-string registry lookup (engine never imports iris_config — layering docstring states it, lint confirms 3 contracts kept), _FallbackLLMProvider wrapper preserving primary id/version.
  • Fallback fires only on LLMUnavailable, surfaces the primary error if both fail, and the docstring documents why LLMRateLimited deliberately does NOT fall back — the exact reasoning from the OCR review, applied here without being asked. That's the pattern sticking.

Stub (iris_engine/llm/in_memory.py, T042)

  • structured_instances: dict[type[BaseModel], BaseModel] registry keyed by schema type → returns the registered instance when a schema is requested, raises LLMSchemaViolation with a clear message when none is registered. This is exactly the seam workstream 005's agent tests will lean on.
  • raise_on_complete: LLMError | None error-injection seam — lets 005's tests exercise every typed-error branch without standing up a real provider.
  • Full OTEL instrumentation via instrument_complete, structured_output_used attribute computed from the real outcome. usage math defaults satisfy the contract's total == input + output invariant (10+5=15).

Tracing (iris_engine/llm/tracing.py)

  • Faithful mirror of ocr/tracing.py: shared async CM, typed-error categorisation, structured success log. The two tracing modules are now consistent siblings — when the real LLM adapters land in T044-T047 they wrap instrument_complete the same way the OCR adapters wrap instrument_extract.

Two earlier notes closed (appreciated)

  • make test-cov is now pytest --cov --cov-report=... --cov-fail-under=95 — reads source from pyproject config instead of repeating --cov= per package. That's the dual-source-of-truth cleanup I flagged on #36. One place now.
  • make generate-schemas + idempotent iris config schema (writes only on change, [model] written vs up-to-date) addresses the schema-artifact drift note from #37. llm_fallback added to AdaptersSchema and the regenerated product.schema.json is in the diff — schema and artifact in sync in the same PR.

Gates: 21 selector/stub tests pass, full suite 394, mypy clean on 24 source files, lint clean.

Same fix before merge as #39

tasks/004-llm-adapter-set/tasks.md still shows - [ ] **T041** and - [ ] **T042**. Please flip both to [x].

This is now two PRs in a row (#39, #40) landing with the checkbox unflipped — you were consistent on this through all of WS001-003, so it's a small slip, not a habit, but worth catching before it becomes one. The tasks.md checkboxes are how the team (and Akhilesh, when he asks "what's done") reads progress at a glance; a merged-but-unchecked task makes main lie about its own state. Quick habit to re-anchor: flip the box in the same commit that implements the task, before opening the PR.

Both #39 and #40 are rebasing as the stack lands anyway, so the three flips (T040, T041, T042) cost one extra line each on a push you're already making.

Status

With this, WS004's foundation (Protocol + selector + stub + tracing) is in — same shape that made the OCR adapters fast to write. Remaining in WS004: the four real adapters (T044 Azure OpenAI, T045 OpenAI, T046 Anthropic, T047 local) on the shared OpenAI-compatible base (T043), the contract suite (T048), OTEL (T049 — though you've effectively pre-wired it), and READMEs (T050). The llm-shared base class (T043) is the natural next PR since three of the four adapters inherit from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants