T041 + T042: LLM selector, StubLLMProvider, and generate-schemas#40
T041 + T042: LLM selector, StubLLMProvider, and generate-schemas#40likith1908 wants to merge 4 commits into
Conversation
…041+T042) - iris_engine/llm/selector.py: select_llm_provider() with _FallbackLLMProvider that fires only on LLMUnavailable, re-raising the primary error when both fail - iris_engine/llm/in_memory.py: StubLLMProvider with structured-output registry, error injection, and full OTEL span instrumentation - iris_engine/llm/tracing.py: instrument_complete() async context manager that sets llm.* span attributes, and log_complete_success() structured log emitter - iris_engine/llm/__init__.py: public re-exports for the llm subsystem - iris_config/schema/adapters.py: add optional llm_fallback field to AdaptersSchema - docs/schemas/product.schema.json: regenerate to include llm_fallback - 20 unit tests, 394 total passing, 96.46% coverage, mypy --strict clean
When AdaptersSchema gains a new field the JSON schema in docs/schemas/ must be regenerated manually. The docstring now points to the exact command so the step is visible at the point of change.
- Makefile: generate-schemas target regenerates all three docs/schemas/*.json files from the Pydantic models in one command - iris config schema: only writes the file when content changed; output now prefixed with [model] and distinguishes written vs up-to-date
anmolg1997
left a comment
There was a problem hiding this comment.
Approved. The selector and stub are textbook — every pattern lesson from the OCR sprint is baked in from the first commit, and you closed two of my earlier non-blocking notes along the way. Same one tracking fix as #39 before merge.
What I verified
Selector (iris_engine/llm/selector.py, T041)
- Mirrors
ocr/selector.pyexactly: plain-string registry lookup (engine never imports iris_config — layering docstring states it, lint confirms 3 contracts kept),_FallbackLLMProviderwrapper preserving primaryid/version. - Fallback fires only on
LLMUnavailable, surfaces the primary error if both fail, and the docstring documents whyLLMRateLimiteddeliberately does NOT fall back — the exact reasoning from the OCR review, applied here without being asked. That's the pattern sticking.
Stub (iris_engine/llm/in_memory.py, T042)
structured_instances: dict[type[BaseModel], BaseModel]registry keyed by schema type → returns the registered instance when a schema is requested, raisesLLMSchemaViolationwith a clear message when none is registered. This is exactly the seam workstream 005's agent tests will lean on.raise_on_complete: LLMError | Noneerror-injection seam — lets 005's tests exercise every typed-error branch without standing up a real provider.- Full OTEL instrumentation via
instrument_complete,structured_output_usedattribute computed from the real outcome.usagemath defaults satisfy the contract'stotal == input + outputinvariant (10+5=15).
Tracing (iris_engine/llm/tracing.py)
- Faithful mirror of
ocr/tracing.py: shared async CM, typed-error categorisation, structured success log. The two tracing modules are now consistent siblings — when the real LLM adapters land in T044-T047 they wrapinstrument_completethe same way the OCR adapters wrapinstrument_extract.
Two earlier notes closed (appreciated)
make test-covis nowpytest --cov --cov-report=... --cov-fail-under=95— readssourcefrom pyproject config instead of repeating--cov=per package. That's the dual-source-of-truth cleanup I flagged on #36. One place now.make generate-schemas+ idempotentiris config schema(writes only on change,[model] writtenvsup-to-date) addresses the schema-artifact drift note from #37.llm_fallbackadded toAdaptersSchemaand the regeneratedproduct.schema.jsonis in the diff — schema and artifact in sync in the same PR.
Gates: 21 selector/stub tests pass, full suite 394, mypy clean on 24 source files, lint clean.
Same fix before merge as #39
tasks/004-llm-adapter-set/tasks.md still shows - [ ] **T041** and - [ ] **T042**. Please flip both to [x].
This is now two PRs in a row (#39, #40) landing with the checkbox unflipped — you were consistent on this through all of WS001-003, so it's a small slip, not a habit, but worth catching before it becomes one. The tasks.md checkboxes are how the team (and Akhilesh, when he asks "what's done") reads progress at a glance; a merged-but-unchecked task makes main lie about its own state. Quick habit to re-anchor: flip the box in the same commit that implements the task, before opening the PR.
Both #39 and #40 are rebasing as the stack lands anyway, so the three flips (T040, T041, T042) cost one extra line each on a push you're already making.
Status
With this, WS004's foundation (Protocol + selector + stub + tracing) is in — same shape that made the OCR adapters fast to write. Remaining in WS004: the four real adapters (T044 Azure OpenAI, T045 OpenAI, T046 Anthropic, T047 local) on the shared OpenAI-compatible base (T043), the contract suite (T048), OTEL (T049 — though you've effectively pre-wired it), and READMEs (T050). The llm-shared base class (T043) is the natural next PR since three of the four adapters inherit from it.
Summary
Implements T041 (LLM provider selector with fallback) and T042 (StubLLMProvider for tests) as a combined PR on top of T040. Also ships the
make generate-schemastarget and the CLI improvement that surfaced during review.iris_engine/llm/selector.py:select_llm_provider()returns the configured provider, optionally wrapped in_FallbackLLMProvider. Fallback fires only onLLMUnavailable;LLMRateLimitedsurfaces to the caller since rate limiting on the primary does not mean the fallback can absorb the load.iris_engine/llm/in_memory.py:StubLLMProviderreturns cannedLLMResponsevalues, supports astructured_instancesregistry for schema-based responses, raisesLLMSchemaViolationwhen a schema is requested but no instance is registered, and acceptsraise_on_completefor error injection. Full OTEL instrumentation viainstrument_complete.iris_engine/llm/tracing.py: sharedinstrument_complete()async context manager andlog_complete_success()structured log emitter. Mirrorsocr/tracing.pypattern exactly.iris_engine/llm/__init__.py: public re-exports for the llm subsystem.iris_config/schema/adapters.py:llm_fallback: LlmAdapter | None = Noneadded toAdaptersSchema. Module docstring now documents theiris config schemaregeneration step.docs/schemas/product.schema.json: regenerated to includellm_fallback.Makefile:make generate-schemastarget regenerates all threedocs/schemas/*.jsonfiles from the Pydantic models in one command.iris config schema: only writes the file when content actually changed; output now prefixed with[model]and distinguisheswrittenvsup-to-date.Task reference
T041,T042004-llm-adapter-setAcceptance criteria
T041
select_llm_provider()returns primary when no fallback configuredLLMUnavailablefalls back to secondaryKeyErrorT042
LLMResponsewith correctadapter_id,usagemath, andtextstructuredfield populatedLLMSchemaViolationstructured = Noneraise_on_completeinjects typed errors for test scenariosPR checklist
tasks.mdentry is satisfied.docs-ciworkflow passes (markdown lint + tasks structural check).Notes for the reviewer
OTEL span attribute tests deferred to T048: The unit test file previously set a module-level
set_tracer_provider()which overwrote the OCR contract conftest's provider, causing 5 false failures intest_c011_otel_span_carries_tenant_id. Span attribute verification belongs in the contract suite (T048), consistent with howC-OCR-011is handled. The tracing code paths (success,LLMError, generic exception) are all covered structurally through the stub'scomplete()call - 100% coverage ontracing.py.JSON schema regeneration had no enforced workflow: When
AdaptersSchemagainedllm_fallback,docs/schemas/product.schema.jsonwas silently out of date - caught only by manual inspection. Fixed in this PR: theadapters.pydocstring documents the command,make generate-schemasruns all three regenerations in one step, and the CLI now only writes and reports when content actually changed._FallbackLLMProviderexposes primary.id: The fallback wrapper's.idis the primary adapter's ID so callers and telemetry always see the configured adapter, not the fallback.Rebase note
This PR is stacked on #39 (T040). Once #39 merges, rebase onto
mainbefore merging this one.