Add opt-in scope router to /chat/message (load the heavy background only when needed)#109
Add opt-in scope router to /chat/message (load the heavy background only when needed)#109anth-volk wants to merge 4 commits into
Conversation
Salvages the off-topic "topic gate" from #95 (Sakshi) as a standalone change against main, dropping that PR's second, unrelated half (the /chat/backends warmup, which depended on the closed model-backend selector in #51). The gate is a cheap pre-check: one fast-model (Haiku) classification on the latest user message, short-circuiting clearly off-topic requests with a canned refusal before the heavy chat loop (system prompt, reference doc, tools) ever runs. It is off by default and opt-in via POLICYENGINE_CHAT_TOPIC_GATE_ENABLED; the classifier fails open (any error or ambiguity → treat as on-topic), since rejecting an on-topic question is worse than letting an off-topic one through. Adapted from #95's version, which was built on #51's multi-backend `backend` object: the refusal SSE event now matches main's single-model `done` shape (no `model_backend`), and the wire-up reads `chat_request.messages` directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Beta preview is ready.
|
…en needed)
Generalises the binary off-topic gate into a scope router that decides which
*background* a turn needs, never what to say — the model still answers the
user's real message in every case:
- "compute" full background: SYSTEM_PROMPT + REFERENCE_DOC + all tools.
Anything needing a calculation or a grounded figure.
- "lightweight" lean background: LIGHTWEIGHT_SYSTEM only, no reference doc,
no tools. Off-topic, scope/capability, and explicitly
unmodelled questions get a tailored, model-generated reply
instead of loading the ~20k-token reference doc and tools —
and instead of the previous canned refusal string.
Implementation reuses the existing stream loop: routing only selects the
model / system blocks / whether tools are sent (the same lever plan mode
already pulls to drop tools), so billing, usage accounting, the `done` event,
and the iteration cap are unchanged. With no tools, the lightweight branch
returns end_turn in one iteration and emits `done` normally.
Fail-safe: empty input, any router error, or an unrecognised reply routes to
"compute" — a wrong "compute" only wastes the background we'd have loaded
anyway, while a wrong "lightweight" risks answering without the data. The lean
prompt forbids quoting figures/parameters from memory, preserving the
"no numbers without computing" invariant.
The scope descriptor is a small hand-written summary for now, with a TODO to
derive it from capabilities() at build time so it cannot drift from the engine.
Prompt text lives in prompts.py; chatbot.py owns the routing/orchestration.
Off by default; opt in via POLICYENGINE_CHAT_SCOPE_ROUTER_ENABLED.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two follow-ups to the scope router. 1. Engine-derived scope descriptor (no drift). build_reference.py now also writes scope_descriptor.md, deriving the "modelled" half from the Parameters schema (the authoritative reform-key list) and capabilities() (datasets), combined with a curated "not modelled" boundary. The route layer loads scope_descriptor.md at startup (alongside reference.md) and falls back to DEFAULT_SCOPE_DESCRIPTOR when it's absent (local dev). prompts.py exposes the static instruction halves plus scope_router_system()/lightweight_system() builders that take the descriptor, so the router and the lightweight branch always agree on what is modelled. scope_descriptor.md is gitignored like reference.md (built at deploy). 2. Routing eval suite. New "routing" suite in the AI eval harness: a RoutingCase asserts that a prompt routes to "compute" or "lightweight" via _route_scope. Cases are live-only (requirements: [live_model]) since the router makes a real fast-model call; they skip offline. The canonical case is the personal-allowance -> inflation flow (must route "compute": the modelled part needs the engine, the macro caveat is the main loop's job), plus contrasting pure-macro / off-topic / capability (lightweight) and reform-cost / household-calc (compute) cases. Offline eval: 96 passed, 0 failed, 34 skipped across all five suites (routing cases skip offline). Backend unit + eval-harness tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lemetry Five fixes from re-review of the scope router: 1. Run the router off the event loop. _route_scope makes a blocking sync API call; it was awaited directly inside the async stream generator, stalling other in-flight requests. Now wrapped in run_in_executor, matching how the loop already offloads execute_tool. 2. Skip routing on follow-ups. The single-message classifier can't see prior context, so a continuation (e.g. "what if they were married?") could route to lightweight and be answered without the engine. The router now runs only on the opening user turn (_is_followup gates it); follow-ups take the full background. 3. Drop the chart directive from the lightweight branch — it told the model to call generate_chart, but lightweight passes no tools. 4. Filter config-knob keys (fiscal_year, labour_supply, uc_migration) out of the generated scope descriptor so it lists user-facing programmes only; labour_supply is a behavioural-response control we explicitly mark NOT modelled. 5. Add 'route' to the done SSE event for client/telemetry visibility. Tests: added _is_followup coverage, removed the obsolete charts test. 75 passed, 10 skipped; offline eval 96 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@vahid-ahmadi — requesting your review, since this is closely related to your #102 (scope/refusal contract) and I want to be sure they compose rather than collide. How they differ — they operate at different layers:
Why they reinforce each other: the Two things I'd value your eye on:
Still draft while I run the |
|
@vahid-ahmadi — #106 and #109 are circling the same two questions from different angles, and I'd rather we agree a target shape than rebase around each other piecemeal. Can we sync? I think we need an affirmative, agreed pathway on two things before either merges: 1. How we handle irrelevant / unanswerable questions. 2. How we handle systemic metadata. I'm putting #109 back into draft until we've talked, so we settle the architecture before either lands. |
A cheap forced-tool pre-pass builds a structured execution plan for the opening user turn and routes it to one of: irrelevant, out_of_scope, partial, needs_plan, ready. The model grounds each plan slot (source flag); the server gates via per-slot criticality. Any gateway error fails safe to compute. Supersedes the binary scope router (#109) and the reference.md drift stamp (#106; replaced by the engine-derived scope descriptor). - backend/gateway_config.py: criticality table, inferable set, promotions, gate() - backend/gateway.py: run_gateway (forced emit_plan), verdict, writer/plan helpers - backend/prompts.py: gateway + lightweight prompts, scope descriptor - backend/routes/chatbot.py: gateway routing in generate_stream - backend/scripts/build_reference.py: build_scope_descriptor -> scope_descriptor.md - evaluation: GatewayCase + _run_gateway + 14 live cases (evals/cases/gateway) - backend/tests/test_gateway.py: 33 offline tests (gate, parser, fail-safe) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Adds an opt-in scope router to
/chat/message: a cheap pre-check that decides which background a turn needs, while the model still answers the user's real message in every case.It started as the off-topic "topic gate" salvaged from #95 (Sakshi) and has been generalised into a router. Two routes:
compute— full background:SYSTEM_PROMPT+REFERENCE_DOC+ all six tools. Anything needing a calculation or a grounded figure.lightweight— lean background: a smallLIGHTWEIGHT_SYSTEMprompt, no reference doc, no tools. Off-topic, scope/capability, and explicitly-unmodelled questions get a tailored, model-generated reply — not a canned string — without loading the ~20k-token reference doc and the tool schemas.Why
#102's scope reasoning currently runs inside the expensive call, entangled with the full background; the original gate proved a cheap pre-pass works but returned a fixed string. This unifies them: the router keeps the model generating from the user's prompt, and only thecomputebranch pays for the heavy background.How (reuses the existing loop)
Routing only selects
model/system_blocks/ whethertoolsare sent — the same lever plan mode already pulls to drop tools. So billing, usage accounting, thedoneevent, streaming, and the iteration cap are all unchanged. With no tools, the lightweight branch returnsend_turnin one iteration and emitsdonenormally. No parallel code path.Safety properties
compute: empty input, any router error, or an unrecognised reply routes to the full background. A wrongcomputeonly wastes the background we'd have loaded anyway; a wronglightweightrisks answering without the data, so we bias hard against it.POLICYENGINE_CHAT_SCOPE_ROUTER_ENABLED=true(model viaPOLICYENGINE_CHAT_SCOPE_ROUTER_MODEL). Disabled, behaviour is byte-identical to currentmain.Engine-derived scope descriptor (no drift)
The router reasons over a compact
SCOPE_DESCRIPTOR(~a few hundred tokens) instead of the full reference doc. It is now generated from the engine at deploy time:scripts/build_reference.pyalso writesscope_descriptor.md, deriving the modelled half from theParametersschema (the authoritative reform-key list) andcapabilities()(datasets), combined with a curated not-modelled boundary. The route layer loads it alongsidereference.mdand falls back toDEFAULT_SCOPE_DESCRIPTORwhen absent (local dev).prompts.pyowns the instruction text +scope_router_system()/lightweight_system()builders;chatbot.pyloads the descriptor and assembles.scope_descriptor.mdis gitignored likereference.md.Sample generated descriptor (against the installed engine): modelled reform keys
income_tax, national_insurance, universal_credit, child_benefit, pension_credit, …; datasetsfrs, efrs, lcfs, spi, was.Eval coverage
New
routingsuite in the AI eval harness pins the routing deterministically. ARoutingCaseasserts a prompt routes tocomputeorlightweightvia_route_scope. Cases are live-only (requirements: [live_model]) since the router makes a real fast-model call. The canonical case is personal-allowance → inflation (must routecompute— the modelled part needs the engine; the macro caveat is the main loop's job), plus contrasting pure-macro / off-topic / capability (→lightweight) and reform-cost / household-calc (→compute) cases.Not included (Part 2 of #95)
Intentionally excluded — the
/chat/backendscold-start work (model_backends.py,modal_app.pypre-import, theChatPage.tsxdropdown, the workflow tweak), coupled to the closed backend-selector stack (#51).Tests
backend/tests/test_scope_router.py— router parser calibration (light/compute, fail-safe on empty/error/unrecognised),_last_user_textflattening, lightweight blocks exclude the reference doc, router off by default.73 passed, 10 skippedacrosstest_scope_router+test_prompts+test_api+test_evaluation.Follow-ups
routingsuite live (make eval-ai-live) to confirm the classifier hits the expected routes, and tune the router prompt if needed.lightweightinto explicit off-topic / unmodelled / confirm-first handling, tying into the confirm-first proposal on Add scope/refusal contract to chat system prompt (closes #101) #102.Credit: original gate by @SakshiKekre in #95.
🤖 Generated with Claude Code