Add opt-in scope router to /chat/message (load the heavy background only when needed) by anth-volk · Pull Request #109 · PolicyEngine/policyengine-uk-chat

anth-volk · 2026-06-15T18:06:06Z

Summary

Adds an opt-in scope router to /chat/message: a cheap pre-check that decides which background a turn needs, while the model still answers the user's real message in every case.

It started as the off-topic "topic gate" salvaged from #95 (Sakshi) and has been generalised into a router. Two routes:

compute — full background: SYSTEM_PROMPT + REFERENCE_DOC + all six tools. Anything needing a calculation or a grounded figure.
lightweight — lean background: a small LIGHTWEIGHT_SYSTEM prompt, no reference doc, no tools. Off-topic, scope/capability, and explicitly-unmodelled questions get a tailored, model-generated reply — not a canned string — without loading the ~20k-token reference doc and the tool schemas.

Why

#102's scope reasoning currently runs inside the expensive call, entangled with the full background; the original gate proved a cheap pre-pass works but returned a fixed string. This unifies them: the router keeps the model generating from the user's prompt, and only the compute branch pays for the heavy background.

How (reuses the existing loop)

Routing only selects model / system_blocks / whether tools are sent — the same lever plan mode already pulls to drop tools. So billing, usage accounting, the done event, streaming, and the iteration cap are all unchanged. With no tools, the lightweight branch returns end_turn in one iteration and emits done normally. No parallel code path.

Safety properties

Fail-safe to compute: empty input, any router error, or an unrecognised reply routes to the full background. A wrong compute only wastes the background we'd have loaded anyway; a wrong lightweight risks answering without the data, so we bias hard against it.
No numbers from memory: the lean prompt forbids quoting figures/parameter values from memory, preserving the app's "every number comes from a computation" invariant.
Off by default: opt in via POLICYENGINE_CHAT_SCOPE_ROUTER_ENABLED=true (model via POLICYENGINE_CHAT_SCOPE_ROUTER_MODEL). Disabled, behaviour is byte-identical to current main.

Engine-derived scope descriptor (no drift)

The router reasons over a compact SCOPE_DESCRIPTOR (~a few hundred tokens) instead of the full reference doc. It is now generated from the engine at deploy time: scripts/build_reference.py also writes scope_descriptor.md, deriving the modelled half from the Parameters schema (the authoritative reform-key list) and capabilities() (datasets), combined with a curated not-modelled boundary. The route layer loads it alongside reference.md and falls back to DEFAULT_SCOPE_DESCRIPTOR when absent (local dev). prompts.py owns the instruction text + scope_router_system() / lightweight_system() builders; chatbot.py loads the descriptor and assembles. scope_descriptor.md is gitignored like reference.md.

Sample generated descriptor (against the installed engine): modelled reform keys income_tax, national_insurance, universal_credit, child_benefit, pension_credit, …; datasets frs, efrs, lcfs, spi, was.

Eval coverage

New routing suite in the AI eval harness pins the routing deterministically. A RoutingCase asserts a prompt routes to compute or lightweight via _route_scope. Cases are live-only (requirements: [live_model]) since the router makes a real fast-model call. The canonical case is personal-allowance → inflation (must route compute — the modelled part needs the engine; the macro caveat is the main loop's job), plus contrasting pure-macro / off-topic / capability (→ lightweight) and reform-cost / household-calc (→ compute) cases.

Not included (Part 2 of #95)

Intentionally excluded — the /chat/backends cold-start work (model_backends.py, modal_app.py pre-import, the ChatPage.tsx dropdown, the workflow tweak), coupled to the closed backend-selector stack (#51).

Tests

backend/tests/test_scope_router.py — router parser calibration (light/compute, fail-safe on empty/error/unrecognised), _last_user_text flattening, lightweight blocks exclude the reference doc, router off by default.
Offline eval: 96 passed, 0 failed, 34 skipped across all five suites (routing cases skip offline). Backend unit + eval-harness tests green; 73 passed, 10 skipped across test_scope_router + test_prompts + test_api + test_evaluation.

Follow-ups

Run the routing suite live (make eval-ai-live) to confirm the classifier hits the expected routes, and tune the router prompt if needed.
Optional: split lightweight into explicit off-topic / unmodelled / confirm-first handling, tying into the confirm-first proposal on Add scope/refusal contract to chat system prompt (closes #101) #102.

Credit: original gate by @SakshiKekre in #95.

🤖 Generated with Claude Code

Salvages the off-topic "topic gate" from #95 (Sakshi) as a standalone change against main, dropping that PR's second, unrelated half (the /chat/backends warmup, which depended on the closed model-backend selector in #51). The gate is a cheap pre-check: one fast-model (Haiku) classification on the latest user message, short-circuiting clearly off-topic requests with a canned refusal before the heavy chat loop (system prompt, reference doc, tools) ever runs. It is off by default and opt-in via POLICYENGINE_CHAT_TOPIC_GATE_ENABLED; the classifier fails open (any error or ambiguity → treat as on-topic), since rejecting an on-topic question is worse than letting an off-topic one through. Adapted from #95's version, which was built on #51's multi-backend `backend` object: the refusal SSE event now matches main's single-model `done` shape (no `model_backend`), and the wire-up reads `chat_request.messages` directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-15T18:06:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyengine-uk-chat	Ready	Preview, Comment	Jun 15, 2026 7:57pm

github-actions · 2026-06-15T18:07:02Z

Beta preview is ready.

Frontend: open preview
Backend: open backend

…en needed) Generalises the binary off-topic gate into a scope router that decides which *background* a turn needs, never what to say — the model still answers the user's real message in every case: - "compute" full background: SYSTEM_PROMPT + REFERENCE_DOC + all tools. Anything needing a calculation or a grounded figure. - "lightweight" lean background: LIGHTWEIGHT_SYSTEM only, no reference doc, no tools. Off-topic, scope/capability, and explicitly unmodelled questions get a tailored, model-generated reply instead of loading the ~20k-token reference doc and tools — and instead of the previous canned refusal string. Implementation reuses the existing stream loop: routing only selects the model / system blocks / whether tools are sent (the same lever plan mode already pulls to drop tools), so billing, usage accounting, the `done` event, and the iteration cap are unchanged. With no tools, the lightweight branch returns end_turn in one iteration and emits `done` normally. Fail-safe: empty input, any router error, or an unrecognised reply routes to "compute" — a wrong "compute" only wastes the background we'd have loaded anyway, while a wrong "lightweight" risks answering without the data. The lean prompt forbids quoting figures/parameters from memory, preserving the "no numbers without computing" invariant. The scope descriptor is a small hand-written summary for now, with a TODO to derive it from capabilities() at build time so it cannot drift from the engine. Prompt text lives in prompts.py; chatbot.py owns the routing/orchestration. Off by default; opt in via POLICYENGINE_CHAT_SCOPE_ROUTER_ENABLED. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two follow-ups to the scope router. 1. Engine-derived scope descriptor (no drift). build_reference.py now also writes scope_descriptor.md, deriving the "modelled" half from the Parameters schema (the authoritative reform-key list) and capabilities() (datasets), combined with a curated "not modelled" boundary. The route layer loads scope_descriptor.md at startup (alongside reference.md) and falls back to DEFAULT_SCOPE_DESCRIPTOR when it's absent (local dev). prompts.py exposes the static instruction halves plus scope_router_system()/lightweight_system() builders that take the descriptor, so the router and the lightweight branch always agree on what is modelled. scope_descriptor.md is gitignored like reference.md (built at deploy). 2. Routing eval suite. New "routing" suite in the AI eval harness: a RoutingCase asserts that a prompt routes to "compute" or "lightweight" via _route_scope. Cases are live-only (requirements: [live_model]) since the router makes a real fast-model call; they skip offline. The canonical case is the personal-allowance -> inflation flow (must route "compute": the modelled part needs the engine, the macro caveat is the main loop's job), plus contrasting pure-macro / off-topic / capability (lightweight) and reform-cost / household-calc (compute) cases. Offline eval: 96 passed, 0 failed, 34 skipped across all five suites (routing cases skip offline). Backend unit + eval-harness tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lemetry Five fixes from re-review of the scope router: 1. Run the router off the event loop. _route_scope makes a blocking sync API call; it was awaited directly inside the async stream generator, stalling other in-flight requests. Now wrapped in run_in_executor, matching how the loop already offloads execute_tool. 2. Skip routing on follow-ups. The single-message classifier can't see prior context, so a continuation (e.g. "what if they were married?") could route to lightweight and be answered without the engine. The router now runs only on the opening user turn (_is_followup gates it); follow-ups take the full background. 3. Drop the chart directive from the lightweight branch — it told the model to call generate_chart, but lightweight passes no tools. 4. Filter config-knob keys (fiscal_year, labour_supply, uc_migration) out of the generated scope descriptor so it lists user-facing programmes only; labour_supply is a behavioural-response control we explicitly mark NOT modelled. 5. Add 'route' to the done SSE event for client/telemetry visibility. Tests: added _is_followup coverage, removed the obsolete charts test. 75 passed, 10 skipped; offline eval 96 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

anth-volk · 2026-06-16T11:57:51Z

@vahid-ahmadi — requesting your review, since this is closely related to your #102 (scope/refusal contract) and I want to be sure they compose rather than collide.

How they differ — they operate at different layers:

Add scope/refusal contract to chat system prompt (closes #101) #102 reasons about scope inside the model call. The system prompt teaches the model to recognise and decline/scope out-of-bounds requests, with the full background (reference doc + tools) already loaded. It shapes what the model says when a request is partly or wholly out of scope.
Add opt-in scope router to /chat/message (load the heavy background only when needed) #109 reasons about scope before the model call. A cheap Haiku pre-pass classifies the opening message and picks which background to load: compute (full system prompt + reference doc + all 6 tools) vs lightweight (lean prompt, no reference doc, no tools). It never changes the wording — the model still answers the user's real message either way. It's a cost/latency lever, not a refusal policy.

	#102	#109
Where	in-call (system prompt)	pre-call (classifier)
Goal	correct scoping/refusal wording	don't pay for the heavy background on off-topic turns
Changes default behaviour?	yes	no — opt-in flag, off by default
Touches refusal text?	yes	no

Why they reinforce each other: the lightweight branch in #109 is exactly where #102's scope contract — and the confirm-first idea I raised on #102 — would live, because that's the path where we've already decided not to compute. If #102 lands first, #109's lightweight prompt should adopt its language; if #109 lands first, #102 slots cleanly into the lightweight branch. No conflict, just sequencing.

Two things I'd value your eye on:

The fail-safe-to-compute bias: anything ambiguous, any router error, and all follow-up turns load the full background. A wrong compute only wastes background we'd have loaded anyway; a wrong lightweight risks answering without data. Conservative enough for you?
The engine-derived scope descriptor (build_reference.py → scope_descriptor.md) as the source of truth — how that sits alongside your version-stamp approach in Version-stamp reference.md and warn on engine drift (closes #104) #106.

Still draft while I run the routing eval suite live to confirm the classifier hits the expected routes.

anth-volk · 2026-06-16T12:10:01Z

@vahid-ahmadi — #106 and #109 are circling the same two questions from different angles, and I'd rather we agree a target shape than rebase around each other piecemeal. Can we sync? I think we need an affirmative, agreed pathway on two things before either merges:

1. How we handle irrelevant / unanswerable questions.
Three overlapping mechanisms are in flight: your scope/refusal contract (#102, in-call), my scope router (#109, pre-call — picks a light vs. full background), and the confirm-first idea I raised on #102. They can compose (different layers), but we should decide the canonical layering: what actually declines, what just loads a lighter background, and what the user sees in each case.

2. How we handle systemic metadata.
#106 (stamp-and-warn on engine drift) and #109 (derive the scope descriptor live from the Parameters schema + capabilities()) are two different answers to the same "stale-index" risk. They also edit the same two spots — build_reference.py and the REFERENCE_DOC load in chatbot.py — so they'll conflict on merge regardless. We should pick one philosophy (stamp-and-warn vs. derive-live) and apply it consistently across reference.md and the scope descriptor.

I'm putting #109 back into draft until we've talked, so we settle the architecture before either lands.

A cheap forced-tool pre-pass builds a structured execution plan for the opening user turn and routes it to one of: irrelevant, out_of_scope, partial, needs_plan, ready. The model grounds each plan slot (source flag); the server gates via per-slot criticality. Any gateway error fails safe to compute. Supersedes the binary scope router (#109) and the reference.md drift stamp (#106; replaced by the engine-derived scope descriptor). - backend/gateway_config.py: criticality table, inferable set, promotions, gate() - backend/gateway.py: run_gateway (forced emit_plan), verdict, writer/plan helpers - backend/prompts.py: gateway + lightweight prompts, scope descriptor - backend/routes/chatbot.py: gateway routing in generate_stream - backend/scripts/build_reference.py: build_scope_descriptor -> scope_descriptor.md - evaluation: GatewayCase + _run_gateway + 14 live cases (evals/cases/gateway) - backend/tests/test_gateway.py: 33 offline tests (gate, parser, fail-safe) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview June 15, 2026 18:06 View deployment

anth-volk mentioned this pull request Jun 15, 2026

Design note: scope-aware routing (load the background only when needed) #110

Closed

anth-volk changed the title ~~Add opt-in topic gate as a pre-check on /chat/message~~ Add opt-in scope router to /chat/message (load the heavy background only when needed) Jun 15, 2026

vercel Bot deployed to Preview June 15, 2026 19:10 View deployment

vercel Bot deployed to Preview June 15, 2026 19:26 View deployment

vercel Bot deployed to Preview June 15, 2026 19:57 View deployment

anth-volk requested a review from vahid-ahmadi June 16, 2026 11:57

anth-volk marked this pull request as ready for review June 16, 2026 12:03

anth-volk mentioned this pull request Jun 16, 2026

Cut wasted tokens (off-topic gate) and cold-start latency (/chat/backends warmup) #95

Closed

4 tasks

anth-volk mentioned this pull request Jun 16, 2026

Version-stamp reference.md and warn on engine drift (closes #104) #106

Open

anth-volk marked this pull request as draft June 16, 2026 12:10

anth-volk removed the request for review from vahid-ahmadi June 16, 2026 12:10

This was referenced Jun 16, 2026

Add a lightweight AI gateway to /chat/message (5-outcome routing); supersede #109 and #106 #111

Open

Add AI gateway to /chat/message (5-outcome routing) #112

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in scope router to /chat/message (load the heavy background only when needed)#109

Add opt-in scope router to /chat/message (load the heavy background only when needed)#109
anth-volk wants to merge 4 commits into
mainfrom
feat/topic-gate-standalone

anth-volk commented Jun 15, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

anth-volk commented Jun 16, 2026

Uh oh!

anth-volk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anth-volk commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

How (reuses the existing loop)

Safety properties

Engine-derived scope descriptor (no drift)

Eval coverage

Not included (Part 2 of #95)

Tests

Follow-ups

Uh oh!

vercel Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

anth-volk commented Jun 16, 2026

Uh oh!

anth-volk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anth-volk commented Jun 15, 2026 •

edited

Loading

vercel Bot commented Jun 15, 2026 •

edited

Loading