Add AI gateway to /chat/message (5-outcome routing) by anth-volk · Pull Request #112 · PolicyEngine/policyengine-uk-chat

anth-volk · 2026-06-16T19:03:03Z

Fixes #111

Summary

Adds an AI gateway to /chat/message: a cheap pre-pass (fast model, forced tool-use) that runs once on the opening user turn, builds a structured execution plan, and routes the turn into one of five outcomes instead of always paying for the full background.

irrelevant → decline, invite a reframe
out_of_scope → explain the modelled angle we can compute instead
partial → state what's modellable vs not, ask whether to run it (confirm-first)
needs_plan → ask 1–3 targeted clarifying questions (auto plan-mode)
ready → run the normal compute loop, seeded with the resolved plan

The first four reply on the lean lightweight path (no reference doc, no tools, one turn). Only ready pays for the heavy background. Any gateway error fails safe to compute. The gateway is uncalibrated until the live eval runs, so treat that live run as a pre-merge gate.

Design — the model grounds, the server gates

The gateway model emits, per plan slot, only a value and a source (prompt/default/assumed). It does not judge importance. The server looks up a static per-slot criticality and applies a deterministic gate:

a slot forces a clarifying question iff source == assumed AND criticality ∈ {high, medium} AND it is not model-inferable.

This split (in gateway_config.py) keeps the gate auditable, unit-testable offline, and independent of poorly-calibrated model self-confidence. Over-asking is bounded by the inferable set (schema-required-but-derivable slots like benunit/household are inferred, never asked) and by context promotions (e.g. a wealth-tax question promotes dataset to high so an assumed FRS default gets flagged).

Safety

Fail-safe to ready/compute on any parse/timeout/API error, an empty/garbage plan, or a plan missing the routing decision — a wrong refusal is the worst outcome, so admissibility biases toward not refusing.
Turn boundary needs no new state: a user's answer to a partial/needs_plan prompt has a prior assistant turn, so _is_followup skips the gateway and the turn flows straight to compute with full context.
Reuses the existing stream loop end to end (the lightweight branch terminates via the normal no-tool convergence path); done events carry route + outcome.

Supersedes #109 and #106

Add opt-in scope router to /chat/message (load the heavy background only when needed) #109 (binary compute/lightweight scope router) → generalised into the 5-outcome gateway.
Version-stamp reference.md and warn on engine drift (closes #104) #106 (stamp-and-warn drift on reference.md) → replaced by derive-live: build_reference.py now generates scope_descriptor.md from the engine, so it can't drift.

Recommend closing both once this lands.

Removes manually-selected plan mode

The manual plan-mode toggle (the /plan command + button and the plan_mode request field) is removed end to end — backend, frontend, tests, eval cases, and skill docs. The gateway's needs_plan outcome is now the sole source of clarifying questions, so a manual override is redundant. (charts_mode is unaffected.)

Tests

backend/tests/test_gateway.py — 34 offline tests (gate logic, criticality, inferable clause, promotions, run_gateway parser, fail-safe paths, writer directives, flag-off). Full backend suite: 162 passed, 10 skipped.
New gateway eval suite — 14 live cases across all 5 outcomes, with minimal-pair discriminators (wealth-tax specified vs not; FRS-correct vs WAS-required dataset; partial vs out_of_scope; inferable household slots must-not-gate). Offline eval: 96 passed, 0 failed, 42 skipped (gateway cases are live-only).

Follow-ups

Live calibration (the outstanding step): make eval-ai-live --suites gateway to confirm the classifier hits the expected outcomes; tune the gateway prompt + promotions against any misroutes. Criticality is deterministic, so most tuning is the grounding prompt.
Optionally re-add a light reference.md drift warning on top of derive-live, if wanted.

🤖 Generated with Claude Code

vercel · 2026-06-16T19:03:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyengine-uk-chat	Ready	Preview, Comment	Jun 17, 2026 2:13pm

github-actions · 2026-06-16T19:03:55Z

Beta preview is ready.

Frontend: open preview
Backend: open backend

A cheap forced-tool pre-pass builds a structured execution plan for the opening user turn and routes it to one of: irrelevant, out_of_scope, partial, needs_plan, ready. The model grounds each plan slot (source flag); the server gates via per-slot criticality. Any gateway error fails safe to compute. Supersedes the binary scope router (#109) and the reference.md drift stamp (#106; replaced by the engine-derived scope descriptor). - backend/gateway_config.py: criticality table, inferable set, promotions, gate() - backend/gateway.py: run_gateway (forced emit_plan), verdict, writer/plan helpers - backend/prompts.py: gateway + lightweight prompts, scope descriptor - backend/routes/chatbot.py: gateway routing in generate_stream - backend/scripts/build_reference.py: build_scope_descriptor -> scope_descriptor.md - evaluation: GatewayCase + _run_gateway + 14 live cases (evals/cases/gateway) - backend/tests/test_gateway.py: 33 offline tests (gate, parser, fail-safe) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The gateway's needs_plan outcome is now the only source of clarifying questions, so the manual plan-mode toggle is removed end to end: - backend: plan_mode request field, PLAN_MODE_DIRECTIVE, the not-plan_mode guards, and plan-mode handling in the eval runner/schemas - frontend: the /plan slash command + handler, the Plan toggle button, plan_mode in the request bodies, and the now-unused IconBulb import - tests + eval cases for plan mode, and stale plan-mode references in the skill docs and the cap-hit fallback message Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add backend/model_config.py with DEFAULT_TEMPERATURE (0) and SUGGESTION_TEMPERATURE (1), and route every model call through them. Previously only the compute loop and eval harness pinned temperature; the gateway, titling, and follow-up suggestions silently inherited the SDK default of 1.0. The gateway is a classifier, so it now runs deterministically at 0. Removes the old CHAT_TEMPERATURE constant (env var ANTHROPIC_CHAT_TEMPERATURE -> ANTHROPIC_TEMPERATURE). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview June 16, 2026 19:03 View deployment

vercel Bot deployed to Preview June 16, 2026 19:08 View deployment

vercel Bot deployed to Preview June 16, 2026 19:11 View deployment

anth-volk changed the title ~~Add opt-in AI gateway to /chat/message (5-outcome routing)~~ Add AI gateway to /chat/message (5-outcome routing) Jun 16, 2026

anth-volk force-pushed the feat/chat-gateway branch from ea06514 to a153910 Compare June 16, 2026 19:16

vercel Bot deployed to Preview June 16, 2026 19:17 View deployment

vercel Bot deployed to Preview June 17, 2026 13:26 View deployment

vercel Bot deployed to Preview June 17, 2026 14:13 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AI gateway to /chat/message (5-outcome routing)#112

Add AI gateway to /chat/message (5-outcome routing)#112
anth-volk wants to merge 3 commits into
mainfrom
feat/chat-gateway

anth-volk commented Jun 16, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anth-volk commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design — the model grounds, the server gates

Safety

Supersedes #109 and #106

Removes manually-selected plan mode

Tests

Follow-ups

Uh oh!

vercel Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anth-volk commented Jun 16, 2026 •

edited

Loading

vercel Bot commented Jun 16, 2026 •

edited

Loading