Add scope/refusal contract to chat system prompt (closes #101)#102
Add scope/refusal contract to chat system prompt (closes #101)#102vahid-ahmadi wants to merge 2 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Beta preview is ready.
|
Add a SCOPE & REFUSAL section near the top of SYSTEM_PROMPT defining what is in scope (UK tax/benefit microsimulation over the datasets and years capabilities() reports) and out of scope (non-UK policy, macro forecasting, unannounced Budgets, legal/tax-filing advice, anything capabilities() reports as not modelled). Off-topic questions are declined in one sentence with no tool calls; on-topic-but-unmodelled questions stop after a single capabilities() check instead of looping or guessing API shapes. A partial-answer rule plus a personal-allowance/inflation example guard against false refusals. Prompt-only change: no new tools, no change to _build_system_blocks, no run_python sandbox change. Closes #101 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ba97ba0 to
98c30d1
Compare
| - Partial-answer rule: a question that touches a non-modelled dimension but can | ||
| still be partially answered should be answered with the limitation explained, | ||
| NOT refused. | ||
| - For example, "how will raising the personal allowance affect inflation?" | ||
| should be answered by computing the modelled fiscal and distributional impact | ||
| and clearly noting that second-round macro effects (inflation, behaviour) lie | ||
| outside the microsimulation — not declined outright. |
The decline list flatly listed "inflation" as out-of-scope, but the
flagship partial-answer example was an inflation question it said NOT to
decline — contradictory guidance for the same query type. Scope the macro
decline to pure-forecast asks ("what will inflation/GDP/employment be?")
with no modelled lever, make the partial-answer rule explicitly take
precedence when a modelled policy is in the question, and reword the example
so the answer addresses the modelled part rather than implying it answered
the inflation question.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@anth-volk heads-up on a fix I just pushed ( Contradiction in the original text: the decline list named
— but the flagship partial-answer example was an inflation question it said not to decline. So for the same query type the prompt gave opposite instructions; the model could swing between a curt refusal and a full simulation. Fix:
Note this PR also got rebased onto current main earlier (main had refactored Still flagging, as before, that prompt-behaviour like this really wants an eval case (topic-gate / #52 harness) to lock it in rather than trusting wording alone — happy to follow up. |
|
Following up on my review — a design suggestion on the partial-answer rule specifically (now that As written, the rule is compute-first: when a question centres on a modelled reform but also touches a non-modelled dimension, the model runs the simulation immediately and caveats the unmodelled part inline. I'd like us to consider flipping it to confirm-first:
So Why I think this is worth it:
The tradeoff is a round-trip of latency/friction on the common case where the user did want the modelled answer and would just say "yes" — and it cuts against the app's general eager-compute stance ("every number must come from a tool result you just computed"). One framing question for you: this confirm-first shape is conceptually a scoped Plan-mode turn — Not blocking — the current compute-first version is internally consistent now. Flagging it as a behavioural choice I'd like us to make deliberately, ideally pinned by an eval case (the |
Summary
Adds a
SCOPE & REFUSAL:section near the top ofSYSTEM_PROMPTinbackend/routes/chatbot.pythat makes "out of scope" an explicit contract the model follows. It defines what is in scope (UK tax/benefit microsimulation over the datasets and yearscapabilities()reports) and out of scope (non-UK policy, macro forecasting, unannounced/future Budgets, legal/tax-filing advice, anythingcapabilities()reports as not modelled), with clear off-topic, unmodelled, and partial-answer rules.Rationale
Today the chat has no first-class handling for off-topic or unmodelled questions:
reference.md) for something that should be declined in one sentence.run_pythonand re-guessing API shapes instead of stopping after onecapabilities()check and saying "not modelled."The only prior guardrail was a single buried line. This section replaces that with an explicit in/out-of-scope list plus a stop-after-one-check rule.
A partial-answer rule and a personal-allowance/inflation example guard against false refusals: questions that touch a non-modelled dimension but can still be partially answered are answered with the limitation explained, not declined.
Notes
_build_system_blocks, norun_pythonsandbox change. No added LLM call or latency.Closes #101
🤖 Generated with Claude Code