Skip to content

Add scope/refusal contract to chat system prompt (closes #101)#102

Open
vahid-ahmadi wants to merge 2 commits into
mainfrom
feat/system-prompt-scope-refusal
Open

Add scope/refusal contract to chat system prompt (closes #101)#102
vahid-ahmadi wants to merge 2 commits into
mainfrom
feat/system-prompt-scope-refusal

Conversation

@vahid-ahmadi

Copy link
Copy Markdown
Collaborator

Summary

Adds a SCOPE & REFUSAL: section near the top of SYSTEM_PROMPT in backend/routes/chatbot.py that makes "out of scope" an explicit contract the model follows. It defines what is in scope (UK tax/benefit microsimulation over the datasets and years capabilities() reports) and out of scope (non-UK policy, macro forecasting, unannounced/future Budgets, legal/tax-filing advice, anything capabilities() reports as not modelled), with clear off-topic, unmodelled, and partial-answer rules.

Rationale

Today the chat has no first-class handling for off-topic or unmodelled questions:

  • Off-topic questions are answered anyway, paying full input/output cost (system prompt + cached reference.md) for something that should be declined in one sentence.
  • On-topic-but-unmodelled questions (macro/GDP/inflation, non-UK policy) degrade into re-running run_python and re-guessing API shapes instead of stopping after one capabilities() check and saying "not modelled."

The only prior guardrail was a single buried line. This section replaces that with an explicit in/out-of-scope list plus a stop-after-one-check rule.

A partial-answer rule and a personal-allowance/inflation example guard against false refusals: questions that touch a non-modelled dimension but can still be partially answered are answered with the limitation explained, not declined.

Notes

Closes #101

🤖 Generated with Claude Code

@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-uk-chat Ready Ready Preview, Comment Jun 15, 2026 2:56pm

Request Review

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Beta preview is ready.

Add a SCOPE & REFUSAL section near the top of SYSTEM_PROMPT defining
what is in scope (UK tax/benefit microsimulation over the datasets and
years capabilities() reports) and out of scope (non-UK policy, macro
forecasting, unannounced Budgets, legal/tax-filing advice, anything
capabilities() reports as not modelled).

Off-topic questions are declined in one sentence with no tool calls;
on-topic-but-unmodelled questions stop after a single capabilities()
check instead of looping or guessing API shapes. A partial-answer rule
plus a personal-allowance/inflation example guard against false refusals.

Prompt-only change: no new tools, no change to _build_system_blocks,
no run_python sandbox change.

Closes #101

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread backend/prompts.py Outdated
Comment on lines +31 to +37
- Partial-answer rule: a question that touches a non-modelled dimension but can
still be partially answered should be answered with the limitation explained,
NOT refused.
- For example, "how will raising the personal allowance affect inflation?"
should be answered by computing the modelled fiscal and distributional impact
and clearly noting that second-round macro effects (inflation, behaviour) lie
outside the microsimulation — not declined outright.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vahid-ahmadi You sure about this part?

The decline list flatly listed "inflation" as out-of-scope, but the
flagship partial-answer example was an inflation question it said NOT to
decline — contradictory guidance for the same query type. Scope the macro
decline to pure-forecast asks ("what will inflation/GDP/employment be?")
with no modelled lever, make the partial-answer rule explicitly take
precedence when a modelled policy is in the question, and reword the example
so the answer addresses the modelled part rather than implying it answered
the inflation question.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vahid-ahmadi

Copy link
Copy Markdown
Collaborator Author

@anth-volk heads-up on a fix I just pushed (1108bc2) to the SCOPE & REFUSAL block, in case it affects your review.

Contradiction in the original text: the decline list named inflation as flatly out-of-scope —

Out of scope (decline): … macroeconomic forecasting (GDP, inflation, employment, market reactions)

— but the flagship partial-answer example was an inflation question it said not to decline. So for the same query type the prompt gave opposite instructions; the model could swing between a curt refusal and a full simulation.

Fix:

  1. Scoped the macro decline to the pure-forecast case — "what will inflation/GDP/employment be?" with no modelled tax-benefit lever in the question — so it no longer collides with the example.
  2. Made the partial-answer rule explicitly take precedence when a modelled policy is in the question, so the tie-break is unambiguous.
  3. Reworded the example so the answer makes clear it's addressing the modelled part (fiscal/distributional) and that the macro part is out of scope — rather than implying it answered the inflation question.

Note this PR also got rebased onto current main earlier (main had refactored SYSTEM_PROMPT out of chatbot.py into section constants in prompts.py), so the contract now lives as a SCOPE_AND_REFUSAL section there. Verified: prompt assembles correctly and test_prompts.py passes.

Still flagging, as before, that prompt-behaviour like this really wants an eval case (topic-gate / #52 harness) to lock it in rather than trusting wording alone — happy to follow up.

@anth-volk

Copy link
Copy Markdown
Contributor

Following up on my review — a design suggestion on the partial-answer rule specifically (now that 1108bc2 has resolved the inflation contradiction).

As written, the rule is compute-first: when a question centres on a modelled reform but also touches a non-modelled dimension, the model runs the simulation immediately and caveats the unmodelled part inline. I'd like us to consider flipping it to confirm-first:

  1. State the boundary up front — what it can and can't answer for this question.
  2. Offer the modelled analysis.
  3. Run the simulation only once the user agrees.

So "how will raising the personal allowance affect inflation?" would first return something like: "I can't model the inflation (second-round macro) effect — that's outside the microsimulation. I can show the fiscal and distributional impact of raising the personal allowance. Want that?" — and compute on confirmation, rather than computing immediately.

Why I think this is worth it:

  • It avoids spending the expensive path on an unwanted answer. The partial-answer route runs run_economy_simulation, the heaviest tool we have. If the user actually wanted the inflation answer (which we can't give), compute-first burns a full distributional simulation and then tells them the thing they cared about is out of scope. Confirm-first spends nothing on the engine until the user says the modelled slice is useful.
  • It sets expectations before presenting numbers, which reduces the risk of a partial answer reading as if it were complete — the exact failure mode the caveat is trying to prevent, but handled before the work rather than after.

The tradeoff is a round-trip of latency/friction on the common case where the user did want the modelled answer and would just say "yes" — and it cuts against the app's general eager-compute stance ("every number must come from a tool result you just computed").

One framing question for you: this confirm-first shape is conceptually a scoped Plan-mode turnPLAN_MODE_DIRECTIVE already encodes "don't call tools, ask first." So rather than adding a fourth bespoke rule to SCOPE_AND_REFUSAL, it might be cleaner to express partial-answer cases as "enter a Plan-mode-style turn" and reuse that machinery. Worth deciding whether this is a new rule or a mode interaction.

Not blocking — the current compute-first version is internally consistent now. Flagging it as a behavioural choice I'd like us to make deliberately, ideally pinned by an eval case (the personal-allowance → inflation flow) once we settle where the eval harness lives post-#52.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an explicit scope/refusal contract to the chat system prompt

2 participants