Skip to content

Design doc: conversation memory layer (#86)#98

Open
vahid-ahmadi wants to merge 1 commit into
mainfrom
docs/conversation-memory-design
Open

Design doc: conversation memory layer (#86)#98
vahid-ahmadi wants to merge 1 commit into
mainfrom
docs/conversation-memory-design

Conversation

@vahid-ahmadi

Copy link
Copy Markdown
Collaborator

Summary

Design doc (no implementation) for issue #86: persist a structured active scenario (household + reform + comparison baseline) across turns within a conversation, so follow-ups like "what if they were married?" mutate one field instead of forcing the agent to re-derive the whole household from scrollback. The doc is grounded in the current code — the streaming loop and prompt assembly in backend/routes/chatbot.py, the tool dispatch/definitions in backend/agent_tools.py, the chat_conversations persistence in backend/routes/conversations.py, and the input chrome in frontend/src/app/ChatPage.tsx — with file:line references throughout. Tagged needs-design; opening in PR form for discussion before any code.

TL;DR of the recommended approach

  • One active_scenario per conversation, owned by the client and resent each turn — mirrors how messages already round-trip (client-owns-state), so no second/racing DB writer.
  • Two tools, get_scenario / update_scenario(patch), added to TOOL_DEFINITIONS, but handled in the chat loop (not in execute_tool) because they touch per-conversation state and execute_tool is intentionally stateless. update_scenario shallow-merges (household merged one level deeper).
  • Live scenario snapshot injected as a per-turn system block after the cache breakpoints in _build_system_blocks (same pattern as Plan/Charts directives) — the agent sees prior state for free and is robust to a missed write; static behavioural rules go in the cached SYSTEM_PROMPT.
  • Persistence: one nullable active_scenario TEXT column on chat_conversations, added via the existing idempotent ensure_table ALTER pattern; round-trips through save/load/share and the done SSE event + chat request.
  • UI: a one-line "Active scenario" pill in the existing toggle row beside Plan/Charts, click-through to a modal with the full JSON + reset.
  • Schema is advisory (not Pydantic-validated) in v1, keyed to match _build_compiled_policy / typed-tool args so it slots into Register the three dormant typed tools (UK) #55/Add uk_python typed tools (run_economy_simulation) to fix B1 reform failures #97 once those land.
  • Shippable in 3 small PRs (backend plumbing → persistence round-trip → pill UI), plus optional polish.

Key open questions flagged for the team

  1. Granularity — recommend per-conversation (not per-message); explicit "fork" later.
  2. Display — pill one-liner vs full-structure modal.
  3. Plan-mode interaction — Plan mode structurally omits tools, so the agent can read but not patch the scenario in Plan mode; argued this is correct, needs only a directive wording tweak.
  4. Vs typed tools (Register the three dormant typed tools (UK) #55/Land typed reform tools to short-circuit reform-API guessing (track PR #55) #81/Add uk_python typed tools (run_economy_simulation) to fix B1 reform failures #97) — keep the reform/household shape aligned with the typed-tool args; sequence this after/alongside them so the schema doesn't get redone.

Refs #86

🤖 Generated with Claude Code

Design-only proposal for persisting a structured active_scenario
(household + reform + comparison baseline) across turns within a
conversation, with get_scenario/update_scenario tools, server-side
round-trip through the chat_conversations table, and an active-scenario
pill in the chat input chrome. No implementation — for PR discussion.

Refs #86

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-uk-chat Ready Ready Preview, Comment Jun 8, 2026 8:17am

Request Review

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Beta preview is ready.

@vahid-ahmadi vahid-ahmadi requested a review from anth-volk June 8, 2026 09:27

@anth-volk anth-volk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up as a doc first — the problem is real (one-shot turns forcing the agent to re-derive the household from scrollback), and the strongest instinct in here is the snapshot-on-read safety net: injecting the live scenario as a per-turn block so the agent always sees prior state even if it forgets to write back. That's the right backbone. A few things should be reconciled before this guides code, and I'd like us to settle some of the structure more broadly so the implementation lands consistently.

1. Reconcile against the now-merged typed tools (#94)

The doc's central sequencing assumption is no longer true. Throughout §5-Q4 it states the model "today only sees run_python + generate_chart… the typed tools exist as functions but are not registered… update_scenario phrasing should not assume typed tools are live until #55/#97 merge."

tool_definitions.py now registers six tools — validate_reform, calculate_household, run_economy_simulation, analyse_microdata, run_python, generate_chart — since #94 landed. So Q4 flips from "defer until the typed-tool schema settles" to "the schema is settled, proceed." The reform/household shape the scenario is meant to mirror should be aligned directly to the live run_economy_simulation / calculate_household signatures rather than treated as forthcoming.

2. Refresh the file:line references and adapt to concurrent dispatch

#92 moved prompt text into prompts.py and reorganised the loop, so most anchors have drifted: SYSTEM_PROMPT is imported from prompts (no longer inline at chatbot.py:30-87), _build_system_blocks is at :119, ChatRequest at :154, and the dispatch is now a TOOL_HANDLERS[name](**input) map rather than the tools = {...} dict the doc cites.

The core "Option A — handle scenario tools in the loop, keep execute_tool stateless" still holds (execute_tool is still stateless (name, input)). But tool dispatch is now concurrent (execute_tool_async + asyncio.ensure_future, gathered). That makes handling the scenario tools synchronously in the loop — rather than dispatching them into the gather — not just cleaner but necessary, to avoid a shared mutable scenario dict being patched from racing tasks. Worth making that explicit.

3. Reconsider the two-tools-plus-snapshot shape (the part I'd most like to tighten)

The doc proposes two tools — get_scenario (read) and update_scenario (write) — and an always-injected per-turn snapshot of the scenario. Those last two overlap: §3.3 itself notes the snapshot lets the agent "see the state for free without spending a tool round-trip." If the canonical state is already in the agent's context every turn, get_scenario has almost nothing to do — the only case it covers is the agent wanting a re-fetch of something it can already see, which shouldn't happen. So we're paying for an extra tool in the cached TOOL_DEFINITIONS array (and an extra branch in the loop, and an extra thing the model can call needlessly) for no real capability.

I'd collapse this to one write tool + the passive snapshot: inject the scenario each turn (the safety net), and give the agent a single update_scenario(patch) to mutate it. That's fewer tokens in the cached tool block, one less code path in the loop, and one less way for the agent to burn a turn on a redundant read. If there's a concrete case where the injected snapshot is insufficient and a live get_scenario is genuinely needed (e.g. the snapshot is deliberately truncated for size and the agent needs the full object), let's state that explicitly as the justification — otherwise drop it. This also interacts with the size-cap discussion in §7: if we ever truncate the injected snapshot, that's exactly when a read tool earns its place, so the two decisions should be made together rather than separately.

4. Justify client-owned vs server-authoritative state explicitly

The doc picks "client owns active_scenario, resends each turn, server stores/relays" for consistency with how messages already flow and to avoid a second writer racing saveConversation. That's a reasonable, low-risk v1, and the tamper surface is genuinely small (the scenario re-enters only as advisory prompt text, never executed — sandboxed run_python is the sole execution path).

But it's worth being explicit that the more robust pattern — and where agent frameworks generally point — is server-side state keyed by session_id, which we already have: chat_conversations is keyed by session_id with an idempotent ensure_table migration. A server-authoritative scenario (single writer, last-write-wins on one nullable column, not touching messages) is natural here, not exotic, and the "two racing writers" objection is solvable. I'm fine shipping client-owned for v1, but let's name server-authoritative as the explicit v2 target rather than dismissing it, so we don't bake the client-owned assumption in permanently.

Broader: let's settle the structure for consistency before coding

Beyond the point fixes above, I'd like us to align on a few structural questions up front so the three implementation PRs don't drift from the rest of the backend:

  • Tool surface — one write tool vs two (per §3 above), and confirm scenario tools are handled in the loop, never in execute_tool, so the "execute_tool is stateless" invariant stays intact.
  • State ownership — client-owned vs server-authoritative, decided deliberately (per §4) and documented as the reason, since it dictates the persistence and request-shape changes.
  • Schema home — where the canonical active_scenario shape lives and who owns it. Given #92/#94 split responsibilities (prompts.py owns prompt text, tool_definitions.py owns schemas, tooling/ owns shared helpers), the scenario schema + _merge_scenario should land in the matching places rather than inline in the loop, and the doc should say which.
  • Eval coverage — the single → married follow-up is the canonical regression for this feature; let's commit it to the eval harness as part of PR 1 rather than "optional," so the memory behaviour is enforced deterministically and not just prompted.

None of this changes the core idea — it's a good direction. I'd just like the doc refreshed against current main (typed tools live, refs corrected), the tool surface tightened, and the state-ownership choice made explicitly, before it drives the phased PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants