Version-stamp reference.md and warn on engine drift (closes #104) by vahid-ahmadi · Pull Request #106 · PolicyEngine/policyengine-uk-chat

vahid-ahmadi · 2026-06-08T09:26:14Z

Summary

Stamp the engine version into reference.md when it is generated, and warn at load time if the deployed engine has drifted from the version the doc was built against.

Why

reference.md is generated from the installed policyengine_uk_compiled engine and loaded whole into the cached system prompt, but it recorded no engine version. If the deployed engine drifts from the doc, the agent is grounded on stale signatures / parameter schemas and can write code that doesn't match the live engine — producing wrong or inconsistent generated code with no signal that anything is out of sync. This is the app's "stale-index" consistency risk.

What was added

build_reference.py — a robust _engine_version() helper (tries importlib.metadata for both hyphen/underscore distribution spellings, falls back to pe.__version__, then "unknown"). render() now emits, right after the intro paragraph:
- a human-readable line: **Engine version:** \{ver}` (generated at build time).`
- a parseable marker on its own line: 
chatbot.py — after REFERENCE_DOC is read, a best-effort drift check parses the stamped version out via regex, resolves the currently installed engine version the same way, and:
- warns if the stamp is missing,
- warns if stamp != installed (suggesting a rebuild),
- logs an info confirmation if they match.
  The entire check is wrapped in try/except and the engine import is lazy, so a missing engine, missing distribution, or any parse failure simply skips the check — it can never break reference loading or request handling.

Closes #104

🤖 Generated with Claude Code

vercel · 2026-06-08T09:26:21Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyengine-uk-chat	Ready	Preview, Comment	Jun 15, 2026 2:42pm

github-actions · 2026-06-08T09:26:57Z

Beta preview is ready.

Frontend: open preview
Backend: open backend

anth-volk

Reviewed with attention to how this interacts with the deploy pipeline. The implementation is careful — every failure mode degrades to "check silently does nothing" rather than breaking anything. Two findings worth addressing before merge, two smaller ones inline.

1. The warning is structurally unreachable in production. modal_app.py bakes the backend into the image (add_local_dir(..., copy=True)) and runs build_reference.py in that same image build, immediately after pip_install of the engine — so the stamp and the installed version are regenerated together on every deploy and are equal by construction (same for the Dockerfile path it mirrors). The check's real value is local dev (where a stale reference.md genuinely lingers) and future-proofing against build-process changes. That's still worth having, but the PR framing overstates the production risk — and it makes the dev-checkout bug flagged inline the one that matters, since dev is where this check earns its keep.

2. No tests. The drift check is trivially unit-testable: feed a doc with a matching / mismatched / missing stamp, monkeypatch the version resolver, assert on caplog. Given the prompt-contract tests added in #92/#94, two or three focused tests here would match the local standard.

Things I checked that are fine: the regex is correctly anchored to its own marker line; the two stamp lines add negligible tokens to the cached prompt and the HTML comment is harmless to the model; the module-level call runs once at import and early-returns on the empty-doc path, so CI (no reference.md) gets no new noise; and the branch auto-merges cleanly with current main despite being authored against pre-#92 code.

Verdict: sound idea, safe implementation — I'd like the dev-checkout version resolution fixed and tests added before merge; the helper dedup and debug-log are nice-to-haves to bundle in.

anth-volk · 2026-06-11T17:54:44Z

    logger.warning("[CHAT] reference.md not found — run scripts/build_reference.py")


+def _installed_engine_version():


This deliberately mirrors _engine_version() in build_reference.py, but two copies of the same resolution logic is the pattern that just got cleaned up for get_capabilities in #94 — they will drift. Suggest a single get_engine_version() in tooling/simulations.py used by both; the script runs standalone but can reach it with a one-line sys.path insert (it already computes the repo-relative path for OUT).

anth-volk · 2026-06-11T17:54:44Z

+        import policyengine_uk_compiled as pe
+
+        return getattr(pe, "__version__", None)


Both halves of this fallback are dead in practice, and that disables the check exactly where drift is most likely:

policyengine_uk_compiled has no __version__ attribute (verified against 0.35.0), so this getattr always returns None.

In local-checkout dev setups the engine is only importable after tooling.simulations.ensure_compiled_package_importable() inserts the rust checkout into sys.path — a bare import policyengine_uk_compiled here raises ImportError, so _installed_engine_version() returns None and the check silently skips.

Production always resolves via distribution metadata (and always matches by construction — see review body), so local dev is the one environment where this check adds value, and it's the one where this returns None. Suggest calling ensure_compiled_package_importable() before the fallback import, or routing through a shared tooling helper.

anth-volk · 2026-06-11T17:54:44Z

+    except Exception:
+        # Drift check is purely advisory; never let it affect loading.
+        pass


Best-effort is right, but fully silent swallowing makes "check skipped due to a bug" indistinguishable from "check never ran". A logger.debug(f"[CHAT] reference drift check failed: {exc}") costs nothing and preserves the never-breaks-loading guarantee.

anth-volk · 2026-06-11T17:54:44Z

+            return importlib.metadata.version(dist)
+        except importlib.metadata.PackageNotFoundError:
+            continue
+    return getattr(pe, "__version__", "unknown")


pe.__version__ doesn't exist (verified against 0.35.0), so in any environment without distribution metadata — e.g. the local rust-checkout setup — this stamps unknown. A metadata-resolving runtime then logs the confusing "reference.md was built against engine unknown" warning. Worth either special-casing unknown at the check side (treat as unverifiable, like a missing stamp) or accepting the noisy-but-safe behaviour knowingly.

vahid-ahmadi

Review: Version-stamp reference.md and warn on engine drift (#106)

The code is clean, defensive, and zero-risk to merge. My concern is that the mechanism doesn't actually catch the risk #104 describes — and given how reference.md is deployed, the drift warning will almost never fire.

✅ Code quality — good

_engine_version() / _installed_engine_version() handle both dist spellings + __version__ fallback. The load-time check is fully wrapped in try/except with a lazy import, so it can never break reference loading or request handling. Correct call.
Stamping the version into the doc is genuinely useful for observability/debugging ("which engine was this prompt built against") regardless of the drift logic.

🟡 The drift check will essentially never fire in the real deploy model

reference.md is not committed (.gitignore:14) — it's generated in the same image build as the engine:

backend/Dockerfile:15 → RUN python scripts/build_reference.py, after deps are installed.
modal_app.py:49 → python scripts/build_reference.py against the Modal-installed engine.

So the stamped version is read from the same installed engine the server then runs against → stamp == installed on every normal deploy, and _check_reference_engine_drift logs the info "matches" branch every time. The mismatch warning only triggers if the doc and the engine are produced at different times (e.g. a prebuilt base image with a stale doc + a later engine bump, or a manual runtime reinstall) — which the current Dockerfile/Modal flow doesn't do. As wired today it's close to a guaranteed no-op.

🟡 It doesn't check the thing that's actually at risk

#104's concern is the agent being grounded on stale signatures / parameter schemas / reform-recipe field names. This PR compares a version string, not content. Two gaps:

A schema/field change within a version (or a hand-maintained recipe field name that was never right — see the #88 fix where recipes 3–5 had non-existent fields that silently no-op'd) produces no version mismatch, so this check stays silent.
It's log-only — a buried startup logger.warning, no CI gate, no signal to the user or agent. Nothing fails.

The real protection is build-time validation: in build_reference.py, assert each documented reform-recipe / parameter field name exists in Parameters.model_json_schema() and fail the build if not. That catches the actual stale-index failure mode at the moment the doc is generated, version-bump or not.

🟡 Related: the engine isn't pinned

backend/requirements.txt:7 pins policyengine-uk-compiled>=0.20.0 — a floor, not a pin. A fresh image can resolve to any newer engine. That's the drift this PR targets, but because the doc is rebuilt in the same image, it self-heals silently rather than warning. Tightening to ==0.38.0 (or a tight range) is the more direct mitigation.

Merge-readiness

Currently CONFLICTING / DIRTY — needs a rebase. The conflict is in build_reference.py::render(), the exact intro region #88 just modified (the recipes landed right where this inserts the stamp), plus chatbot.py has moved. Straightforward to rebase.

Verdict

Harmless and adds useful version observability, so no objection to landing it after rebase. But it should not be treated as closing #104 — as deployed it rarely fires and never inspects the at-risk content. I'd pair it with (1) build-time schema validation of recipe/param field names that fails the build, and (2) tightening the engine pin. Happy to take the build-time-validation follow-up if useful.

reference.md is generated from the installed policyengine_uk_compiled engine and loaded whole into the cached system prompt, but recorded no engine version. If the deployed engine drifts from the doc, the agent is grounded on stale signatures/schemas with no signal anything is wrong. - build_reference.py: resolve the engine version robustly and stamp it into the header (human-readable line) plus a parseable `` marker. - chatbot.py: at load, parse the stamped version, resolve the installed engine version the same way, and log a warning on mismatch or missing stamp. Wrapped in try/except so it can never break reference loading or request handling. Closes #104 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vahid-ahmadi · 2026-06-15T14:42:16Z

@anth-volk thanks — all four addressed in the rebase (now on current main, 92592fb).

1. Helper dedup → shared resolver. Pulled the version resolution into a single get_engine_version() in tooling/simulations.py; both build_reference.py (build-time stamp) and chatbot.py (load-time check) now call it. No more two-copies-that-drift. The script reaches it with the one-line sys.path insert you suggested (it already computed the repo-relative path).

2. Dev-checkout dead fallback — fixed. This was the important one. get_engine_version() now calls ensure_compiled_package_importable() before resolving, so the rust-checkout is on sys.path and the local-dev path actually works. I also dropped the pe.__version__ fallback entirely — you're right that it doesn't exist (confirmed on 0.38.0), so distribution metadata is the only real source; the helper returns None when it can't resolve, and callers treat that as "unverifiable."

3. unknown stamp — special-cased at the check side. _check_reference_engine_drift now treats a missing or "unknown" stamp as unverifiable (one warning, no comparison), so a metadata-resolving runtime no longer logs the confusing "built against engine unknown" mismatch.

4. Silent swallow → debug log. The catch-all now does logger.debug("[CHAT] reference drift check skipped: {exc}"), so "skipped due to a bug" is distinguishable from "ran clean" without touching the never-breaks-loading guarantee.

Tests added — backend/tests/test_reference_drift.py, 7 cases on the drift logic (matching / mismatched / missing / unknown stamp, unresolvable installed version, empty-doc no-op, and the resolver returning a string when the engine is installed), monkeypatching the resolver and asserting on caplog per the local standard. All pass in python313; verified end-to-end too: build_reference.py stamps 0.38.0 and load logs "matches installed engine 0.38.0".

On your review-body point: agreed the production warning is unreachable by construction (doc + engine regenerate together) — its value is local dev and future build-process changes, which is exactly where fix #2 makes it land. Separately I'd still flag that this checks a version string, not the recipe/parameter field names that actually drift (cf. #88), and requirements.txt pins >=0.20.0 not ==; both are follow-ups rather than part of this PR.

anth-volk · 2026-06-16T12:10:02Z

@vahid-ahmadi — #106 and #109 are circling the same two questions from different angles, and I'd rather we agree a target shape than rebase around each other piecemeal. Can we sync? I think we need an affirmative, agreed pathway on two things before either merges:

1. How we handle irrelevant / unanswerable questions.
Three overlapping mechanisms are in flight: your scope/refusal contract (#102, in-call), my scope router (#109, pre-call — picks a light vs. full background), and the confirm-first idea I raised on #102. They can compose (different layers), but we should decide the canonical layering: what actually declines, what just loads a lighter background, and what the user sees in each case.

2. How we handle systemic metadata.
#106 (stamp-and-warn on engine drift) and #109 (derive the scope descriptor live from the Parameters schema + capabilities()) are two different answers to the same "stale-index" risk. They also edit the same two spots — build_reference.py and the REFERENCE_DOC load in chatbot.py — so they'll conflict on merge regardless. We should pick one philosophy (stamp-and-warn vs. derive-live) and apply it consistently across reference.md and the scope descriptor.

I'm putting #109 back into draft until we've talked, so we settle the architecture before either lands.

A cheap forced-tool pre-pass builds a structured execution plan for the opening user turn and routes it to one of: irrelevant, out_of_scope, partial, needs_plan, ready. The model grounds each plan slot (source flag); the server gates via per-slot criticality. Any gateway error fails safe to compute. Supersedes the binary scope router (#109) and the reference.md drift stamp (#106; replaced by the engine-derived scope descriptor). - backend/gateway_config.py: criticality table, inferable set, promotions, gate() - backend/gateway.py: run_gateway (forced emit_plan), verdict, writer/plan helpers - backend/prompts.py: gateway + lightweight prompts, scope descriptor - backend/routes/chatbot.py: gateway routing in generate_stream - backend/scripts/build_reference.py: build_scope_descriptor -> scope_descriptor.md - evaluation: GatewayCase + _run_gateway + 14 live cases (evals/cases/gateway) - backend/tests/test_gateway.py: 33 offline tests (gate, parser, fail-safe) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview June 8, 2026 09:27 View deployment

vahid-ahmadi requested a review from anth-volk June 8, 2026 09:28

anth-volk reviewed Jun 11, 2026

View reviewed changes

vahid-ahmadi commented Jun 15, 2026

View reviewed changes

vahid-ahmadi force-pushed the feat/reference-version-stamp branch from 0f4fef7 to 92592fb Compare June 15, 2026 14:41

vercel Bot deployed to Preview June 15, 2026 14:42 View deployment

anth-volk mentioned this pull request Jun 16, 2026

Add opt-in scope router to /chat/message (load the heavy background only when needed) #109

Draft

This was referenced Jun 16, 2026

Add a lightweight AI gateway to /chat/message (5-outcome routing); supersede #109 and #106 #111

Open

Add AI gateway to /chat/message (5-outcome routing) #112

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version-stamp reference.md and warn on engine drift (closes #104)#106

Version-stamp reference.md and warn on engine drift (closes #104)#106
vahid-ahmadi wants to merge 1 commit into
mainfrom
feat/reference-version-stamp

vahid-ahmadi commented Jun 8, 2026

Uh oh!

vercel Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

anth-volk left a comment

Uh oh!

anth-volk Jun 11, 2026

Uh oh!

anth-volk Jun 11, 2026

Uh oh!

anth-volk Jun 11, 2026

Uh oh!

anth-volk Jun 11, 2026

Uh oh!

vahid-ahmadi left a comment

Uh oh!

vahid-ahmadi commented Jun 15, 2026

Uh oh!

anth-volk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		logger.warning("[CHAT] reference.md not found — run scripts/build_reference.py")


		def _installed_engine_version():

		import policyengine_uk_compiled as pe

		return getattr(pe, "__version__", None)

Conversation

vahid-ahmadi commented Jun 8, 2026

Summary

Why

What was added

Uh oh!

vercel Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

anth-volk left a comment

Choose a reason for hiding this comment

Uh oh!

anth-volk Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

anth-volk Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

anth-volk Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

anth-volk Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

vahid-ahmadi left a comment

Choose a reason for hiding this comment

Review: Version-stamp reference.md and warn on engine drift (#106)

✅ Code quality — good

🟡 The drift check will essentially never fire in the real deploy model

🟡 It doesn't check the thing that's actually at risk

🟡 Related: the engine isn't pinned

Merge-readiness

Verdict

Uh oh!

vahid-ahmadi commented Jun 15, 2026

Uh oh!

anth-volk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 8, 2026 •

edited

Loading