human-centric-engineering · JohnD-EE · Jun 9, 2026 · Jun 9, 2026
diff --git a/.context/app/planning/development-plan.md b/.context/app/planning/development-plan.md
@@ -688,7 +688,7 @@ _Indicative tasks:_
 
 ### F9.1 — Production hardening
 
-_Status:_ not started · _Size:_ ~1–2 PRs · _Owner:_ TBD · _Deps:_ everything (final pass)
+_Status:_ in flight ([tracker](./features/f9.1.md)) · _Size:_ ~1–2 PRs · _Owner:_ TBD · _Deps:_ everything (final pass)
 
 The pre-ship technical hardening: concurrent-session sanity, master + sub-flag inventory, verification that every flag and sub-flag controls the right surfaces independently.
 

diff --git a/.context/app/planning/features/f9.1.md b/.context/app/planning/features/f9.1.md
@@ -0,0 +1,106 @@
+---
+feature: F9.1
+title: Production hardening
+phase: P9 — Hardening + forking docs
+status: in flight
+owner: TBD
+deps: everything (final pass — P0–P8 complete)
+opened: 2026-06-09
+plan: .context/app/planning/development-plan.md#f91--production-hardening
+docs: .context/app/questionnaire/feature-flags.md
+---
+
+# F9.1 — Production hardening
+
+> Committable tracker for **F9.1**. The pre-ship technical pass: prove the questionnaire
+> surface holds up under concurrency, document the master + sub-flag matrix, and verify each
+> flag gates its own surface independently. A **verification + documentation** feature — it
+> adds no new runtime capability. Gated by `APP_QUESTIONNAIRES_ENABLED` like everything else.
+
+## Intent
+
+F9.1 is the first feature of P9 and the final hardening pass before ConQuest is demo-grade
+and fork-ready. Its job is not to build — it is to **prove** what P0–P8 built: that 20+
+concurrent respondent sessions don't deadlock, orphan turns, or drop audit writes; that the
+eleven feature flags gate exactly the surfaces they claim and nothing else; and that the
+full respondent happy path still composes end-to-end. The deliverables are a smoke harness, a
+flag-inventory doc, a flag-verification test suite, and a green integration pass.
+
+## Decisions (confirmed with the user)
+
+- **Plan-spec-only scope.** Stay strictly within the plan's four indicative tasks
+  (concurrency smoke, flag inventory, per-flag verification, happy-path pass). Gaps that
+  exploration surfaced — no IP-keyed rate limit on anonymous session creation, no data
+  retention, no per-version spend cap — are **out of scope**, documented as follow-ups (see
+  below), not built. Keeps F9.1 a clean gate, not a grab-bag.
+- **Concurrency smoke against the real DB.** The "no deadlocks / orphan turns / missed audit
+  writes" invariants are a real-Postgres concern; the house integration tests mock Prisma and
+  can't catch a transaction race. So the concurrency check is a **smoke script** (`scripts/smoke/`)
+  against the dev DB, not a vitest test.
+- **LLM stubbed by construction, not via a fake provider.** The orchestrator's paid compute
+  only _produces_ the `AnswerSlotIntent`s that `persistTurn` writes; the smoke feeds those
+  intents directly, so there is no LLM in the loop to stub. Cleaner and more deterministic
+  than wiring a multi-schema fake provider through `registerProviderInstance` (the mechanism
+  the other smoke scripts use). The smoke drives the **real** persistence seams
+  (`createAnonymousSession` → `persistTurn` → `markSessionCompleted`), which is where the
+  concurrency risk actually lives.
+
+## Build shape (branch `feat/F9.1-production-hardening`)
+
+- **Concurrency + happy-path smoke** — `scripts/smoke/concurrent-sessions.ts`,
+  `npm run smoke:concurrent-sessions`. Seeds one launched, anonymous-mode version (4 free-text
+  slots), then creates **24 sessions concurrently**, runs 4 turns each through the live
+  persistence seams, and completes them. Asserts: no rejected promise / deadlock (40P01); each
+  session has exactly its 4 turns with contiguous ordinals (no orphan/dropped turns); every
+  answered slot is back-stamped with a real turn id; each session has exactly one `created` and
+  one `completed` `AppQuestionnaireSessionEvent` (no missed audit writes). `--single` runs one
+  verbose happy-path session plus a F8.2 results-export read — the plan's "final happy-path
+  integration pass" stitched journey. Everything hangs off one `smoke-test-f91` questionnaire;
+  cleanup cascades the graph. Idempotent. Registered in `scripts/smoke/README.md`.
+- **Flag inventory doc** — `.context/app/questionnaire/feature-flags.md`. The authoritative
+  matrix: master + 10 sub-flags, each flag's `feature_flag` name, resolver, required parents,
+  gated surface, and off-behaviour. States the two design rules (disabled surface **404s** not
+  401s; a sub-flag requires its parents) and the **three off-behaviour shapes** (route-404 /
+  degrade / behaviour-inside-route). Calls out that the flags are **DB rows, not env vars**.
+  Linked from the namespace `README.md` index.
+- **Per-sub-flag verification suite** — `tests/unit/lib/app/questionnaire/feature-flag.test.ts`
+  (extended from master-only to all eleven resolvers). Data-driven truth tables: each resolver
+  is true only when all required flags are on, false when any one (master / live-sessions /
+  its own sub-flag) is off. Plus **independence** (one sub-flag off suppresses only its own
+  resolver, every sibling stays true), the **live-sessions cascade** (parent off closes the
+  voice/attachment/cost-cap trio), the **master transitive close**, and the `ensure*` route-gate
+  404-envelope contract. 54 tests. Per-route gating stays covered by each route's own
+  `route.test.ts`; this suite is the consolidated matrix check.
+
+## Feature-flag matrix
+
+The canonical inventory — every flag, its dependency chain, and its off-behaviour — lives in
+[`../../app/questionnaire/feature-flags.md`](../../app/questionnaire/feature-flags.md).
+
+## Verification
+
+- `npm run smoke:concurrent-sessions` — 24 sessions · 96 turns · invariants reconciled; runs
+  twice in a row clean (idempotent); leaves no `smoke-test-f91` rows.
+- `npm run smoke:concurrent-sessions -- --single` — happy path reaches a completed session +
+  a non-empty results export.
+- `npx vitest run tests/unit/lib/app/questionnaire/feature-flag.test.ts` — 54 pass.
+- `npm run validate` clean; the app suites (`tests/unit|integration/.../app/**`) green (1780
+  tests) — the final integration pass.
+
+## Out of scope (documented follow-ups, not built)
+
+Surfaced during F9.1 exploration; deliberately deferred to keep F9.1 verification-only. Each
+is a real feature in its own right:
+
+- **IP-keyed rate limit on anonymous session creation** — the no-login
+  `questionnaire-sessions/anonymous` path is keyed per-session; a global `anon:IP` cap on
+  session _creation_ would harden it against session-minting abuse.
+- **Data retention / purge** — completed sessions, turns (respondent PII), and answer slots are
+  kept indefinitely; there is no time-based purge.
+- **Per-version / per-admin spend cap** — only a per-session `costBudgetUsd` exists; an
+  expensive launched version can accrue unbounded spend across many sessions.
+
+## No CHANGELOG entry
+
+F9.1 touches only app-owned smoke/test/docs — no Sunrise platform surface. Per the repo's
+platform-scoped CHANGELOG policy, it adds no `CHANGELOG.md` bullet.
diff --git a/.context/app/questionnaire/README.md b/.context/app/questionnaire/README.md
@@ -28,6 +28,7 @@ plan and feature trackers, see [`../planning/`](../planning/); for the platform
 | [`cost-cap-enforcement.md`](./cost-cap-enforcement.md)       | Per-session USD budget at the turn boundary — soft wrap-up nudge at 90%, hard 402 + auto-pause at 100%, summed turn cost, dark-launch flag (F6.3)                                  |
 | [`answer-slot-panel.md`](./answer-slot-panel.md)             | The live respondent answer panel beside the chat — `GET …/answers` read endpoint, scope config, confidence language, Revisit wiring (F7.2)                                         |
 | [`anonymous-mode.md`](./anonymous-mode.md)                   | The cross-surface PII contract — per-surface gates, the profile snapshot rule, k-anonymity suppression, erasure cascade (F8.3)                                                     |
+| [`feature-flags.md`](./feature-flags.md)                     | The master + 10 sub-flag gate matrix — what each flag gates, its dependency chain, and the three off-behaviour shapes (404 / degrade / behaviour-inside-route) (F9.1)              |
 
 ## Where the code lives
 

diff --git a/.context/app/questionnaire/feature-flags.md b/.context/app/questionnaire/feature-flags.md
@@ -0,0 +1,80 @@
+# Feature-flag inventory — the questionnaire gate matrix (F9.1)
+
+The questionnaire product dark-launches behind **one master flag and ten sub-flags**. This
+is the authoritative inventory: every flag, what it gates, what it depends on, and exactly
+what a respondent or admin sees when it is **off**. It is the reference the F9.1 hardening
+pass verifies (`tests/unit/lib/app/questionnaire/feature-flag.test.ts`) and the runbook
+(F9.2) toggles against.
+
+The flag resolvers live in [`lib/app/questionnaire/feature-flag.ts`](../../../lib/app/questionnaire/feature-flag.ts);
+the canonical flag-name constants live in the dependency-light
+[`constants.ts`](../../../lib/app/questionnaire/constants.ts) (so the seed can import a name
+without the resolver's HTTP/DB deps).
+
+## They are DB rows, not env vars
+
+> ⚠️ **`APP_QUESTIONNAIRES_*_ENABLED` are `feature_flag` table rows, not environment
+> variables.** The name _looks_ like an env var; it is not. Every resolver is a thin
+> wrapper over Sunrise's `isFeatureEnabled(name)`, which reads the `feature_flag` table.
+
+Toggle a flag by writing its row (admin feature-flag surface / seed / a direct DB update),
+**not** by setting a shell variable. A flag with no row resolves to its seeded default. This
+matters for the runbook and for any "turn X off and confirm the surface disappears" check —
+you are flipping a row, and the change is live without a redeploy.
+
+## The two design rules
+
+1. **A disabled surface 404s — it does not 401.** Every route-level gate runs **before**
+   auth (`withQuestionnairesEnabled` / `withLiveSessionsEnabled` / `withVoiceInputEnabled`
+   wrap the handler so the gate fires first). A switched-off feature is therefore
+   indistinguishable from a route that was never built — no information leaks about a
+   feature that exists but is dark. Never place a gate after `withAdminAuth`/`withAuth`.
+
+2. **A sub-flag requires its parents.** Every sub-flag resolver `AND`s the master flag (and,
+   for the live-dependent trio, the live-sessions flag) — so turning a parent off
+   transitively closes every child, and no child can run with its parent dark.
+
+## The matrix
+
+`is*Enabled()` returns `true` only when **all** the flags in its "Requires" column are on.
+
+| #   | Flag (`feature_flag` name)                           | Resolver                                                  | Requires                   | Gates                                                                                                | Off-behaviour                                                                                                                                                           |
+| --- | ---------------------------------------------------- | --------------------------------------------------------- | -------------------------- | ---------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 0   | `APP_QUESTIONNAIRES_ENABLED`                         | `isQuestionnairesEnabled` / `ensureQuestionnairesEnabled` | — (master)                 | **the entire app** — every `/api/v1/app/**` route, every admin + respondent surface                  | every questionnaire route **404s**; the whole product is invisible                                                                                                      |
+| 1   | `APP_QUESTIONNAIRES_ADAPTIVE_STRATEGY_ENABLED`       | `isAdaptiveSelectionEnabled`                              | master                     | F4.1 adaptive (embedding + LLM) next-question selection                                              | a version set to `adaptive` **degrades to `weighted`** (no 404 — selection still runs, just cheaper)                                                                    |
+| 2   | `APP_QUESTIONNAIRES_ANSWER_EXTRACTION_ENABLED`       | `isAnswerExtractionEnabled`                               | master                     | F4.2 answer-extraction preview route                                                                 | route **404s**                                                                                                                                                          |
+| 3   | `APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_ENABLED` | `isContradictionDetectionEnabled`                         | master                     | F4.3 contradiction-detection preview route                                                           | route **404s**                                                                                                                                                          |
+| 4   | `APP_QUESTIONNAIRES_ANSWER_REFINEMENT_ENABLED`       | `isAnswerRefinementEnabled`                               | master                     | F4.4 answer-refinement preview route                                                                 | route **404s**                                                                                                                                                          |
+| 5   | `APP_QUESTIONNAIRES_COMPLETION_ENABLED`              | `isCompletionEnabled`                                     | master                     | F4.5 completion-offer **phrasing** (the LLM prose)                                                   | the completion-status route returns the deterministic **assessment with no composed offer** (no 404 — the free assessment is always available under the master flag)    |
+| 6   | `APP_QUESTIONNAIRES_DESIGN_EVALUATION_ENABLED`       | `isDesignEvaluationEnabled`                               | master                     | F5.1 seven-judge design-evaluation preview route                                                     | route **404s** (the whole route is paid LLM work — no free fallback)                                                                                                    |
+| 7   | `APP_QUESTIONNAIRES_LIVE_SESSIONS_ENABLED`           | `isLiveSessionsEnabled` / `ensureLiveSessionsEnabled`     | master                     | F6.1 respondent surface — session-create + `/messages` turn loop (incl. the no-login anonymous path) | session-create and messages routes **404**; the respondent surface disappears                                                                                           |
+| 8   | `APP_QUESTIONNAIRES_VOICE_INPUT_ENABLED`             | `isVoiceInputEnabled` / `ensureVoiceInputEnabled`         | master **+ live-sessions** | F6.2 voice transcribe route                                                                          | route **404s** (a transcript is useless without the live turn loop, so voice is gated behind live-sessions, not merely beside it)                                       |
+| 9   | `APP_QUESTIONNAIRES_ATTACHMENT_INPUT_ENABLED`        | `isAttachmentInputEnabled`                                | master **+ live-sessions** | respondent image/document attachments on a `/messages` turn                                          | the chat hides the attach affordance and the `/messages` route **ignores any attachments** a client sends (no 404 — it gates a behaviour inside an already-gated route) |
+| 10  | `APP_QUESTIONNAIRES_COST_CAP_ENABLED`                | `isCostCapEnforcementEnabled`                             | master **+ live-sessions** | F6.3 per-session USD budget check at the turn boundary                                               | turns run with **no budget check** even when a version sets `costBudgetUsd` (no 404 — it gates a behaviour inside the messages route)                                   |
+
+## The three off-behaviour shapes
+
+Reading the table, every sub-flag falls into one of three shapes — know which one you are
+verifying:
+
+- **Route 404** (flags 2, 3, 4, 6, 7, 8) — the gated route is paid LLM work or a whole
+  surface; off ⇒ the route returns 404 via its `ensure*`/`with*` wrapper.
+- **Degrade** (flags 1, 5) — a cheaper deterministic result stands in: adaptive → weighted;
+  composed offer → bare assessment. The route still responds.
+- **Behaviour-inside-route** (flags 9, 10) — there is no route to 404; the flag toggles a
+  branch inside an already-gated route (attachments ignored; budget check skipped).
+
+When verifying "with each off, the gated surface is suppressed and the rest is unaffected",
+assert against the shape: a 404 for the first group, the fallback result for the second, the
+absent side-effect for the third.
+
+## Verification
+
+- **Resolver truth tables** — `tests/unit/lib/app/questionnaire/feature-flag.test.ts` pins,
+  for every resolver, that it is `true` only when all required flags are on and `false` when
+  the master, the sub-flag, or (for the live trio) live-sessions is off.
+- **Independence** — the same suite asserts a representative route behind each gate is
+  suppressed when its flag is off while a sibling behind a different (still-on) flag keeps
+  responding, so flags gate their own surface and nothing else.
+- **Concurrency / happy path** — `npm run smoke:concurrent-sessions` exercises the live
+  respondent surface (flag 7) end-to-end against the real DB.
diff --git a/package.json b/package.json
@@ -35,6 +35,7 @@
     "db:reset": "prisma migrate reset --force",
     "db:drift-check": "tsx --env-file=.env.local scripts/db/check-drift.ts",
     "smoke:chat": "tsx --env-file=.env.local scripts/smoke/chat.ts",
+    "smoke:concurrent-sessions": "tsx --env-file=.env.local scripts/smoke/concurrent-sessions.ts",
     "smoke:orchestration": "tsx --env-file=.env.local scripts/smoke/orchestration.ts",
     "smoke:hybrid-search": "tsx --env-file=.env.local scripts/smoke/knowledge-hybrid-search.ts",
     "smoke:transcribe": "tsx --env-file=.env.local scripts/smoke/transcribe.ts",