Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .context/app/planning/development-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -688,7 +688,7 @@ _Indicative tasks:_

### F9.1 — Production hardening

_Status:_ not started · _Size:_ ~1–2 PRs · _Owner:_ TBD · _Deps:_ everything (final pass)
_Status:_ in flight ([tracker](./features/f9.1.md)) · _Size:_ ~1–2 PRs · _Owner:_ TBD · _Deps:_ everything (final pass)

The pre-ship technical hardening: concurrent-session sanity, master + sub-flag inventory, verification that every flag and sub-flag controls the right surfaces independently.

Expand Down
106 changes: 106 additions & 0 deletions .context/app/planning/features/f9.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
feature: F9.1
title: Production hardening
phase: P9 — Hardening + forking docs
status: in flight
owner: TBD
deps: everything (final pass — P0–P8 complete)
opened: 2026-06-09
plan: .context/app/planning/development-plan.md#f91--production-hardening
docs: .context/app/questionnaire/feature-flags.md
---

# F9.1 — Production hardening

> Committable tracker for **F9.1**. The pre-ship technical pass: prove the questionnaire
> surface holds up under concurrency, document the master + sub-flag matrix, and verify each
> flag gates its own surface independently. A **verification + documentation** feature — it
> adds no new runtime capability. Gated by `APP_QUESTIONNAIRES_ENABLED` like everything else.

## Intent

F9.1 is the first feature of P9 and the final hardening pass before ConQuest is demo-grade
and fork-ready. Its job is not to build — it is to **prove** what P0–P8 built: that 20+
concurrent respondent sessions don't deadlock, orphan turns, or drop audit writes; that the
eleven feature flags gate exactly the surfaces they claim and nothing else; and that the
full respondent happy path still composes end-to-end. The deliverables are a smoke harness, a
flag-inventory doc, a flag-verification test suite, and a green integration pass.

## Decisions (confirmed with the user)

- **Plan-spec-only scope.** Stay strictly within the plan's four indicative tasks
(concurrency smoke, flag inventory, per-flag verification, happy-path pass). Gaps that
exploration surfaced — no IP-keyed rate limit on anonymous session creation, no data
retention, no per-version spend cap — are **out of scope**, documented as follow-ups (see
below), not built. Keeps F9.1 a clean gate, not a grab-bag.
- **Concurrency smoke against the real DB.** The "no deadlocks / orphan turns / missed audit
writes" invariants are a real-Postgres concern; the house integration tests mock Prisma and
can't catch a transaction race. So the concurrency check is a **smoke script** (`scripts/smoke/`)
against the dev DB, not a vitest test.
- **LLM stubbed by construction, not via a fake provider.** The orchestrator's paid compute
only _produces_ the `AnswerSlotIntent`s that `persistTurn` writes; the smoke feeds those
intents directly, so there is no LLM in the loop to stub. Cleaner and more deterministic
than wiring a multi-schema fake provider through `registerProviderInstance` (the mechanism
the other smoke scripts use). The smoke drives the **real** persistence seams
(`createAnonymousSession` → `persistTurn` → `markSessionCompleted`), which is where the
concurrency risk actually lives.

## Build shape (branch `feat/F9.1-production-hardening`)

- **Concurrency + happy-path smoke** — `scripts/smoke/concurrent-sessions.ts`,
`npm run smoke:concurrent-sessions`. Seeds one launched, anonymous-mode version (4 free-text
slots), then creates **24 sessions concurrently**, runs 4 turns each through the live
persistence seams, and completes them. Asserts: no rejected promise / deadlock (40P01); each
session has exactly its 4 turns with contiguous ordinals (no orphan/dropped turns); every
answered slot is back-stamped with a real turn id; each session has exactly one `created` and
one `completed` `AppQuestionnaireSessionEvent` (no missed audit writes). `--single` runs one
verbose happy-path session plus a F8.2 results-export read — the plan's "final happy-path
integration pass" stitched journey. Everything hangs off one `smoke-test-f91` questionnaire;
cleanup cascades the graph. Idempotent. Registered in `scripts/smoke/README.md`.
- **Flag inventory doc** — `.context/app/questionnaire/feature-flags.md`. The authoritative
matrix: master + 10 sub-flags, each flag's `feature_flag` name, resolver, required parents,
gated surface, and off-behaviour. States the two design rules (disabled surface **404s** not
401s; a sub-flag requires its parents) and the **three off-behaviour shapes** (route-404 /
degrade / behaviour-inside-route). Calls out that the flags are **DB rows, not env vars**.
Linked from the namespace `README.md` index.
- **Per-sub-flag verification suite** — `tests/unit/lib/app/questionnaire/feature-flag.test.ts`
(extended from master-only to all eleven resolvers). Data-driven truth tables: each resolver
is true only when all required flags are on, false when any one (master / live-sessions /
its own sub-flag) is off. Plus **independence** (one sub-flag off suppresses only its own
resolver, every sibling stays true), the **live-sessions cascade** (parent off closes the
voice/attachment/cost-cap trio), the **master transitive close**, and the `ensure*` route-gate
404-envelope contract. 54 tests. Per-route gating stays covered by each route's own
`route.test.ts`; this suite is the consolidated matrix check.

## Feature-flag matrix

The canonical inventory — every flag, its dependency chain, and its off-behaviour — lives in
[`../../app/questionnaire/feature-flags.md`](../../app/questionnaire/feature-flags.md).

## Verification

- `npm run smoke:concurrent-sessions` — 24 sessions · 96 turns · invariants reconciled; runs
twice in a row clean (idempotent); leaves no `smoke-test-f91` rows.
- `npm run smoke:concurrent-sessions -- --single` — happy path reaches a completed session +
a non-empty results export.
- `npx vitest run tests/unit/lib/app/questionnaire/feature-flag.test.ts` — 54 pass.
- `npm run validate` clean; the app suites (`tests/unit|integration/.../app/**`) green (1780
tests) — the final integration pass.

## Out of scope (documented follow-ups, not built)

Surfaced during F9.1 exploration; deliberately deferred to keep F9.1 verification-only. Each
is a real feature in its own right:

- **IP-keyed rate limit on anonymous session creation** — the no-login
`questionnaire-sessions/anonymous` path is keyed per-session; a global `anon:IP` cap on
session _creation_ would harden it against session-minting abuse.
- **Data retention / purge** — completed sessions, turns (respondent PII), and answer slots are
kept indefinitely; there is no time-based purge.
- **Per-version / per-admin spend cap** — only a per-session `costBudgetUsd` exists; an
expensive launched version can accrue unbounded spend across many sessions.

## No CHANGELOG entry

F9.1 touches only app-owned smoke/test/docs — no Sunrise platform surface. Per the repo's
platform-scoped CHANGELOG policy, it adds no `CHANGELOG.md` bullet.
1 change: 1 addition & 0 deletions .context/app/questionnaire/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ plan and feature trackers, see [`../planning/`](../planning/); for the platform
| [`cost-cap-enforcement.md`](./cost-cap-enforcement.md) | Per-session USD budget at the turn boundary — soft wrap-up nudge at 90%, hard 402 + auto-pause at 100%, summed turn cost, dark-launch flag (F6.3) |
| [`answer-slot-panel.md`](./answer-slot-panel.md) | The live respondent answer panel beside the chat — `GET …/answers` read endpoint, scope config, confidence language, Revisit wiring (F7.2) |
| [`anonymous-mode.md`](./anonymous-mode.md) | The cross-surface PII contract — per-surface gates, the profile snapshot rule, k-anonymity suppression, erasure cascade (F8.3) |
| [`feature-flags.md`](./feature-flags.md) | The master + 10 sub-flag gate matrix — what each flag gates, its dependency chain, and the three off-behaviour shapes (404 / degrade / behaviour-inside-route) (F9.1) |

## Where the code lives

Expand Down
80 changes: 80 additions & 0 deletions .context/app/questionnaire/feature-flags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Feature-flag inventory — the questionnaire gate matrix (F9.1)

The questionnaire product dark-launches behind **one master flag and ten sub-flags**. This
is the authoritative inventory: every flag, what it gates, what it depends on, and exactly
what a respondent or admin sees when it is **off**. It is the reference the F9.1 hardening
pass verifies (`tests/unit/lib/app/questionnaire/feature-flag.test.ts`) and the runbook
(F9.2) toggles against.

The flag resolvers live in [`lib/app/questionnaire/feature-flag.ts`](../../../lib/app/questionnaire/feature-flag.ts);
the canonical flag-name constants live in the dependency-light
[`constants.ts`](../../../lib/app/questionnaire/constants.ts) (so the seed can import a name
without the resolver's HTTP/DB deps).

## They are DB rows, not env vars

> ⚠️ **`APP_QUESTIONNAIRES_*_ENABLED` are `feature_flag` table rows, not environment
> variables.** The name _looks_ like an env var; it is not. Every resolver is a thin
> wrapper over Sunrise's `isFeatureEnabled(name)`, which reads the `feature_flag` table.

Toggle a flag by writing its row (admin feature-flag surface / seed / a direct DB update),
**not** by setting a shell variable. A flag with no row resolves to its seeded default. This
matters for the runbook and for any "turn X off and confirm the surface disappears" check —
you are flipping a row, and the change is live without a redeploy.

## The two design rules

1. **A disabled surface 404s — it does not 401.** Every route-level gate runs **before**
auth (`withQuestionnairesEnabled` / `withLiveSessionsEnabled` / `withVoiceInputEnabled`
wrap the handler so the gate fires first). A switched-off feature is therefore
indistinguishable from a route that was never built — no information leaks about a
feature that exists but is dark. Never place a gate after `withAdminAuth`/`withAuth`.

2. **A sub-flag requires its parents.** Every sub-flag resolver `AND`s the master flag (and,
for the live-dependent trio, the live-sessions flag) — so turning a parent off
transitively closes every child, and no child can run with its parent dark.

## The matrix

`is*Enabled()` returns `true` only when **all** the flags in its "Requires" column are on.

| # | Flag (`feature_flag` name) | Resolver | Requires | Gates | Off-behaviour |
| --- | ---------------------------------------------------- | --------------------------------------------------------- | -------------------------- | ---------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 0 | `APP_QUESTIONNAIRES_ENABLED` | `isQuestionnairesEnabled` / `ensureQuestionnairesEnabled` | — (master) | **the entire app** — every `/api/v1/app/**` route, every admin + respondent surface | every questionnaire route **404s**; the whole product is invisible |
| 1 | `APP_QUESTIONNAIRES_ADAPTIVE_STRATEGY_ENABLED` | `isAdaptiveSelectionEnabled` | master | F4.1 adaptive (embedding + LLM) next-question selection | a version set to `adaptive` **degrades to `weighted`** (no 404 — selection still runs, just cheaper) |
| 2 | `APP_QUESTIONNAIRES_ANSWER_EXTRACTION_ENABLED` | `isAnswerExtractionEnabled` | master | F4.2 answer-extraction preview route | route **404s** |
| 3 | `APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_ENABLED` | `isContradictionDetectionEnabled` | master | F4.3 contradiction-detection preview route | route **404s** |
| 4 | `APP_QUESTIONNAIRES_ANSWER_REFINEMENT_ENABLED` | `isAnswerRefinementEnabled` | master | F4.4 answer-refinement preview route | route **404s** |
| 5 | `APP_QUESTIONNAIRES_COMPLETION_ENABLED` | `isCompletionEnabled` | master | F4.5 completion-offer **phrasing** (the LLM prose) | the completion-status route returns the deterministic **assessment with no composed offer** (no 404 — the free assessment is always available under the master flag) |
| 6 | `APP_QUESTIONNAIRES_DESIGN_EVALUATION_ENABLED` | `isDesignEvaluationEnabled` | master | F5.1 seven-judge design-evaluation preview route | route **404s** (the whole route is paid LLM work — no free fallback) |
| 7 | `APP_QUESTIONNAIRES_LIVE_SESSIONS_ENABLED` | `isLiveSessionsEnabled` / `ensureLiveSessionsEnabled` | master | F6.1 respondent surface — session-create + `/messages` turn loop (incl. the no-login anonymous path) | session-create and messages routes **404**; the respondent surface disappears |
| 8 | `APP_QUESTIONNAIRES_VOICE_INPUT_ENABLED` | `isVoiceInputEnabled` / `ensureVoiceInputEnabled` | master **+ live-sessions** | F6.2 voice transcribe route | route **404s** (a transcript is useless without the live turn loop, so voice is gated behind live-sessions, not merely beside it) |
| 9 | `APP_QUESTIONNAIRES_ATTACHMENT_INPUT_ENABLED` | `isAttachmentInputEnabled` | master **+ live-sessions** | respondent image/document attachments on a `/messages` turn | the chat hides the attach affordance and the `/messages` route **ignores any attachments** a client sends (no 404 — it gates a behaviour inside an already-gated route) |
| 10 | `APP_QUESTIONNAIRES_COST_CAP_ENABLED` | `isCostCapEnforcementEnabled` | master **+ live-sessions** | F6.3 per-session USD budget check at the turn boundary | turns run with **no budget check** even when a version sets `costBudgetUsd` (no 404 — it gates a behaviour inside the messages route) |

## The three off-behaviour shapes

Reading the table, every sub-flag falls into one of three shapes — know which one you are
verifying:

- **Route 404** (flags 2, 3, 4, 6, 7, 8) — the gated route is paid LLM work or a whole
surface; off ⇒ the route returns 404 via its `ensure*`/`with*` wrapper.
- **Degrade** (flags 1, 5) — a cheaper deterministic result stands in: adaptive → weighted;
composed offer → bare assessment. The route still responds.
- **Behaviour-inside-route** (flags 9, 10) — there is no route to 404; the flag toggles a
branch inside an already-gated route (attachments ignored; budget check skipped).

When verifying "with each off, the gated surface is suppressed and the rest is unaffected",
assert against the shape: a 404 for the first group, the fallback result for the second, the
absent side-effect for the third.

## Verification

- **Resolver truth tables** — `tests/unit/lib/app/questionnaire/feature-flag.test.ts` pins,
for every resolver, that it is `true` only when all required flags are on and `false` when
the master, the sub-flag, or (for the live trio) live-sessions is off.
- **Independence** — the same suite asserts a representative route behind each gate is
suppressed when its flag is off while a sibling behind a different (still-on) flag keeps
responding, so flags gate their own surface and nothing else.
- **Concurrency / happy path** — `npm run smoke:concurrent-sessions` exercises the live
respondent surface (flag 7) end-to-end against the real DB.
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
"db:reset": "prisma migrate reset --force",
"db:drift-check": "tsx --env-file=.env.local scripts/db/check-drift.ts",
"smoke:chat": "tsx --env-file=.env.local scripts/smoke/chat.ts",
"smoke:concurrent-sessions": "tsx --env-file=.env.local scripts/smoke/concurrent-sessions.ts",
"smoke:orchestration": "tsx --env-file=.env.local scripts/smoke/orchestration.ts",
"smoke:hybrid-search": "tsx --env-file=.env.local scripts/smoke/knowledge-hybrid-search.ts",
"smoke:transcribe": "tsx --env-file=.env.local scripts/smoke/transcribe.ts",
Expand Down
Loading
Loading