diff --git a/.context/app/planning/development-plan.md b/.context/app/planning/development-plan.md index dab2980fa..0979c7dd1 100644 --- a/.context/app/planning/development-plan.md +++ b/.context/app/planning/development-plan.md @@ -688,7 +688,7 @@ _Indicative tasks:_ ### F9.1 — Production hardening -_Status:_ not started · _Size:_ ~1–2 PRs · _Owner:_ TBD · _Deps:_ everything (final pass) +_Status:_ in flight ([tracker](./features/f9.1.md)) · _Size:_ ~1–2 PRs · _Owner:_ TBD · _Deps:_ everything (final pass) The pre-ship technical hardening: concurrent-session sanity, master + sub-flag inventory, verification that every flag and sub-flag controls the right surfaces independently. diff --git a/.context/app/planning/features/f9.1.md b/.context/app/planning/features/f9.1.md new file mode 100644 index 000000000..c64063e6f --- /dev/null +++ b/.context/app/planning/features/f9.1.md @@ -0,0 +1,106 @@ +--- +feature: F9.1 +title: Production hardening +phase: P9 — Hardening + forking docs +status: in flight +owner: TBD +deps: everything (final pass — P0–P8 complete) +opened: 2026-06-09 +plan: .context/app/planning/development-plan.md#f91--production-hardening +docs: .context/app/questionnaire/feature-flags.md +--- + +# F9.1 — Production hardening + +> Committable tracker for **F9.1**. The pre-ship technical pass: prove the questionnaire +> surface holds up under concurrency, document the master + sub-flag matrix, and verify each +> flag gates its own surface independently. A **verification + documentation** feature — it +> adds no new runtime capability. Gated by `APP_QUESTIONNAIRES_ENABLED` like everything else. + +## Intent + +F9.1 is the first feature of P9 and the final hardening pass before ConQuest is demo-grade +and fork-ready. Its job is not to build — it is to **prove** what P0–P8 built: that 20+ +concurrent respondent sessions don't deadlock, orphan turns, or drop audit writes; that the +eleven feature flags gate exactly the surfaces they claim and nothing else; and that the +full respondent happy path still composes end-to-end. The deliverables are a smoke harness, a +flag-inventory doc, a flag-verification test suite, and a green integration pass. + +## Decisions (confirmed with the user) + +- **Plan-spec-only scope.** Stay strictly within the plan's four indicative tasks + (concurrency smoke, flag inventory, per-flag verification, happy-path pass). Gaps that + exploration surfaced — no IP-keyed rate limit on anonymous session creation, no data + retention, no per-version spend cap — are **out of scope**, documented as follow-ups (see + below), not built. Keeps F9.1 a clean gate, not a grab-bag. +- **Concurrency smoke against the real DB.** The "no deadlocks / orphan turns / missed audit + writes" invariants are a real-Postgres concern; the house integration tests mock Prisma and + can't catch a transaction race. So the concurrency check is a **smoke script** (`scripts/smoke/`) + against the dev DB, not a vitest test. +- **LLM stubbed by construction, not via a fake provider.** The orchestrator's paid compute + only _produces_ the `AnswerSlotIntent`s that `persistTurn` writes; the smoke feeds those + intents directly, so there is no LLM in the loop to stub. Cleaner and more deterministic + than wiring a multi-schema fake provider through `registerProviderInstance` (the mechanism + the other smoke scripts use). The smoke drives the **real** persistence seams + (`createAnonymousSession` → `persistTurn` → `markSessionCompleted`), which is where the + concurrency risk actually lives. + +## Build shape (branch `feat/F9.1-production-hardening`) + +- **Concurrency + happy-path smoke** — `scripts/smoke/concurrent-sessions.ts`, + `npm run smoke:concurrent-sessions`. Seeds one launched, anonymous-mode version (4 free-text + slots), then creates **24 sessions concurrently**, runs 4 turns each through the live + persistence seams, and completes them. Asserts: no rejected promise / deadlock (40P01); each + session has exactly its 4 turns with contiguous ordinals (no orphan/dropped turns); every + answered slot is back-stamped with a real turn id; each session has exactly one `created` and + one `completed` `AppQuestionnaireSessionEvent` (no missed audit writes). `--single` runs one + verbose happy-path session plus a F8.2 results-export read — the plan's "final happy-path + integration pass" stitched journey. Everything hangs off one `smoke-test-f91` questionnaire; + cleanup cascades the graph. Idempotent. Registered in `scripts/smoke/README.md`. +- **Flag inventory doc** — `.context/app/questionnaire/feature-flags.md`. The authoritative + matrix: master + 10 sub-flags, each flag's `feature_flag` name, resolver, required parents, + gated surface, and off-behaviour. States the two design rules (disabled surface **404s** not + 401s; a sub-flag requires its parents) and the **three off-behaviour shapes** (route-404 / + degrade / behaviour-inside-route). Calls out that the flags are **DB rows, not env vars**. + Linked from the namespace `README.md` index. +- **Per-sub-flag verification suite** — `tests/unit/lib/app/questionnaire/feature-flag.test.ts` + (extended from master-only to all eleven resolvers). Data-driven truth tables: each resolver + is true only when all required flags are on, false when any one (master / live-sessions / + its own sub-flag) is off. Plus **independence** (one sub-flag off suppresses only its own + resolver, every sibling stays true), the **live-sessions cascade** (parent off closes the + voice/attachment/cost-cap trio), the **master transitive close**, and the `ensure*` route-gate + 404-envelope contract. 54 tests. Per-route gating stays covered by each route's own + `route.test.ts`; this suite is the consolidated matrix check. + +## Feature-flag matrix + +The canonical inventory — every flag, its dependency chain, and its off-behaviour — lives in +[`../../app/questionnaire/feature-flags.md`](../../app/questionnaire/feature-flags.md). + +## Verification + +- `npm run smoke:concurrent-sessions` — 24 sessions · 96 turns · invariants reconciled; runs + twice in a row clean (idempotent); leaves no `smoke-test-f91` rows. +- `npm run smoke:concurrent-sessions -- --single` — happy path reaches a completed session + + a non-empty results export. +- `npx vitest run tests/unit/lib/app/questionnaire/feature-flag.test.ts` — 54 pass. +- `npm run validate` clean; the app suites (`tests/unit|integration/.../app/**`) green (1780 + tests) — the final integration pass. + +## Out of scope (documented follow-ups, not built) + +Surfaced during F9.1 exploration; deliberately deferred to keep F9.1 verification-only. Each +is a real feature in its own right: + +- **IP-keyed rate limit on anonymous session creation** — the no-login + `questionnaire-sessions/anonymous` path is keyed per-session; a global `anon:IP` cap on + session _creation_ would harden it against session-minting abuse. +- **Data retention / purge** — completed sessions, turns (respondent PII), and answer slots are + kept indefinitely; there is no time-based purge. +- **Per-version / per-admin spend cap** — only a per-session `costBudgetUsd` exists; an + expensive launched version can accrue unbounded spend across many sessions. + +## No CHANGELOG entry + +F9.1 touches only app-owned smoke/test/docs — no Sunrise platform surface. Per the repo's +platform-scoped CHANGELOG policy, it adds no `CHANGELOG.md` bullet. diff --git a/.context/app/questionnaire/README.md b/.context/app/questionnaire/README.md index 7cfaa2106..8a27cbf67 100644 --- a/.context/app/questionnaire/README.md +++ b/.context/app/questionnaire/README.md @@ -28,6 +28,7 @@ plan and feature trackers, see [`../planning/`](../planning/); for the platform | [`cost-cap-enforcement.md`](./cost-cap-enforcement.md) | Per-session USD budget at the turn boundary — soft wrap-up nudge at 90%, hard 402 + auto-pause at 100%, summed turn cost, dark-launch flag (F6.3) | | [`answer-slot-panel.md`](./answer-slot-panel.md) | The live respondent answer panel beside the chat — `GET …/answers` read endpoint, scope config, confidence language, Revisit wiring (F7.2) | | [`anonymous-mode.md`](./anonymous-mode.md) | The cross-surface PII contract — per-surface gates, the profile snapshot rule, k-anonymity suppression, erasure cascade (F8.3) | +| [`feature-flags.md`](./feature-flags.md) | The master + 10 sub-flag gate matrix — what each flag gates, its dependency chain, and the three off-behaviour shapes (404 / degrade / behaviour-inside-route) (F9.1) | ## Where the code lives diff --git a/.context/app/questionnaire/feature-flags.md b/.context/app/questionnaire/feature-flags.md new file mode 100644 index 000000000..88cf03921 --- /dev/null +++ b/.context/app/questionnaire/feature-flags.md @@ -0,0 +1,80 @@ +# Feature-flag inventory — the questionnaire gate matrix (F9.1) + +The questionnaire product dark-launches behind **one master flag and ten sub-flags**. This +is the authoritative inventory: every flag, what it gates, what it depends on, and exactly +what a respondent or admin sees when it is **off**. It is the reference the F9.1 hardening +pass verifies (`tests/unit/lib/app/questionnaire/feature-flag.test.ts`) and the runbook +(F9.2) toggles against. + +The flag resolvers live in [`lib/app/questionnaire/feature-flag.ts`](../../../lib/app/questionnaire/feature-flag.ts); +the canonical flag-name constants live in the dependency-light +[`constants.ts`](../../../lib/app/questionnaire/constants.ts) (so the seed can import a name +without the resolver's HTTP/DB deps). + +## They are DB rows, not env vars + +> ⚠️ **`APP_QUESTIONNAIRES_*_ENABLED` are `feature_flag` table rows, not environment +> variables.** The name _looks_ like an env var; it is not. Every resolver is a thin +> wrapper over Sunrise's `isFeatureEnabled(name)`, which reads the `feature_flag` table. + +Toggle a flag by writing its row (admin feature-flag surface / seed / a direct DB update), +**not** by setting a shell variable. A flag with no row resolves to its seeded default. This +matters for the runbook and for any "turn X off and confirm the surface disappears" check — +you are flipping a row, and the change is live without a redeploy. + +## The two design rules + +1. **A disabled surface 404s — it does not 401.** Every route-level gate runs **before** + auth (`withQuestionnairesEnabled` / `withLiveSessionsEnabled` / `withVoiceInputEnabled` + wrap the handler so the gate fires first). A switched-off feature is therefore + indistinguishable from a route that was never built — no information leaks about a + feature that exists but is dark. Never place a gate after `withAdminAuth`/`withAuth`. + +2. **A sub-flag requires its parents.** Every sub-flag resolver `AND`s the master flag (and, + for the live-dependent trio, the live-sessions flag) — so turning a parent off + transitively closes every child, and no child can run with its parent dark. + +## The matrix + +`is*Enabled()` returns `true` only when **all** the flags in its "Requires" column are on. + +| # | Flag (`feature_flag` name) | Resolver | Requires | Gates | Off-behaviour | +| --- | ---------------------------------------------------- | --------------------------------------------------------- | -------------------------- | ---------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| 0 | `APP_QUESTIONNAIRES_ENABLED` | `isQuestionnairesEnabled` / `ensureQuestionnairesEnabled` | — (master) | **the entire app** — every `/api/v1/app/**` route, every admin + respondent surface | every questionnaire route **404s**; the whole product is invisible | +| 1 | `APP_QUESTIONNAIRES_ADAPTIVE_STRATEGY_ENABLED` | `isAdaptiveSelectionEnabled` | master | F4.1 adaptive (embedding + LLM) next-question selection | a version set to `adaptive` **degrades to `weighted`** (no 404 — selection still runs, just cheaper) | +| 2 | `APP_QUESTIONNAIRES_ANSWER_EXTRACTION_ENABLED` | `isAnswerExtractionEnabled` | master | F4.2 answer-extraction preview route | route **404s** | +| 3 | `APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_ENABLED` | `isContradictionDetectionEnabled` | master | F4.3 contradiction-detection preview route | route **404s** | +| 4 | `APP_QUESTIONNAIRES_ANSWER_REFINEMENT_ENABLED` | `isAnswerRefinementEnabled` | master | F4.4 answer-refinement preview route | route **404s** | +| 5 | `APP_QUESTIONNAIRES_COMPLETION_ENABLED` | `isCompletionEnabled` | master | F4.5 completion-offer **phrasing** (the LLM prose) | the completion-status route returns the deterministic **assessment with no composed offer** (no 404 — the free assessment is always available under the master flag) | +| 6 | `APP_QUESTIONNAIRES_DESIGN_EVALUATION_ENABLED` | `isDesignEvaluationEnabled` | master | F5.1 seven-judge design-evaluation preview route | route **404s** (the whole route is paid LLM work — no free fallback) | +| 7 | `APP_QUESTIONNAIRES_LIVE_SESSIONS_ENABLED` | `isLiveSessionsEnabled` / `ensureLiveSessionsEnabled` | master | F6.1 respondent surface — session-create + `/messages` turn loop (incl. the no-login anonymous path) | session-create and messages routes **404**; the respondent surface disappears | +| 8 | `APP_QUESTIONNAIRES_VOICE_INPUT_ENABLED` | `isVoiceInputEnabled` / `ensureVoiceInputEnabled` | master **+ live-sessions** | F6.2 voice transcribe route | route **404s** (a transcript is useless without the live turn loop, so voice is gated behind live-sessions, not merely beside it) | +| 9 | `APP_QUESTIONNAIRES_ATTACHMENT_INPUT_ENABLED` | `isAttachmentInputEnabled` | master **+ live-sessions** | respondent image/document attachments on a `/messages` turn | the chat hides the attach affordance and the `/messages` route **ignores any attachments** a client sends (no 404 — it gates a behaviour inside an already-gated route) | +| 10 | `APP_QUESTIONNAIRES_COST_CAP_ENABLED` | `isCostCapEnforcementEnabled` | master **+ live-sessions** | F6.3 per-session USD budget check at the turn boundary | turns run with **no budget check** even when a version sets `costBudgetUsd` (no 404 — it gates a behaviour inside the messages route) | + +## The three off-behaviour shapes + +Reading the table, every sub-flag falls into one of three shapes — know which one you are +verifying: + +- **Route 404** (flags 2, 3, 4, 6, 7, 8) — the gated route is paid LLM work or a whole + surface; off ⇒ the route returns 404 via its `ensure*`/`with*` wrapper. +- **Degrade** (flags 1, 5) — a cheaper deterministic result stands in: adaptive → weighted; + composed offer → bare assessment. The route still responds. +- **Behaviour-inside-route** (flags 9, 10) — there is no route to 404; the flag toggles a + branch inside an already-gated route (attachments ignored; budget check skipped). + +When verifying "with each off, the gated surface is suppressed and the rest is unaffected", +assert against the shape: a 404 for the first group, the fallback result for the second, the +absent side-effect for the third. + +## Verification + +- **Resolver truth tables** — `tests/unit/lib/app/questionnaire/feature-flag.test.ts` pins, + for every resolver, that it is `true` only when all required flags are on and `false` when + the master, the sub-flag, or (for the live trio) live-sessions is off. +- **Independence** — the same suite asserts a representative route behind each gate is + suppressed when its flag is off while a sibling behind a different (still-on) flag keeps + responding, so flags gate their own surface and nothing else. +- **Concurrency / happy path** — `npm run smoke:concurrent-sessions` exercises the live + respondent surface (flag 7) end-to-end against the real DB. diff --git a/package.json b/package.json index f4fb4d885..539406d79 100644 --- a/package.json +++ b/package.json @@ -35,6 +35,7 @@ "db:reset": "prisma migrate reset --force", "db:drift-check": "tsx --env-file=.env.local scripts/db/check-drift.ts", "smoke:chat": "tsx --env-file=.env.local scripts/smoke/chat.ts", + "smoke:concurrent-sessions": "tsx --env-file=.env.local scripts/smoke/concurrent-sessions.ts", "smoke:orchestration": "tsx --env-file=.env.local scripts/smoke/orchestration.ts", "smoke:hybrid-search": "tsx --env-file=.env.local scripts/smoke/knowledge-hybrid-search.ts", "smoke:transcribe": "tsx --env-file=.env.local scripts/smoke/transcribe.ts", diff --git a/scripts/smoke/README.md b/scripts/smoke/README.md index 0fc18bc89..e724df09c 100644 --- a/scripts/smoke/README.md +++ b/scripts/smoke/README.md @@ -90,11 +90,12 @@ Prefer numbered `[n] description` stdout markers over ad-hoc logging — it make ## Current scripts -| Script | Exercises | Stubs | Notes | -| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `chat.ts` | `streamChat` → tool loop → persistence | `LlmProvider` via `registerProviderInstance` | Verifies event sequence, `AiMessage` + `AiCostLog` writes, and budget check. Doesn't exercise the tool loop live (needs seeded capability rows). | -| `orchestration.ts` | Phase 3 admin HTTP surface: providers/agents/capabilities/workflows CRUD + validate + execute stub, chat SSE, knowledge upload + search, evaluations complete, conversations clear, costs + budget | In-process Node HTTP server stubs OpenAI-compatible `/v1/chat/completions` (JSON + SSE) and `/v1/embeddings` | Requires the dev server running (`npm run dev`, default `PORT=3001`). Hits real Postgres. Successive runs within 60s may hit admin rate limit — wait out the window. | -| `transcribe.ts` | `getAudioProvider()` resolution + `provider.transcribe()` round-trip with a silent WAV | Fake audio `LlmProvider` via `registerProviderInstance` (returns a scripted transcript) | Seeds a scoped `smoke-test-audio` `AiProviderModel` row with `capabilities: ['audio']`. Proves the audio plumbing wires up end-to-end without a real Whisper API key. | +| Script | Exercises | Stubs | Notes | +| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `chat.ts` | `streamChat` → tool loop → persistence | `LlmProvider` via `registerProviderInstance` | Verifies event sequence, `AiMessage` + `AiCostLog` writes, and budget check. Doesn't exercise the tool loop live (needs seeded capability rows). | +| `orchestration.ts` | Phase 3 admin HTTP surface: providers/agents/capabilities/workflows CRUD + validate + execute stub, chat SSE, knowledge upload + search, evaluations complete, conversations clear, costs + budget | In-process Node HTTP server stubs OpenAI-compatible `/v1/chat/completions` (JSON + SSE) and `/v1/embeddings` | Requires the dev server running (`npm run dev`, default `PORT=3001`). Hits real Postgres. Successive runs within 60s may hit admin rate limit — wait out the window. | +| `transcribe.ts` | `getAudioProvider()` resolution + `provider.transcribe()` round-trip with a silent WAV | Fake audio `LlmProvider` via `registerProviderInstance` (returns a scripted transcript) | Seeds a scoped `smoke-test-audio` `AiProviderModel` row with `capabilities: ['audio']`. Proves the audio plumbing wires up end-to-end without a real Whisper API key. | +| `concurrent-sessions.ts` | F9.1 hardening: 24 concurrent respondent sessions × 4 turns through the live persistence seams (`createAnonymousSession` → `persistTurn` → `markSessionCompleted`). Asserts no deadlocks / orphan turns / missed audit writes. `--single` runs one verbose happy-path session + a results-export read. | LLM stubbed **by construction** — feeds `persistTurn` the extraction intents the orchestrator would emit, so no provider instance is needed | Everything hangs off one `smoke-test-f91` `AppQuestionnaire`; cleanup cascades the whole graph. Idempotent. | ## Adding a new smoke script diff --git a/scripts/smoke/concurrent-sessions.ts b/scripts/smoke/concurrent-sessions.ts new file mode 100644 index 000000000..2e1b9854e --- /dev/null +++ b/scripts/smoke/concurrent-sessions.ts @@ -0,0 +1,369 @@ +/** + * Concurrent-session sanity smoke script (F9.1 production hardening) + * + * Drives the live respondent **persistence seams** under concurrency against the real + * Postgres dev DB, to prove the three invariants F9.1 names: + * + * - **no deadlocks** — N sessions create + turn + complete concurrently with no rejected + * promise / Postgres deadlock (40P01) / serialization failure; + * - **no orphan turns** — every persisted turn maps to a live session, every session has + * exactly the turns it drove, and ordinals are contiguous 1..K per session; + * - **no missed audit writes** — every session has its `created` + `completed` + * `AppQuestionnaireSessionEvent`, every answered slot is back-stamped with a real turn + * id, and the answer-slot count reconciles per session. + * + * Seams exercised (the concurrency-sensitive write paths, all `$transaction`-based): + * - `createAnonymousSession` → session row + `created` event (one tx) + * - `persistTurn` (turn-run.ts) → `AppQuestionnaireTurn` + `AppAnswerSlot` upsert + + * `lastUpdatedTurnId` back-stamp (one tx), the real live-turn write path + * - `markSessionCompleted` → status update + `completed` event (one tx) + * + * **LLM is stubbed by construction, not by a fake provider.** The orchestrator's paid + * compute (extraction/refinement/contradiction LLM calls) only *produces* the + * `AnswerSlotIntent`s that `persistTurn` writes; this script feeds `persistTurn` those + * intents directly — deterministic, free, no network — so the test isolates the DB + * concurrency surface the invariants are actually about. (The other smoke scripts stub via + * `registerProviderInstance`; here there is no LLM in the loop to stub.) + * + * Modes: + * - default → 24 concurrent sessions × 4 turns each (concurrency sanity). + * - `--single` / `-1` → one session, verbose per-stage logging + a results-export read + * (F8.2) — the F9.1 "final happy-path integration pass" stitched journey. + * + * Safety: + * - Everything hangs off ONE `smoke-test-f91` AppQuestionnaire; deleting it cascades to + * versions, config, sections, slots, sessions, answers, events, and turns. Stale rows + * from a prior run are removed before seeding and after the run. Never touches any + * other data; no destructive global commands. Read `scripts/smoke/README.md` first. + * + * Run with: + * npm run smoke:concurrent-sessions + * npm run smoke:concurrent-sessions -- --single + * # or: + * npx tsx --env-file=.env.local scripts/smoke/concurrent-sessions.ts + */ + +import { prisma } from '@/lib/db/client'; +import { createAnonymousSession } from '@/app/api/v1/app/questionnaire-sessions/_lib/create'; +import { persistTurn } from '@/app/api/v1/app/questionnaire-sessions/_lib/turn-run'; +import { + markSessionCompleted, + loadSessionResumeState, +} from '@/app/api/v1/app/questionnaires/_lib/sessions'; +import { loadResultsExport } from '@/lib/app/questionnaire/export/results-loader'; +import { toResultsCsv } from '@/lib/app/questionnaire/export/results-serialize'; +import type { AnalyticsScope } from '@/lib/app/questionnaire/analytics'; +import type { AnswerSlotIntent } from '@/lib/app/questionnaire/extraction/types'; +import type { ToolCallRecord } from '@/lib/app/questionnaire/orchestrator'; + +const MARKER = 'smoke-test-f91'; +const QUESTIONNAIRE_TITLE = `${MARKER} concurrent sessions`; + +const SESSIONS = 24; // "20+ concurrent-session sanity test" +const SLOT_COUNT = 4; // questions per version +const TURNS_PER_SESSION = SLOT_COUNT; // one answered slot per turn + +/** A failure detail collected during the run; a non-empty list fails the smoke. */ +const failures: string[] = []; +function fail(msg: string): void { + failures.push(msg); + console.error(` ✗ ${msg}`); +} + +/** The seeded version graph the run drives against. */ +interface SeededVersion { + questionnaireId: string; + versionId: string; + /** slotKey → { id, type } for the persistTurn intent mapping. */ + slots: { key: string; id: string }[]; +} + +/** Delete any AppQuestionnaire(s) left by a previous run — cascade clears the whole graph. */ +async function cleanupStale(): Promise { + const stale = await prisma.appQuestionnaire.findMany({ + where: { title: QUESTIONNAIRE_TITLE }, + select: { id: true }, + }); + if (stale.length === 0) return; + await prisma.appQuestionnaire.deleteMany({ where: { title: QUESTIONNAIRE_TITLE } }); + console.log(` cleaned up ${stale.length} stale ${MARKER} questionnaire(s)`); +} + +/** Seed a launched, anonymous-mode version with SLOT_COUNT free-text questions. */ +async function seed(): Promise { + const questionnaire = await prisma.appQuestionnaire.create({ + data: { + title: QUESTIONNAIRE_TITLE, + status: 'launched', + versions: { + create: { + versionNumber: 1, + status: 'launched', + // anonymousMode = true so createAnonymousSession is permitted (no-login surface). + config: { create: { anonymousMode: true, selectionStrategy: 'sequential' } }, + sections: { + create: { + ordinal: 1, + title: `${MARKER} section`, + questions: { + create: Array.from({ length: SLOT_COUNT }, (_, i) => ({ + versionId: '', // set below — denormalised FK needs the version id + ordinal: i + 1, + key: `${MARKER}-q${i + 1}`, + prompt: `Smoke question ${i + 1}?`, + type: 'free_text', + required: true, + })), + }, + }, + }, + }, + }, + }, + select: { + id: true, + versions: { select: { id: true } }, + }, + }); + + const versionId = questionnaire.versions[0].id; + + // AppQuestionSlot.versionId is a denormalised FK Prisma's nested create can't backfill in + // one shot — stamp it now so slot lookups by version work. + await prisma.appQuestionSlot.updateMany({ + where: { section: { versionId } }, + data: { versionId }, + }); + + const slots = await prisma.appQuestionSlot.findMany({ + where: { versionId }, + orderBy: { ordinal: 'asc' }, + select: { id: true, key: true }, + }); + + return { questionnaireId: questionnaire.id, versionId, slots }; +} + +/** The deterministic extraction intent the orchestrator would have produced for one slot. */ +function intentForSlot(slotKey: string, turnIndex: number): AnswerSlotIntent { + return { + slotKey, + questionType: 'free_text', + value: `answer-${turnIndex}`, + confidence: 0.9, + provenance: 'direct', + rationale: 'smoke deterministic answer', + isActiveQuestion: true, + }; +} + +/** Run TURNS_PER_SESSION sequential turns over one session (ordinal depends on count). */ +async function runSession(sessionId: string, seeded: SeededVersion): Promise { + const keyToSlotId = new Map(seeded.slots.map((s) => [s.key, s.id])); + for (let t = 0; t < TURNS_PER_SESSION; t++) { + const slot = seeded.slots[t]; + const toolCalls: ToolCallRecord[] = [{ slug: 'extract_answer_slots', success: true }]; + await persistTurn({ + sessionId, + userMessage: `respondent message ${t + 1}`, + agentResponse: `agent reply ${t + 1}`, + targetedQuestionId: slot.id, + toolCalls, + costUsd: 0.001, + upserts: [intentForSlot(slot.key, t + 1)], + refinements: [], + keyToSlotId, + }); + } + await markSessionCompleted(sessionId); +} + +/** Assert the three invariants over the persisted graph for the given session ids. */ +async function verify(seeded: SeededVersion, sessionIds: string[]): Promise { + const sessions = await prisma.appQuestionnaireSession.findMany({ + where: { versionId: seeded.versionId, isPreview: false }, + select: { + id: true, + status: true, + turns: { select: { id: true, ordinal: true }, orderBy: { ordinal: 'asc' } }, + answers: { select: { id: true, lastUpdatedTurnId: true } }, + events: { select: { eventType: true } }, + }, + }); + + // No orphan/extra sessions. + if (sessions.length !== sessionIds.length) { + fail(`expected ${sessionIds.length} sessions, found ${sessions.length}`); + } + + const allTurnIds = new Set(); + for (const s of sessions) { + // Completed status (markSessionCompleted ran). + if (s.status !== 'completed') + fail(`session ${s.id} status is "${s.status}", expected completed`); + + // No orphan turns: exactly TURNS_PER_SESSION, ordinals contiguous 1..K. + if (s.turns.length !== TURNS_PER_SESSION) { + fail(`session ${s.id} has ${s.turns.length} turns, expected ${TURNS_PER_SESSION}`); + } + s.turns.forEach((turn, i) => { + if (turn.ordinal !== i + 1) + fail(`session ${s.id} turn #${i} ordinal=${turn.ordinal}, expected ${i + 1}`); + allTurnIds.add(turn.id); + }); + + // Answer-slot reconciliation: one answer per slot, each back-stamped with a real turn. + if (s.answers.length !== SLOT_COUNT) { + fail(`session ${s.id} has ${s.answers.length} answers, expected ${SLOT_COUNT}`); + } + const sessionTurnIds = new Set(s.turns.map((t) => t.id)); + for (const a of s.answers) { + if (!a.lastUpdatedTurnId) { + fail(`session ${s.id} answer ${a.id} has no lastUpdatedTurnId (missed turn back-stamp)`); + } else if (!sessionTurnIds.has(a.lastUpdatedTurnId)) { + fail( + `session ${s.id} answer ${a.id} back-stamped with foreign turn ${a.lastUpdatedTurnId}` + ); + } + } + + // No missed audit writes: exactly one `created` and one `completed` event. + const created = s.events.filter((e) => e.eventType === 'created').length; + const completed = s.events.filter((e) => e.eventType === 'completed').length; + if (created !== 1) fail(`session ${s.id} has ${created} created events, expected 1`); + if (completed !== 1) fail(`session ${s.id} has ${completed} completed events, expected 1`); + } + + // No orphan turns globally: every turn belongs to one of our sessions (no extras/dupes). + const totalTurns = await prisma.appQuestionnaireTurn.count({ + where: { session: { versionId: seeded.versionId } }, + }); + const expectedTurns = sessionIds.length * TURNS_PER_SESSION; + if (totalTurns !== expectedTurns) { + fail(`total turns=${totalTurns}, expected ${expectedTurns} (orphan or dropped turns)`); + } + if (allTurnIds.size !== expectedTurns) { + fail(`distinct turn ids=${allTurnIds.size}, expected ${expectedTurns}`); + } + + console.log( + ` ✓ ${sessions.length} sessions · ${totalTurns} turns · audit events + answer back-stamps reconciled` + ); +} + +/** The F9.1 happy-path stitched journey: one session, verbose, plus a results-export read. */ +async function runHappyPath(seeded: SeededVersion): Promise { + console.log('\n[journey] single happy-path session'); + const create = await createAnonymousSession(seeded.versionId); + if (!create.ok) { + fail(`createAnonymousSession failed: ${create.code} ${create.message}`); + return; + } + const sessionId = create.session.id; + console.log(` • created session ${sessionId} (status=${create.session.status})`); + + await runSession(sessionId, seeded); + console.log(` • ran ${TURNS_PER_SESSION} turns + completed`); + + const resume = await loadSessionResumeState(sessionId); + console.log( + ` • resume state: status=${resume.status}, ${resume.answeredSlots.length} answers captured` + ); + if (resume.status !== 'completed') + fail(`journey session status=${resume.status}, expected completed`); + if (resume.answeredSlots.length !== SLOT_COUNT) { + fail(`journey captured ${resume.answeredSlots.length} answers, expected ${SLOT_COUNT}`); + } + + // F8.2 results export — the journey's final stage. Wide window to capture the just-completed session. + const scope: AnalyticsScope = { + versionId: seeded.versionId, + from: new Date('2000-01-01T00:00:00.000Z'), + to: new Date('2999-01-01T00:00:00.000Z'), + tagIds: [], + }; + const exportModel = await loadResultsExport(scope); + if (!exportModel) { + fail('loadResultsExport returned null for the seeded version'); + return; + } + const csv = toResultsCsv(exportModel); + const csvRows = csv.trim().split('\n').length - 1; // minus header + console.log( + ` • export: ${exportModel.sessions.length} session(s), ${exportModel.questions.length} questions, ${csvRows} CSV row(s)` + ); + if (exportModel.sessions.length < 1) fail('export has no completed sessions'); +} + +async function main(): Promise { + const single = process.argv.includes('--single') || process.argv.includes('-1'); + + console.log(`\n[1] cleanup stale ${MARKER} rows`); + await cleanupStale(); + + console.log('[2] seed launched anonymous-mode version'); + const seeded = await seed(); + console.log( + ` questionnaire ${seeded.questionnaireId} · version ${seeded.versionId} · ${seeded.slots.length} slots` + ); + + if (single) { + await runHappyPath(seeded); + } else { + console.log(`\n[3] create ${SESSIONS} sessions concurrently`); + const creates = await Promise.allSettled( + Array.from({ length: SESSIONS }, () => createAnonymousSession(seeded.versionId)) + ); + const sessionIds: string[] = []; + creates.forEach((r, i) => { + if (r.status === 'rejected') { + fail(`session create #${i} rejected: ${String(r.reason)}`); + } else if (!r.value.ok) { + fail(`session create #${i} failed: ${r.value.code} ${r.value.message}`); + } else { + sessionIds.push(r.value.session.id); + } + }); + console.log(` ✓ ${sessionIds.length}/${SESSIONS} sessions created`); + + console.log(`[4] run ${TURNS_PER_SESSION} turns × ${sessionIds.length} sessions concurrently`); + const runs = await Promise.allSettled(sessionIds.map((id) => runSession(id, seeded))); + runs.forEach((r, i) => { + if (r.status === 'rejected') { + // A Postgres deadlock (40P01) or serialization failure surfaces here. + fail(`session run #${i} (${sessionIds[i]}) rejected: ${String(r.reason)}`); + } + }); + console.log( + ` ✓ ${runs.filter((r) => r.status === 'fulfilled').length}/${runs.length} session runs settled` + ); + + console.log('[5] verify invariants (no deadlocks / orphan turns / missed audit writes)'); + await verify(seeded, sessionIds); + } + + console.log('\n[6] cleanup (scoped — cascade from the seeded questionnaire)'); + const deleted = await prisma.appQuestionnaire.deleteMany({ + where: { id: seeded.questionnaireId }, + }); + console.log(` deleted ${deleted.count} questionnaire (cascade cleared the graph)`); + + await prisma.$disconnect(); + + if (failures.length > 0) { + console.error(`\n✗ smoke FAILED with ${failures.length} invariant violation(s)`); + process.exit(1); + } + console.log('\n✓ concurrent-session smoke passed'); +} + +main().catch(async (err) => { + console.error('\n✗ smoke script failed:', err); + try { + await prisma.appQuestionnaire.deleteMany({ where: { title: QUESTIONNAIRE_TITLE } }); + await prisma.$disconnect(); + } catch { + /* ignore */ + } + process.exit(1); +}); diff --git a/tests/unit/lib/app/questionnaire/feature-flag.test.ts b/tests/unit/lib/app/questionnaire/feature-flag.test.ts index f30c89211..bac2d2cee 100644 --- a/tests/unit/lib/app/questionnaire/feature-flag.test.ts +++ b/tests/unit/lib/app/questionnaire/feature-flag.test.ts @@ -1,9 +1,35 @@ +import { NextRequest } from 'next/server'; import { describe, it, expect, vi, beforeEach } from 'vitest'; import { APP_QUESTIONNAIRES_FLAG, + APP_QUESTIONNAIRES_ADAPTIVE_FLAG, + APP_QUESTIONNAIRES_ANSWER_EXTRACTION_FLAG, + APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_FLAG, + APP_QUESTIONNAIRES_ANSWER_REFINEMENT_FLAG, + APP_QUESTIONNAIRES_COMPLETION_FLAG, + APP_QUESTIONNAIRES_DESIGN_EVALUATION_FLAG, + APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG, + APP_QUESTIONNAIRES_VOICE_INPUT_FLAG, + APP_QUESTIONNAIRES_COST_CAP_FLAG, + APP_QUESTIONNAIRES_ATTACHMENT_INPUT_FLAG, ensureQuestionnairesEnabled, + ensureLiveSessionsEnabled, + ensureVoiceInputEnabled, + withQuestionnairesEnabled, + withLiveSessionsEnabled, + withVoiceInputEnabled, isQuestionnairesEnabled, + isAdaptiveSelectionEnabled, + isAnswerExtractionEnabled, + isContradictionDetectionEnabled, + isAnswerRefinementEnabled, + isCompletionEnabled, + isDesignEvaluationEnabled, + isLiveSessionsEnabled, + isVoiceInputEnabled, + isAttachmentInputEnabled, + isCostCapEnforcementEnabled, } from '@/lib/app/questionnaire/feature-flag'; import { isFeatureEnabled } from '@/lib/feature-flags'; @@ -13,51 +39,411 @@ vi.mock('@/lib/feature-flags', () => ({ const mockedIsFeatureEnabled = vi.mocked(isFeatureEnabled); -describe('questionnaire feature flag', () => { - beforeEach(() => { - vi.clearAllMocks(); +/** + * Drive {@link isFeatureEnabled} from a per-flag map: a flag is enabled iff its name maps + * to `true`. The resolvers call `isFeatureEnabled(name)` (often in a `Promise.all`), so this + * lets each test set exactly which flags are on and assert the resolver's AND logic. + */ +function setFlags(enabled: Record): void { + mockedIsFeatureEnabled.mockImplementation((name: string) => + Promise.resolve(enabled[name] === true) + ); +} + +/** All eleven flag names, used to build "everything on" baselines. */ +const ALL_FLAGS = [ + APP_QUESTIONNAIRES_FLAG, + APP_QUESTIONNAIRES_ADAPTIVE_FLAG, + APP_QUESTIONNAIRES_ANSWER_EXTRACTION_FLAG, + APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_FLAG, + APP_QUESTIONNAIRES_ANSWER_REFINEMENT_FLAG, + APP_QUESTIONNAIRES_COMPLETION_FLAG, + APP_QUESTIONNAIRES_DESIGN_EVALUATION_FLAG, + APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG, + APP_QUESTIONNAIRES_VOICE_INPUT_FLAG, + APP_QUESTIONNAIRES_COST_CAP_FLAG, + APP_QUESTIONNAIRES_ATTACHMENT_INPUT_FLAG, +] as const; + +/** A map with every flag on (the baseline each truth-table test perturbs from). */ +function allOn(): Record { + return Object.fromEntries(ALL_FLAGS.map((f) => [f, true])); +} + +beforeEach(() => { + vi.clearAllMocks(); +}); + +describe('questionnaire feature flag — flag names are stable', () => { + // The seed and any external toggling rely on the exact `feature_flag` row names; guard + // them so a rename can't silently dark-launch (or un-gate) a surface. + it('master + sub-flag names match their published constants', () => { + expect(APP_QUESTIONNAIRES_FLAG).toBe('APP_QUESTIONNAIRES_ENABLED'); + expect(APP_QUESTIONNAIRES_ADAPTIVE_FLAG).toBe('APP_QUESTIONNAIRES_ADAPTIVE_STRATEGY_ENABLED'); + expect(APP_QUESTIONNAIRES_ANSWER_EXTRACTION_FLAG).toBe( + 'APP_QUESTIONNAIRES_ANSWER_EXTRACTION_ENABLED' + ); + expect(APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_FLAG).toBe( + 'APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_ENABLED' + ); + expect(APP_QUESTIONNAIRES_ANSWER_REFINEMENT_FLAG).toBe( + 'APP_QUESTIONNAIRES_ANSWER_REFINEMENT_ENABLED' + ); + expect(APP_QUESTIONNAIRES_COMPLETION_FLAG).toBe('APP_QUESTIONNAIRES_COMPLETION_ENABLED'); + expect(APP_QUESTIONNAIRES_DESIGN_EVALUATION_FLAG).toBe( + 'APP_QUESTIONNAIRES_DESIGN_EVALUATION_ENABLED' + ); + expect(APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG).toBe('APP_QUESTIONNAIRES_LIVE_SESSIONS_ENABLED'); + expect(APP_QUESTIONNAIRES_VOICE_INPUT_FLAG).toBe('APP_QUESTIONNAIRES_VOICE_INPUT_ENABLED'); + expect(APP_QUESTIONNAIRES_COST_CAP_FLAG).toBe('APP_QUESTIONNAIRES_COST_CAP_ENABLED'); + expect(APP_QUESTIONNAIRES_ATTACHMENT_INPUT_FLAG).toBe( + 'APP_QUESTIONNAIRES_ATTACHMENT_INPUT_ENABLED' + ); }); +}); - describe('isQuestionnairesEnabled', () => { - it('delegates to isFeatureEnabled with the APP_QUESTIONNAIRES_ENABLED flag', async () => { - mockedIsFeatureEnabled.mockResolvedValue(true); +describe('isQuestionnairesEnabled (master)', () => { + it('delegates to isFeatureEnabled with the master flag', async () => { + setFlags({ [APP_QUESTIONNAIRES_FLAG]: true }); + await expect(isQuestionnairesEnabled()).resolves.toBe(true); + expect(mockedIsFeatureEnabled).toHaveBeenCalledWith(APP_QUESTIONNAIRES_FLAG); + }); + + it('returns false when the master flag is disabled', async () => { + setFlags({ [APP_QUESTIONNAIRES_FLAG]: false }); + await expect(isQuestionnairesEnabled()).resolves.toBe(false); + }); +}); + +/** + * The data-driven truth table for every sub-flag resolver: each is `true` only when ALL its + * required flags are on, and `false` when ANY one of them is off. `requires` lists the flags + * the resolver AND's together (master first, then any parents, then its own sub-flag). + */ +const SUB_FLAG_RESOLVERS: ReadonlyArray<{ + name: string; + fn: () => Promise; + requires: readonly string[]; +}> = [ + { + name: 'isAdaptiveSelectionEnabled', + fn: isAdaptiveSelectionEnabled, + requires: [APP_QUESTIONNAIRES_FLAG, APP_QUESTIONNAIRES_ADAPTIVE_FLAG], + }, + { + name: 'isAnswerExtractionEnabled', + fn: isAnswerExtractionEnabled, + requires: [APP_QUESTIONNAIRES_FLAG, APP_QUESTIONNAIRES_ANSWER_EXTRACTION_FLAG], + }, + { + name: 'isContradictionDetectionEnabled', + fn: isContradictionDetectionEnabled, + requires: [APP_QUESTIONNAIRES_FLAG, APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_FLAG], + }, + { + name: 'isAnswerRefinementEnabled', + fn: isAnswerRefinementEnabled, + requires: [APP_QUESTIONNAIRES_FLAG, APP_QUESTIONNAIRES_ANSWER_REFINEMENT_FLAG], + }, + { + name: 'isCompletionEnabled', + fn: isCompletionEnabled, + requires: [APP_QUESTIONNAIRES_FLAG, APP_QUESTIONNAIRES_COMPLETION_FLAG], + }, + { + name: 'isDesignEvaluationEnabled', + fn: isDesignEvaluationEnabled, + requires: [APP_QUESTIONNAIRES_FLAG, APP_QUESTIONNAIRES_DESIGN_EVALUATION_FLAG], + }, + { + name: 'isLiveSessionsEnabled', + fn: isLiveSessionsEnabled, + requires: [APP_QUESTIONNAIRES_FLAG, APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG], + }, + { + // Live-dependent: master + live-sessions + its own sub-flag. + name: 'isVoiceInputEnabled', + fn: isVoiceInputEnabled, + requires: [ + APP_QUESTIONNAIRES_FLAG, + APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG, + APP_QUESTIONNAIRES_VOICE_INPUT_FLAG, + ], + }, + { + name: 'isAttachmentInputEnabled', + fn: isAttachmentInputEnabled, + requires: [ + APP_QUESTIONNAIRES_FLAG, + APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG, + APP_QUESTIONNAIRES_ATTACHMENT_INPUT_FLAG, + ], + }, + { + name: 'isCostCapEnforcementEnabled', + fn: isCostCapEnforcementEnabled, + requires: [ + APP_QUESTIONNAIRES_FLAG, + APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG, + APP_QUESTIONNAIRES_COST_CAP_FLAG, + ], + }, +]; - const result = await isQuestionnairesEnabled(); +describe('sub-flag resolvers — truth tables', () => { + for (const { name, fn, requires } of SUB_FLAG_RESOLVERS) { + describe(name, () => { + it('is true when all required flags are on', async () => { + setFlags(Object.fromEntries(requires.map((f) => [f, true]))); + await expect(fn()).resolves.toBe(true); + }); - expect(result).toBe(true); - expect(mockedIsFeatureEnabled).toHaveBeenCalledWith(APP_QUESTIONNAIRES_FLAG); - // Guard the exact flag name — the seed and any external toggling rely on it. - expect(APP_QUESTIONNAIRES_FLAG).toBe('APP_QUESTIONNAIRES_ENABLED'); + // One test per required flag: that flag off, every other required flag on → false. + for (const off of requires) { + const label = + off === APP_QUESTIONNAIRES_FLAG + ? 'master' + : off === APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG + ? 'live-sessions' + : 'its own sub-flag'; + it(`is false when ${label} (${off}) is off`, async () => { + const flags = Object.fromEntries(requires.map((f) => [f, true])); + flags[off] = false; + setFlags(flags); + await expect(fn()).resolves.toBe(false); + }); + } }); + } +}); + +describe('sub-flag independence — one flag off suppresses only its own surface', () => { + // Turning a single sub-flag off must NOT affect any sibling resolver. We flip each + // sub-flag off (master + everything else on) and assert exactly that resolver goes false + // while the others stay true — the "rest of the platform unaffected" guarantee. + const INDEPENDENT_PAIRS: ReadonlyArray<{ + flag: string; + resolver: () => Promise; + }> = [ + { flag: APP_QUESTIONNAIRES_ADAPTIVE_FLAG, resolver: isAdaptiveSelectionEnabled }, + { flag: APP_QUESTIONNAIRES_ANSWER_EXTRACTION_FLAG, resolver: isAnswerExtractionEnabled }, + { + flag: APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_FLAG, + resolver: isContradictionDetectionEnabled, + }, + { flag: APP_QUESTIONNAIRES_ANSWER_REFINEMENT_FLAG, resolver: isAnswerRefinementEnabled }, + { flag: APP_QUESTIONNAIRES_COMPLETION_FLAG, resolver: isCompletionEnabled }, + { flag: APP_QUESTIONNAIRES_DESIGN_EVALUATION_FLAG, resolver: isDesignEvaluationEnabled }, + { flag: APP_QUESTIONNAIRES_VOICE_INPUT_FLAG, resolver: isVoiceInputEnabled }, + { flag: APP_QUESTIONNAIRES_ATTACHMENT_INPUT_FLAG, resolver: isAttachmentInputEnabled }, + { flag: APP_QUESTIONNAIRES_COST_CAP_FLAG, resolver: isCostCapEnforcementEnabled }, + ]; - it('returns false when the flag is disabled', async () => { - mockedIsFeatureEnabled.mockResolvedValue(false); + for (const { flag, resolver } of INDEPENDENT_PAIRS) { + it(`${flag} off → that resolver false, every sibling still true`, async () => { + const flags = allOn(); + flags[flag] = false; + setFlags(flags); - await expect(isQuestionnairesEnabled()).resolves.toBe(false); + await expect(resolver()).resolves.toBe(false); + + // Every OTHER sub-flag resolver whose required flags are all still on stays true. + for (const sibling of SUB_FLAG_RESOLVERS) { + if (sibling.requires.includes(flag)) continue; + await expect(sibling.fn(), `${sibling.name} should be unaffected`).resolves.toBe(true); + } }); + } + + it('adaptive degrades independently of extraction (both are master-only children)', async () => { + // Concrete independence example: adaptive off, extraction on. + setFlags({ + [APP_QUESTIONNAIRES_FLAG]: true, + [APP_QUESTIONNAIRES_ADAPTIVE_FLAG]: false, + [APP_QUESTIONNAIRES_ANSWER_EXTRACTION_FLAG]: true, + }); + await expect(isAdaptiveSelectionEnabled()).resolves.toBe(false); + await expect(isAnswerExtractionEnabled()).resolves.toBe(true); }); +}); - describe('ensureQuestionnairesEnabled', () => { - it('returns null (no gate) when the app is enabled', async () => { - mockedIsFeatureEnabled.mockResolvedValue(true); +describe('live-sessions cascade — turning the parent off closes the live-dependent trio', () => { + it('live-sessions off ⇒ voice, attachment, and cost-cap all false even with their sub-flags on', async () => { + const flags = allOn(); + flags[APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG] = false; + setFlags(flags); + + await expect(isLiveSessionsEnabled()).resolves.toBe(false); + await expect(isVoiceInputEnabled()).resolves.toBe(false); + await expect(isAttachmentInputEnabled()).resolves.toBe(false); + await expect(isCostCapEnforcementEnabled()).resolves.toBe(false); + }); + + it('master off ⇒ every resolver false (transitive close)', async () => { + const flags = allOn(); + flags[APP_QUESTIONNAIRES_FLAG] = false; + setFlags(flags); + + await expect(isQuestionnairesEnabled()).resolves.toBe(false); + for (const { name, fn } of SUB_FLAG_RESOLVERS) { + // Label the assertion so a single regressing resolver is named rather than + // hidden behind whichever one the sequential loop reaches first. + await expect(fn(), `${name} should be false when master is off`).resolves.toBe(false); + } + }); +}); +/** + * Route-level gates: the `ensure*` wrappers a route calls first (before auth) so a disabled + * surface 404s rather than 401s. Per-route gating is additionally covered by each route's own + * `route.test.ts`; these pin the shared gate helpers' contract. + */ +describe('route gates — ensure* return a 404 envelope when off, null when on', () => { + async function expect404(res: Response | null): Promise { + expect(res).not.toBeNull(); + expect(res).toBeInstanceOf(Response); + expect(res?.status).toBe(404); + const body = await res?.json(); + expect(body).toEqual({ success: false, error: { message: 'Not found', code: 'NOT_FOUND' } }); + } + + describe('ensureQuestionnairesEnabled', () => { + it('returns null (no gate) when the master flag is on', async () => { + setFlags({ [APP_QUESTIONNAIRES_FLAG]: true }); await expect(ensureQuestionnairesEnabled()).resolves.toBeNull(); }); + it('returns a 404 NOT_FOUND envelope when the master flag is off', async () => { + setFlags({ [APP_QUESTIONNAIRES_FLAG]: false }); + await expect404(await ensureQuestionnairesEnabled()); + }); + }); + + describe('ensureLiveSessionsEnabled', () => { + it('returns null when master + live-sessions are on', async () => { + setFlags({ + [APP_QUESTIONNAIRES_FLAG]: true, + [APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG]: true, + }); + await expect(ensureLiveSessionsEnabled()).resolves.toBeNull(); + }); + it('404s when live-sessions is off even though master is on', async () => { + setFlags({ + [APP_QUESTIONNAIRES_FLAG]: true, + [APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG]: false, + }); + await expect404(await ensureLiveSessionsEnabled()); + }); + }); + + describe('ensureVoiceInputEnabled', () => { + it('returns null when master + live-sessions + voice are on', async () => { + setFlags({ + [APP_QUESTIONNAIRES_FLAG]: true, + [APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG]: true, + [APP_QUESTIONNAIRES_VOICE_INPUT_FLAG]: true, + }); + await expect(ensureVoiceInputEnabled()).resolves.toBeNull(); + }); + it('404s when the voice sub-flag is off even though master + live-sessions are on', async () => { + setFlags({ + [APP_QUESTIONNAIRES_FLAG]: true, + [APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG]: true, + [APP_QUESTIONNAIRES_VOICE_INPUT_FLAG]: false, + }); + await expect404(await ensureVoiceInputEnabled()); + }); + it('404s when live-sessions is off even though master + voice are on', async () => { + // Voice is a three-way AND (master + live-sessions + voice); turning the live-sessions + // parent off must close the gate too, not just the voice sub-flag. + setFlags({ + [APP_QUESTIONNAIRES_FLAG]: true, + [APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG]: false, + [APP_QUESTIONNAIRES_VOICE_INPUT_FLAG]: true, + }); + await expect404(await ensureVoiceInputEnabled()); + }); + }); +}); + +/** + * The `with*Enabled` HOC wrappers compose the flag gate with a route handler so the gate runs + * **before** anything else (auth, handler work) — the ordering that makes a disabled surface + * look like a missing route (404) rather than a 401. Each wrapper must (a) short-circuit to the + * gate's 404 Response without ever calling the handler when the flag is off, and (b) call the + * handler with the original `(request, context)` and forward its Response when the flag is on. + * These pin both arms; per-route wiring is additionally covered by each route's own test. + */ +describe('with* gate wrappers — run the flag gate before the handler', () => { + const request = new NextRequest('http://localhost:3000/api/v1/app/test'); + const context = { params: Promise.resolve({}) }; - it('returns a 404 NOT_FOUND envelope when the app is disabled', async () => { - mockedIsFeatureEnabled.mockResolvedValue(false); + type GateWrapper = ( + handler: (request: NextRequest, context: C) => Promise + ) => (request: NextRequest, context: C) => Promise; - const res = await ensureQuestionnairesEnabled(); + const WRAPPERS: ReadonlyArray<{ + name: string; + wrap: GateWrapper; + // Flags that must ALL be on for the gate to allow the handler through. + enableFlags: readonly string[]; + // The flag to turn off (others on) to prove the gate blocks before the handler. + blockFlag: string; + }> = [ + { + name: 'withQuestionnairesEnabled', + wrap: withQuestionnairesEnabled, + enableFlags: [APP_QUESTIONNAIRES_FLAG], + blockFlag: APP_QUESTIONNAIRES_FLAG, + }, + { + name: 'withLiveSessionsEnabled', + wrap: withLiveSessionsEnabled, + enableFlags: [APP_QUESTIONNAIRES_FLAG, APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG], + blockFlag: APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG, + }, + { + name: 'withVoiceInputEnabled', + wrap: withVoiceInputEnabled, + enableFlags: [ + APP_QUESTIONNAIRES_FLAG, + APP_QUESTIONNAIRES_LIVE_SESSIONS_FLAG, + APP_QUESTIONNAIRES_VOICE_INPUT_FLAG, + ], + blockFlag: APP_QUESTIONNAIRES_VOICE_INPUT_FLAG, + }, + ]; - expect(res).not.toBeNull(); - expect(res).toBeInstanceOf(Response); - expect(res?.status).toBe(404); + for (const { name, wrap, enableFlags, blockFlag } of WRAPPERS) { + describe(name, () => { + it('calls the handler with the original request + context and forwards its Response when enabled', async () => { + setFlags(Object.fromEntries(enableFlags.map((f) => [f, true]))); + const handlerResponse = new Response('ok'); + const handler = vi.fn( + async (_request: NextRequest, _context: { params: Promise> }) => + handlerResponse + ); - const body = await res?.json(); - expect(body).toEqual({ - success: false, - error: { message: 'Not found', code: 'NOT_FOUND' }, + const result = await wrap(handler)(request, context); + + expect(handler).toHaveBeenCalledTimes(1); + expect(handler).toHaveBeenCalledWith(request, context); + expect(result).toBe(handlerResponse); + }); + + it('short-circuits to a 404 and never calls the handler when the gate flag is off', async () => { + const flags = Object.fromEntries(enableFlags.map((f) => [f, true])); + flags[blockFlag] = false; + setFlags(flags); + const handler = vi.fn( + async (_request: NextRequest, _context: { params: Promise> }) => + new Response('ok') + ); + + const result = await wrap(handler)(request, context); + + expect(handler).not.toHaveBeenCalled(); + expect(result.status).toBe(404); }); }); - }); + } });