Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .context/app/planning/development-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -667,13 +667,13 @@ _Indicative tasks:_

### F8.3 — Anonymous-mode hardening

_Status:_ not started · _Size:_ ~1–2 PRs · _Owner:_ TBD · _Deps:_ F8.1, F8.2 + any surface that touches session data
_Status:_ in flight ([tracker](./features/f8.3.md)) · _Size:_ ~1–2 PRs · _Owner:_ TBD · _Deps:_ F8.1, F8.2 + any surface that touches session data

Verification pass across every surface that touches session data, ensuring no PII leak when `anonymousMode = true`. Flag-gating tightened where needed.

_Indicative tasks:_

- Audit every read path that touches `AppQuestionnaireUserProfile` for anonymous-mode gating.
- Audit every read path that touches `AppRespondentProfileSnapshot` for anonymous-mode gating.
- Audit exports + analytics + admin UI.
- Integration tests that flip the flag and assert PII absence on every surface.
- Documentation of the anonymous-mode contract.
Expand Down
104 changes: 104 additions & 0 deletions .context/app/planning/features/f8.3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
feature: F8.3
title: Anonymous-mode hardening (+ respondent profile collection)
phase: P8 — Admin analytics, exports, anonymous mode
status: in flight
owner: TBD
deps: F8.1 (analytics surfaces), F8.2 (result exports), F7.4 (PDF export), F6.1 (sessions)
opened: 2026-06-09
plan: .context/app/planning/development-plan.md#f83--anonymous-mode-hardening
docs: .context/app/questionnaire/anonymous-mode.md
---

# F8.3 — Anonymous-mode hardening (+ respondent profile collection)

> Committable tracker for **F8.3**. The cross-surface PII pass: guarantee no respondent
> identity leaks when a version's `anonymousMode = true`, across exports, analytics, and
> the admin UI. Pulls in the previously-unbuilt **respondent profile collection** (model +
> form + capture) so the snapshot is the thing anonymous mode must gate. Gated by
> `APP_QUESTIONNAIRES_ENABLED`.

## Intent

`anonymousMode` had, until now, meant "open / no-invitation" — and the authenticated
anonymous-direct path still bound `respondentUserId` ("the admin-side identity redaction is
a later phase", per the F6.1 seam comment). F8.3 is that phase. It makes the flag a real
PII contract honoured at every data boundary, and adds the profile-collection capability
(net-new — no form, capture, or storage existed) so there's a respondent profile for the
contract to gate.

## Decisions (confirmed with the user)

- **Full scope, including the form.** Build the `AppRespondentProfileSnapshot` model, the
capture seam, the read-path redaction, AND the respondent-facing profile form — not just
the server contract.
- **D1 — no row, not an empty row.** An anonymous session writes **no** snapshot at all.
Absence is the strongest, most testable invariant (a test asserts `create` was never
called). Read paths additionally null the profile when anonymous (defence in depth).
- **Form only on the non-anonymous surface.** The profile form renders only for a
non-anonymous version with profile fields (`loadStartContext` gate); the anonymous /
no-login flow never shows it. So `anonymousMode = true` ⇒ no PII collected at all.
- **Analytics: guard + k-anonymity.** `K_ANONYMITY_THRESHOLD = 5`. Below that many
non-preview sessions, granular detail is withheld (distributions detail, funnel counts,
cost top-sessions). Aggregate spend is always returned (no identity). Cost additionally
drops the per-session table whenever anonymous (session ids re-identify).
- **Modelled `User` FK with `onDelete: Cascade`.** The snapshot is PII, so it breaks the
deferred-UG-1 plain-scalar posture and cascades natively — `eraseUser()` needs no hook.

## Build shape (branch `feat/F8.3-anonymous-mode-hardening`)

- **Privacy constants** — `lib/app/questionnaire/analytics/privacy.ts`:
`K_ANONYMITY_THRESHOLD = 5`, `isCohortSuppressed(n)`. Pure / client-safe so admin panels
import the threshold for labels. Re-exported from `analytics/index.ts`.
- **Model + migration** — `AppRespondentProfileSnapshot` in `app-questionnaire.prisma`
(migration `20260609062611_app_respondent_profile_snapshot`, `--create-only` + phantom
pgvector DROPs stripped). Reverse relation on `AppQuestionnaireSession` and on `User`.
- **Profile validation** — `lib/app/questionnaire/profile/profile-values.ts`:
`validateProfileValues` (strict — rejects unknown keys, enforces required, coerces
types), `parseProfileFields`, `asProfileValues`. Reused by the form and the seam.
- **Capture seam** — `questionnaire-sessions/_lib/create.ts`: `createSessionFromInvitation`
threads `profileValues`, validates, and writes the snapshot inside the create transaction
— only when non-anonymous and values present. The version-direct and no-login paths never
capture. Route (`questionnaire-sessions/route.ts`) accepts `profileValues` on the
invitation body.
- **Respondent form** — `components/app/questionnaire/profile/profile-start-form.tsx`
(react-hook-form + Zod + `FieldHelp`), gated onto the start page by
`lib/app/questionnaire/chat/start-context.ts` (`loadStartContext`).
- **Read-path redaction** — `profile` carried + gated in `export/results-loader.ts` (+
`results-types.ts`, `results-serialize.ts` `respondent_profile` column),
`questionnaire-sessions/_lib/session-export.ts`, `export/build-session-export-model.ts` (+
`types.ts`), rendered in `components/app/questionnaire/export/session-pdf-document.tsx`.
- **Analytics hardening** — `analytics/{cost,funnel,distributions}.ts` gain cohort
suppression + (cost) the anonymous guard; new `suppressed` / `topSessionsSuppressed`
result fields + `{ kind: 'suppressed' }` distribution variant in `views.ts`; admin panels
render the suppressed states.

## Anonymous-mode contract

The canonical statement of the invariant, the per-surface gate table, the snapshot rule,
k-anonymity, and the erasure cascade lives in
[`../../app/questionnaire/anonymous-mode.md`](../../app/questionnaire/anonymous-mode.md).

## Tests

- `tests/integration/.../questionnaire-sessions/profile-snapshot.test.ts` — capture
invariants (the core: anonymous never writes a snapshot; invalid/empty rejected/skipped;
no capture on resume).
- `tests/unit/lib/app/questionnaire/profile/profile-values.test.ts` — the validator.
- `tests/unit/lib/app/questionnaire/analytics/{cost,funnel,distributions}.test.ts` —
suppression + anonymous guard (existing cohorts bumped to ≥5; suppression tests added).
- `tests/unit/lib/app/questionnaire/export/*` + `session-export.test.ts` — profile
surfaced when not anonymous, dropped when anonymous.
- `tests/unit/prisma/app-questionnaire-schema.test.ts` — model shape + **both FKs
`ON DELETE CASCADE`** (the GDPR erasure contract).

## Erasure

`AppRespondentProfileSnapshot.user` is `onDelete: Cascade`, so `eraseUser()` removes the
snapshot via the native cascade — no cleanup hook. Noted in
[`../../privacy/data-erasure.md`](../../privacy/data-erasure.md).

## No CHANGELOG entry

App-owned models/routes are not part of the Sunrise platform surface — per the repo's
platform-scoped CHANGELOG policy, F8.3 adds no `CHANGELOG.md` bullet.
1 change: 1 addition & 0 deletions .context/app/questionnaire/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ plan and feature trackers, see [`../planning/`](../planning/); for the platform
| [`per-turn-orchestrator.md`](./per-turn-orchestrator.md) | The live streaming turn loop — pure orchestrator, SSE route, 3 access scenarios (incl. no-login anonymous), streamed offers (F6.1) |
| [`cost-cap-enforcement.md`](./cost-cap-enforcement.md) | Per-session USD budget at the turn boundary — soft wrap-up nudge at 90%, hard 402 + auto-pause at 100%, summed turn cost, dark-launch flag (F6.3) |
| [`answer-slot-panel.md`](./answer-slot-panel.md) | The live respondent answer panel beside the chat — `GET …/answers` read endpoint, scope config, confidence language, Revisit wiring (F7.2) |
| [`anonymous-mode.md`](./anonymous-mode.md) | The cross-surface PII contract — per-surface gates, the profile snapshot rule, k-anonymity suppression, erasure cascade (F8.3) |

## Where the code lives

Expand Down
73 changes: 73 additions & 0 deletions .context/app/questionnaire/anonymous-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Anonymous mode — the PII contract (F8.3)

A version-level boolean, `AppQuestionnaireConfig.anonymousMode` (default `false`), governs
whether respondent identity may be **collected, persisted, or surfaced** for that version.
F8.3 hardens the guarantee across every surface that touches session data and adds the
respondent **profile snapshot** (collected only on the non-anonymous surface).

## The invariant

When `anonymousMode = true`, for that version:

- **No identity is persisted that links a session to a person.** Authenticated
anonymous-direct and no-login sessions bind `respondentUserId = null` / mint a signed
token; profile fields are never collected.
- **No identity reaches any admin read surface.** Respondent name, the profile snapshot,
and raw conversational turns are all dropped at the **data boundary** (the loader /
aggregator), not merely hidden in the UI.
- **Granular analytics that could re-identify a small cohort are withheld** (k-anonymity).

Anonymity is about not linking data to a person — it is **not** about redacting the survey
data itself. Structured answer _values_ are always exported (they're the point of the
export); what's withheld is identity, free-text prose, and small-cohort detail.

## Per-surface gates

| Surface | File | Behaviour when `anonymousMode = true` |
| ---------------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| Authed-direct session create | `questionnaire-sessions/_lib/create.ts` | `respondentUserId` bound, but **no profile snapshot** ever written |
| No-login session create | `questionnaire-sessions/_lib/create.ts` (`createAnonymousSession`) | `respondentUserId = null`; no profile |
| Profile capture | `create.ts` (`resolveProfileCapture`) | Skipped entirely — short-circuits on the anonymous flag |
| Single-session PDF | `questionnaire-sessions/_lib/session-export.ts` + `export/build-session-export-model.ts` | Identity query skipped; `respondent` and `profile` null |
| Bulk CSV/JSON export | `export/results-loader.ts` | Names skipped; `turns = []`; `profile = null` per session |
| Distributions analytics | `analytics/distributions.ts` | Identity-free by construction; small-cohort detail suppressed (below) |
| Funnel analytics | `analytics/funnel.ts` | Counts-only; small-cohort counts suppressed |
| Cost analytics | `analytics/cost.ts` | Per-session spend table dropped (session ids are a re-identification handle) |
| Invitations | `questionnaires/[id]/invitations/_lib/read.ts` | Orthogonal — invitations are the _invited_ (non-anonymous) surface; an anonymous version has none |

## The profile snapshot rule

`AppRespondentProfileSnapshot` (1:1 with a session) holds the `profileFields` values a
respondent supplied at session start. **Decision D1 — no row, not an empty row:** an
anonymous session writes **no** snapshot at all. Absence is the strongest, most testable
invariant — a test asserts `appRespondentProfileSnapshot.create` was never called, and
there is structurally no PII at rest. Read paths additionally null the profile when
anonymous, as defence in depth.

Capture happens only on the **invitation** surface (always non-anonymous), at session
create, inside the same transaction. The respondent form lives at
`components/app/questionnaire/profile/profile-start-form.tsx`, gated by
`loadStartContext` so it renders only for a non-anonymous version with profile fields.

## k-anonymity suppression

`K_ANONYMITY_THRESHOLD = 5` (`analytics/privacy.ts`, client-safe so admin panels label it).
Below this many non-preview sessions, granular analytics detail is withheld — a tiny
sample can re-identify an individual answer. Applied at the aggregator:

- **distributions** — per-question `detail` becomes `{ kind: 'suppressed' }`, counts zeroed,
result `suppressed: true`.
- **funnel** — all stage + anonymous counts zeroed, `suppressed: true`.
- **cost** — the top-spend-session table emptied (`topSessionsSuppressed: true`); aggregate
spend (total / by-capability / trend) carries no identity and is always returned.

An empty cohort (`0` sessions) is **not** "suppressed" — it genuinely has no data.

## Erasure

`AppRespondentProfileSnapshot` is the **first** questionnaire model with a modelled `User`
FK (the deferred-UG-1 "plain String, no relation" posture is deliberately broken because
this row IS personal data). Both FKs declare `onDelete: Cascade`: the session FK (owned
data) and the user FK (personal data). The user cascade means `eraseUser()`'s
`prisma.user.delete()` removes the snapshot natively — **no erasure hook needed**. See
[`../../privacy/data-erasure.md`](../../privacy/data-erasure.md).
5 changes: 5 additions & 0 deletions .context/app/questionnaire/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@ needs no separate model (the same shape precedent as `audience` / `typeConfig`).
`key` is a unique lowercase slug; `options` is required (non-empty, distinct) for
`select` and forbidden for every other type.

The values a respondent supplies for these fields are collected at session start and
persisted to `AppRespondentProfileSnapshot` — **only on the non-anonymous surface**. When
`anonymousMode = true` no profile is collected, stored, or surfaced. See
[`anonymous-mode.md`](./anonymous-mode.md) for the full PII contract (F8.3).

## Lazy materialization

No config row exists until the admin first saves — this keeps the F1.1 ingest path
Expand Down
14 changes: 14 additions & 0 deletions .context/app/questionnaire/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,4 +271,18 @@ questionSlotId)` (the upsert unique), with `value Json`, `provenanceLabel`,
Migration hand-stripped of the phantom pgvector DROPs (schema-shape test guards the
strip). See [`per-turn-orchestrator.md`](./per-turn-orchestrator.md).

### Respondent profile snapshot (F8.3 — P8)

- **`AppRespondentProfileSnapshot`** (F8.3, migration
`20260609062611_app_respondent_profile_snapshot`) — the `profileFields` values a
respondent supplied at session start, 1:1 with a session (`sessionId @unique`). `values
Json` (keyed by field `key`), `respondentUserId String?` denormalised from the session.
**The first questionnaire model with a modelled `User` FK** — the deferred-UG-1
"plain String, no `@relation`" posture is deliberately broken because this row IS personal
data and must cascade on erasure. Both FKs `onDelete: Cascade`: the session FK (owned
data) and the user FK (so `eraseUser()` removes it natively, no hook). **Never written for
an anonymous session** (no row, not an empty row). Migration hand-stripped of the phantom
pgvector DROPs (schema-shape test guards the strip + asserts both cascades). See
[`anonymous-mode.md`](./anonymous-mode.md).

_Later phases extend this file. Each documents its models here as it lands._
7 changes: 7 additions & 0 deletions .context/privacy/data-erasure.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,13 @@ Two failure modes if the migration FK is wrong:

So the migration FK with an explicit `ON DELETE` is **mandatory**, not optional.

**ConQuest exception — a modelled `User` FK.** `AppRespondentProfileSnapshot` (F8.3,
respondent profile values, PII) deliberately breaks the plain-scalar pattern: it adds a
real `@relation` with a reverse field on `User`, FK `onDelete: Cascade`. Because it
cascades natively, `eraseUser()` removes it with no cleanup hook. It is the one app table
that _is_ caught by the schema-level `@relation onDelete` review. See
[`../app/questionnaire/anonymous-mode.md`](../app/questionnaire/anonymous-mode.md).

### What the FK cascade can't do — register a cleanup hook

A `CASCADE` FK is erased automatically by `prisma.user.delete()`. But, exactly as
Expand Down
16 changes: 16 additions & 0 deletions app/(protected)/questionnaires/start/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ import {
createOrResumeAuthedSession,
type AuthedSessionRequest,
} from '@/lib/app/questionnaire/chat/session-bootstrap';
import { loadStartContext } from '@/lib/app/questionnaire/chat/start-context';
import { ProfileStartForm } from '@/components/app/questionnaire/profile/profile-start-form';

export const metadata: Metadata = {
title: 'Start questionnaire',
Expand Down Expand Up @@ -53,6 +55,20 @@ export default async function StartQuestionnairePage({
? `?invitationToken=${encodeURIComponent(sp.invitationToken)}`
: `?versionId=${encodeURIComponent(sp.versionId ?? '')}`;
clearInvalidSession(`/questionnaires/start${query}`);
return null; // unreachable — clearInvalidSession redirects
}

// F8.3: a non-anonymous questionnaire with profile fields collects them BEFORE the
// session is created. The form posts the values back into the create route (which
// writes the snapshot atomically); a resumable session skips straight to the chat.
const context = await loadStartContext(request, session.user.id);
if (context.kind === 'resume') {
redirect(`/questionnaires/${context.sessionId}`);
}
if (context.kind === 'needs-profile' && 'invitationToken' in request) {
return (
<ProfileStartForm invitationToken={request.invitationToken} fields={context.profileFields} />
);
}

const result = await createOrResumeAuthedSession(request);
Expand Down
Loading