Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,6 @@ CORS_ALLOWED_ORIGINS=http://localhost:5173,https://who-goes-to-try.hackathon.sev

# Frontend origin used to build the verification URI returned by POST /api/v1/auth/pairings.
# The extension opens FRONTEND_URL/extension/pair?code=<user_code> in the browser.
# Defaults in code to https://who-goes-to-try.hackathon.sev-2.com — this variable is optional.
# Set it only to override the default (e.g., point the extension at a local frontend during dev).
FRONTEND_URL=http://localhost:5173
16 changes: 16 additions & 0 deletions docs/ai-insights.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,19 @@ All require JWT or `dvf_` API token auth. The action endpoint also verifies owne
- **Extension polls every 60 s.** Worst case the popup is delayed by one poll interval. Acceptable for "consider a break" UX.

Cross-reference: this pipeline reads from `metrics_daily` and `metrics_session` populated by the [metrics ETL](metrics.md).

## Manual trigger (demo / debugging escape hatch)

`POST /api/v1/recommendations/trigger` lets a signed-in user force the insight pipeline to run immediately, instead of waiting on the 10-minute scheduler tick. **This is an escape hatch for demos and live debugging — not a production feature.** A logged-in user can produce a popup on demand by spamming the `demo` mode; this is acceptable per-user but not something to advertise to end users.

Body: `{ "mode": "real" | "force" | "demo" }`. Mode defaults to `"real"`.

| Mode | Behaviour |
| --- | --- |
| `real` | Invokes `evaluateUser` with all gates intact (cooldown, rules, Gemini). Returns `{ skipped, reason }` or `{ skipped: false, rule, state_type, recommendation_id }`. |
| `force` | Marks the user's latest pending recommendation as `expired`, then runs `evaluateUser`. Cooldown is bypassed; the rule + LLM gates still apply. |
| `demo` | Fabricates a canned `WorkflowState` (`state_type = 'demo'`) + `Recommendation`. No Gemini call. Returns HTTP 409 if the user has no active session. |

Every successful call emits a single `logger.info('recommendation-trigger', { user_id, mode, outcome })` line.

In the VSCode extension, **DevVital AI: Trigger Insight** (command palette) opens a quick-pick of the three modes. After POSTing, the extension immediately polls `/recommendations/pending` so the resulting popup surfaces within a second or two.
20 changes: 19 additions & 1 deletion docs/extension-pairing.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,24 @@ No authentication. Rate-limited to one request per second per `pairing_id`.

## Configuration

The backend builds `verification_uri` from `FRONTEND_URL`. Set this to the public origin of the frontend, e.g. `https://who-goes-to-try.hackathon.sev-2.com`. Defaults to `http://localhost:5173` in development.
The backend builds `verification_uri` from `FRONTEND_URL`. In application code this defaults to `https://who-goes-to-try.hackathon.sev-2.com`, so production pods do not need to inject the variable to get the right pairing URL. Set `FRONTEND_URL` only when you need to override the default — typically `http://localhost:5173` for local development.

```env
# Local development override
FRONTEND_URL=http://localhost:5173
```

After rollout, smoke-test the public pairing URL:

```bash
curl -s -X POST https://who-goes-to-try.hackathon.sev-2.com/api/v1/auth/pairings \
| jq -r '.verification_uri'
```

Expected output:

```text
https://who-goes-to-try.hackathon.sev-2.com/extension/pair
```

Expired rows are pruned every 5 minutes (`pairing.service.js#cleanupExpired`); the cleanup removes rows whose `expires_at` is more than one hour in the past so a slow extension still finds its row.
3 changes: 3 additions & 0 deletions k8s/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ spec:
imagePullPolicy: Always
ports:
- containerPort: 3000
env:
- name: FRONTEND_URL
value: "https://who-goes-to-try.hackathon.sev-2.com"
envFrom:
- secretRef:
name: who-goes-to-try-backend-secret
Expand Down
5 changes: 4 additions & 1 deletion k8s/ingress.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@ metadata:
namespace: who-goes-to-try
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
traefik.ingress.kubernetes.io/router.middlewares: who-goes-to-try-strip-api@kubernetescrd
spec:
ingressClassName: traefik
tls:
- hosts:
- who-goes-to-try.hackathon.sev-2.com
secretName: who-goes-to-try-tls-cert
rules:
- host: who-goes-to-try.hackathon.sev-2.com
http:
Expand Down
43 changes: 43 additions & 0 deletions openspec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -958,3 +958,46 @@ paths:
description: Recommendation not found
'409':
description: Recommendation already acted on

/recommendations/trigger:
post:
summary: Manually trigger an insight evaluation (demo / debugging escape hatch)
operationId: triggerRecommendation
security:
- bearerAuth: []
- cookieAuth: []
requestBody:
required: false
content:
application/json:
schema:
type: object
properties:
mode:
type: string
enum: [real, force, demo]
description: |
real: invoke evaluateUser with full gates (cooldown, rules, LLM).
force: expire latest pending recommendation, then evaluateUser.
demo: fabricate a canned recommendation; no LLM call.
Defaults to "real" when omitted.
responses:
'200':
description: Trigger processed. Body shape varies by mode and outcome.
content:
application/json:
schema:
type: object
properties:
skipped: { type: boolean }
reason: { type: string }
mode: { type: string }
rule: { type: string }
state_type: { type: string }
recommendation_id: { type: integer }
'400':
description: Validation failed (invalid mode)
'401':
description: Unauthorized
'409':
description: No active session (demo mode only)
2 changes: 2 additions & 0 deletions openspec/changes/add-demo-insight-trigger/.openspec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-13
94 changes: 94 additions & 0 deletions openspec/changes/add-demo-insight-trigger/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
## Context

The recommendation system in [insight-scheduler.js](src/services/insight-scheduler.js) runs `evaluateUser` on a configurable interval (default 10 min) for every active user. `evaluateUser` gates on Gemini being configured, cooldown not elapsed, a current session existing, at least one of four rules firing on `metrics_daily`, and the LLM returning a non-`normal` `state_type`. Any one failing → no recommendation → no popup. For a hackathon demo this means the audience may see nothing for the whole 10-minute window.

Today the extension [`RecommendationService`](../devFlowExtension/src/services/recommendationService.ts) polls `GET /recommendations/pending` every 60s and surfaces any row whose `user_action IS NULL` as a `vscode.window.showInformationMessage` toast. The plumbing is healthy; the input — `Recommendation` rows being created — is what's intermittent.

The proposed `POST /recommendations/trigger` endpoint plus the **DevVital AI: Trigger Insight** command give a presenter a button to force the popup to appear on demand, with graceful degradation: `real` is honest, `force` is "I know you should fire," `demo` is "the wifi is bad, but the slide must go on."

Constraints:
- No database schema migration. Reuse `workflow_states` and `recommendations`.
- No production hardening. JWT auth is the only gate.
- Bundle entirely inside this one change — no follow-up PRs needed for the demo to work.
- Do not interfere with the scheduler. The scheduler keeps running unchanged.

## Goals / Non-Goals

**Goals:**
- A presenter can produce a popup within a few seconds at any point during the demo.
- The "honest" path is preferred: `real` shows what users actually see; `force` shows "imagine the cooldown isn't blocking us"; `demo` is the last-resort fallback.
- Implementation is small enough to ship the same day.

**Non-Goals:**
- A general-purpose admin tool for manipulating recommendations.
- A way to *suppress* popups (we have cooldown for that).
- Rate limiting / abuse prevention. A logged-in user spamming `demo` mode only floods their own history.
- Telemetry on how often the demo trigger is used.
- Frontend UI to display "this recommendation was triggered manually." Doesn't matter for the demo, and adds scope.
- Replacing the four-rule engine or the LLM prompt. That's `llm-driven-insight-trigger`, a separate change.

## Decisions

### Decision 1: Single endpoint with `mode` field, not three separate endpoints

Why: All three modes share the same auth, the same response shape, the same logging. Splitting them into three URLs trades one POST for three URLs the frontend has to know about. One endpoint with a discriminated body is the standard REST shape for "do one of three related things."

Alternative considered: `POST /recommendations/trigger`, `POST /recommendations/trigger/force`, `POST /recommendations/trigger/demo`. Rejected — more route surface, more docs, no upside.

### Decision 2: `force` does NOT skip the no-rule-fired gate

Why: The point of `force` is to bypass *cooldown*, not to bypass the rules. If we made force always recommend, it would duplicate `demo` and confuse the demo story. Force is "imagine we weren't in cooldown — would a real recommendation fire right now?" If the answer is "no, no rule fires," that's an honest demo signal: the system is working, just doesn't see a reason to interrupt.

Alternative considered: `force` also bypasses the rules and always invokes the LLM. Rejected — see above.

### Decision 3: `demo` mode creates a real `WorkflowState` row

Why: The recommendations controller's `getPending` JOINs `recommendations → workflow_states → sessions`. A `Recommendation` without a `WorkflowState` and `Session` won't surface. We need a row in each table. The cheapest way is `state_type: 'demo'` so it's instantly recognisable as fake when grepping the DB later.

Alternative considered: SQL-insert the recommendation directly with `workflow_state_id` pointing to an existing row. Rejected — coupling to whatever happens to be the latest workflow state is fragile; just create a fresh one.

### Decision 4: Canned demo text is hardcoded, not configurable

Why: Configuration cost vs demo value is wildly imbalanced. A presenter can do one demo with one message. If they need a different message later, edit the constant.

The canned message: `"You've been heads-down for a while. Consider stepping away for 5 minutes — your next bug is probably hiding behind a clear head."` ≤ 240 chars (within the existing Recommendation text limit), second-person, one concrete action, matches the tone of LLM-generated messages.

### Decision 5: Force mode "expires" the latest pending recommendation, not all of them

Why: The cooldown logic in `getLatestRecommendationForUser` looks at the user's single most-recent recommendation, regardless of `user_action`. So we only need to flip the one most-recent row to `expired` to make the cooldown check pass. Going further (expiring all pending rows) is destructive and could mask bugs in the cooldown logic itself.

Alternative considered: temporarily ignore cooldown via a flag passed into `evaluateUser`. Rejected — `evaluateUser` would grow a `{ skipCooldown }` parameter that exists *only* for demo purposes. Worse separation than mutating the one row.

### Decision 6: The extension reuses the existing `RecommendationService` instance

Why: That service already owns the `apiBaseUrl` derivation and the `pollAndNotify` method. Adding a `triggerInsight(mode)` method to it keeps everything related in one file.

Alternative considered: Spawn a new `InsightTriggerService` in the extension. Rejected — strictly more code, no benefit.

### Decision 7: The quick-pick UI, not three palette commands

Why: One palette entry (**DevVital AI: Trigger Insight**) keeps the command list clean. A quick-pick is one extra click during the demo — trivial. Three separate commands would be `triggerInsightReal`, `triggerInsightForce`, `triggerInsightDemo`, which clutters the palette for occasional use.

Alternative considered: status bar item. Rejected for scope — the demo only needs the popup to fire, not a permanent UI surface.

## Risks / Trade-offs

- **Risk:** A user spams `demo` mode and floods their own recommendation history. → **Mitigation:** Document as known limitation. JWT auth means it stays scoped per user. Not exploitable cross-tenant.
- **Risk:** `force` mode races with the scheduler — scheduler tick concurrently calls `evaluateUser` for the same user. → **Mitigation:** The `inFlight` Set in [insight-scheduler.js:21](src/services/insight-scheduler.js#L21) is per-process, but the trigger endpoint runs in the same process as the scheduler. Add the same `inFlight` guard in the controller, or accept duplicate-call possibility and rely on the cooldown query (which now is post-`force`-expire, so it'd see no recent recommendation and proceed). For a hackathon demo, the race is acceptable; document and move on.
- **Risk:** `demo` mode creates a recommendation that bypasses the LLM, so any future audit ("show me Gemini's reasoning") finds `code_context.reasoning` is `null` or a string like `"Demo trigger — no LLM invocation."`. → **Mitigation:** Hardcode `code_context.reasoning = "Manually triggered demo recommendation; no Gemini call was made."`. Self-explanatory in the DB.
- **Risk:** The endpoint exists in production. → **Mitigation:** It's authenticated, no destructive side effects beyond the user's own row. If we later want to disable it in prod, add a single `if (process.env.INSIGHTS_TRIGGER_DEMO_ENABLED === 'false') return 404` guard. Not building that now.
- **Trade-off:** Bundling all three modes in one endpoint means a slightly larger PR surface than just shipping `demo`. Worth it because `real` + `force` exercise the real code path and give us a debugging tool beyond the demo.

## Migration Plan

1. Land the backend changes (route + controller + service helpers + openspec.yaml).
2. Deploy the backend image — no env var changes required.
3. Land the extension changes (package.json + extension.ts + recommendationService.ts).
4. Recompile + reload the extension.
5. Smoke-test the demo flow from the command palette: invoke the command three times, once per mode, verify the popup appears after `demo` and after `force` (assuming a recent recommendation exists for the cooldown bypass to matter), and that `real` either fires or returns a clear `skipped` reason.
6. **Rollback:** revert the backend commit + redeploy. The extension command becomes a no-op (404 → output channel warning). No data cleanup needed; any `state_type = 'demo'` rows are harmless artifacts.

## Open Questions

- Should the canned demo recommendation be visually marked (e.g., prefixed `[Demo]`) in the popup? Trade-off: more honest, less impressive on stage. Decision: leave unmarked for the demo punch; revisit if we keep the trigger long-term.
- Future: should there be an admin-only variant of `force` that lets you target another user's recommendation? Out of scope here, and probably never a good idea.
45 changes: 45 additions & 0 deletions openspec/changes/add-demo-insight-trigger/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
## Why

The insight/recommendation popup driven by [`insight-scheduler`](src/services/insight-scheduler.js) and [`insight-trigger.service`](src/services/insight-trigger.service.js) fires only when (a) the 10-minute scheduler tick runs, (b) one of four deterministic rules matches (very long session, long+high churn, rapid switching, delete-heavy rewriting), (c) the cooldown has elapsed, and (d) the LLM produces a non-normal `state_type`. In normal use this means a demo audience may sit for many minutes without ever seeing the popup — even though the underlying system is healthy. We need a way to **prove the end-to-end recommendation flow works**, on demand, during a hackathon demo or a live debugging session, without waiting on the scheduler or hoping rules trip.

## What Changes

- Add `POST /api/v1/recommendations/trigger` to the backend, authenticated with the same JWT/`dvf_` token middleware used by the rest of `/recommendations`. Body `{ mode: 'real' | 'force' | 'demo' }`, default `'real'`.
- `real`: invoke `evaluateUser(req.user.id)` once and return its result. Respects cooldown, Gemini config, and "no rule fired" gates exactly like the scheduler. Proves the *real* path.
- `force`: expire the user's latest pending recommendation first (so cooldown is moot), then `evaluateUser`. Still uses real rules + Gemini; useful when you know an insight *should* fire but cooldown is in the way.
- `demo`: fabricate a `WorkflowState` + `Recommendation` row with hardcoded canned text, no LLM call, no rule check. Bulletproof fallback when Gemini is unreachable or there's no real activity.
- Add `devvitalAI.triggerInsight` command to the extension. Surfaced in the command palette as **DevVital AI: Trigger Insight**. Opens a `showQuickPick` letting the user choose `real / force / demo`, POSTs to the new endpoint, then calls `recommendationService.pollAndNotify()` so the popup appears within the next poll tick (≤60s; immediate in practice).
- Document the endpoint and command in [docs/](docs/) so the demo runbook is reproducible.
- All three modes emit a `logger.info` line tagged `recommendation-trigger` for auditability.

Explicitly **out of scope**:
- Keyboard shortcuts (can be added later via user `keybindings.json`).
- Configurable demo text — the canned message is hardcoded.
- Rate limiting (the existing JWT auth is sufficient gate).
- Dashboard / UI in the frontend.
- Production hardening — this is a demo-tool escape hatch and the proposal should not be misread as a general-purpose feature.

## Capabilities

### New Capabilities
- `insight-triggering`: the device-driven entry point for invoking the LLM/rule-based insight pipeline on demand, distinct from the scheduler-driven path. Owns the `POST /recommendations/trigger` endpoint and the `mode` semantics.

### Modified Capabilities
<!-- none -->

## Impact

- Affected code (backend):
- [src/routes/recommendations.routes.js](src/routes/recommendations.routes.js) — new route.
- [src/controllers/recommendations.controller.js](src/controllers/recommendations.controller.js) — new `triggerRecommendation` handler.
- [src/services/insight-trigger.service.js](src/services/insight-trigger.service.js) — exports `evaluateUser` (already exported); add a `forceEvaluateUser` helper that expires latest then evaluates, and a `createDemoRecommendation` helper for the canned path.
- [openspec.yaml](openspec.yaml) — register the new endpoint + request/response schema for Ajv validation.
- Affected code (extension):
- [package.json](../devFlowExtension/package.json) — register `devvitalAI.triggerInsight` in `contributes.commands`.
- [src/extension.ts](../devFlowExtension/src/extension.ts) — register command handler; reuses the existing `RecommendationService` instance.
- [src/services/recommendationService.ts](../devFlowExtension/src/services/recommendationService.ts) — add a `triggerInsight(mode)` method.
- Docs: brief addition to recommendation docs explaining the demo command (no separate runbook file).
- No database schema migration — uses existing `workflow_states` and `recommendations` tables.
- No frontend dashboard change.
- Cost: `real` and `force` each consume one Gemini API call (same as a scheduler tick). `demo` is free.
- Risk: a signed-in user can spam `demo` mode to flood their own recommendation history. Acceptable for a hackathon; would not be acceptable for a production rollout (call out in design as a known limitation).
Loading
Loading