Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
327 changes: 327 additions & 0 deletions .kiro/specs/ai-budgeting-chat/design.md

Large diffs are not rendered by default.

141 changes: 141 additions & 0 deletions .kiro/specs/ai-budgeting-chat/requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Requirements — AI Budgeting Chat (Human-in-the-Loop)

Traces to **GitHub Issue #8**. Acceptance criteria use EARS form
(WHEN … THE SYSTEM SHALL …). Each requirement is testable; the test mapping lives
in `tasks.md`.

---

## R1 — Conversational budget Q&A

**User story:** As a budgeter, I want to ask questions about my budget in plain
language so I can understand my envelopes without reading tables.

- R1.1 — WHEN the user sends a message, THE SYSTEM SHALL call `claude-opus-4-8`
with the budgeting system prompt and the conversation history, and return the
assistant's reply text.
- R1.2 — WHEN the assistant needs current budget state, THE SYSTEM SHALL expose a
`get_budget` tool that returns Ready-to-Assign and per-envelope
Assigned/Activity/Available, computed by the existing budget logic.
- R1.3 — WHEN the assistant needs transaction history, THE SYSTEM SHALL expose a
`get_transactions` tool returning a compact list (date, payee, amount, account).
- R1.4 — WHEN a read tool is called, THE SYSTEM SHALL execute it server-side,
feed the result back to the model, and continue until the model produces a final
reply — bounded by `MAX_TOOL_ITERATIONS` (8).
- R1.5 — THE SYSTEM SHALL NOT recompute any balance, Ready-to-Assign, or delta in
chat code; read tools delegate to `server/utils` (hledger remains the source of
truth).

## R2 — Human-in-the-loop proposed actions (safety-critical)

**User story:** As a budgeter, I want the assistant to suggest assignments and
transfers that I approve before anything changes, so the AI never moves my money
on its own.

- R2.1 — WHEN the assistant decides to assign or transfer, THE SYSTEM SHALL
surface it as a **proposed action** (`assign_to_envelope` /
`transfer_between_envelopes`) with a human-readable summary and the payload.
- R2.2 — WHEN a proposed-action tool is emitted by the model, THE SYSTEM SHALL NOT
execute it and SHALL NOT write to the journal. *(Load-bearing — see R6.)*
- R2.3 — WHEN the user approves a proposed action, THE SYSTEM SHALL commit it by
calling the existing `POST /api/budget/assign` or `POST /api/budget/transfer`
endpoint with the proposal payload.
- R2.4 — WHEN the user rejects a proposed action, THE SYSTEM SHALL NOT write
anything and SHALL inform the model the action was rejected.
- R2.5 — WHEN a commit endpoint rejects the request (e.g. the assign
availability gate, non-positive amount), THE SYSTEM SHALL surface the error in
the chat and SHALL NOT mark the action committed.
- R2.6 — WHEN an action is committed, THE SYSTEM SHALL reflect the result back to
the model so the conversation stays consistent, and the budget view SHALL
refresh to show the change.

## R3 — Conversation protocol integrity

**User story:** As a developer, I want the chat to stay stateless and protocol-valid
so it never wedges the API.

- R3.1 — THE SYSTEM SHALL be stateless: the full conversation history is passed
from the client each request and echoed back unchanged.
- R3.2 — WHEN a turn contains tool calls, THE SYSTEM SHALL ensure every `tool_use`
block receives a matching `tool_result` before the next assistant turn.
- R3.3 — WHEN read tools and a proposed action occur in the same turn, THE SYSTEM
SHALL compute the read results, return them with the proposal, and resume only
once all `tool_use` blocks in that turn are resolved.
- R3.4 — WHEN the user sends a new message while a proposal is still un-acted, THE
SYSTEM SHALL auto-resolve the pending proposal as rejected before processing the
new message (no dangling `tool_use`).

## R4 — Configuration & failure handling

- R4.1 — THE SYSTEM SHALL resolve the Anthropic API key as
`process.env.ANTHROPIC_API_KEY` (override) else the in-app stored key (see R7);
no `nuxt.config.ts` change.
- R4.2 — WHEN no key is configured (neither env nor stored), THE SYSTEM SHALL
return HTTP 503 with a clear message, and the chat panel SHALL show a "configure
your API key" empty state (linking to Settings) rather than erroring.
- R4.3 — WHEN the model returns `stop_reason: "refusal"`, THE SYSTEM SHALL return
the refusal as assistant text with no proposed actions.
- R4.4 — WHEN `MAX_TOOL_ITERATIONS` is reached, THE SYSTEM SHALL return the
partial result with a note and SHALL NOT loop indefinitely.
- R4.5 — WHEN the Anthropic call fails (network/5xx after SDK retries), THE SYSTEM
SHALL surface a friendly error in the chat and leave the conversation resumable.

## R5 — Data egress transparency

- R5.1 — THE SYSTEM SHALL display a persistent, visible notice in the chat panel
that messages and budget data are sent to the Anthropic API to generate replies.
- R5.2 — THE SYSTEM SHALL NOT persist the API key anywhere except the environment,
and SHALL NOT log message content or the key.

## R6 — Verifiable HITL guarantee (test requirement)

- R6.1 — A test SHALL drive the tool loop with a mocked Anthropic client emitting
an `assign_to_envelope` `tool_use` and assert the journal writer / assign
endpoint is called **zero** times and a `proposedAction` is returned instead.
- R6.2 — A test SHALL assert the resume path commits only via the existing endpoint
after an `approved` resolution.

---

## R7 — In-app API-key configuration (amendment)

**User story:** As a user, I want to set my Anthropic API key in the app so I can
enable the chat without editing environment variables or restarting the server.

- R7.1 — THE SYSTEM SHALL let the user save an API key from the Settings page,
persisted to gitignored `config/ai-config.json`.
- R7.2 — WHEN a key is saved, THE SYSTEM SHALL use it on the next request without a
server restart (the client is rebuilt when the resolved key changes).
- R7.3 — `process.env.ANTHROPIC_API_KEY` SHALL take precedence over the stored key.
- R7.4 — THE SYSTEM SHALL NEVER return the API key in full from any endpoint — only
a masked form (last 4 chars) — and SHALL NEVER log it (extends R5.2).
- R7.5 — WHEN saving, THE SYSTEM SHALL reject an empty, whitespace-only,
whitespace-containing, or too-short key with HTTP 400 and SHALL NOT persist it.
- R7.6 — WHEN the user clears the stored key, THE SYSTEM SHALL remove it but leave
any `ANTHROPIC_API_KEY` env var intact, and report the resulting state.
- R7.7 — THE SETTINGS UI SHALL show whether a key is configured and its source,
and SHALL indicate when an env var overrides a stored key.

## Non-functional requirements

- **NFR1 — Separation of concerns:** chat route = HTTP glue + tool loop; read
tools delegate to `server/utils`; no accounting math in chat/AI code; the panel
fetches only through `composables/useAiChat`.
- **NFR2 — Prompt caching:** system prompt + tool definitions form a stable,
deterministically-ordered, `cache_control: ephemeral` prefix; volatile budget
state is fetched via `get_budget`, never embedded in the prefix.
- **NFR3 — Type safety:** no `any`/unnecessary `as` in source; the opaque
Anthropic history is cast to `MessageParam[]` only at the validated SDK
boundary. (`any` allowed in tests for mocking.)
- **NFR4 — Windows/CRLF & money:** read tools reuse existing utils, inheriting
CRLF-safe parsing and integer-cent handling; no new parsing paths.
- **NFR5 — Verification:** `npx vitest run` and `npx nuxi typecheck` both clean.
- **NFR6 — Map upkeep:** `AI-MAP.md` updated by the main agent after implementation.

## Out of scope

- Streaming responses (deferred; wire contract unchanged by a later SSE switch).
- CSV import (#9).
- Multi-user auth / per-user keys.
- Persisting chat history across reloads.
- Creating new envelopes/categories via chat (assign + transfer only for v1).
163 changes: 163 additions & 0 deletions .kiro/specs/ai-budgeting-chat/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Tasks — AI Budgeting Chat (Human-in-the-Loop)

Ordered, independently verifiable. Each task notes files, tests, and the
requirement(s) it covers. Implement one at a time; run the listed tests + mark
`- [x]` before moving on. Do not commit until I say so.

Convention: `*.test.ts` beside source; API tests under `server/**/__tests__/`;
mock Nitro globals with `vi.stubGlobal()`; `any` allowed in tests for SDK mocks.

---

- [x] **T1 — Dependency + shared Anthropic client**
- Add `@anthropic-ai/sdk` to `package.json` (`npm install @anthropic-ai/sdk`).
- New `server/utils/anthropic.ts`: `getAnthropic()` returns a singleton client
reading `process.env.ANTHROPIC_API_KEY`; export `MissingApiKeyError`; export
`MODEL = 'claude-opus-4-8'` and shared request defaults (adaptive thinking,
`effort: 'medium'`, `max_tokens: 4096`).
- **Tests** (`anthropic.test.ts`): `getAnthropic()` throws `MissingApiKeyError`
when the env var is unset; returns a client when set (stub `process.env`).
- **Covers:** R4.1, R4.2, NFR2 (defaults live here). _Verify:_ `vitest run server/utils/anthropic.test.ts`, typecheck.

- [x] **T2 — Extract `getBudgetReport` (refactor, no behavior change)**
- New `server/utils/budgetReport.ts`: `getBudgetReport(period: string)` = the
report-building body of `budget.get.ts`.
- Edit `budget.get.ts` to validate the period then delegate to it (thin wrapper).
- **Tests:** existing `budget.get` route tests must still pass; add a direct unit
test for `getBudgetReport` (default + a period).
- **Covers:** R1.2, R1.5, NFR1. _Verify:_ `vitest run` on the budget route + new test; typecheck.

- [x] **T3 — `getTransactionList` read helper**
- New `server/utils/transactionList.ts`: `getTransactionList({startDate?,endDate?,account?})`
→ `hledgerExec(['print', …])` + `transformTransactions`, shaped to
`{date, payee, amount, account}[]`. Reuse `isValidDate`/`isValidAccount`
guards; pass account after `--`.
- **Tests** (`transactionList.test.ts`): shaping + that invalid date/account are
rejected; CRLF-safe (mock `hledgerExec`).
- **Covers:** R1.3, R1.5, NFR4. _Verify:_ `vitest run server/utils/transactionList.test.ts`, typecheck.

- [x] **T4 — Wire types**
- New `types/ai.ts`: `AssignProposalPayload`, `TransferProposalPayload`,
`ProposedAction`, `ChatResolution`, `AiChatRequest`, `AiChatResponse` (per
design.md).
- **Covers:** R3.1, NFR3. _Verify:_ typecheck only.

- [x] **T5 — System prompt**
- New `server/ai/budgetInstructions.ts`: `BUDGET_SYSTEM_PROMPT` (markdown string)
— YNAB Rule 1, envelope conventions (strip prefixes, "Envelope" label),
"propose, never execute; one action per turn", tone, and that it must call
`get_budget` for live numbers rather than guessing.
- **Tests** (`budgetInstructions.test.ts`): non-empty; asserts a couple of
load-bearing phrases (propose-don't-execute; YNAB Rule 1) so the safety framing
can't silently regress.
- **Covers:** R2.1, NFR2. _Verify:_ `vitest run server/ai`, typecheck.

- [x] **T6 — Tool definitions + read handlers**
- New `server/utils/aiTools.ts`: deterministically-ordered `TOOLS` with
`cache_control` on the last definition; `input_schema`s mirroring the
assign/transfer request bodies and the read queries; `READ_TOOL_HANDLERS`
(`get_budget`→`getBudgetReport`, `get_transactions`→`getTransactionList`);
`isProposedActionTool(name)` classifier; a `toProposedAction(toolUse)` mapper
building `ProposedAction` + summary.
- **Tests** (`aiTools.test.ts`): read handler delegates to the right util;
classifier flags assign/transfer as proposed actions and reads as reads;
`toProposedAction` builds correct payload + summary.
- **Covers:** R1.2, R1.3, R2.1, NFR1, NFR2. _Verify:_ `vitest run server/utils/aiTools.test.ts`, typecheck.

- [x] **T7 — Chat route: the HITL tool loop (safety-critical)**
- New `server/api/ai/chat.post.ts`: read `AiChatRequest`; cast `messages` to
`MessageParam[]` at the boundary; on resume, append a `tool_result` user turn
from `readToolResults` + `resolutions`. Run the manual loop: read tools execute
& feed back; **proposed-action tools are surfaced, never executed**; bound by
`MAX_TOOL_ITERATIONS`. Handle `refusal`, missing key (503), and API errors.
Return `AiChatResponse`.
- **Tests** (`server/api/ai/__tests__/chat.post.test.ts`), mock the SDK:
- **R6.1 (load-bearing):** model emits `assign_to_envelope` → assert
`appendTransaction`/assign endpoint called **0×**, `proposedActions`
non-empty.
- read-tool dispatch → feeds `tool_result`, loops to `end_turn`.
- **R6.2:** resume with `approved` resolution → `tool_result` turn appended,
loop resumes.
- pending-proposal supersede (R3.4); refusal (R4.3); iteration cap (R4.4);
missing key → 503 (R4.2).
- **Covers:** R1.1, R1.4, R2.2–R2.6, R3.2–R3.4, R4.2–R4.5, R6. _Verify:_
`vitest run server/api/ai`, typecheck.

- [x] **T8 — `useAiChat` composable**
- New `composables/useAiChat.ts`: reactive `messages`/`reply`/`proposedActions`/
`pending`/`error`; `send(text)`; `approve(action)` (calls the existing
assign/transfer composable/endpoint, then resumes `/api/ai/chat` with an
`approved` resolution + triggers budget refresh); `reject(action)`. Auto-reject
un-acted proposals when `send` is called (R3.4). No business logic.
- **Tests** (`useAiChat.test.ts`): send round-trip (mock `$fetch`); approve
commits via the endpoint then resumes; reject resumes without committing;
supersede behavior.
- **Covers:** R2.3–R2.6, R3.1, R3.4, NFR1. _Verify:_ `vitest run composables/useAiChat.test.ts`, typecheck.

- [x] **T9 — Chat panel UI**
- New `components/AiChatPanel.vue`: Nuxt UI chat suite (`UChatMessages`,
`UChatMessage`, `UChatPrompt`, `UChatPromptSubmit`); a `UCard` proposed-action
card with Approve/Reject + the action summary; persistent egress notice (R5.1);
no-API-key empty state (R4.2); error display (R4.5). Renders only `reply` +
`proposedActions` (never raw history).
- Edit `pages/budget.vue` to mount the panel (slideover or side panel).
- **Tests:** light component test if practical (render notice + card states);
otherwise covered by manual run + the composable tests. State plainly which.
- **Covers:** R2.1, R2.3, R2.4, R4.2, R4.5, R5.1. _Verify:_ typecheck; `npm run dev` smoke check.

- [x] **T10 — Egress/logging hygiene pass**
- Confirm no `console.log` of message content or the key anywhere in the new
code; the notice is present and persistent.
- **Covers:** R5.1, R5.2. _Verify:_ grep the new files; typecheck.

- [x] **T11 — Full verification + map update**
- `npx vitest run` (all green) and `npx nuxi typecheck` (0 errors).
- Manual smoke: ask a question (reads), get a proposal, approve (commits + budget
refreshes), reject (no write); unset key → empty state.
- Main agent updates `AI-MAP.md`: `/api/ai/chat` route row; `anthropic.ts`,
`aiTools.ts`, `budgetReport.ts`, `transactionList.ts` util rows;
`useAiChat` composable; `AiChatPanel` component; budget-page panel; AI quirks
(HITL invariant, `ANTHROPIC_API_KEY` env, data egress).
- **Covers:** NFR5, NFR6. _Verify:_ both commands clean; map diff reviewed.

---

## Amendment — in-app API-key configuration (Issue #8, user-approved deviation)

- [x] **T12 — `aiConfig` util + key resolution**
- New `server/utils/aiConfig.ts`: `readStoredApiKey` (sync, guarded, never throws),
`writeStoredApiKey`/`clearStoredApiKey` (async), `maskApiKey` (last-4). Path
`config/ai-config.json` (gitignored).
- `server/utils/anthropic.ts`: `resolveApiKey` (env → stored), `getApiKeySource`,
`getAnthropic` rebuilds the client when the resolved key changes (no restart).
- **Tests:** `aiConfig.test.ts` (read/write/clear/mask, mocked fs); updated
`anthropic.test.ts` (env-overrides-stored precedence, none → throws).
- **Covers:** R4.1, R7.1–R7.4. _Verified:_ 15 tests green; typecheck.

- [x] **T13 — `GET/POST/DELETE /api/ai/config`**
- `config.get.ts` (`{configured, source, maskedKey}` — never full key);
`config.post.ts` (validate then `writeStoredApiKey`); `config.delete.ts`
(`clearStoredApiKey`, env left intact).
- **Tests** (`config.test.ts`): masked-only responses, validation 400s, env-override
source, clear behavior.
- **Covers:** R4.2, R7.4–R7.6. _Verified:_ 10 tests green; typecheck.

- [x] **T14 — Settings card + panel link**
- `pages/settings.vue`: "AI Assistant" card (status, source badge, masked key,
password input, Save, Clear-when-config-source). `AiChatPanel.vue` empty state
links to Settings.
- **Covers:** R4.2, R7.7. _Verified:_ typecheck; runtime curl flow (save → chat
no longer 503 → clear).

- [x] **T15 — Verify + adversarial review + spec/map**
- Full `vitest run` (384 green) + `nuxi typecheck` (0 errors); runtime probe of
all `/api/ai/config` verbs + the no-restart effect. Adversarial multi-agent
review of the secret handling. `design.md`/`requirements.md`/`AI-MAP.md` updated.

---

## Checkpoint

All tasks `- [x]`, `npx vitest run` and `npx nuxi typecheck` both clean, manual
HITL flow verified (propose → approve → commit; reject → no write; missing key →
empty state), `AI-MAP.md` updated. Then ready for commit/PR (PR body: `Fixes #8`).
Loading
Loading