toolpathguy · toolpathguy · Jun 17, 2026 · Jun 17, 2026
diff --git a/.kiro/specs/ai-budgeting-chat/design.md b/.kiro/specs/ai-budgeting-chat/design.md
diff --git a/.kiro/specs/ai-budgeting-chat/requirements.md b/.kiro/specs/ai-budgeting-chat/requirements.md
@@ -0,0 +1,141 @@
+# Requirements — AI Budgeting Chat (Human-in-the-Loop)
+
+Traces to **GitHub Issue #8**. Acceptance criteria use EARS form
+(WHEN … THE SYSTEM SHALL …). Each requirement is testable; the test mapping lives
+in `tasks.md`.
+
+---
+
+## R1 — Conversational budget Q&A
+
+**User story:** As a budgeter, I want to ask questions about my budget in plain
+language so I can understand my envelopes without reading tables.
+
+- R1.1 — WHEN the user sends a message, THE SYSTEM SHALL call `claude-opus-4-8`
+  with the budgeting system prompt and the conversation history, and return the
+  assistant's reply text.
+- R1.2 — WHEN the assistant needs current budget state, THE SYSTEM SHALL expose a
+  `get_budget` tool that returns Ready-to-Assign and per-envelope
+  Assigned/Activity/Available, computed by the existing budget logic.
+- R1.3 — WHEN the assistant needs transaction history, THE SYSTEM SHALL expose a
+  `get_transactions` tool returning a compact list (date, payee, amount, account).
+- R1.4 — WHEN a read tool is called, THE SYSTEM SHALL execute it server-side,
+  feed the result back to the model, and continue until the model produces a final
+  reply — bounded by `MAX_TOOL_ITERATIONS` (8).
+- R1.5 — THE SYSTEM SHALL NOT recompute any balance, Ready-to-Assign, or delta in
+  chat code; read tools delegate to `server/utils` (hledger remains the source of
+  truth).
+
+## R2 — Human-in-the-loop proposed actions (safety-critical)
+
+**User story:** As a budgeter, I want the assistant to suggest assignments and
+transfers that I approve before anything changes, so the AI never moves my money
+on its own.
+
+- R2.1 — WHEN the assistant decides to assign or transfer, THE SYSTEM SHALL
+  surface it as a **proposed action** (`assign_to_envelope` /
+  `transfer_between_envelopes`) with a human-readable summary and the payload.
+- R2.2 — WHEN a proposed-action tool is emitted by the model, THE SYSTEM SHALL NOT
+  execute it and SHALL NOT write to the journal. *(Load-bearing — see R6.)*
+- R2.3 — WHEN the user approves a proposed action, THE SYSTEM SHALL commit it by
+  calling the existing `POST /api/budget/assign` or `POST /api/budget/transfer`
+  endpoint with the proposal payload.
+- R2.4 — WHEN the user rejects a proposed action, THE SYSTEM SHALL NOT write
+  anything and SHALL inform the model the action was rejected.
+- R2.5 — WHEN a commit endpoint rejects the request (e.g. the assign
+  availability gate, non-positive amount), THE SYSTEM SHALL surface the error in
+  the chat and SHALL NOT mark the action committed.
+- R2.6 — WHEN an action is committed, THE SYSTEM SHALL reflect the result back to
+  the model so the conversation stays consistent, and the budget view SHALL
+  refresh to show the change.
+
+## R3 — Conversation protocol integrity
+
+**User story:** As a developer, I want the chat to stay stateless and protocol-valid
+so it never wedges the API.
+
+- R3.1 — THE SYSTEM SHALL be stateless: the full conversation history is passed
+  from the client each request and echoed back unchanged.
+- R3.2 — WHEN a turn contains tool calls, THE SYSTEM SHALL ensure every `tool_use`
+  block receives a matching `tool_result` before the next assistant turn.
+- R3.3 — WHEN read tools and a proposed action occur in the same turn, THE SYSTEM
+  SHALL compute the read results, return them with the proposal, and resume only
+  once all `tool_use` blocks in that turn are resolved.
+- R3.4 — WHEN the user sends a new message while a proposal is still un-acted, THE
+  SYSTEM SHALL auto-resolve the pending proposal as rejected before processing the
+  new message (no dangling `tool_use`).
+
+## R4 — Configuration & failure handling
+
+- R4.1 — THE SYSTEM SHALL resolve the Anthropic API key as
+  `process.env.ANTHROPIC_API_KEY` (override) else the in-app stored key (see R7);
+  no `nuxt.config.ts` change.
+- R4.2 — WHEN no key is configured (neither env nor stored), THE SYSTEM SHALL
+  return HTTP 503 with a clear message, and the chat panel SHALL show a "configure
+  your API key" empty state (linking to Settings) rather than erroring.
+- R4.3 — WHEN the model returns `stop_reason: "refusal"`, THE SYSTEM SHALL return
+  the refusal as assistant text with no proposed actions.
+- R4.4 — WHEN `MAX_TOOL_ITERATIONS` is reached, THE SYSTEM SHALL return the
+  partial result with a note and SHALL NOT loop indefinitely.
+- R4.5 — WHEN the Anthropic call fails (network/5xx after SDK retries), THE SYSTEM
+  SHALL surface a friendly error in the chat and leave the conversation resumable.
+
+## R5 — Data egress transparency
+
+- R5.1 — THE SYSTEM SHALL display a persistent, visible notice in the chat panel
+  that messages and budget data are sent to the Anthropic API to generate replies.
+- R5.2 — THE SYSTEM SHALL NOT persist the API key anywhere except the environment,
+  and SHALL NOT log message content or the key.
+
+## R6 — Verifiable HITL guarantee (test requirement)
+
+- R6.1 — A test SHALL drive the tool loop with a mocked Anthropic client emitting
+  an `assign_to_envelope` `tool_use` and assert the journal writer / assign
+  endpoint is called **zero** times and a `proposedAction` is returned instead.
+- R6.2 — A test SHALL assert the resume path commits only via the existing endpoint
+  after an `approved` resolution.
+
+---
+
+## R7 — In-app API-key configuration (amendment)
+
+**User story:** As a user, I want to set my Anthropic API key in the app so I can
+enable the chat without editing environment variables or restarting the server.
+
+- R7.1 — THE SYSTEM SHALL let the user save an API key from the Settings page,
+  persisted to gitignored `config/ai-config.json`.
+- R7.2 — WHEN a key is saved, THE SYSTEM SHALL use it on the next request without a
+  server restart (the client is rebuilt when the resolved key changes).
+- R7.3 — `process.env.ANTHROPIC_API_KEY` SHALL take precedence over the stored key.
+- R7.4 — THE SYSTEM SHALL NEVER return the API key in full from any endpoint — only
+  a masked form (last 4 chars) — and SHALL NEVER log it (extends R5.2).
+- R7.5 — WHEN saving, THE SYSTEM SHALL reject an empty, whitespace-only,
+  whitespace-containing, or too-short key with HTTP 400 and SHALL NOT persist it.
+- R7.6 — WHEN the user clears the stored key, THE SYSTEM SHALL remove it but leave
+  any `ANTHROPIC_API_KEY` env var intact, and report the resulting state.
+- R7.7 — THE SETTINGS UI SHALL show whether a key is configured and its source,
+  and SHALL indicate when an env var overrides a stored key.
+
+## Non-functional requirements
+
+- **NFR1 — Separation of concerns:** chat route = HTTP glue + tool loop; read
+  tools delegate to `server/utils`; no accounting math in chat/AI code; the panel
+  fetches only through `composables/useAiChat`.
+- **NFR2 — Prompt caching:** system prompt + tool definitions form a stable,
+  deterministically-ordered, `cache_control: ephemeral` prefix; volatile budget
+  state is fetched via `get_budget`, never embedded in the prefix.
+- **NFR3 — Type safety:** no `any`/unnecessary `as` in source; the opaque
+  Anthropic history is cast to `MessageParam[]` only at the validated SDK
+  boundary. (`any` allowed in tests for mocking.)
+- **NFR4 — Windows/CRLF & money:** read tools reuse existing utils, inheriting
+  CRLF-safe parsing and integer-cent handling; no new parsing paths.
+- **NFR5 — Verification:** `npx vitest run` and `npx nuxi typecheck` both clean.
+- **NFR6 — Map upkeep:** `AI-MAP.md` updated by the main agent after implementation.
+
+## Out of scope
+
+- Streaming responses (deferred; wire contract unchanged by a later SSE switch).
+- CSV import (#9).
+- Multi-user auth / per-user keys.
+- Persisting chat history across reloads.
+- Creating new envelopes/categories via chat (assign + transfer only for v1).
diff --git a/.kiro/specs/ai-budgeting-chat/tasks.md b/.kiro/specs/ai-budgeting-chat/tasks.md
@@ -0,0 +1,163 @@
+# Tasks — AI Budgeting Chat (Human-in-the-Loop)
+
+Ordered, independently verifiable. Each task notes files, tests, and the
+requirement(s) it covers. Implement one at a time; run the listed tests + mark
+`- [x]` before moving on. Do not commit until I say so.
+
+Convention: `*.test.ts` beside source; API tests under `server/**/__tests__/`;
+mock Nitro globals with `vi.stubGlobal()`; `any` allowed in tests for SDK mocks.
+
+---
+
+- [x] **T1 — Dependency + shared Anthropic client**
+  - Add `@anthropic-ai/sdk` to `package.json` (`npm install @anthropic-ai/sdk`).
+  - New `server/utils/anthropic.ts`: `getAnthropic()` returns a singleton client
+    reading `process.env.ANTHROPIC_API_KEY`; export `MissingApiKeyError`; export
+    `MODEL = 'claude-opus-4-8'` and shared request defaults (adaptive thinking,
+    `effort: 'medium'`, `max_tokens: 4096`).
+  - **Tests** (`anthropic.test.ts`): `getAnthropic()` throws `MissingApiKeyError`
+    when the env var is unset; returns a client when set (stub `process.env`).
+  - **Covers:** R4.1, R4.2, NFR2 (defaults live here). _Verify:_ `vitest run server/utils/anthropic.test.ts`, typecheck.
+
+- [x] **T2 — Extract `getBudgetReport` (refactor, no behavior change)**
+  - New `server/utils/budgetReport.ts`: `getBudgetReport(period: string)` = the
+    report-building body of `budget.get.ts`.
+  - Edit `budget.get.ts` to validate the period then delegate to it (thin wrapper).
+  - **Tests:** existing `budget.get` route tests must still pass; add a direct unit
+    test for `getBudgetReport` (default + a period).
+  - **Covers:** R1.2, R1.5, NFR1. _Verify:_ `vitest run` on the budget route + new test; typecheck.
+
+- [x] **T3 — `getTransactionList` read helper**
+  - New `server/utils/transactionList.ts`: `getTransactionList({startDate?,endDate?,account?})`
+    → `hledgerExec(['print', …])` + `transformTransactions`, shaped to
+    `{date, payee, amount, account}[]`. Reuse `isValidDate`/`isValidAccount`
+    guards; pass account after `--`.
+  - **Tests** (`transactionList.test.ts`): shaping + that invalid date/account are
+    rejected; CRLF-safe (mock `hledgerExec`).
+  - **Covers:** R1.3, R1.5, NFR4. _Verify:_ `vitest run server/utils/transactionList.test.ts`, typecheck.
+
+- [x] **T4 — Wire types**
+  - New `types/ai.ts`: `AssignProposalPayload`, `TransferProposalPayload`,
+    `ProposedAction`, `ChatResolution`, `AiChatRequest`, `AiChatResponse` (per
+    design.md).
+  - **Covers:** R3.1, NFR3. _Verify:_ typecheck only.
+
+- [x] **T5 — System prompt**
+  - New `server/ai/budgetInstructions.ts`: `BUDGET_SYSTEM_PROMPT` (markdown string)
+    — YNAB Rule 1, envelope conventions (strip prefixes, "Envelope" label),
+    "propose, never execute; one action per turn", tone, and that it must call
+    `get_budget` for live numbers rather than guessing.
+  - **Tests** (`budgetInstructions.test.ts`): non-empty; asserts a couple of
+    load-bearing phrases (propose-don't-execute; YNAB Rule 1) so the safety framing
+    can't silently regress.
+  - **Covers:** R2.1, NFR2. _Verify:_ `vitest run server/ai`, typecheck.
+
+- [x] **T6 — Tool definitions + read handlers**
+  - New `server/utils/aiTools.ts`: deterministically-ordered `TOOLS` with
+    `cache_control` on the last definition; `input_schema`s mirroring the
+    assign/transfer request bodies and the read queries; `READ_TOOL_HANDLERS`
+    (`get_budget`→`getBudgetReport`, `get_transactions`→`getTransactionList`);
+    `isProposedActionTool(name)` classifier; a `toProposedAction(toolUse)` mapper
+    building `ProposedAction` + summary.
+  - **Tests** (`aiTools.test.ts`): read handler delegates to the right util;
+    classifier flags assign/transfer as proposed actions and reads as reads;
+    `toProposedAction` builds correct payload + summary.
+  - **Covers:** R1.2, R1.3, R2.1, NFR1, NFR2. _Verify:_ `vitest run server/utils/aiTools.test.ts`, typecheck.
+
+- [x] **T7 — Chat route: the HITL tool loop (safety-critical)**
+  - New `server/api/ai/chat.post.ts`: read `AiChatRequest`; cast `messages` to
+    `MessageParam[]` at the boundary; on resume, append a `tool_result` user turn
+    from `readToolResults` + `resolutions`. Run the manual loop: read tools execute
+    & feed back; **proposed-action tools are surfaced, never executed**; bound by
+    `MAX_TOOL_ITERATIONS`. Handle `refusal`, missing key (503), and API errors.
+    Return `AiChatResponse`.
+  - **Tests** (`server/api/ai/__tests__/chat.post.test.ts`), mock the SDK:
+    - **R6.1 (load-bearing):** model emits `assign_to_envelope` → assert
+      `appendTransaction`/assign endpoint called **0×**, `proposedActions`
+      non-empty.
+    - read-tool dispatch → feeds `tool_result`, loops to `end_turn`.
+    - **R6.2:** resume with `approved` resolution → `tool_result` turn appended,
+      loop resumes.
+    - pending-proposal supersede (R3.4); refusal (R4.3); iteration cap (R4.4);
+      missing key → 503 (R4.2).
+  - **Covers:** R1.1, R1.4, R2.2–R2.6, R3.2–R3.4, R4.2–R4.5, R6. _Verify:_
+    `vitest run server/api/ai`, typecheck.
+
+- [x] **T8 — `useAiChat` composable**
+  - New `composables/useAiChat.ts`: reactive `messages`/`reply`/`proposedActions`/
+    `pending`/`error`; `send(text)`; `approve(action)` (calls the existing
+    assign/transfer composable/endpoint, then resumes `/api/ai/chat` with an
+    `approved` resolution + triggers budget refresh); `reject(action)`. Auto-reject
+    un-acted proposals when `send` is called (R3.4). No business logic.
+  - **Tests** (`useAiChat.test.ts`): send round-trip (mock `$fetch`); approve
+    commits via the endpoint then resumes; reject resumes without committing;
+    supersede behavior.
+  - **Covers:** R2.3–R2.6, R3.1, R3.4, NFR1. _Verify:_ `vitest run composables/useAiChat.test.ts`, typecheck.
+
+- [x] **T9 — Chat panel UI**
+  - New `components/AiChatPanel.vue`: Nuxt UI chat suite (`UChatMessages`,
+    `UChatMessage`, `UChatPrompt`, `UChatPromptSubmit`); a `UCard` proposed-action
+    card with Approve/Reject + the action summary; persistent egress notice (R5.1);
+    no-API-key empty state (R4.2); error display (R4.5). Renders only `reply` +
+    `proposedActions` (never raw history).
+  - Edit `pages/budget.vue` to mount the panel (slideover or side panel).
+  - **Tests:** light component test if practical (render notice + card states);
+    otherwise covered by manual run + the composable tests. State plainly which.
+  - **Covers:** R2.1, R2.3, R2.4, R4.2, R4.5, R5.1. _Verify:_ typecheck; `npm run dev` smoke check.
+
+- [x] **T10 — Egress/logging hygiene pass**
+  - Confirm no `console.log` of message content or the key anywhere in the new
+    code; the notice is present and persistent.
+  - **Covers:** R5.1, R5.2. _Verify:_ grep the new files; typecheck.
+
+- [x] **T11 — Full verification + map update**
+  - `npx vitest run` (all green) and `npx nuxi typecheck` (0 errors).
+  - Manual smoke: ask a question (reads), get a proposal, approve (commits + budget
+    refreshes), reject (no write); unset key → empty state.
+  - Main agent updates `AI-MAP.md`: `/api/ai/chat` route row; `anthropic.ts`,
+    `aiTools.ts`, `budgetReport.ts`, `transactionList.ts` util rows;
+    `useAiChat` composable; `AiChatPanel` component; budget-page panel; AI quirks
+    (HITL invariant, `ANTHROPIC_API_KEY` env, data egress).
+  - **Covers:** NFR5, NFR6. _Verify:_ both commands clean; map diff reviewed.
+
+---
+
+## Amendment — in-app API-key configuration (Issue #8, user-approved deviation)
+
+- [x] **T12 — `aiConfig` util + key resolution**
+  - New `server/utils/aiConfig.ts`: `readStoredApiKey` (sync, guarded, never throws),
+    `writeStoredApiKey`/`clearStoredApiKey` (async), `maskApiKey` (last-4). Path
+    `config/ai-config.json` (gitignored).
+  - `server/utils/anthropic.ts`: `resolveApiKey` (env → stored), `getApiKeySource`,
+    `getAnthropic` rebuilds the client when the resolved key changes (no restart).
+  - **Tests:** `aiConfig.test.ts` (read/write/clear/mask, mocked fs); updated
+    `anthropic.test.ts` (env-overrides-stored precedence, none → throws).
+  - **Covers:** R4.1, R7.1–R7.4. _Verified:_ 15 tests green; typecheck.
+
+- [x] **T13 — `GET/POST/DELETE /api/ai/config`**
+  - `config.get.ts` (`{configured, source, maskedKey}` — never full key);
+    `config.post.ts` (validate then `writeStoredApiKey`); `config.delete.ts`
+    (`clearStoredApiKey`, env left intact).
+  - **Tests** (`config.test.ts`): masked-only responses, validation 400s, env-override
+    source, clear behavior.
+  - **Covers:** R4.2, R7.4–R7.6. _Verified:_ 10 tests green; typecheck.
+
+- [x] **T14 — Settings card + panel link**
+  - `pages/settings.vue`: "AI Assistant" card (status, source badge, masked key,
+    password input, Save, Clear-when-config-source). `AiChatPanel.vue` empty state
+    links to Settings.
+  - **Covers:** R4.2, R7.7. _Verified:_ typecheck; runtime curl flow (save → chat
+    no longer 503 → clear).
+
+- [x] **T15 — Verify + adversarial review + spec/map**
+  - Full `vitest run` (384 green) + `nuxi typecheck` (0 errors); runtime probe of
+    all `/api/ai/config` verbs + the no-restart effect. Adversarial multi-agent
+    review of the secret handling. `design.md`/`requirements.md`/`AI-MAP.md` updated.
+
+---
+
+## Checkpoint
+
+All tasks `- [x]`, `npx vitest run` and `npx nuxi typecheck` both clean, manual
+HITL flow verified (propose → approve → commit; reject → no write; missing key →
+empty state), `AI-MAP.md` updated. Then ready for commit/PR (PR body: `Fixes #8`).