diff --git a/.kiro/specs/ai-budgeting-chat/design.md b/.kiro/specs/ai-budgeting-chat/design.md new file mode 100644 index 0000000..0662850 --- /dev/null +++ b/.kiro/specs/ai-budgeting-chat/design.md @@ -0,0 +1,327 @@ +# Design — AI Budgeting Chat (Human-in-the-Loop) + +## Introduction + +Implements **GitHub Issue #8** (`toolpathguy/FinanceApp`): a chat assistant that +helps the user budget. The assistant reads live budget state through tools and +**proposes** envelope assignments / transfers; the user approves each proposed +action before anything is written. The model never moves money on its own — the +only code path that touches the journal is the existing +`POST /api/budget/assign` / `POST /api/budget/transfer` endpoints, invoked by the +client **after explicit user approval**. + +This is the foundation feature: it introduces the shared Anthropic client and the +**propose → approve → commit** spine that the CSV-import feature (#9) reuses. + +It respects `separation-of-concerns.md`: **no new accounting logic.** The model +reads structured state via tools that delegate to existing `server/utils`, and +the assistant only ever proposes calls to endpoints that already exist. + +### Non-negotiable safety invariant + +> **The tool loop never writes to the journal.** Proposed-action tools +> (`assign_to_envelope`, `transfer_between_envelopes`) are *surfaced*, never +> *executed*, by the server. A real write happens only when the client calls the +> existing assign/transfer endpoint after the user clicks Approve. This is +> covered by a dedicated test (R from requirements.md). + +--- + +## Stack / dependencies + +- **New dep:** `@anthropic-ai/sdk` (the official SDK; the one supported path for + Node/Nitro). +- **Model:** `claude-opus-4-8`, **adaptive thinking** (`thinking: {type: + "adaptive"}`), `output_config: { effort: "medium" }` (tunable — chat favors + latency; bump to `high` if reasoning quality needs it). No `temperature` / + `top_p` / `budget_tokens` (all removed on Opus 4.8 — sending them 400s). +- **Non-streaming** for v1 (`max_tokens: 4096`). Budget-chat replies are short and + well under the SDK HTTP-timeout threshold; non-streaming keeps the HITL tool + loop simple and fully testable. Streaming is a deferred enhancement (see + Alternatives). +- **API key:** resolved in `server/utils/anthropic.ts` with **env override → + in-app stored key**: `process.env.ANTHROPIC_API_KEY` first, else a key the user + saves on the Settings page (persisted to gitignored `config/ai-config.json`, + same pattern as `config/active-journal.json`). No `nuxt.config.ts` change + (config stays hands-off per coding-standards). See **Decision: in-app key + config** below. +- **UI:** Nuxt UI v4 chat suite — `UChatMessages`, `UChatMessage`, `UChatPrompt`, + `UChatPromptSubmit` — plus a `UCard`-based proposed-action card with + Approve / Reject buttons. + +--- + +## Architecture & data flow + +``` +components/AiChatPanel.vue ─┐ + (UChat* + proposal card) │ approve → composables/useBudget assign/transfer + │ │ (existing POST /api/budget/{assign,transfer}) + ▼ │ │ +composables/useAiChat.ts ───┘ ▼ + $fetch('/api/ai/chat') ◄──────────── reflect committed result back into chat + │ + ▼ +server/api/ai/chat.post.ts ← holds the tool loop; NEVER writes the journal + ├─ server/utils/anthropic.ts ← shared SDK client + key (reused by #9) + ├─ server/utils/aiTools.ts ← tool defs + read-tool handlers (delegate down) + ├─ server/ai/budgetInstructions.ts ← cached system prompt (YNAB rules, tone) + └─ delegates reads to: + server/utils/budgetReport.ts (extracted from budget.get.ts) + server/utils/transactionList.ts (thin print wrapper) +``` + +The system prompt + tool definitions are the **stable, cacheable prefix** +(`cache_control: { type: "ephemeral" }`). Volatile budget state is **never** in +the prefix — it arrives through the `get_budget` tool, so the cache stays warm +across turns. + +--- + +## The HITL tool loop (the crux) + +`POST /api/ai/chat` runs a **manual** agentic loop (not the SDK tool runner — we +need to intercept proposed-action tools before they'd execute): + +1. Client sends the full conversation `messages` (opaque Anthropic + `MessageParam[]`, echoed verbatim each turn) plus, on a resume, the user's + approve/reject `resolutions`. +2. Server calls `client.messages.create({ model, system, tools, messages, … })`. +3. On `stop_reason: "tool_use"`, inspect every `tool_use` block in the turn: + - **Read tool** (`get_budget`, `get_transactions`): execute server-side via the + `server/utils` delegate, collect a `tool_result`. + - **Proposed-action tool** (`assign_to_envelope`, `transfer_between_envelopes`): + **do not execute.** Record it as a pending proposal. +4. **If the turn contains no pending proposal:** append the assistant turn + the + read `tool_result`s and loop (back to step 2), bounded by + `MAX_TOOL_ITERATIONS = 8` (guards against runaway read loops). +5. **If the turn contains ≥1 pending proposal:** stop the loop and return to the + client: + ```ts + { + messages, // updated history incl. this assistant turn (opaque) + reply, // assistant visible text for this turn + proposedActions: ProposedAction[], // awaiting approval + readToolResults: ToolResultBlock[], // results already computed for any + // read tools in the SAME turn (echoed + // back on resume so the protocol stays valid) + } + ``` +6. Client renders the proposal card(s). On **Approve**, it calls the existing + `POST /api/budget/assign` | `/transfer` with the proposal payload, then re-POSTs + `/api/ai/chat` with `{ messages, resolutions }`. On **Reject**, it skips the + write and re-POSTs with a reject resolution. +7. On resume, the server builds the user turn's `tool_result` blocks from + `readToolResults` **+** the resolutions (`"Committed: "` / + `"User rejected this action"`), appends it, and resumes the loop at step 2. + +This keeps the server **stateless** and the Anthropic protocol valid: every +`tool_use` block eventually receives a matching `tool_result` on the next request. + +### Edge cases handled + +- **New message while a proposal is pending:** the model is blocked waiting on + `tool_result`s. The composable auto-resolves any un-acted proposal as rejected + (`"Superseded by a new message"`) before sending the new user turn — otherwise + the dangling `tool_use` 400s the API. +- **Mixed read + action in one turn:** the read results are computed and returned + in `readToolResults`; the client echoes them back alongside the action verdict, + so all `tool_use` blocks in the turn resolve together. (The system prompt also + asks the model to propose actions in their own turn, which makes this rare.) +- **Multiple proposals in one turn:** all surfaced; each gets its own card and its + own resolution. The loop resumes only once every proposal is resolved. +- **`stop_reason: "refusal"`:** return the refusal as assistant text; no actions. +- **Missing `ANTHROPIC_API_KEY`:** route returns 503 with a clear message; the + panel shows a "configure your API key" empty state instead of a broken chat. +- **Iteration cap hit:** return what we have with a note; never loop forever. + +--- + +## Tool surface (`server/utils/aiTools.ts`) + +| Tool | Kind | Wraps | Returns / Effect | +|---|---|---|---| +| `get_budget` | read | `getBudgetReport(period?)` | Ready-to-Assign + envelope balances (Assigned/Activity/Available) | +| `get_transactions` | read | `getTransactionList({startDate?,endDate?,account?})` | recent transactions (date, payee, amount, account) | +| `assign_to_envelope` | **proposed action** | `POST /api/budget/assign` | proposal `{ date, physicalAccount, envelopes }` — HITL | +| `transfer_between_envelopes` | **proposed action** | `POST /api/budget/transfer` | proposal `{ date, sourceEnvelope, destinationEnvelope, amount }` — HITL | + +- Tool `input_schema`s mirror the request bodies the endpoints already validate, so + the existing server-side validation remains the real gate (amounts > 0, the + Ready-to-Assign availability gate on assign, etc.). The model's proposal is just + a suggestion; the endpoint still rejects an over-assignment. +- Read handlers are **pure delegations** — no accounting math in `aiTools.ts`. +- Tool definitions are frozen and ordered deterministically (cache-stable). + +### Refactors to enable clean delegation (no behavior change) + +- **Extract `server/utils/budgetReport.ts`** — move the report-building body of + `budget.get.ts` into `getBudgetReport(period: string)`; the route becomes a thin + validate-and-call wrapper. Both the route and `get_budget` call it. (Mirrors how + `getReadyToAssign` was extracted to `budgetData.ts`.) +- **Add `server/utils/transactionList.ts`** — `getTransactionList(query)` runs + `hledgerExec(['print', …])` + `transformTransactions` (existing utils) and shapes + a compact list for the model. `transactions.get.ts` is left as-is (its + register-row path is UI-specific); this is a small read helper, not a refactor of + that route. + +--- + +## Key interfaces / types (`types/ai.ts`) + +```ts +export interface AssignProposalPayload { + date: string + physicalAccount: string + envelopes: Record +} +export interface TransferProposalPayload { + date: string + sourceEnvelope: string + destinationEnvelope: string + amount: number +} + +export type ProposedAction = + | { id: string; kind: 'assign'; summary: string; payload: AssignProposalPayload } + | { id: string; kind: 'transfer'; summary: string; payload: TransferProposalPayload } + +export interface ChatResolution { + toolUseId: string + status: 'approved' | 'rejected' + resultText: string // "Committed: …" or "User rejected this action" +} + +// Wire types. `messages` is the opaque Anthropic MessageParam[] history; the +// client never renders it raw — it reads `reply` + `proposedActions` each turn. +export interface AiChatRequest { + messages: unknown[] // Anthropic MessageParam[]; typed at the SDK boundary + resolutions?: ChatResolution[] +} +export interface AiChatResponse { + messages: unknown[] + reply: string + proposedActions: ProposedAction[] + readToolResults: unknown[] // Anthropic ToolResultBlockParam[]; echoed back on resume +} +``` + +`unknown[]` at the wire boundary is the validated trust boundary (cast to the +SDK's `MessageParam[]` inside `chat.post.ts`); no `any`. + +--- + +## Files added / changed + +| File | Change | +|---|---| +| `server/utils/anthropic.ts` (new) | shared SDK client; reads `ANTHROPIC_API_KEY`; `getAnthropic()` throws a typed `MissingApiKeyError` if unset | +| `server/utils/aiTools.ts` (new) | tool definitions (read + proposed-action), `READ_TOOL_HANDLERS`, kind classification | +| `server/ai/budgetInstructions.ts` (new) | `BUDGET_SYSTEM_PROMPT` (markdown string: YNAB Rule 1, envelope conventions, "propose, never execute", tone) | +| `server/utils/budgetReport.ts` (new) | `getBudgetReport(period)` extracted from `budget.get.ts` | +| `server/utils/transactionList.ts` (new) | `getTransactionList(query)` compact read helper | +| `server/api/ai/chat.post.ts` (new) | the HITL tool loop; returns `AiChatResponse`; 503 when key missing | +| `server/api/budget.get.ts` (edit) | delegate to `getBudgetReport` (thin wrapper) | +| `composables/useAiChat.ts` (new) | reactive chat state, send/approve/reject, pending-proposal handling | +| `components/AiChatPanel.vue` (new) | Nuxt UI chat panel + proposed-action card + egress notice + no-key empty state | +| `pages/budget.vue` (edit) | mount the chat panel (slideover/side panel on the budget page) | +| `types/ai.ts` (new) | wire + proposal types above | +| `package.json` | add `@anthropic-ai/sdk` | +| `AI-MAP.md` | new route/util/composable/page rows + AI quirks (main agent, after impl) | + +--- + +## Data egress (must document prominently — Issue #8 risk note) + +Budget/envelope data and the user's chat messages are sent to the Anthropic API — +the one external data flow in this app. No secrets are persisted (only +`ANTHROPIC_API_KEY` in env; no bank credentials). The chat panel shows a +**persistent, visible notice** ("Messages and budget data are sent to Anthropic to +generate replies") and this is captured as a requirement, not just a code comment. + +--- + +## Decision: in-app key config (amendment — deviates from Issue #8) + +Issue #8 framed the key as living **only** in env ("No secrets stored … only +`ANTHROPIC_API_KEY` in env"). In practice that left no in-app way to enable the +chat — the user had to set an env var and restart the server. **Decision (user- +approved):** add a Settings-page field that persists the key to gitignored +`config/ai-config.json`, read by `getAnthropic()`. + +- **Why this is acceptable here:** single-user local app; `config/` is already + gitignored; mirrors the existing `config/active-journal.json` precedent; an + Anthropic key in a local config file is standard practice (cf. `~/.aws/credentials`, + `gh` CLI). It is a much lower-sensitivity secret than the bank/OAuth tokens #8 + was contrasting against. +- **Resolution precedence:** env **overrides** stored, so a Docker/CI deployment + can still pin the key via `ANTHROPIC_API_KEY`; a local user configures it in the + UI. The client is rebuilt when the resolved key changes, so saving takes effect + on the next request — **no restart**. +- **Secret hygiene:** the key is never logged and never returned in full — GET + `/api/ai/config` returns only `{configured, source, maskedKey}` (last-4 mask). +- **Endpoints:** `GET/POST/DELETE /api/ai/config` (status / save / clear); + `server/utils/aiConfig.ts` owns the file (sync guarded read like `activeJournal.ts`; + async write/clear). Settings page gains an "AI Assistant" card; the chat panel's + no-key empty state links to Settings. + +## Alternatives considered + +- **SDK tool runner (auto loop)** — rejected: it executes tool handlers + automatically, which is exactly what must *not* happen for proposed-action tools. + The manual loop is required to intercept them. (The docs explicitly recommend the + manual loop for human-in-the-loop approval.) +- **Server-side conversation state / session store** — rejected: adds persistent + state for no benefit. Echoing the opaque `messages` history round-trip keeps the + server stateless (consistent with the rest of the app) and is the standard + stateless-Messages-API pattern. +- **Streaming in v1** — deferred. Streaming + a paused HITL tool loop is more + complex to get right and to test, and budget-chat replies are short. The wire + contract (`AiChatResponse`) is unchanged by a later switch to SSE for the text + delta; tool-pause semantics stay the same. +- **`server/ai/budget-instructions.md` read via `fs`** — rejected in favor of a + `.ts` string export: a relative `.md` read is fragile under the Nitro production + bundle (path resolution), and a `.ts` constant typechecks and bundles cleanly. + The content stays markdown-formatted inside the string, so it remains + human-editable. +- **Tools recompute balances themselves** — rejected: violates + separation-of-concerns. Read tools delegate to `server/utils`; the engine + (hledger) stays the single source of truth. +- **Config via `nuxt.config.ts` `runtimeConfig`** — avoided: reading + `process.env.ANTHROPIC_API_KEY` directly in the server util matches the existing + `LEDGER_FILE` precedent and keeps framework config untouched. + +--- + +## Testing strategy (detail in tasks.md) + +- **Load-bearing safety test:** drive the tool loop with a mocked Anthropic client + that emits an `assign_to_envelope` `tool_use`; assert `appendTransaction` / + the assign endpoint is **never** called and the response surfaces a + `proposedAction` instead. The journal-writer mock must record zero calls. +- **Read-tool dispatch:** mock the SDK to call `get_budget`; assert the handler + delegates to `getBudgetReport` and the result is fed back as a `tool_result`, + loop continues to `end_turn`. +- **Resume path:** given `resolutions: [approved]`, assert a `tool_result` user + turn is appended with the committed text and the loop resumes. +- **Pending-proposal supersede:** a new message with an un-acted proposal + auto-rejects it (valid protocol, no dangling `tool_use`). +- **Missing key:** `GET`/`POST` with no `ANTHROPIC_API_KEY` → 503, clear message. +- **Iteration cap:** a mock that always calls a read tool stops at + `MAX_TOOL_ITERATIONS`. +- **Refactor parity:** existing `budget.get.ts` tests still pass against the + `getBudgetReport` delegate (no behavior change). +- Mock Nitro globals with `vi.stubGlobal()` per project convention; mock the + Anthropic SDK (`any` allowed in tests). Full `npx vitest run` + `npx nuxi + typecheck` clean at the end. + +--- + +## Out of scope + +- Streaming responses (deferred enhancement). +- CSV import (#9) — this feature only builds the shared Anthropic plumbing it + reuses. +- Multi-user auth / per-user API keys (single-user local app). +- Persisting chat history across reloads. +- New envelope/category creation via chat (assign + transfer only for v1). diff --git a/.kiro/specs/ai-budgeting-chat/requirements.md b/.kiro/specs/ai-budgeting-chat/requirements.md new file mode 100644 index 0000000..a4e9803 --- /dev/null +++ b/.kiro/specs/ai-budgeting-chat/requirements.md @@ -0,0 +1,141 @@ +# Requirements — AI Budgeting Chat (Human-in-the-Loop) + +Traces to **GitHub Issue #8**. Acceptance criteria use EARS form +(WHEN … THE SYSTEM SHALL …). Each requirement is testable; the test mapping lives +in `tasks.md`. + +--- + +## R1 — Conversational budget Q&A + +**User story:** As a budgeter, I want to ask questions about my budget in plain +language so I can understand my envelopes without reading tables. + +- R1.1 — WHEN the user sends a message, THE SYSTEM SHALL call `claude-opus-4-8` + with the budgeting system prompt and the conversation history, and return the + assistant's reply text. +- R1.2 — WHEN the assistant needs current budget state, THE SYSTEM SHALL expose a + `get_budget` tool that returns Ready-to-Assign and per-envelope + Assigned/Activity/Available, computed by the existing budget logic. +- R1.3 — WHEN the assistant needs transaction history, THE SYSTEM SHALL expose a + `get_transactions` tool returning a compact list (date, payee, amount, account). +- R1.4 — WHEN a read tool is called, THE SYSTEM SHALL execute it server-side, + feed the result back to the model, and continue until the model produces a final + reply — bounded by `MAX_TOOL_ITERATIONS` (8). +- R1.5 — THE SYSTEM SHALL NOT recompute any balance, Ready-to-Assign, or delta in + chat code; read tools delegate to `server/utils` (hledger remains the source of + truth). + +## R2 — Human-in-the-loop proposed actions (safety-critical) + +**User story:** As a budgeter, I want the assistant to suggest assignments and +transfers that I approve before anything changes, so the AI never moves my money +on its own. + +- R2.1 — WHEN the assistant decides to assign or transfer, THE SYSTEM SHALL + surface it as a **proposed action** (`assign_to_envelope` / + `transfer_between_envelopes`) with a human-readable summary and the payload. +- R2.2 — WHEN a proposed-action tool is emitted by the model, THE SYSTEM SHALL NOT + execute it and SHALL NOT write to the journal. *(Load-bearing — see R6.)* +- R2.3 — WHEN the user approves a proposed action, THE SYSTEM SHALL commit it by + calling the existing `POST /api/budget/assign` or `POST /api/budget/transfer` + endpoint with the proposal payload. +- R2.4 — WHEN the user rejects a proposed action, THE SYSTEM SHALL NOT write + anything and SHALL inform the model the action was rejected. +- R2.5 — WHEN a commit endpoint rejects the request (e.g. the assign + availability gate, non-positive amount), THE SYSTEM SHALL surface the error in + the chat and SHALL NOT mark the action committed. +- R2.6 — WHEN an action is committed, THE SYSTEM SHALL reflect the result back to + the model so the conversation stays consistent, and the budget view SHALL + refresh to show the change. + +## R3 — Conversation protocol integrity + +**User story:** As a developer, I want the chat to stay stateless and protocol-valid +so it never wedges the API. + +- R3.1 — THE SYSTEM SHALL be stateless: the full conversation history is passed + from the client each request and echoed back unchanged. +- R3.2 — WHEN a turn contains tool calls, THE SYSTEM SHALL ensure every `tool_use` + block receives a matching `tool_result` before the next assistant turn. +- R3.3 — WHEN read tools and a proposed action occur in the same turn, THE SYSTEM + SHALL compute the read results, return them with the proposal, and resume only + once all `tool_use` blocks in that turn are resolved. +- R3.4 — WHEN the user sends a new message while a proposal is still un-acted, THE + SYSTEM SHALL auto-resolve the pending proposal as rejected before processing the + new message (no dangling `tool_use`). + +## R4 — Configuration & failure handling + +- R4.1 — THE SYSTEM SHALL resolve the Anthropic API key as + `process.env.ANTHROPIC_API_KEY` (override) else the in-app stored key (see R7); + no `nuxt.config.ts` change. +- R4.2 — WHEN no key is configured (neither env nor stored), THE SYSTEM SHALL + return HTTP 503 with a clear message, and the chat panel SHALL show a "configure + your API key" empty state (linking to Settings) rather than erroring. +- R4.3 — WHEN the model returns `stop_reason: "refusal"`, THE SYSTEM SHALL return + the refusal as assistant text with no proposed actions. +- R4.4 — WHEN `MAX_TOOL_ITERATIONS` is reached, THE SYSTEM SHALL return the + partial result with a note and SHALL NOT loop indefinitely. +- R4.5 — WHEN the Anthropic call fails (network/5xx after SDK retries), THE SYSTEM + SHALL surface a friendly error in the chat and leave the conversation resumable. + +## R5 — Data egress transparency + +- R5.1 — THE SYSTEM SHALL display a persistent, visible notice in the chat panel + that messages and budget data are sent to the Anthropic API to generate replies. +- R5.2 — THE SYSTEM SHALL NOT persist the API key anywhere except the environment, + and SHALL NOT log message content or the key. + +## R6 — Verifiable HITL guarantee (test requirement) + +- R6.1 — A test SHALL drive the tool loop with a mocked Anthropic client emitting + an `assign_to_envelope` `tool_use` and assert the journal writer / assign + endpoint is called **zero** times and a `proposedAction` is returned instead. +- R6.2 — A test SHALL assert the resume path commits only via the existing endpoint + after an `approved` resolution. + +--- + +## R7 — In-app API-key configuration (amendment) + +**User story:** As a user, I want to set my Anthropic API key in the app so I can +enable the chat without editing environment variables or restarting the server. + +- R7.1 — THE SYSTEM SHALL let the user save an API key from the Settings page, + persisted to gitignored `config/ai-config.json`. +- R7.2 — WHEN a key is saved, THE SYSTEM SHALL use it on the next request without a + server restart (the client is rebuilt when the resolved key changes). +- R7.3 — `process.env.ANTHROPIC_API_KEY` SHALL take precedence over the stored key. +- R7.4 — THE SYSTEM SHALL NEVER return the API key in full from any endpoint — only + a masked form (last 4 chars) — and SHALL NEVER log it (extends R5.2). +- R7.5 — WHEN saving, THE SYSTEM SHALL reject an empty, whitespace-only, + whitespace-containing, or too-short key with HTTP 400 and SHALL NOT persist it. +- R7.6 — WHEN the user clears the stored key, THE SYSTEM SHALL remove it but leave + any `ANTHROPIC_API_KEY` env var intact, and report the resulting state. +- R7.7 — THE SETTINGS UI SHALL show whether a key is configured and its source, + and SHALL indicate when an env var overrides a stored key. + +## Non-functional requirements + +- **NFR1 — Separation of concerns:** chat route = HTTP glue + tool loop; read + tools delegate to `server/utils`; no accounting math in chat/AI code; the panel + fetches only through `composables/useAiChat`. +- **NFR2 — Prompt caching:** system prompt + tool definitions form a stable, + deterministically-ordered, `cache_control: ephemeral` prefix; volatile budget + state is fetched via `get_budget`, never embedded in the prefix. +- **NFR3 — Type safety:** no `any`/unnecessary `as` in source; the opaque + Anthropic history is cast to `MessageParam[]` only at the validated SDK + boundary. (`any` allowed in tests for mocking.) +- **NFR4 — Windows/CRLF & money:** read tools reuse existing utils, inheriting + CRLF-safe parsing and integer-cent handling; no new parsing paths. +- **NFR5 — Verification:** `npx vitest run` and `npx nuxi typecheck` both clean. +- **NFR6 — Map upkeep:** `AI-MAP.md` updated by the main agent after implementation. + +## Out of scope + +- Streaming responses (deferred; wire contract unchanged by a later SSE switch). +- CSV import (#9). +- Multi-user auth / per-user keys. +- Persisting chat history across reloads. +- Creating new envelopes/categories via chat (assign + transfer only for v1). diff --git a/.kiro/specs/ai-budgeting-chat/tasks.md b/.kiro/specs/ai-budgeting-chat/tasks.md new file mode 100644 index 0000000..bd737c2 --- /dev/null +++ b/.kiro/specs/ai-budgeting-chat/tasks.md @@ -0,0 +1,163 @@ +# Tasks — AI Budgeting Chat (Human-in-the-Loop) + +Ordered, independently verifiable. Each task notes files, tests, and the +requirement(s) it covers. Implement one at a time; run the listed tests + mark +`- [x]` before moving on. Do not commit until I say so. + +Convention: `*.test.ts` beside source; API tests under `server/**/__tests__/`; +mock Nitro globals with `vi.stubGlobal()`; `any` allowed in tests for SDK mocks. + +--- + +- [x] **T1 — Dependency + shared Anthropic client** + - Add `@anthropic-ai/sdk` to `package.json` (`npm install @anthropic-ai/sdk`). + - New `server/utils/anthropic.ts`: `getAnthropic()` returns a singleton client + reading `process.env.ANTHROPIC_API_KEY`; export `MissingApiKeyError`; export + `MODEL = 'claude-opus-4-8'` and shared request defaults (adaptive thinking, + `effort: 'medium'`, `max_tokens: 4096`). + - **Tests** (`anthropic.test.ts`): `getAnthropic()` throws `MissingApiKeyError` + when the env var is unset; returns a client when set (stub `process.env`). + - **Covers:** R4.1, R4.2, NFR2 (defaults live here). _Verify:_ `vitest run server/utils/anthropic.test.ts`, typecheck. + +- [x] **T2 — Extract `getBudgetReport` (refactor, no behavior change)** + - New `server/utils/budgetReport.ts`: `getBudgetReport(period: string)` = the + report-building body of `budget.get.ts`. + - Edit `budget.get.ts` to validate the period then delegate to it (thin wrapper). + - **Tests:** existing `budget.get` route tests must still pass; add a direct unit + test for `getBudgetReport` (default + a period). + - **Covers:** R1.2, R1.5, NFR1. _Verify:_ `vitest run` on the budget route + new test; typecheck. + +- [x] **T3 — `getTransactionList` read helper** + - New `server/utils/transactionList.ts`: `getTransactionList({startDate?,endDate?,account?})` + → `hledgerExec(['print', …])` + `transformTransactions`, shaped to + `{date, payee, amount, account}[]`. Reuse `isValidDate`/`isValidAccount` + guards; pass account after `--`. + - **Tests** (`transactionList.test.ts`): shaping + that invalid date/account are + rejected; CRLF-safe (mock `hledgerExec`). + - **Covers:** R1.3, R1.5, NFR4. _Verify:_ `vitest run server/utils/transactionList.test.ts`, typecheck. + +- [x] **T4 — Wire types** + - New `types/ai.ts`: `AssignProposalPayload`, `TransferProposalPayload`, + `ProposedAction`, `ChatResolution`, `AiChatRequest`, `AiChatResponse` (per + design.md). + - **Covers:** R3.1, NFR3. _Verify:_ typecheck only. + +- [x] **T5 — System prompt** + - New `server/ai/budgetInstructions.ts`: `BUDGET_SYSTEM_PROMPT` (markdown string) + — YNAB Rule 1, envelope conventions (strip prefixes, "Envelope" label), + "propose, never execute; one action per turn", tone, and that it must call + `get_budget` for live numbers rather than guessing. + - **Tests** (`budgetInstructions.test.ts`): non-empty; asserts a couple of + load-bearing phrases (propose-don't-execute; YNAB Rule 1) so the safety framing + can't silently regress. + - **Covers:** R2.1, NFR2. _Verify:_ `vitest run server/ai`, typecheck. + +- [x] **T6 — Tool definitions + read handlers** + - New `server/utils/aiTools.ts`: deterministically-ordered `TOOLS` with + `cache_control` on the last definition; `input_schema`s mirroring the + assign/transfer request bodies and the read queries; `READ_TOOL_HANDLERS` + (`get_budget`→`getBudgetReport`, `get_transactions`→`getTransactionList`); + `isProposedActionTool(name)` classifier; a `toProposedAction(toolUse)` mapper + building `ProposedAction` + summary. + - **Tests** (`aiTools.test.ts`): read handler delegates to the right util; + classifier flags assign/transfer as proposed actions and reads as reads; + `toProposedAction` builds correct payload + summary. + - **Covers:** R1.2, R1.3, R2.1, NFR1, NFR2. _Verify:_ `vitest run server/utils/aiTools.test.ts`, typecheck. + +- [x] **T7 — Chat route: the HITL tool loop (safety-critical)** + - New `server/api/ai/chat.post.ts`: read `AiChatRequest`; cast `messages` to + `MessageParam[]` at the boundary; on resume, append a `tool_result` user turn + from `readToolResults` + `resolutions`. Run the manual loop: read tools execute + & feed back; **proposed-action tools are surfaced, never executed**; bound by + `MAX_TOOL_ITERATIONS`. Handle `refusal`, missing key (503), and API errors. + Return `AiChatResponse`. + - **Tests** (`server/api/ai/__tests__/chat.post.test.ts`), mock the SDK: + - **R6.1 (load-bearing):** model emits `assign_to_envelope` → assert + `appendTransaction`/assign endpoint called **0×**, `proposedActions` + non-empty. + - read-tool dispatch → feeds `tool_result`, loops to `end_turn`. + - **R6.2:** resume with `approved` resolution → `tool_result` turn appended, + loop resumes. + - pending-proposal supersede (R3.4); refusal (R4.3); iteration cap (R4.4); + missing key → 503 (R4.2). + - **Covers:** R1.1, R1.4, R2.2–R2.6, R3.2–R3.4, R4.2–R4.5, R6. _Verify:_ + `vitest run server/api/ai`, typecheck. + +- [x] **T8 — `useAiChat` composable** + - New `composables/useAiChat.ts`: reactive `messages`/`reply`/`proposedActions`/ + `pending`/`error`; `send(text)`; `approve(action)` (calls the existing + assign/transfer composable/endpoint, then resumes `/api/ai/chat` with an + `approved` resolution + triggers budget refresh); `reject(action)`. Auto-reject + un-acted proposals when `send` is called (R3.4). No business logic. + - **Tests** (`useAiChat.test.ts`): send round-trip (mock `$fetch`); approve + commits via the endpoint then resumes; reject resumes without committing; + supersede behavior. + - **Covers:** R2.3–R2.6, R3.1, R3.4, NFR1. _Verify:_ `vitest run composables/useAiChat.test.ts`, typecheck. + +- [x] **T9 — Chat panel UI** + - New `components/AiChatPanel.vue`: Nuxt UI chat suite (`UChatMessages`, + `UChatMessage`, `UChatPrompt`, `UChatPromptSubmit`); a `UCard` proposed-action + card with Approve/Reject + the action summary; persistent egress notice (R5.1); + no-API-key empty state (R4.2); error display (R4.5). Renders only `reply` + + `proposedActions` (never raw history). + - Edit `pages/budget.vue` to mount the panel (slideover or side panel). + - **Tests:** light component test if practical (render notice + card states); + otherwise covered by manual run + the composable tests. State plainly which. + - **Covers:** R2.1, R2.3, R2.4, R4.2, R4.5, R5.1. _Verify:_ typecheck; `npm run dev` smoke check. + +- [x] **T10 — Egress/logging hygiene pass** + - Confirm no `console.log` of message content or the key anywhere in the new + code; the notice is present and persistent. + - **Covers:** R5.1, R5.2. _Verify:_ grep the new files; typecheck. + +- [x] **T11 — Full verification + map update** + - `npx vitest run` (all green) and `npx nuxi typecheck` (0 errors). + - Manual smoke: ask a question (reads), get a proposal, approve (commits + budget + refreshes), reject (no write); unset key → empty state. + - Main agent updates `AI-MAP.md`: `/api/ai/chat` route row; `anthropic.ts`, + `aiTools.ts`, `budgetReport.ts`, `transactionList.ts` util rows; + `useAiChat` composable; `AiChatPanel` component; budget-page panel; AI quirks + (HITL invariant, `ANTHROPIC_API_KEY` env, data egress). + - **Covers:** NFR5, NFR6. _Verify:_ both commands clean; map diff reviewed. + +--- + +## Amendment — in-app API-key configuration (Issue #8, user-approved deviation) + +- [x] **T12 — `aiConfig` util + key resolution** + - New `server/utils/aiConfig.ts`: `readStoredApiKey` (sync, guarded, never throws), + `writeStoredApiKey`/`clearStoredApiKey` (async), `maskApiKey` (last-4). Path + `config/ai-config.json` (gitignored). + - `server/utils/anthropic.ts`: `resolveApiKey` (env → stored), `getApiKeySource`, + `getAnthropic` rebuilds the client when the resolved key changes (no restart). + - **Tests:** `aiConfig.test.ts` (read/write/clear/mask, mocked fs); updated + `anthropic.test.ts` (env-overrides-stored precedence, none → throws). + - **Covers:** R4.1, R7.1–R7.4. _Verified:_ 15 tests green; typecheck. + +- [x] **T13 — `GET/POST/DELETE /api/ai/config`** + - `config.get.ts` (`{configured, source, maskedKey}` — never full key); + `config.post.ts` (validate then `writeStoredApiKey`); `config.delete.ts` + (`clearStoredApiKey`, env left intact). + - **Tests** (`config.test.ts`): masked-only responses, validation 400s, env-override + source, clear behavior. + - **Covers:** R4.2, R7.4–R7.6. _Verified:_ 10 tests green; typecheck. + +- [x] **T14 — Settings card + panel link** + - `pages/settings.vue`: "AI Assistant" card (status, source badge, masked key, + password input, Save, Clear-when-config-source). `AiChatPanel.vue` empty state + links to Settings. + - **Covers:** R4.2, R7.7. _Verified:_ typecheck; runtime curl flow (save → chat + no longer 503 → clear). + +- [x] **T15 — Verify + adversarial review + spec/map** + - Full `vitest run` (384 green) + `nuxi typecheck` (0 errors); runtime probe of + all `/api/ai/config` verbs + the no-restart effect. Adversarial multi-agent + review of the secret handling. `design.md`/`requirements.md`/`AI-MAP.md` updated. + +--- + +## Checkpoint + +All tasks `- [x]`, `npx vitest run` and `npx nuxi typecheck` both clean, manual +HITL flow verified (propose → approve → commit; reject → no write; missing key → +empty state), `AI-MAP.md` updated. Then ready for commit/PR (PR body: `Fixes #8`). diff --git a/AI-MAP.md b/AI-MAP.md index 1f33e51..000766e 100644 --- a/AI-MAP.md +++ b/AI-MAP.md @@ -32,21 +32,26 @@ rejected on delete and upload**, since they break the date-line ↔ tindex mappi | Route | File | Purpose | |---|---|---| | `/` | `index.vue` | Dashboard placeholder (hidden from nav) | -| `/budget` | `budget.vue` | Envelope budget — Ready to Assign, groups, Assigned/Activity/Available, inline assign | +| `/budget` | `budget.vue` | Envelope budget — Ready to Assign, groups, Assigned/Activity/Available, inline assign. **AI assistant** in a slideover (Issue #8) | | `/reports` | `reports.vue` | Placeholder (hidden) | -| `/settings` | `settings.vue` | Journal mgmt (create/upload/export/list/activate) | +| `/settings` | `settings.vue` | Journal mgmt (create/upload/export/list/activate) + **AI Assistant** API-key config (Issue #8) | | `/accounts` | `accounts/index.vue` | Add/delete real accounts | | `/accounts/:path` | `accounts/[...path].vue` | Account register + transaction form | ## Components / layout - `components/AccountRegister.vue` — YNAB register table (Date, Payee, Envelope, Inflow, Outflow, Balance). For a real account the register is **family-aggregated**: rows net the account + its `:budget:*` envelopes, so Balance = the real bank balance and internal moves (assignments, envelope transfers) drop out. - `components/SimplifiedTransactionForm.vue` — Add-transaction modal (Account, Payee, Envelope, Inflow/Outflow). +- `components/AiChatPanel.vue` — AI budgeting chat (Issue #8). Nuxt UI chat input + message bubbles + **proposed-action cards** (Approve/Reject) + persistent data-egress notice + no-API-key empty state. Emits `committed` (budget page refreshes). All logic via `useAiChat`. - `layouts/default.vue` — UDashboardGroup + sidebar + real-accounts UTree. ## Composables (`composables/`) — data fetch `useAccounts(type?)`, `useBalances(query?)`, `useBudget(period?)`, `useTransactions(query?)` + `useRegister({account})`, `useReports` → `useIncomeStatement` / `useBalanceSheet` (placeholder). +`useAiChat({onCommitted?})` (Issue #8) — client for `/api/ai/chat` + the existing +assign/transfer endpoints. Holds the opaque Anthropic history; `send`/`approve`/ +`reject`. **Money is committed only here on user approval** (chat route never +writes); auto-rejects un-acted proposals on a new message. ## API surface (`server/api/`) | Method | Path | Purpose | @@ -59,6 +64,8 @@ rejected on delete and upload**, since they break the date-line ↔ tindex mappi | GET | `/api/budget?period=` | BudgetEnvelopeReport — Ready to Assign, Assigned/Activity/Available | | POST | `/api/budget/assign` | Assignment txn (**unallocated pool → envelope**; inverse of reduce) | | POST | `/api/budget/transfer` | Move between envelopes | +| POST | `/api/ai/chat` | AI budgeting chat tool loop (Issue #8). **Never writes** — read tools run server-side; assign/transfer are *proposed* for HITL approval. Stateless (opaque history round-trips). 503 if no key configured | +| GET·POST·DELETE | `/api/ai/config` | AI key status / save / clear (Issue #8). Returns `{configured, source, maskedKey}` — **never the full key**. Save takes effect with no restart | | POST | `/api/categories` | Create expense groups/envelopes | | GET·POST | `/api/hidden-envelopes` | List / hide-unhide (zero balance to hide) | | * | `/api/journal/{create,upload,export,activate,list}` | Journal file management | @@ -86,6 +93,27 @@ rejected on delete and upload**, since they break the date-line ↔ tindex mappi create/upload/activate; throws 400 on separators/`..`/bad extension). - `hledgerArgs.ts` — pure `isValidDate`/`isValidPeriod`/`isValidAccount` (arg-injection guards for read-route query params). +- `budgetReport.ts` — `getBudgetReport(period)` (Issue #8): the envelope-report + computation extracted from `budget.get.ts` so the route AND the AI `get_budget` + tool share one source (no duplicated accounting). +- `transactionList.ts` — `getTransactionList(query)`: compact `{date,payee,amount,account}` + list for the AI `get_transactions` tool; reuses `hledgerExec`+`transformTransactions`. +- `anthropic.ts` — shared Anthropic SDK client (`getAnthropic`, `MissingApiKeyError`, + `MODEL='claude-opus-4-8'`, `REQUEST_DEFAULTS`: adaptive thinking, effort medium). + Key via `resolveApiKey()` = **env override → stored** (`ANTHROPIC_API_KEY` else + `config/ai-config.json`); `getApiKeySource()` reports `env`/`config`/`none`. + Client rebuilds when the resolved key changes (saving a key needs no restart). + Reused by future CSV import (#9). +- `aiConfig.ts` — owns the gitignored `config/ai-config.json` (Issue #8): + `readStoredApiKey` (sync, guarded, never throws — like `activeJournal.ts`), + `writeStoredApiKey`/`clearStoredApiKey` (async), `maskApiKey` (last-4). The key + is never logged and never returned in full. +- `aiTools.ts` — AI tool defs + dispatch (Issue #8). `TOOLS` (cache-controlled + prefix), `READ_TOOL_HANDLERS` (delegate to budgetReport/transactionList), + `isProposedActionTool`, `toProposedAction` (resolves the budget host, builds the + assign/transfer payload — **builds a proposal, never writes**). +- `server/ai/budgetInstructions.ts` — `BUDGET_SYSTEM_PROMPT` (cached system prefix: + YNAB Rule 1, envelope conventions, propose-never-execute, tone). ## Pure utils (`utils/`) — property-tested `formatAmount`, `stripAccountPrefix`, `buildAccountTree`, `filterAccounts` @@ -100,7 +128,10 @@ silently dropping commodities), `validateTransactionForm` (legacy). ## Types (`types/`) `hledger.ts` (HledgerAmount/Posting/Transaction), `api.ts` (TransactionInput, PostingInput, BalanceQuery, TransactionQuery), `ui.ts` (SimplifiedTransactionInput, -RegisterRow, BudgetCategory/Group, BudgetEnvelopeReport, RealAccount, AccountTreeItem). +RegisterRow, BudgetCategory/Group, BudgetEnvelopeReport, RealAccount, AccountTreeItem), +`ai.ts` (Issue #8: AssignProposalPayload, TransferProposalPayload, ProposedAction, +ChatResolution, AiChatRequest/Response, ChatDisplayMessage — `messages` is opaque +Anthropic `MessageParam[]`, cast at the SDK boundary in `chat.post.ts`). ## Known quirks / gotchas - **Windows CRLF:** hledger text output → `split(/\r?\n/)` + trim, else `\r` leaks (`%0D` in URLs). @@ -122,6 +153,17 @@ RegisterRow, BudgetCategory/Group, BudgetEnvelopeReport, RealAccount, AccountTre via `hledgerArgs` and account queries are passed after a `--` separator. - **Active journal** is persisted to `config/active-journal.json` (gitignored), not `process.env` — set by `journal/activate.post`, read by `resolveJournalPath`. +- **AI chat (Issue #8) — HITL invariant:** `/api/ai/chat` NEVER writes the journal. + Read tools (`get_budget`/`get_transactions`) run server-side; assign/transfer are + *proposed* and surfaced for approval — only the existing `budget/assign|transfer` + endpoints (called by `useAiChat` after the user clicks Approve) write. Guarded by + `chat.post.test.ts` (asserts the journal writer is called 0×). API key via + `resolveApiKey()` = `ANTHROPIC_API_KEY` env **override** → in-app key in + gitignored `config/ai-config.json` (set on the Settings page, no restart); + server-only, never logged, never returned in full (masked last-4). Model + `claude-opus-4-8`, non-streaming, manual tool loop (capped at 8 iterations). + **Data egress:** chat + budget data go to the Anthropic API (the one external + flow); the panel shows a persistent notice. - **Robustness (Issue #4):** hledger spawns time out / reject (never hang) via `runHledger`; simplified `POST /api/transactions` rejects non-positive/non-finite amounts; the budget base is **derived** (`resolveBudgetBase`), not hardcoded diff --git a/components/AiChatPanel.vue b/components/AiChatPanel.vue new file mode 100644 index 0000000..a7f5c96 --- /dev/null +++ b/components/AiChatPanel.vue @@ -0,0 +1,126 @@ + + + diff --git a/composables/__tests__/useAiChat.test.ts b/composables/__tests__/useAiChat.test.ts new file mode 100644 index 0000000..63e6291 --- /dev/null +++ b/composables/__tests__/useAiChat.test.ts @@ -0,0 +1,152 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest' +import { ref } from 'vue' +import type { AiChatResponse, ProposedAction } from '~/types/ai' + +// The composable uses auto-imported `ref` and `$fetch`. +vi.stubGlobal('ref', ref) + +const fetchMock = vi.fn() +vi.stubGlobal('$fetch', (...args: any[]) => fetchMock(...args)) + +const { useAiChat } = await import('../useAiChat') + +const assignProposal: ProposedAction = { + id: 't1', + kind: 'assign', + summary: 'Assign $100.00 to Rent', + payload: { date: '', physicalAccount: 'assets:checking', envelopes: { rent: 100 } }, +} + +const chatRes = (over: Partial): AiChatResponse => ({ + messages: ['h'], + reply: '', + proposedActions: [], + readToolResults: [], + ...over, +}) + +beforeEach(() => { + fetchMock.mockReset() +}) + +describe('useAiChat.send', () => { + it('posts the message and records the assistant reply + proposals', async () => { + fetchMock.mockResolvedValueOnce(chatRes({ reply: 'I can assign that.', proposedActions: [assignProposal] })) + const chat = useAiChat() + + await chat.send('assign 100 to rent') + + expect(fetchMock).toHaveBeenCalledWith('/api/ai/chat', expect.objectContaining({ + method: 'POST', + body: expect.objectContaining({ message: 'assign 100 to rent' }), + })) + expect(chat.transcript.value).toEqual([ + { role: 'user', text: 'assign 100 to rent' }, + { role: 'assistant', text: 'I can assign that.' }, + ]) + expect(chat.proposedActions.value).toHaveLength(1) + }) +}) + +describe('useAiChat.approve', () => { + it('commits via the existing assign endpoint (with today\'s date), then resumes the chat', async () => { + const onCommitted = vi.fn() + fetchMock + .mockResolvedValueOnce(chatRes({ reply: 'Proposing', proposedActions: [assignProposal] })) // send + .mockResolvedValueOnce({ success: true }) // commit + .mockResolvedValueOnce(chatRes({ reply: 'Done — Rent is funded.' })) // resume + const chat = useAiChat({ onCommitted }) + + await chat.send('assign 100 to rent') + await chat.approve(chat.proposedActions.value[0]!) + + // committed via the existing endpoint, with the model's payload + today's date + const commitCall = fetchMock.mock.calls.find(c => c[0] === '/api/budget/assign') + expect(commitCall).toBeDefined() + expect(commitCall![1].body.physicalAccount).toBe('assets:checking') + expect(commitCall![1].body.date).toMatch(/^\d{4}-\d{2}-\d{2}$/) + expect(onCommitted).toHaveBeenCalledOnce() + + // resumed the chat with an 'approved' resolution + const resume = fetchMock.mock.calls.find(c => c[0] === '/api/ai/chat' && c[1].body.resolutions) + expect(resume![1].body.resolutions[0]).toMatchObject({ toolUseId: 't1', status: 'approved' }) + expect(chat.proposedActions.value).toHaveLength(0) + expect(chat.transcript.value.at(-1)).toEqual({ role: 'assistant', text: 'Done — Rent is funded.' }) + }) + + it('does NOT mark committed when the endpoint rejects the write (R2.5)', async () => { + fetchMock + .mockResolvedValueOnce(chatRes({ proposedActions: [assignProposal] })) // send + .mockRejectedValueOnce({ data: { message: "Can't assign $100 — only $40 left." } }) // commit fails + .mockResolvedValueOnce(chatRes({ reply: 'Okay, that didn\'t go through.' })) // resume + const chat = useAiChat() + + await chat.send('assign 100 to rent') + await chat.approve(chat.proposedActions.value[0]!) + + // Not committed: the failure is fed to the model via the resolution, and the + // assistant explains it in the resumed reply (no separate error banner). + const resume = fetchMock.mock.calls.find(c => c[0] === '/api/ai/chat' && c[1].body.resolutions) + expect(resume![1].body.resolutions[0].status).toBe('rejected') + expect(resume![1].body.resolutions[0].resultText).toContain('only $40 left') + expect(chat.transcript.value.at(-1)).toEqual({ role: 'assistant', text: 'Okay, that didn\'t go through.' }) + }) +}) + +describe('useAiChat.reject', () => { + it('resumes without committing anything', async () => { + fetchMock + .mockResolvedValueOnce(chatRes({ proposedActions: [assignProposal] })) + .mockResolvedValueOnce(chatRes({ reply: 'No problem.' })) + const chat = useAiChat() + + await chat.send('assign 100 to rent') + await chat.reject(chat.proposedActions.value[0]!) + + expect(fetchMock.mock.calls.some(c => c[0] === '/api/budget/assign')).toBe(false) + const resume = fetchMock.mock.calls.find(c => c[0] === '/api/ai/chat' && c[1].body.resolutions) + expect(resume![1].body.resolutions[0].status).toBe('rejected') + expect(chat.proposedActions.value).toHaveLength(0) + }) +}) + +describe('useAiChat — supersede & config', () => { + it('auto-rejects an un-acted proposal when a new message is sent (R3.4)', async () => { + fetchMock + .mockResolvedValueOnce(chatRes({ proposedActions: [assignProposal] })) // first send → proposal + .mockResolvedValueOnce(chatRes({ reply: 'Sure.' })) // second send + const chat = useAiChat() + + await chat.send('assign 100 to rent') + await chat.send('actually, never mind — how much is in groceries?') + + const second = fetchMock.mock.calls[1]![1].body + expect(second.message).toContain('groceries') + expect(second.resolutions[0]).toMatchObject({ toolUseId: 't1', status: 'rejected' }) + expect(second.resolutions[0].resultText).toMatch(/superseded/i) + }) + + it('surfaces a not-configured state on a 503', async () => { + fetchMock.mockRejectedValueOnce({ statusCode: 503 }) + const chat = useAiChat() + await chat.send('hi') + expect(chat.error.value).toBe('not-configured') + }) +}) + +describe('useAiChat.checkConfigured (proactive empty state, R4.2)', () => { + it('sets not-configured when GET /api/ai/config reports no key', async () => { + fetchMock.mockResolvedValueOnce({ configured: false }) + const chat = useAiChat() + await chat.checkConfigured() + expect(fetchMock).toHaveBeenCalledWith('/api/ai/config') + expect(chat.error.value).toBe('not-configured') + }) + + it('leaves the chat usable when a key is configured', async () => { + fetchMock.mockResolvedValueOnce({ configured: true }) + const chat = useAiChat() + await chat.checkConfigured() + expect(chat.error.value).toBeNull() + }) +}) diff --git a/composables/useAiChat.ts b/composables/useAiChat.ts new file mode 100644 index 0000000..a9f0b71 --- /dev/null +++ b/composables/useAiChat.ts @@ -0,0 +1,153 @@ +import type { + AiChatRequest, + AiChatResponse, + ProposedAction, + ChatResolution, + ChatDisplayMessage, +} from '~/types/ai' + +function is503(e: unknown): boolean { + const err = e as any + return err?.statusCode === 503 || err?.status === 503 || err?.response?.status === 503 +} + +function errMessage(e: unknown): string { + const err = e as any + return err?.data?.message || err?.data?.statusMessage || err?.statusMessage || err?.message || 'Unknown error' +} + +/** + * Client for the AI budgeting chat (Issue #8). Thin data-fetch layer over + * `/api/ai/chat` plus the existing assign/transfer endpoints — no business logic. + * + * Holds the opaque Anthropic history (`messages`) and any held read-tool results, + * and round-trips them so the server stays stateless. Money is committed ONLY + * here, on explicit user approval, via the existing endpoints; the chat route + * never writes. + * + * @param options.onCommitted called after a successful assign/transfer commit so + * the page can refresh the budget view. + */ +export function useAiChat(options?: { onCommitted?: () => void }) { + const transcript = ref([]) + const proposedActions = ref([]) // still-undecided proposals + const pending = ref(false) + /** 'not-configured' (no API key) | a message | null. */ + const error = ref(null) + + // Opaque round-trip state (never rendered raw). + const history = ref([]) + const heldReadResults = ref([]) + // Verdicts accumulated while a multi-proposal turn is being decided. + const decided = ref([]) + + /** + * Probe key configuration so the panel shows the not-configured empty state + * proactively (R4.2) — before the user types and hits a 503. Call on mount. + */ + async function checkConfigured(): Promise { + try { + const cfg = await $fetch<{ configured: boolean }>('/api/ai/config') + error.value = cfg.configured ? null : 'not-configured' + } catch { + // A transient failure here shouldn't block the UI; the first send will + // surface any real problem. + } + } + + async function run(req: AiChatRequest): Promise { + error.value = null + try { + const res = await $fetch('/api/ai/chat', { method: 'POST', body: req }) + history.value = res.messages + heldReadResults.value = res.readToolResults + proposedActions.value = res.proposedActions + if (res.reply) transcript.value.push({ role: 'assistant', text: res.reply }) + } catch (e) { + error.value = is503(e) + ? 'not-configured' + : 'Sorry — I had trouble reaching the assistant. Please try again.' + } + } + + /** Resume the loop once every proposal in the turn has a verdict. */ + async function maybeResume(): Promise { + if (proposedActions.value.length > 0) return // still awaiting other verdicts + const resolutions = decided.value + const held = heldReadResults.value + decided.value = [] + heldReadResults.value = [] + await run({ messages: history.value, resolutions, readToolResults: held }) + } + + function record(action: ProposedAction, status: ChatResolution['status'], resultText: string): void { + decided.value.push({ toolUseId: action.id, status, resultText }) + proposedActions.value = proposedActions.value.filter(a => a.id !== action.id) + } + + async function send(text: string): Promise { + if (pending.value || !text.trim()) return + pending.value = true + try { + // R3.4: a new message auto-rejects any un-acted proposals so there's no + // dangling tool_use; combine with any already-decided verdicts. + const supersede: ChatResolution[] = proposedActions.value.map(a => ({ + toolUseId: a.id, + status: 'rejected', + resultText: 'Superseded by a new message', + })) + const resolutions = [...decided.value, ...supersede] + const held = heldReadResults.value + decided.value = [] + proposedActions.value = [] + heldReadResults.value = [] + transcript.value.push({ role: 'user', text }) + await run({ + messages: history.value, + message: text, + resolutions: resolutions.length ? resolutions : undefined, + readToolResults: held.length ? held : undefined, + }) + } finally { + pending.value = false + } + } + + async function approve(action: ProposedAction): Promise { + if (pending.value) return + pending.value = true + try { + const today = new Date().toISOString().slice(0, 10) + try { + if (action.kind === 'assign') { + await $fetch('/api/budget/assign', { method: 'POST', body: { ...action.payload, date: today } }) + } else { + await $fetch('/api/budget/transfer', { method: 'POST', body: { ...action.payload, date: today } }) + } + options?.onCommitted?.() + record(action, 'approved', `Committed: ${action.summary}`) + } catch (e) { + // R2.5: the endpoint rejected the write (e.g. availability gate). Don't + // mark it committed; feed the failure to the model via the resolution so + // the assistant explains it in its reply (rather than a redundant banner). + record(action, 'rejected', `Could not apply that: ${errMessage(e)}`) + } + await maybeResume() + } finally { + pending.value = false + } + } + + async function reject(action: ProposedAction): Promise { + if (pending.value) return + pending.value = true + try { + record(action, 'rejected', 'User rejected this action') + await maybeResume() + } finally { + pending.value = false + } + } + + return { transcript, proposedActions, pending, error, send, approve, reject, checkConfigured } +} diff --git a/package-lock.json b/package-lock.json index 0d14a3f..5a06636 100644 --- a/package-lock.json +++ b/package-lock.json @@ -6,6 +6,7 @@ "": { "name": "hledger-budget-app", "dependencies": { + "@anthropic-ai/sdk": "^0.104.2", "@nuxt/ui": "4.5.1", "nuxt": "^4.3.1" }, @@ -42,6 +43,27 @@ "url": "https://github.com/sponsors/antfu" } }, + "node_modules/@anthropic-ai/sdk": { + "version": "0.104.2", + "resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.104.2.tgz", + "integrity": "sha512-s1wEVDAtEwkS7Ajgep6PZKJLFqybRkmD3Byz+iVVsSpbDY0gjROXE9aOft6V3PMqynn3NTcycV5whga9tCzmKA==", + "license": "MIT", + "dependencies": { + "json-schema-to-ts": "^3.1.1", + "standardwebhooks": "^1.0.0" + }, + "bin": { + "anthropic-ai-sdk": "bin/cli" + }, + "peerDependencies": { + "zod": "^3.25.0 || ^4.0.0" + }, + "peerDependenciesMeta": { + "zod": { + "optional": true + } + } + }, "node_modules/@babel/code-frame": { "version": "7.29.0", "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.29.0.tgz", @@ -394,6 +416,15 @@ "@babel/core": "^7.0.0-0" } }, + "node_modules/@babel/runtime": { + "version": "7.29.7", + "resolved": "https://registry.npmjs.org/@babel/runtime/-/runtime-7.29.7.tgz", + "integrity": "sha512-Nq8OhGWiZIZGV6hLHoyAKLLcJihP/xFeBMGJoUrxTX2psI8dCifzLhZISFb+VWS3wFMRDmCGw5R+dOySCqPLhw==", + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, "node_modules/@babel/template": { "version": "7.28.6", "resolved": "https://registry.npmjs.org/@babel/template/-/template-7.28.6.tgz", @@ -3878,6 +3909,12 @@ "integrity": "sha512-G4ewlBNhUtlLvrJTb88d2mdy2KRijzs4UhnlrOSRT4bmjh/IqNElZa3zkrZ+TC47TwtlDWzVLFADljF1Ijp5hA==", "license": "CC0-1.0" }, + "node_modules/@stablelib/base64": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/@stablelib/base64/-/base64-1.0.1.tgz", + "integrity": "sha512-1bnPQqSxSuc3Ii6MhBysoWCg58j97aUjuCSZrGSmDxNqtytIi0k8utUenAwTZN4V5mXXYGsVUI9zeBqy+jBOSQ==", + "license": "MIT" + }, "node_modules/@standard-schema/spec": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/@standard-schema/spec/-/spec-1.1.0.tgz", @@ -7287,6 +7324,12 @@ "url": "https://github.com/sponsors/antfu" } }, + "node_modules/fast-sha256": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/fast-sha256/-/fast-sha256-1.3.0.tgz", + "integrity": "sha512-n11RGP/lrWEFI/bWdygLxhI+pVeo1ZYIVwvvPkW7azl/rOy+F3HYRZ2K5zeE9mmkhQppyv9sQFx0JM9UabnpPQ==", + "license": "Unlicense" + }, "node_modules/fastq": { "version": "1.20.1", "resolved": "https://registry.npmjs.org/fastq/-/fastq-1.20.1.tgz", @@ -8090,6 +8133,19 @@ "node": ">=6" } }, + "node_modules/json-schema-to-ts": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/json-schema-to-ts/-/json-schema-to-ts-3.1.1.tgz", + "integrity": "sha512-+DWg8jCJG2TEnpy7kOm/7/AxaYoaRbjVB4LFZLySZlWn8exGs3A4OLJR966cVvU26N7X9TWxl+Jsw7dzAqKT6g==", + "license": "MIT", + "dependencies": { + "@babel/runtime": "^7.18.3", + "ts-algebra": "^2.0.0" + }, + "engines": { + "node": ">=16" + } + }, "node_modules/json5": { "version": "2.2.3", "resolved": "https://registry.npmjs.org/json5/-/json5-2.2.3.tgz", @@ -10992,6 +11048,16 @@ "integrity": "sha512-qoRRSyROncaz1z0mvYqIE4lCd9p2R90i6GxW3uZv5ucSu8tU7B5HXUP1gG8pVZsYNVaXjk8ClXHPttLyxAL48A==", "license": "MIT" }, + "node_modules/standardwebhooks": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/standardwebhooks/-/standardwebhooks-1.0.0.tgz", + "integrity": "sha512-BbHGOQK9olHPMvQNHWul6MYlrRTAOKn03rOe4A8O3CLWhNf4YHBqq2HJKKC+sfqpxiBY52pNeesD6jIiLDz8jg==", + "license": "MIT", + "dependencies": { + "@stablelib/base64": "^1.0.0", + "fast-sha256": "^1.3.0" + } + }, "node_modules/statuses": { "version": "2.0.2", "resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.2.tgz", @@ -11461,6 +11527,12 @@ "integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==", "license": "MIT" }, + "node_modules/ts-algebra": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/ts-algebra/-/ts-algebra-2.0.0.tgz", + "integrity": "sha512-FPAhNPFMrkwz76P7cdjdmiShwMynZYN6SgOujD1urY4oNm80Ou9oMdmbR45LotcKOXoy7wSmHkRFE6Mxbrhefw==", + "license": "MIT" + }, "node_modules/tslib": { "version": "2.8.1", "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.8.1.tgz", diff --git a/package.json b/package.json index 475437d..0cd0679 100644 --- a/package.json +++ b/package.json @@ -10,8 +10,9 @@ "test": "vitest run" }, "dependencies": { - "nuxt": "^4.3.1", - "@nuxt/ui": "4.5.1" + "@anthropic-ai/sdk": "^0.104.2", + "@nuxt/ui": "4.5.1", + "nuxt": "^4.3.1" }, "devDependencies": { "@types/node": "^22.0.0", diff --git a/pages/budget.vue b/pages/budget.vue index 95fd902..b494099 100644 --- a/pages/budget.vue +++ b/pages/budget.vue @@ -6,6 +6,9 @@ const toast = useToast() const currentPeriod = ref(new Date().toISOString().slice(0, 7)) const { data: budget, status, refresh } = useBudget(currentPeriod) +// AI budgeting chat (Issue #8) — opens in a slideover; refreshes on commit. +const showChat = ref(false) + // Group management state const showAddGroup = ref(false) const newGroupName = ref('') @@ -258,6 +261,7 @@ async function saveAssignment(cat: BudgetCategory) { @@ -422,4 +426,11 @@ async function saveAssignment(cat: BudgetCategory) { + + + + + diff --git a/pages/settings.vue b/pages/settings.vue index c7b36d9..6101629 100644 --- a/pages/settings.vue +++ b/pages/settings.vue @@ -9,6 +9,38 @@ const exporting = ref(false) const { data: journalData, refresh: refreshJournals } = useFetch('/api/journal/list') +// AI Assistant API-key config (Issue #8). +interface AiConfigStatus { configured: boolean, source: 'env' | 'config' | 'none', maskedKey: string | null, hasStoredKey: boolean } +const { data: aiConfig, refresh: refreshAiConfig } = useFetch('/api/ai/config') +const aiKeyInput = ref('') +const aiKeySaving = ref(false) + +async function handleSaveKey() { + const key = aiKeyInput.value.trim() + if (!key) return + aiKeySaving.value = true + try { + await $fetch('/api/ai/config', { method: 'POST', body: { apiKey: key } }) + toast.add({ title: 'API key saved', description: 'The budgeting assistant is ready to use.', color: 'success' }) + aiKeyInput.value = '' + await refreshAiConfig() + } catch (err: any) { + toast.add({ title: 'Error', description: err?.data?.statusMessage || err?.message || 'Failed to save key', color: 'error' }) + } finally { + aiKeySaving.value = false + } +} + +async function handleClearKey() { + try { + await $fetch('/api/ai/config', { method: 'DELETE' }) + toast.add({ title: 'API key cleared', color: 'success' }) + await refreshAiConfig() + } catch (err: any) { + toast.add({ title: 'Error', description: err?.data?.statusMessage || err?.message || 'Failed to clear key', color: 'error' }) + } +} + async function handleCreate() { let name = newJournalName.value.trim() if (!name) return @@ -164,6 +196,51 @@ function readFileContent(file: File): Promise {

No journal files found.

+ + +

+ Set your Anthropic API key to enable the budgeting assistant. It's stored locally in + config/ai-config.json (gitignored) and sent only to the Anthropic API. +

+ +
+ + +
+ +

+ The ANTHROPIC_API_KEY environment variable is set and takes precedence over a saved key. + +

+ +
+ + + + + +
+
+ diff --git a/server/ai/__tests__/budgetInstructions.test.ts b/server/ai/__tests__/budgetInstructions.test.ts new file mode 100644 index 0000000..5285140 --- /dev/null +++ b/server/ai/__tests__/budgetInstructions.test.ts @@ -0,0 +1,21 @@ +import { describe, it, expect } from 'vitest' +import { BUDGET_SYSTEM_PROMPT } from '../budgetInstructions' + +describe('BUDGET_SYSTEM_PROMPT', () => { + it('is a non-empty string', () => { + expect(typeof BUDGET_SYSTEM_PROMPT).toBe('string') + expect(BUDGET_SYSTEM_PROMPT.length).toBeGreaterThan(200) + }) + + it('carries the load-bearing propose-never-execute framing', () => { + // These phrases encode the HITL safety contract — guard against silent drift. + expect(BUDGET_SYSTEM_PROMPT).toMatch(/human-in-the-loop/i) + expect(BUDGET_SYSTEM_PROMPT).toMatch(/propose/i) + expect(BUDGET_SYSTEM_PROMPT).toMatch(/nothing is written until the user explicitly approves/i) + }) + + it('states YNAB Rule 1 and use-live-data guidance', () => { + expect(BUDGET_SYSTEM_PROMPT).toMatch(/Ready to Assign/i) + expect(BUDGET_SYSTEM_PROMPT).toMatch(/get_budget/) + }) +}) diff --git a/server/ai/budgetInstructions.ts b/server/ai/budgetInstructions.ts new file mode 100644 index 0000000..cfe9f6c --- /dev/null +++ b/server/ai/budgetInstructions.ts @@ -0,0 +1,38 @@ +/** + * System prompt for the AI budgeting chat (Issue #8). + * + * Loaded as the cacheable system prefix (stable across turns). Volatile budget + * numbers are NEVER embedded here — the model fetches them via `get_budget`, so + * the cached prefix stays warm. Kept as a `.ts` string export (not a read `.md`) + * so it typechecks and bundles cleanly under Nitro; the content is still plain + * markdown for easy editing. + * + * The propose-never-execute framing here is load-bearing and is guarded by a + * test (budgetInstructions.test.ts) so it can't silently regress. + */ +export const BUDGET_SYSTEM_PROMPT = `You are the budgeting assistant inside a friendly YNAB-style budgeting app built on the hledger accounting engine. You help the user understand and manage their envelope budget. You speak in plain budgeting terms — never in accounting jargon. Never mention postings, debits, credits, or double-entry. Say "envelope" rather than "category", and refer to accounts and envelopes by their friendly names (e.g. "Checking", "Groceries"), not their raw colon-separated paths. + +## What you can do +- **Answer questions** about the budget: how much is in an envelope, what's left to assign, recent spending, where money went. +- **Propose** moving money: assigning Ready-to-Assign money into envelopes, or transferring between envelopes. + +## How money moves — human-in-the-loop (READ THIS) +You do NOT move money. When the user wants to assign or transfer, you call the corresponding tool to **propose** the action. The app shows the user a confirmation card; nothing is written until the user explicitly approves. So: +- **Propose, never assume.** Calling \`assign_to_envelope\` or \`transfer_between_envelopes\` creates a *proposal* for the user to approve — it does not commit anything. Describe what you're proposing in plain language. +- Propose **one action per turn**, in its own turn — don't bundle a proposal together with data lookups in the same response. +- After a proposal is approved, the app tells you the result. After it's rejected, respect that — don't re-propose the same thing unless asked. +- If the user only asks a question, just answer it. Don't propose an action they didn't ask for. + +## Use live data, never guess numbers +Always call \`get_budget\` to read current Ready-to-Assign and envelope balances before stating any figure or proposing an assignment — do not rely on numbers from earlier in the conversation, which may be stale. Use \`get_transactions\` for spending history. + +## YNAB Rule 1 — every dollar has a job +"Ready to Assign" is money that exists but isn't yet assigned to an envelope. You can only assign money that's in Ready to Assign; the app will reject an assignment that exceeds it. Overspending an envelope is handled by transferring from another envelope, not by assigning more than is available. When the user has unassigned money, it's reasonable to help them give it a job — but only assign what they ask for or agree to. + +## Building proposals +- To assign, you need the envelope name(s) and amount(s). Envelope and account identifiers come from \`get_budget\` — use the identifiers it returns, not guesses. +- Amounts are always positive dollar figures. +- Dates default to today; you don't need to specify one. + +## Tone +Be concise, warm, and practical. Lead with the answer. When proposing, state plainly what will happen if the user approves (e.g. "I'll move $50 from Dining to Groceries").` diff --git a/server/api/ai/__tests__/chat.post.test.ts b/server/api/ai/__tests__/chat.post.test.ts new file mode 100644 index 0000000..8f7e3ff --- /dev/null +++ b/server/api/ai/__tests__/chat.post.test.ts @@ -0,0 +1,183 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest' + +// --- Hoisted mock state (referenced inside vi.mock factories) --- +const h = vi.hoisted(() => ({ + createMock: vi.fn(), + appendMock: vi.fn(), + budgetMock: vi.fn(), + txMock: vi.fn(), + state: { keyConfigured: true }, +})) + +class FakeMissingApiKeyError extends Error {} + +vi.mock('../../../utils/anthropic', () => ({ + getAnthropic: () => { + if (!h.state.keyConfigured) throw new FakeMissingApiKeyError() + return { messages: { create: h.createMock } } + }, + MissingApiKeyError: FakeMissingApiKeyError, + REQUEST_DEFAULTS: { model: 'claude-opus-4-8', max_tokens: 4096 }, +})) + +// Leaf data utils used by the read-tool handlers. +vi.mock('../../../utils/budgetReport', () => ({ getBudgetReport: (...a: any[]) => h.budgetMock(...a) })) +vi.mock('../../../utils/transactionList', () => ({ getTransactionList: (...a: any[]) => h.txMock(...a) })) + +// The journal writer — proving the loop NEVER writes (R6.1). The chat route graph +// must never reach this. +vi.mock('../../../utils/journalWriter', () => ({ appendTransaction: (...a: any[]) => h.appendMock(...a) })) + +// Nitro globals + the auto-imported budget-base resolver used by toProposedAction. +vi.stubGlobal('defineEventHandler', (fn: Function) => fn) +vi.stubGlobal('readBody', async (event: any) => event.body) +vi.stubGlobal('createError', (opts: any) => Object.assign(new Error(opts.statusMessage || opts.message), opts)) +vi.stubGlobal('resolveBudgetBase', async () => 'assets:checking') + +const { default: chat } = await import('../chat.post') + +const ev = (body: any) => ({ body } as any) +const text = (t: string) => ({ type: 'text', text: t }) +const toolUse = (id: string, name: string, input: any) => ({ type: 'tool_use', id, name, input }) + +beforeEach(() => { + vi.clearAllMocks() + h.state.keyConfigured = true +}) + +describe('POST /api/ai/chat — HITL safety (R6.1)', () => { + it('surfaces an assign proposal WITHOUT writing to the journal', async () => { + h.createMock.mockResolvedValueOnce({ + stop_reason: 'tool_use', + content: [text('I can assign that.'), toolUse('toolu_a', 'assign_to_envelope', { envelopes: { rent: 100 } })], + }) + + const res = await chat(ev({ message: 'assign 100 to rent' })) + + // The load-bearing guarantee: nothing was written. + expect(h.appendMock).not.toHaveBeenCalled() + // The loop paused on the proposal — no second model call. + expect(h.createMock).toHaveBeenCalledTimes(1) + expect(res.proposedActions).toHaveLength(1) + expect(res.proposedActions[0]!.kind).toBe('assign') + expect(res.proposedActions[0]!.id).toBe('toolu_a') + expect(res.reply).toContain('assign') + }) +}) + +describe('read-tool dispatch', () => { + it('executes get_budget, feeds the result back, and continues to a final reply', async () => { + h.budgetMock.mockResolvedValue({ readyToAssign: 250 }) + h.createMock + .mockResolvedValueOnce({ stop_reason: 'tool_use', content: [toolUse('toolu_b', 'get_budget', {})] }) + .mockResolvedValueOnce({ stop_reason: 'end_turn', content: [text('You have $250 to assign.')] }) + + const res = await chat(ev({ message: 'how much can I assign?' })) + + expect(h.budgetMock).toHaveBeenCalledOnce() + expect(h.createMock).toHaveBeenCalledTimes(2) + expect(res.proposedActions).toHaveLength(0) + expect(res.reply).toBe('You have $250 to assign.') + // The history carries a tool_result turn for the read tool (search, not index: + // `messages` is one mutated array, so the final assistant turn is last). + const readResultTurn = res.messages.find((m: any) => + m.role === 'user' && Array.isArray(m.content) && m.content[0]?.tool_use_id === 'toolu_b') + expect(readResultTurn).toBeDefined() + expect((readResultTurn as any).content[0].type).toBe('tool_result') + expect(h.appendMock).not.toHaveBeenCalled() + }) +}) + +describe('resume after approval (R6.2 / R3.2)', () => { + it('appends a tool_result turn for the approved action and resumes — still no write', async () => { + h.createMock.mockResolvedValueOnce({ stop_reason: 'end_turn', content: [text('Done — Rent is funded.')] }) + + const priorMessages = [ + { role: 'user', content: 'assign 100 to rent' }, + { role: 'assistant', content: [toolUse('toolu_c', 'assign_to_envelope', { envelopes: { rent: 100 } })] }, + ] + + const res = await chat(ev({ + messages: priorMessages, + resolutions: [{ toolUseId: 'toolu_c', status: 'approved', resultText: 'Committed: assigned $100 to rent' }], + })) + + // The route itself never writes — committing is the client's job via the + // existing endpoint. Here we only verify the resume protocol. + expect(h.appendMock).not.toHaveBeenCalled() + const sentMessages = h.createMock.mock.calls[0]![0].messages + const resumeTurn = sentMessages.find((m: any) => + m.role === 'user' && Array.isArray(m.content) && m.content[0]?.tool_use_id === 'toolu_c') + expect(resumeTurn).toBeDefined() + expect(resumeTurn.content[0].content).toContain('Committed') + expect(res.reply).toBe('Done — Rent is funded.') + }) + + it('echoes held read results back alongside the resolution (mixed turn, R3.3)', async () => { + h.createMock.mockResolvedValueOnce({ stop_reason: 'end_turn', content: [text('ok')] }) + const heldRead = { type: 'tool_result', tool_use_id: 'toolu_read', content: '{"readyToAssign":250}' } + + const res = await chat(ev({ + messages: [{ role: 'assistant', content: [toolUse('toolu_act', 'assign_to_envelope', {})] }], + readToolResults: [heldRead], + resolutions: [{ toolUseId: 'toolu_act', status: 'rejected', resultText: 'User rejected this action' }], + })) + + // The resume turn resolves BOTH tool_use ids together (held read + verdict). + const turn = res.messages.find((m: any) => + m.role === 'user' && Array.isArray(m.content) && m.content.some((b: any) => b.tool_use_id === 'toolu_act')) + expect(turn).toBeDefined() + const ids = (turn as any).content.map((b: any) => b.tool_use_id).sort() + expect(ids).toEqual(['toolu_act', 'toolu_read']) + }) +}) + +describe('failure & control handling', () => { + it('returns a refusal as plain text with no actions (R4.3)', async () => { + h.createMock.mockResolvedValueOnce({ stop_reason: 'refusal', content: [] }) + const res = await chat(ev({ message: 'do something off-limits' })) + expect(res.proposedActions).toHaveLength(0) + expect(res.reply.length).toBeGreaterThan(0) + }) + + it('stops at the iteration cap instead of looping forever (R4.4)', async () => { + h.budgetMock.mockResolvedValue({}) + // Always asks for a read tool → would loop forever without the cap. + h.createMock.mockResolvedValue({ stop_reason: 'tool_use', content: [toolUse('toolu_loop', 'get_budget', {})] }) + + const res = await chat(ev({ message: 'loop' })) + + expect(h.createMock).toHaveBeenCalledTimes(8) // MAX_TOOL_ITERATIONS + expect(res.proposedActions).toHaveLength(0) + expect(res.reply).toMatch(/more steps/i) + expect(h.appendMock).not.toHaveBeenCalled() + }) + + it('returns a friendly error (not a 500) when the model call fails (R4.5)', async () => { + h.createMock.mockRejectedValueOnce(new Error('network down')) + const res = await chat(ev({ message: 'hi' })) + expect(res.reply).toMatch(/trouble/i) + expect(res.proposedActions).toHaveLength(0) + }) + + it('reports a billing/credit error accurately, not as a malformed conversation', async () => { + h.createMock.mockRejectedValueOnce( + Object.assign(new Error('Your credit balance is too low to access the Anthropic API.'), { status: 400 }), + ) + const res = await chat(ev({ message: 'hi' })) + expect(res.reply).toMatch(/out of credits/i) + expect(res.reply).not.toMatch(/snag/i) + }) + + it('maps a 401 to a key-rejected message', async () => { + h.createMock.mockRejectedValueOnce(Object.assign(new Error('invalid x-api-key'), { status: 401 })) + const res = await chat(ev({ message: 'hi' })) + expect(res.reply).toMatch(/key was rejected/i) + }) + + it('returns 503 when ANTHROPIC_API_KEY is unset (R4.2)', async () => { + h.state.keyConfigured = false + await expect(chat(ev({ message: 'hi' }))).rejects.toMatchObject({ statusCode: 503 }) + expect(h.createMock).not.toHaveBeenCalled() + }) +}) diff --git a/server/api/ai/__tests__/config.test.ts b/server/api/ai/__tests__/config.test.ts new file mode 100644 index 0000000..42e3e85 --- /dev/null +++ b/server/api/ai/__tests__/config.test.ts @@ -0,0 +1,111 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest' + +const h = vi.hoisted(() => ({ + resolveApiKey: vi.fn(), + getApiKeySource: vi.fn(), + writeStoredApiKey: vi.fn(), + clearStoredApiKey: vi.fn(), + readStoredApiKey: vi.fn(), +})) + +vi.mock('../../../utils/anthropic', () => ({ + resolveApiKey: h.resolveApiKey, + getApiKeySource: h.getApiKeySource, +})) +vi.mock('../../../utils/aiConfig', () => ({ + writeStoredApiKey: h.writeStoredApiKey, + clearStoredApiKey: h.clearStoredApiKey, + readStoredApiKey: h.readStoredApiKey, + // Real-ish mask so we can assert the full key is never returned. + maskApiKey: (k: string) => (k.length <= 8 ? '••••••••' : `••••••••${k.slice(-4)}`), +})) + +vi.stubGlobal('defineEventHandler', (fn: Function) => fn) +vi.stubGlobal('readBody', async (event: any) => event.body) +vi.stubGlobal('createError', (opts: any) => Object.assign(new Error(opts.statusMessage || opts.message), opts)) + +const { default: getConfig } = await import('../config.get') +const { default: postConfig } = await import('../config.post') +const { default: deleteConfig } = await import('../config.delete') + +const ev = (body?: any) => ({ body } as any) + +beforeEach(() => { + vi.clearAllMocks() + h.writeStoredApiKey.mockResolvedValue(undefined) + h.clearStoredApiKey.mockResolvedValue(undefined) +}) + +describe('GET /api/ai/config', () => { + it('reports configured + source + masked key (never the full key)', () => { + h.resolveApiKey.mockReturnValue('sk-ant-secret-7777') + h.getApiKeySource.mockReturnValue('config') + h.readStoredApiKey.mockReturnValue('sk-ant-secret-7777') + const res = getConfig(ev()) as any + expect(res).toEqual({ configured: true, source: 'config', maskedKey: '••••••••7777', hasStoredKey: true }) + expect(JSON.stringify(res)).not.toContain('secret') + }) + + it('reports not configured', () => { + h.resolveApiKey.mockReturnValue(undefined) + h.getApiKeySource.mockReturnValue('none') + h.readStoredApiKey.mockReturnValue(undefined) + expect(getConfig(ev())).toEqual({ configured: false, source: 'none', maskedKey: null, hasStoredKey: false }) + }) + + it('reports a dormant stored key even when an env var overrides it', () => { + // env wins → source 'env', resolved key is the env key, but a stored key + // still exists on disk and must remain clearable from the UI. + h.resolveApiKey.mockReturnValue('sk-env-1234') + h.getApiKeySource.mockReturnValue('env') + h.readStoredApiKey.mockReturnValue('sk-stored-9999') + const res = getConfig(ev()) as any + expect(res.source).toBe('env') + expect(res.hasStoredKey).toBe(true) + }) +}) + +describe('POST /api/ai/config', () => { + it('saves a valid key and returns only the masked form', async () => { + h.getApiKeySource.mockReturnValue('config') + const res = await postConfig(ev({ apiKey: ' sk-ant-abcdef1234 ' })) as any + expect(h.writeStoredApiKey).toHaveBeenCalledWith('sk-ant-abcdef1234') // trimmed + expect(res.configured).toBe(true) + expect(res.maskedKey).toBe('••••••••1234') + expect(JSON.stringify(res)).not.toContain('abcdef') + }) + + it('reflects env override in the reported source', async () => { + h.getApiKeySource.mockReturnValue('env') + const res = await postConfig(ev({ apiKey: 'sk-ant-abcdef1234' })) as any + expect(res.source).toBe('env') + }) + + it.each([ + ['empty', { apiKey: ' ' }], + ['missing', {}], + ['whitespace inside', { apiKey: 'sk ant key' }], + ['too short', { apiKey: 'sk-12' }], + ])('rejects an invalid key (%s) with 400 and does not write', async (_label, body) => { + await expect(postConfig(ev(body))).rejects.toMatchObject({ statusCode: 400 }) + expect(h.writeStoredApiKey).not.toHaveBeenCalled() + }) +}) + +describe('DELETE /api/ai/config', () => { + it('clears the stored key and reports the remaining state', async () => { + h.resolveApiKey.mockReturnValue(undefined) + h.getApiKeySource.mockReturnValue('none') + const res = await deleteConfig(ev()) as any + expect(h.clearStoredApiKey).toHaveBeenCalledOnce() + expect(res).toEqual({ configured: false, source: 'none', maskedKey: null }) + }) + + it('still reports configured when an env var remains after clearing', async () => { + h.resolveApiKey.mockReturnValue('sk-env-9999') + h.getApiKeySource.mockReturnValue('env') + const res = await deleteConfig(ev()) as any + expect(res.configured).toBe(true) + expect(res.source).toBe('env') + }) +}) diff --git a/server/api/ai/chat.post.ts b/server/api/ai/chat.post.ts new file mode 100644 index 0000000..ba5fb0d --- /dev/null +++ b/server/api/ai/chat.post.ts @@ -0,0 +1,168 @@ +import type Anthropic from '@anthropic-ai/sdk' +import { getAnthropic, MissingApiKeyError, REQUEST_DEFAULTS } from '../../utils/anthropic' +import { BUDGET_SYSTEM_PROMPT } from '../../ai/budgetInstructions' +import { TOOLS, READ_TOOL_HANDLERS, isProposedActionTool, toProposedAction } from '../../utils/aiTools' +import type { AiChatRequest, AiChatResponse, ProposedAction } from '../../../types/ai' + +/** + * POST /api/ai/chat — the human-in-the-loop budgeting chat tool loop (Issue #8). + * + * SAFETY INVARIANT: this route NEVER writes to the journal. Read tools execute + * server-side and feed results back to the model; proposed-action tools + * (assign/transfer) are SURFACED for user approval, never executed. A write + * happens only when the client calls the existing assign/transfer endpoint after + * the user approves. (Guarded by chat.post.test.ts.) + * + * Stateless: the full Anthropic message history is passed in and echoed back each + * request. Every `tool_use` block eventually receives a matching `tool_result` — + * read results immediately, proposed-action results on the resume turn — so the + * conversation stays protocol-valid. + */ + +const MAX_TOOL_ITERATIONS = 8 + +function isToolUse(block: Anthropic.ContentBlock): block is Anthropic.ToolUseBlock { + return block.type === 'tool_use' +} + +function extractText(content: Anthropic.ContentBlock[]): string { + return content + .filter((b): b is Anthropic.TextBlock => b.type === 'text') + .map(b => b.text) + .join('\n') + .trim() +} + +function toolResult(toolUseId: string, value: unknown, isError = false): Anthropic.ToolResultBlockParam { + return { + type: 'tool_result', + tool_use_id: toolUseId, + content: typeof value === 'string' ? value : JSON.stringify(value ?? {}), + is_error: isError, + } +} + +export default defineEventHandler(async (event): Promise => { + const body = await readBody(event) + + let client: Anthropic + try { + client = getAnthropic() + } catch (err) { + if (err instanceof MissingApiKeyError) { + throw createError({ + statusCode: 503, + statusMessage: 'AI chat is not configured. Set ANTHROPIC_API_KEY to enable it.', + }) + } + throw err + } + + // Validated trust boundary: the opaque wire history → Anthropic MessageParam[]. + const messages = (Array.isArray(body.messages) ? body.messages : []) as Anthropic.MessageParam[] + + // Resume turn: build the user turn that resolves the prior assistant turn's + // tool_use blocks — read results computed last turn (echoed back) + the user's + // approve/reject verdicts. Both kinds together so every tool_use is covered. + if (body.resolutions?.length || body.readToolResults?.length) { + const content: Anthropic.ToolResultBlockParam[] = [ + ...((body.readToolResults ?? []) as Anthropic.ToolResultBlockParam[]), + ...(body.resolutions ?? []).map(r => toolResult(r.toolUseId, r.resultText)), + ] + if (content.length) messages.push({ role: 'user', content }) + } + + // Fresh user message. + if (body.message && body.message.trim()) { + messages.push({ role: 'user', content: body.message }) + } + + const system: Anthropic.TextBlockParam[] = [ + { type: 'text', text: BUDGET_SYSTEM_PROMPT, cache_control: { type: 'ephemeral' } }, + ] + + try { + for (let i = 0; i < MAX_TOOL_ITERATIONS; i++) { + const response = await client.messages.create({ + ...REQUEST_DEFAULTS, + system, + tools: TOOLS, + messages, + }) + + // Echo the assistant turn back into history (preserves thinking + tool_use + // blocks, required for protocol-valid continuation on the same model). + messages.push({ role: 'assistant', content: response.content }) + const reply = extractText(response.content) + + if (response.stop_reason === 'refusal') { + return { messages, reply: reply || 'Sorry, I can\'t help with that.', proposedActions: [], readToolResults: [] } + } + + if (response.stop_reason !== 'tool_use') { + // end_turn / max_tokens / stop_sequence — a final reply. + return { messages, reply, proposedActions: [], readToolResults: [] } + } + + const toolUses = response.content.filter(isToolUse) + const proposals = toolUses.filter(b => isProposedActionTool(b.name)) + const reads = toolUses.filter(b => !isProposedActionTool(b.name)) + + if (proposals.length > 0) { + // PAUSE: surface the proposal(s) for approval. Compute (but hold) results + // for any read tools in this same turn so the client can echo them back + // and resolve every tool_use together on resume. NOTHING is written. + const readToolResults: Anthropic.ToolResultBlockParam[] = [] + for (const r of reads) { + const result = await READ_TOOL_HANDLERS[r.name]?.(r.input) + readToolResults.push(toolResult(r.id, result)) + } + const proposedActions: ProposedAction[] = await Promise.all(proposals.map(toProposedAction)) + return { messages, reply, proposedActions, readToolResults } + } + + // Read-only turn: execute, append results, and continue the loop. + const results: Anthropic.ToolResultBlockParam[] = [] + for (const r of reads) { + try { + const result = await READ_TOOL_HANDLERS[r.name]?.(r.input) + results.push(toolResult(r.id, result)) + } catch (e) { + results.push(toolResult(r.id, { error: (e as Error).message }, true)) + } + } + messages.push({ role: 'user', content: results }) + } + + // Iteration cap — return what we have rather than looping forever. + return { + messages, + reply: 'I looked into that but it took more steps than expected — could you narrow the question?', + proposedActions: [], + readToolResults: [], + } + } catch (err) { + // Anthropic/network failure after SDK retries. Leave the conversation + // resumable and surface an actionable message rather than a 500. + // + // Log only the error name/status/message for diagnostics — these describe the + // request *structure* or transport, never the budget data or the key (R5.2). + const status: number | undefined = (err as { status?: number })?.status + const name = (err as { name?: string })?.name ?? 'Error' + const message = (err as { message?: string })?.message ?? 'unknown error' + console.error(`[ai/chat] Anthropic request failed: ${name}${status ? ` (status ${status})` : ''}: ${message}`) + + // A billing/credit problem comes back as a 400 invalid_request_error — detect + // it by message so we don't mislabel it as a malformed conversation. + const isBilling = /credit balance|billing|quota/i.test(message) + + let reply = 'Sorry — I had trouble reaching the assistant just now. Please try again.' + if (isBilling) reply = 'Your Anthropic account is out of credits. Add credits in the Anthropic Console (Plans & Billing), then try again.' + else if (status === 401) reply = 'The Anthropic API key was rejected. Check or re-enter it in Settings.' + else if (status === 403) reply = 'This Anthropic API key is not permitted to use this model. Check your key in Settings.' + else if (status === 429) reply = 'Anthropic is rate-limiting requests — please wait a moment and try again.' + else if (status === 400) reply = 'This conversation hit a snag and can\'t continue. Start a new chat to reset it.' + + return { messages, reply, proposedActions: [], readToolResults: [] } + } +}) diff --git a/server/api/ai/config.delete.ts b/server/api/ai/config.delete.ts new file mode 100644 index 0000000..2be227a --- /dev/null +++ b/server/api/ai/config.delete.ts @@ -0,0 +1,17 @@ +import { clearStoredApiKey, maskApiKey } from '../../utils/aiConfig' +import { resolveApiKey, getApiKeySource } from '../../utils/anthropic' + +/** + * DELETE /api/ai/config — remove the stored Anthropic API key. Any + * `ANTHROPIC_API_KEY` env var is left intact (and still wins), so the response + * reflects whatever key remains in effect afterward. + */ +export default defineEventHandler(async () => { + await clearStoredApiKey() + const key = resolveApiKey() + return { + configured: Boolean(key), + source: getApiKeySource(), + maskedKey: key ? maskApiKey(key) : null, + } +}) diff --git a/server/api/ai/config.get.ts b/server/api/ai/config.get.ts new file mode 100644 index 0000000..dcf7369 --- /dev/null +++ b/server/api/ai/config.get.ts @@ -0,0 +1,24 @@ +import { resolveApiKey, getApiKeySource } from '../../utils/anthropic' +import { maskApiKey, readStoredApiKey } from '../../utils/aiConfig' + +/** + * GET /api/ai/config — report whether the Anthropic API key is configured and + * where it comes from, for the Settings UI status line. + * + * `hasStoredKey` is reported independently of `source` so the UI can still offer + * to clear a stored key that's currently shadowed by an env var (otherwise a + * dormant key on disk would be unclearable from the UI). + * + * NEVER returns the key in full — only a masked form (last 4 chars). The full + * key never crosses the wire. + */ +export default defineEventHandler(() => { + const key = resolveApiKey() + const source = getApiKeySource() + return { + configured: Boolean(key), + source, + maskedKey: key ? maskApiKey(key) : null, + hasStoredKey: Boolean(readStoredApiKey()), + } +}) diff --git a/server/api/ai/config.post.ts b/server/api/ai/config.post.ts new file mode 100644 index 0000000..33985cf --- /dev/null +++ b/server/api/ai/config.post.ts @@ -0,0 +1,39 @@ +import { writeStoredApiKey, maskApiKey } from '../../utils/aiConfig' +import { getApiKeySource } from '../../utils/anthropic' + +interface SaveKeyRequest { + apiKey?: unknown +} + +/** + * POST /api/ai/config — save the Anthropic API key entered in the Settings UI to + * the gitignored config file. Takes effect on the next chat request (no restart). + * + * The key is validated but never logged or echoed back in full (only masked). + */ +export default defineEventHandler(async (event) => { + const body = await readBody(event) + const raw = typeof body?.apiKey === 'string' ? body.apiKey.trim() : '' + + if (!raw) { + throw createError({ statusCode: 400, statusMessage: 'apiKey is required' }) + } + // A key has no whitespace/newlines; reject paste artifacts rather than persist + // a value that would fail at the API boundary. + if (/\s/.test(raw)) { + throw createError({ statusCode: 400, statusMessage: 'apiKey must not contain whitespace' }) + } + if (raw.length < 8) { + throw createError({ statusCode: 400, statusMessage: 'apiKey looks too short' }) + } + + await writeStoredApiKey(raw) + + // `source` reflects what's actually in effect: if ANTHROPIC_API_KEY is set, it + // still overrides the stored key — tell the UI so it can say so. + return { + configured: true, + source: getApiKeySource(), + maskedKey: maskApiKey(raw), + } +}) diff --git a/server/api/budget.get.ts b/server/api/budget.get.ts index 7b7bbbf..2174882 100644 --- a/server/api/budget.get.ts +++ b/server/api/budget.get.ts @@ -1,29 +1,5 @@ -import type { BudgetCategory, BudgetCategoryGroup, BudgetEnvelopeReport } from '../../types/ui' -import { stripAccountPrefix } from '../../utils/stripAccountPrefix' -import { singleQuantity, MultiCommodityError } from '../../utils/singleQuantity' import { isValidPeriod } from '../utils/hledgerArgs' -import { readFile } from 'node:fs/promises' -import { pathExists } from '../utils/fsExists' - -async function loadHiddenEnvelopes(): Promise> { - const path = 'config/hidden-envelopes.json' - if (!(await pathExists(path))) return new Set() - try { - const list = JSON.parse(await readFile(path, 'utf-8')) as string[] - return new Set(list) - } catch { - return new Set() - } -} - -/** - * Maps an expense account path to its corresponding budget sub-account name. - * e.g. "expenses:food:groceries" → "food:groceries" - */ -function expenseToBudgetKey(expenseAccount: string): string { - // Strip the "expenses:" prefix to get the category path - return expenseAccount.replace(/^expenses:/, '') -} +import { getBudgetReport } from '../utils/budgetReport' export default defineEventHandler(async (event) => { const { period } = getQuery(event) @@ -35,154 +11,7 @@ export default defineEventHandler(async (event) => { throw createError({ statusCode: 400, statusMessage: 'Invalid period expression' }) } - // 1. Fetch ALL expense accounts, filter out hidden ones - const allAccountsRaw = await hledgerExecText(['accounts']) - const allAccounts = allAccountsRaw.trim().split(/\r?\n/).filter(Boolean).map(s => s.trim()) - const hiddenSet = await loadHiddenEnvelopes() - const expenseAccounts = allAccounts.filter(a => a.startsWith('expenses:') && !hiddenSet.has(a)) - - // Derive the budget base from the account list (Issue #4 item 3) — no extra - // hledger call. All budget sub-account queries/keys hang off this prefix - // instead of a hardcoded `assets:checking:budget:`. - const budgetBase = await resolveBudgetBase(allAccounts) - const budgetPrefix = `${budgetBase}:budget:` - const unallocatedAccount = `${budgetPrefix}unallocated` - const pendingPrefix = `${budgetPrefix}pending:` - - // 2. Fetch period-filtered expense activity (Activity column) - const expenseArgs = ['bal', 'expenses:'] - if (pd) expenseArgs.push('-p', pd) - const expenseRaw = await hledgerExec(expenseArgs) - const expenseReport = transformBalanceReport(expenseRaw) - - const activityMap = new Map() - for (const row of expenseReport.rows) { - activityMap.set(row.account, singleQuantity(row.amounts, `expense activity for ${row.account}`)) - } - - // 3. Fetch budget sub-account data and real account totals - // a) Cumulative budget balances (no period) → Available column - // b) Period-scoped budget delta (with period) → derive this month's Assigned - // c) Real account totals → compute Ready to Assign via YNAB Rule 1 - const budgetBalanceMap = new Map() // cumulative Available - const budgetPeriodDeltaMap = new Map() // period net change - let readyToAssign = 0 - - try { - // a) Cumulative balances — Available is the all-time running balance - const cumulativeArgs = ['bal', budgetPrefix] - const cumulativeRaw = await hledgerExec(cumulativeArgs) - const cumulativeReport = transformBalanceReport(cumulativeRaw) - - for (const row of cumulativeReport.rows) { - const account = row.account as string - if (account.startsWith(budgetPrefix) - && account !== unallocatedAccount - && !account.startsWith(pendingPrefix)) { - const categoryKey = account.slice(budgetPrefix.length) - budgetBalanceMap.set(categoryKey, singleQuantity(row.amounts, `budget balance for ${account}`)) - } - } - - // Ready to Assign (YNAB Rule 1) = net worth − money in envelopes. The single - // source of truth lives in server/utils/budgetData.ts and is shared with the - // assign availability gate, so the report and the gate can never disagree. - // Pass the data we already fetched so this adds only the real-balance read. - readyToAssign = await getReadyToAssign({ budgetBase, cumulativeReport }) - - // b) Period-scoped delta — net change in budget sub-accounts this period - if (pd) { - const periodArgs = ['bal', budgetPrefix, '-p', pd] - const periodRaw = await hledgerExec(periodArgs) - const periodReport = transformBalanceReport(periodRaw) - - for (const row of periodReport.rows) { - const account = row.account as string - const delta = singleQuantity(row.amounts, `budget period delta for ${account}`) - if (account.startsWith(budgetPrefix) - && account !== unallocatedAccount - && !account.startsWith(pendingPrefix)) { - const categoryKey = account.slice(budgetPrefix.length) - budgetPeriodDeltaMap.set(categoryKey, delta) - } - } - } - } catch (err) { - // A multi-commodity account is a real error — surface it, don't mask it as $0. - if (err instanceof MultiCommodityError) throw err - // No budget sub-accounts yet — show $0 for everything (backward compatibility) - } - - // 4. Build categories from expense accounts, overlaying budget data - const groupMap = new Map() - - for (const accountPath of expenseAccounts) { - const isParent = expenseAccounts.some(a => a !== accountPath && a.startsWith(accountPath + ':')) - if (isParent) continue - - const activity = activityMap.get(accountPath) ?? 0 - const budgetKey = expenseToBudgetKey(accountPath) - - // Available = cumulative running balance (includes rollover from all prior periods) - const available = budgetBalanceMap.get(budgetKey) ?? 0 - - // Assigned = assignment amount, reverse-derived from the budget sub-account. - // Identity: budgetDelta = assigned − spent, and spent = activity (signed: - // an outflow is negative, a refund positive). So assigned = delta + activity - // with SIGNED activity. Using |activity| would invent a phantom assignment - // for refunds (a $20 refund would read as +$40 assigned). - let assigned: number - if (pd) { - const periodDelta = budgetPeriodDeltaMap.get(budgetKey) ?? 0 - assigned = periodDelta + activity - } else { - // No period filter: all-time assigned = cumulative available + all-time activity. - assigned = available + activity - } - - const category: BudgetCategory = { - name: stripAccountPrefix(accountPath), - accountPath, - assigned, - activity, - available, - } - - const segments = accountPath.split(':') - const groupKey = segments[1] ?? '' - - if (!groupMap.has(groupKey)) { - groupMap.set(groupKey, []) - } - groupMap.get(groupKey)!.push(category) - } - - // 5. Build category groups with totals - const categoryGroups: BudgetCategoryGroup[] = [] - for (const [key, categories] of groupMap) { - const groupAssigned = categories.reduce((s, c) => s + c.assigned, 0) - const groupActivity = categories.reduce((s, c) => s + c.activity, 0) - const groupAvailable = categories.reduce((s, c) => s + c.available, 0) - - categoryGroups.push({ - name: key.charAt(0).toUpperCase() + key.slice(1), - categories, - assigned: groupAssigned, - activity: groupActivity, - available: groupAvailable, - }) - } - - const totalAssigned = categoryGroups.reduce((s, g) => s + g.assigned, 0) - const totalActivity = categoryGroups.reduce((s, g) => s + g.activity, 0) - const totalAvailable = categoryGroups.reduce((s, g) => s + g.available, 0) - - return { - period: pd, - readyToAssign, - categoryGroups, - totalAssigned, - totalActivity, - totalAvailable, - } satisfies BudgetEnvelopeReport + // Report-building lives in server/utils/budgetReport.ts so it can be shared + // with the AI `get_budget` tool (Issue #8) without duplicating accounting logic. + return await getBudgetReport(pd) }) diff --git a/server/utils/__tests__/aiConfig.test.ts b/server/utils/__tests__/aiConfig.test.ts new file mode 100644 index 0000000..bb802e6 --- /dev/null +++ b/server/utils/__tests__/aiConfig.test.ts @@ -0,0 +1,70 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest' + +const h = vi.hoisted(() => ({ + readFileSync: vi.fn(), + writeFile: vi.fn(), + mkdir: vi.fn(), +})) + +vi.mock('node:fs', () => ({ readFileSync: h.readFileSync })) +vi.mock('node:fs/promises', () => ({ writeFile: h.writeFile, mkdir: h.mkdir })) + +const { readStoredApiKey, writeStoredApiKey, clearStoredApiKey, maskApiKey } = await import('../aiConfig') + +beforeEach(() => { + vi.clearAllMocks() + h.writeFile.mockResolvedValue(undefined) + h.mkdir.mockResolvedValue(undefined) +}) + +describe('readStoredApiKey', () => { + it('returns the trimmed key when present', () => { + h.readFileSync.mockReturnValue(JSON.stringify({ apiKey: ' sk-ant-abc123 ' })) + expect(readStoredApiKey()).toBe('sk-ant-abc123') + }) + + it('returns undefined when the file is missing (read throws)', () => { + h.readFileSync.mockImplementation(() => { throw new Error('ENOENT') }) + expect(readStoredApiKey()).toBeUndefined() + }) + + it('returns undefined for malformed JSON', () => { + h.readFileSync.mockReturnValue('{not json') + expect(readStoredApiKey()).toBeUndefined() + }) + + it('returns undefined when apiKey is empty/whitespace', () => { + h.readFileSync.mockReturnValue(JSON.stringify({ apiKey: ' ' })) + expect(readStoredApiKey()).toBeUndefined() + }) +}) + +describe('writeStoredApiKey', () => { + it('ensures the config dir exists and writes the key', async () => { + await writeStoredApiKey('sk-ant-xyz') + expect(h.mkdir).toHaveBeenCalledWith('config', { recursive: true }) + expect(h.writeFile).toHaveBeenCalledOnce() + const [path, contents] = h.writeFile.mock.calls[0]! + expect(path).toBe('config/ai-config.json') + expect(JSON.parse(contents as string)).toEqual({ apiKey: 'sk-ant-xyz' }) + }) +}) + +describe('clearStoredApiKey', () => { + it('writes an empty config (no key)', async () => { + await clearStoredApiKey() + const [, contents] = h.writeFile.mock.calls[0]! + expect(JSON.parse(contents as string)).toEqual({}) + }) +}) + +describe('maskApiKey', () => { + it('shows only the last 4 characters', () => { + expect(maskApiKey('sk-ant-abcdef1234')).toBe('••••••••1234') + }) + + it('fully masks short or empty keys (never exposes them)', () => { + expect(maskApiKey('short')).toBe('••••••••') + expect(maskApiKey('')).toBe('••••••••') + }) +}) diff --git a/server/utils/__tests__/aiTools.test.ts b/server/utils/__tests__/aiTools.test.ts new file mode 100644 index 0000000..46ae9e5 --- /dev/null +++ b/server/utils/__tests__/aiTools.test.ts @@ -0,0 +1,91 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest' + +const mockGetBudgetReport = vi.fn() +const mockGetTransactionList = vi.fn() + +vi.mock('../budgetReport', () => ({ getBudgetReport: (...a: any[]) => mockGetBudgetReport(...a) })) +vi.mock('../transactionList', () => ({ getTransactionList: (...a: any[]) => mockGetTransactionList(...a) })) + +// resolveBudgetBase is auto-imported in aiTools.ts. +vi.stubGlobal('resolveBudgetBase', async () => 'assets:checking') + +const { + TOOLS, + READ_TOOL_HANDLERS, + isProposedActionTool, + toProposedAction, + READ_TOOL_NAMES, + PROPOSED_ACTION_TOOL_NAMES, +} = await import('../aiTools') + +beforeEach(() => vi.clearAllMocks()) + +describe('tool classification', () => { + it('flags assign/transfer as proposed actions and reads as reads', () => { + expect(isProposedActionTool('assign_to_envelope')).toBe(true) + expect(isProposedActionTool('transfer_between_envelopes')).toBe(true) + expect(isProposedActionTool('get_budget')).toBe(false) + expect(isProposedActionTool('get_transactions')).toBe(false) + }) + + it('exposes all four tools with the cache breakpoint on the last definition', () => { + const names = TOOLS.map(t => t.name) + expect(names).toEqual([...READ_TOOL_NAMES, ...PROPOSED_ACTION_TOOL_NAMES]) + expect(TOOLS[TOOLS.length - 1]!.cache_control).toEqual({ type: 'ephemeral' }) + // Only the last one carries a breakpoint (cache-stable prefix). + expect(TOOLS.slice(0, -1).every(t => t.cache_control == null)).toBe(true) + }) +}) + +describe('read tool handlers delegate to server/utils', () => { + it('get_budget → getBudgetReport with the period', async () => { + mockGetBudgetReport.mockResolvedValue({ readyToAssign: 100 }) + const result = await READ_TOOL_HANDLERS.get_budget!({ period: '2025-03' }) + expect(mockGetBudgetReport).toHaveBeenCalledWith('2025-03') + expect(result).toEqual({ readyToAssign: 100 }) + }) + + it('get_budget → empty period when omitted', async () => { + mockGetBudgetReport.mockResolvedValue({}) + await READ_TOOL_HANDLERS.get_budget!({}) + expect(mockGetBudgetReport).toHaveBeenCalledWith('') + }) + + it('get_transactions → getTransactionList with the query', async () => { + mockGetTransactionList.mockResolvedValue([]) + await READ_TOOL_HANDLERS.get_transactions!({ account: 'expenses:rent', limit: 10 }) + expect(mockGetTransactionList).toHaveBeenCalledWith( + expect.objectContaining({ account: 'expenses:rent', limit: 10 }), + ) + }) +}) + +describe('toProposedAction (never writes — builds a proposal)', () => { + it('builds an assign proposal, resolving the budget host and normalizing keys', async () => { + const action = await toProposedAction({ + id: 'toolu_1', + name: 'assign_to_envelope', + input: { envelopes: { 'expenses:rent': 1200, 'food:groceries': 400 } }, + }) + expect(action.kind).toBe('assign') + expect(action.id).toBe('toolu_1') + if (action.kind !== 'assign') throw new Error('expected assign') + // "expenses:" prefix stripped to the budget sub-account key. + expect(action.payload.envelopes).toEqual({ rent: 1200, 'food:groceries': 400 }) + expect(action.payload.physicalAccount).toBe('assets:checking') + expect(action.summary).toContain('1200') + }) + + it('builds a transfer proposal with full envelope account paths', async () => { + const action = await toProposedAction({ + id: 'toolu_2', + name: 'transfer_between_envelopes', + input: { sourceEnvelope: 'dining', destinationEnvelope: 'food:groceries', amount: 50 }, + }) + expect(action.kind).toBe('transfer') + if (action.kind !== 'transfer') throw new Error('expected transfer') + expect(action.payload.sourceEnvelope).toBe('assets:checking:budget:dining') + expect(action.payload.destinationEnvelope).toBe('assets:checking:budget:food:groceries') + expect(action.payload.amount).toBe(50) + }) +}) diff --git a/server/utils/__tests__/anthropic.test.ts b/server/utils/__tests__/anthropic.test.ts new file mode 100644 index 0000000..4e66a96 --- /dev/null +++ b/server/utils/__tests__/anthropic.test.ts @@ -0,0 +1,79 @@ +import { describe, it, expect, afterEach, vi, beforeEach } from 'vitest' + +const h = vi.hoisted(() => ({ readStored: vi.fn() })) +// The key resolution consults the stored config; mock it so tests don't depend +// on a real config/ai-config.json on disk. +vi.mock('../aiConfig', () => ({ readStoredApiKey: h.readStored })) + +const { getAnthropic, MissingApiKeyError, MODEL, REQUEST_DEFAULTS, resolveApiKey, getApiKeySource } = + await import('../anthropic') + +const original = process.env.ANTHROPIC_API_KEY + +beforeEach(() => { + vi.clearAllMocks() + h.readStored.mockReturnValue(undefined) +}) + +afterEach(() => { + if (original !== undefined) process.env.ANTHROPIC_API_KEY = original + else delete process.env.ANTHROPIC_API_KEY +}) + +describe('resolveApiKey precedence (env overrides stored)', () => { + it('uses the env var when set, even if a stored key exists', () => { + process.env.ANTHROPIC_API_KEY = 'sk-env' + h.readStored.mockReturnValue('sk-stored') + expect(resolveApiKey()).toBe('sk-env') + expect(getApiKeySource()).toBe('env') + }) + + it('falls back to the stored key when the env var is unset', () => { + delete process.env.ANTHROPIC_API_KEY + h.readStored.mockReturnValue('sk-stored') + expect(resolveApiKey()).toBe('sk-stored') + expect(getApiKeySource()).toBe('config') + }) + + it('reports none when neither is configured', () => { + delete process.env.ANTHROPIC_API_KEY + h.readStored.mockReturnValue(undefined) + expect(resolveApiKey()).toBeUndefined() + expect(getApiKeySource()).toBe('none') + }) +}) + +describe('getAnthropic', () => { + it('throws MissingApiKeyError when neither env nor stored key is set', () => { + delete process.env.ANTHROPIC_API_KEY + h.readStored.mockReturnValue(undefined) + expect(() => getAnthropic()).toThrow(MissingApiKeyError) + }) + + it('builds a client from the stored key when env is unset', () => { + delete process.env.ANTHROPIC_API_KEY + h.readStored.mockReturnValue('sk-stored') + expect(getAnthropic().apiKey).toBe('sk-stored') + }) + + it('rebuilds the client when the resolved key changes (no restart needed)', () => { + delete process.env.ANTHROPIC_API_KEY + h.readStored.mockReturnValue('sk-a') + const a = getAnthropic() + h.readStored.mockReturnValue('sk-b') + const b = getAnthropic() + expect(b.apiKey).toBe('sk-b') + expect(b).not.toBe(a) + }) +}) + +describe('request defaults', () => { + it('targets Opus 4.8 with adaptive thinking and no sampling params', () => { + expect(MODEL).toBe('claude-opus-4-8') + expect(REQUEST_DEFAULTS.model).toBe('claude-opus-4-8') + expect(REQUEST_DEFAULTS.thinking).toEqual({ type: 'adaptive' }) + expect(REQUEST_DEFAULTS).not.toHaveProperty('temperature') + expect(REQUEST_DEFAULTS).not.toHaveProperty('top_p') + expect(REQUEST_DEFAULTS.output_config).toEqual({ effort: 'medium' }) + }) +}) diff --git a/server/utils/__tests__/budgetReport.test.ts b/server/utils/__tests__/budgetReport.test.ts new file mode 100644 index 0000000..73e7535 --- /dev/null +++ b/server/utils/__tests__/budgetReport.test.ts @@ -0,0 +1,57 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest' +import { getReadyToAssign } from '../budgetData' + +// --- Mock Nitro auto-imported globals that budgetReport.ts relies on --- +const mockHledgerExec = vi.fn() +const mockHledgerExecText = vi.fn() + +vi.stubGlobal('hledgerExec', mockHledgerExec) +vi.stubGlobal('hledgerExecText', mockHledgerExecText) +vi.stubGlobal('transformBalanceReport', (raw: any) => raw) +vi.stubGlobal('resolveBudgetBase', async () => 'assets:checking') +// Real RTA util — exercised with the base + cumulative report the caller passes. +vi.stubGlobal('getReadyToAssign', getReadyToAssign) + +const { getBudgetReport } = await import('../budgetReport') + +beforeEach(() => { + vi.clearAllMocks() +}) + +describe('getBudgetReport', () => { + it('builds Ready-to-Assign and per-envelope columns (no period)', async () => { + mockHledgerExecText.mockResolvedValue( + 'assets:checking\nassets:checking:budget:rent\nexpenses:rent\nexpenses:food:groceries\n', + ) + // Call order (no period): expense activity, cumulative budget, real accounts. + mockHledgerExec + .mockResolvedValueOnce({ rows: [ + { account: 'expenses:rent', amounts: [{ quantity: 1200, commodity: '$' }] }, + { account: 'expenses:food:groceries', amounts: [{ quantity: 50, commodity: '$' }] }, + ], totals: [] }) + .mockResolvedValueOnce({ rows: [ + { account: 'assets:checking:budget:rent', amounts: [{ quantity: 0, commodity: '$' }] }, + { account: 'assets:checking:budget:unallocated', amounts: [{ quantity: 300, commodity: '$' }] }, + ], totals: [] }) + .mockResolvedValueOnce({ rows: [], totals: [{ quantity: 1500, commodity: '$' }] }) + + const report = await getBudgetReport('') + + // RTA = net real balance (1500) − envelopes (sum budget − unallocated = 0) + expect(report.readyToAssign).toBe(1500) + expect(report.period).toBe('') + const rent = report.categoryGroups.flatMap(g => g.categories).find(c => c.accountPath === 'expenses:rent') + expect(rent).toBeDefined() + expect(rent!.activity).toBe(1200) + }) + + it('passes the period through to hledger when provided', async () => { + mockHledgerExecText.mockResolvedValue('assets:checking\nexpenses:rent\n') + mockHledgerExec.mockResolvedValue({ rows: [], totals: [{ quantity: 0, commodity: '$' }] }) + + await getBudgetReport('2025-03') + + // Expense activity call carries -p . + expect(mockHledgerExec).toHaveBeenCalledWith(expect.arrayContaining(['-p', '2025-03'])) + }) +}) diff --git a/server/utils/__tests__/transactionList.test.ts b/server/utils/__tests__/transactionList.test.ts new file mode 100644 index 0000000..b34bd07 --- /dev/null +++ b/server/utils/__tests__/transactionList.test.ts @@ -0,0 +1,81 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest' + +const mockHledgerExec = vi.fn() +vi.stubGlobal('hledgerExec', mockHledgerExec) +// transformTransactions is auto-imported; pass raw through (we feed shaped data). +vi.stubGlobal('transformTransactions', (raw: any) => raw) + +const { getTransactionList } = await import('../transactionList') + +beforeEach(() => { + vi.clearAllMocks() +}) + +const tx = (date: string, description: string, postings: { account: string; q: number }[]) => ({ + date, + status: '*', + description, + index: 0, + postings: postings.map(p => ({ account: p.account, amounts: [{ commodity: '$', quantity: p.q }] })), +}) + +describe('getTransactionList', () => { + it('surfaces category legs as compact rows, most-recent-first', async () => { + mockHledgerExec.mockResolvedValue([ + tx('2025-03-01', 'Landlord', [ + { account: 'assets:checking', q: -1200 }, + { account: 'expenses:rent', q: 1200 }, + ]), + tx('2025-03-05', 'Grocery Co', [ + { account: 'assets:checking', q: -50 }, + { account: 'expenses:food:groceries', q: 50 }, + ]), + ]) + + const list = await getTransactionList() + + expect(list).toEqual([ + { date: '2025-03-05', payee: 'Grocery Co', amount: 50, account: 'expenses:food:groceries' }, + { date: '2025-03-01', payee: 'Landlord', amount: 1200, account: 'expenses:rent' }, + ]) + }) + + it('falls back to non-budget legs when a transaction has no category leg', async () => { + mockHledgerExec.mockResolvedValue([ + tx('2025-03-02', 'Move to savings', [ + { account: 'assets:checking', q: -500 }, + { account: 'assets:savings', q: 500 }, + ]), + ]) + + const list = await getTransactionList() + expect(list.map(e => e.account).sort()).toEqual(['assets:checking', 'assets:savings']) + }) + + it('caps results to the limit', async () => { + const many = Array.from({ length: 10 }, (_, i) => + tx(`2025-03-${String(i + 1).padStart(2, '0')}`, `p${i}`, [{ account: 'expenses:misc', q: i + 1 }]), + ) + mockHledgerExec.mockResolvedValue(many) + + const list = await getTransactionList({ limit: 3 }) + expect(list).toHaveLength(3) + // most-recent-first + expect(list[0]!.payee).toBe('p9') + }) + + it('passes a validated date filter through to hledger', async () => { + mockHledgerExec.mockResolvedValue([]) + await getTransactionList({ startDate: '2025-03-01' }) + expect(mockHledgerExec).toHaveBeenCalledWith(expect.arrayContaining(['-b', '2025-03-01'])) + }) + + it('rejects a malformed account query (arg-injection guard)', async () => { + await expect(getTransactionList({ account: '--bad' })).rejects.toThrow('Invalid account query') + expect(mockHledgerExec).not.toHaveBeenCalled() + }) + + it('rejects a malformed start date', async () => { + await expect(getTransactionList({ startDate: 'not-a-date' })).rejects.toThrow('Invalid startDate') + }) +}) diff --git a/server/utils/aiConfig.ts b/server/utils/aiConfig.ts new file mode 100644 index 0000000..6a1f7bf --- /dev/null +++ b/server/utils/aiConfig.ts @@ -0,0 +1,59 @@ +import { readFileSync } from 'node:fs' +import { writeFile, mkdir } from 'node:fs/promises' + +/** + * Persisted AI configuration (Issue #8): the Anthropic API key entered via the + * Settings page. Stored in a small gitignored JSON file rather than process.env + * so the user can configure the chat in-app, and the choice survives restart. + * Mirrors the active-journal pattern (`activeJournal.ts`): this module owns + * reading the file (sync, guarded, never throws — called from the synchronous + * `getAnthropic`); the config endpoints own writing it (async). + * + * The stored key is a secret: it is NEVER logged and NEVER returned in full from + * an API response (only a masked form via {@link maskApiKey}). `config/` is + * gitignored, so the file is not committed. + */ + +const AI_CONFIG_PATH = 'config/ai-config.json' +const CONFIG_DIR = 'config' + +interface AiConfig { + apiKey?: string +} + +/** + * Read the stored Anthropic API key, or undefined if absent/unreadable/empty. + * Never throws. + */ +export function readStoredApiKey(): string | undefined { + try { + const raw = readFileSync(AI_CONFIG_PATH, 'utf-8') + const key = (JSON.parse(raw) as AiConfig).apiKey + if (typeof key === 'string' && key.trim()) return key.trim() + } catch { + // no/invalid config — caller falls back to env / none + } + return undefined +} + +/** Persist the API key. Async — called from a request handler, not a hot path. */ +export async function writeStoredApiKey(key: string): Promise { + await mkdir(CONFIG_DIR, { recursive: true }) + await writeFile(AI_CONFIG_PATH, JSON.stringify({ apiKey: key } satisfies AiConfig, null, 2), 'utf-8') +} + +/** Remove the stored API key (writes an empty config; leaves any env var intact). */ +export async function clearStoredApiKey(): Promise { + await mkdir(CONFIG_DIR, { recursive: true }) + await writeFile(AI_CONFIG_PATH, JSON.stringify({} satisfies AiConfig, null, 2), 'utf-8') +} + +/** + * Mask a key for display: keep only the last 4 characters. Returns a fixed mask + * for short/empty input so the full key is never exposed. + */ +export function maskApiKey(key: string): string { + const trimmed = key.trim() + if (trimmed.length <= 8) return '••••••••' + return `••••••••${trimmed.slice(-4)}` +} diff --git a/server/utils/aiTools.ts b/server/utils/aiTools.ts new file mode 100644 index 0000000..15c2397 --- /dev/null +++ b/server/utils/aiTools.ts @@ -0,0 +1,182 @@ +import type Anthropic from '@anthropic-ai/sdk' +import type { ProposedAction } from '../../types/ai' +import { getBudgetReport } from './budgetReport' +import { getTransactionList } from './transactionList' + +/** + * Tool surface for the AI budgeting chat (Issue #8). + * + * Two kinds: + * - **Read tools** (`get_budget`, `get_transactions`): executed server-side by + * the chat route, delegating to existing `server/utils` (no accounting math + * here). Their results are fed back to the model. + * - **Proposed-action tools** (`assign_to_envelope`, `transfer_between_envelopes`): + * NEVER executed here. The route surfaces them for human approval; only the + * existing assign/transfer endpoints (called after approval) write the journal. + * + * `TOOLS` is the stable, deterministically-ordered, cache-controlled prefix the + * route passes to `messages.create` — keep ordering and content stable so the + * prompt cache stays warm. + */ + +export const READ_TOOL_NAMES = ['get_budget', 'get_transactions'] as const +export const PROPOSED_ACTION_TOOL_NAMES = ['assign_to_envelope', 'transfer_between_envelopes'] as const + +export function isProposedActionTool(name: string): boolean { + return (PROPOSED_ACTION_TOOL_NAMES as readonly string[]).includes(name) +} + +/** Minimal shape of a model tool call we care about (subset of Anthropic.ToolUseBlock). */ +interface ToolCall { + id: string + name: string + input: unknown +} + +// --- Read tool handlers (delegation only) --- + +export const READ_TOOL_HANDLERS: Record Promise> = { + get_budget: async (input) => { + const period = typeof input?.period === 'string' ? input.period.trim() : '' + return await getBudgetReport(period) + }, + get_transactions: async (input) => { + return await getTransactionList({ + startDate: typeof input?.startDate === 'string' ? input.startDate : undefined, + endDate: typeof input?.endDate === 'string' ? input.endDate : undefined, + account: typeof input?.account === 'string' ? input.account : undefined, + limit: typeof input?.limit === 'number' ? input.limit : undefined, + }) + }, +} + +// --- Proposed-action mapping (does NOT execute anything) --- + +/** Strip a stray leading "expenses:" so an envelope key matches the budget sub-account name. */ +function toEnvelopeKey(raw: string): string { + return raw.trim().replace(/^expenses:/, '') +} + +function friendly(key: string): string { + return key.split(':').map(s => s.charAt(0).toUpperCase() + s.slice(1)).join(' / ') +} + +function fmt(amount: number): string { + return `$${amount.toFixed(2)}` +} + +/** + * Build a {@link ProposedAction} from a proposed-action tool call. Resolves the + * budget host (the asset account that owns the `:budget:` tree) server-side so + * the model only has to supply envelope keys + amounts, and builds the exact + * payload the existing assign/transfer endpoints expect. + * + * This creates a *proposal* only — it never writes to the journal. + */ +export async function toProposedAction(call: ToolCall): Promise { + const input = (call.input ?? {}) as Record + const budgetBase = await resolveBudgetBase() + + if (call.name === 'assign_to_envelope') { + const rawEnvelopes = (input.envelopes ?? {}) as Record + const envelopes: Record = {} + for (const [k, v] of Object.entries(rawEnvelopes)) { + if (typeof v === 'number') envelopes[toEnvelopeKey(k)] = v + } + const physicalAccount = typeof input.physicalAccount === 'string' && input.physicalAccount.trim() + ? input.physicalAccount.trim() + : budgetBase + const summary = 'Assign ' + Object.entries(envelopes) + .map(([k, v]) => `${fmt(v)} to ${friendly(k)}`) + .join(', ') + return { + id: call.id, + kind: 'assign', + summary, + // date filled in by the client at commit time (uses the user's local date). + payload: { date: '', physicalAccount, envelopes }, + } + } + + // transfer_between_envelopes + const srcKey = toEnvelopeKey(String(input.sourceEnvelope ?? '')) + const dstKey = toEnvelopeKey(String(input.destinationEnvelope ?? '')) + const amount = typeof input.amount === 'number' ? input.amount : 0 + const summary = `Move ${fmt(amount)} from ${friendly(srcKey)} to ${friendly(dstKey)}` + return { + id: call.id, + kind: 'transfer', + summary, + payload: { + date: '', + sourceEnvelope: `${budgetBase}:budget:${srcKey}`, + destinationEnvelope: `${budgetBase}:budget:${dstKey}`, + amount, + }, + } +} + +// --- Tool definitions (stable, cache-controlled prefix) --- + +export const TOOLS: Anthropic.Tool[] = [ + { + name: 'get_budget', + description: + 'Read the current envelope budget: Ready-to-Assign plus each envelope\'s Assigned, Activity, and Available. Call this before stating any figure or proposing an assignment — never rely on numbers from earlier in the conversation.', + input_schema: { + type: 'object', + properties: { + period: { + type: 'string', + description: 'Optional hledger period (e.g. "2025-03", "this month"). Omit for all-time balances.', + }, + }, + }, + }, + { + name: 'get_transactions', + description: + 'List recent transactions (date, payee, amount, account) to answer questions about spending history.', + input_schema: { + type: 'object', + properties: { + startDate: { type: 'string', description: 'Optional start date, YYYY-MM-DD.' }, + endDate: { type: 'string', description: 'Optional end date, YYYY-MM-DD.' }, + account: { type: 'string', description: 'Optional account or envelope filter.' }, + limit: { type: 'integer', description: 'Max rows (default 50, most recent first).' }, + }, + }, + }, + { + name: 'assign_to_envelope', + description: + 'PROPOSE assigning Ready-to-Assign money into one or more envelopes. This creates a proposal the user must approve — it does NOT move money. Provide the envelope identifiers from get_budget (the part after "expenses:") and positive dollar amounts.', + input_schema: { + type: 'object', + properties: { + envelopes: { + type: 'object', + description: 'Map of envelope identifier (e.g. "rent", "food:groceries") to a positive dollar amount.', + additionalProperties: { type: 'number' }, + }, + }, + required: ['envelopes'], + }, + }, + { + name: 'transfer_between_envelopes', + description: + 'PROPOSE moving money from one envelope to another. This creates a proposal the user must approve — it does NOT move money. Use envelope identifiers from get_budget.', + input_schema: { + type: 'object', + properties: { + sourceEnvelope: { type: 'string', description: 'Envelope to move money from (e.g. "dining").' }, + destinationEnvelope: { type: 'string', description: 'Envelope to move money into (e.g. "food:groceries").' }, + amount: { type: 'number', description: 'Positive dollar amount to move.' }, + }, + required: ['sourceEnvelope', 'destinationEnvelope', 'amount'], + }, + // Cache breakpoint on the last (stable) tool definition: caches tools + system. + cache_control: { type: 'ephemeral' }, + }, +] diff --git a/server/utils/anthropic.ts b/server/utils/anthropic.ts new file mode 100644 index 0000000..867d340 --- /dev/null +++ b/server/utils/anthropic.ts @@ -0,0 +1,74 @@ +import Anthropic from '@anthropic-ai/sdk' +import { readStoredApiKey } from './aiConfig' + +/** + * Shared Anthropic client + request defaults for the AI features (budgeting chat + * #8, and CSV import #9 which reuses this module). + * + * The API key is resolved with the env var taking precedence over a key + * configured in-app via the Settings page (persisted by `aiConfig.ts`): + * + * process.env.ANTHROPIC_API_KEY → config/ai-config.json → none + * + * Env-first lets a Docker/CI deployment pin the key, while a local user can + * configure it in the UI without touching the environment or restarting. The + * key never leaves the server: only this module and the routes that import it + * touch it, and it is never logged. + */ + +/** Where the resolved key came from — surfaced to the Settings UI. */ +export type ApiKeySource = 'env' | 'config' | 'none' + +/** Resolve the active API key (env override → stored), or undefined if none. */ +export function resolveApiKey(): string | undefined { + return process.env.ANTHROPIC_API_KEY?.trim() || readStoredApiKey() +} + +/** Where the active key comes from (for the Settings UI status line). */ +export function getApiKeySource(): ApiKeySource { + if (process.env.ANTHROPIC_API_KEY?.trim()) return 'env' + if (readStoredApiKey()) return 'config' + return 'none' +} + +/** Thrown when no API key is configured. Routes map this to a 503 + empty state. */ +export class MissingApiKeyError extends Error { + constructor() { + super('ANTHROPIC_API_KEY is not set') + this.name = 'MissingApiKeyError' + } +} + +/** The model for all AI features. Opus 4.8 — adaptive thinking, no sampling params. */ +export const MODEL = 'claude-opus-4-8' + +/** + * Shared request defaults. Spread into `messages.create`. + * - adaptive thinking: Opus 4.8 decides depth per turn (no `budget_tokens`). + * - effort `medium`: chat favors latency; bump to `high` if reasoning needs it. + * - `max_tokens` 4096: budget-chat replies are short; non-streaming stays well + * under the SDK HTTP-timeout threshold. + */ +export const REQUEST_DEFAULTS = { + model: MODEL, + max_tokens: 4096, + thinking: { type: 'adaptive' as const }, + output_config: { effort: 'medium' as const }, +} + +let client: Anthropic | null = null + +/** + * Return the shared Anthropic client, constructing it on first use. + * @throws {MissingApiKeyError} when no key is configured (neither env nor stored). + */ +export function getAnthropic(): Anthropic { + const apiKey = resolveApiKey() + if (!apiKey) throw new MissingApiKeyError() + // Cache, but rebuild if the key changed — so saving a new key in the Settings + // UI takes effect on the next request without a restart. + if (!client || client.apiKey !== apiKey) { + client = new Anthropic({ apiKey }) + } + return client +} diff --git a/server/utils/budgetReport.ts b/server/utils/budgetReport.ts new file mode 100644 index 0000000..36a2715 --- /dev/null +++ b/server/utils/budgetReport.ts @@ -0,0 +1,190 @@ +import type { BudgetCategory, BudgetCategoryGroup, BudgetEnvelopeReport } from '../../types/ui' +import { stripAccountPrefix } from '../../utils/stripAccountPrefix' +import { singleQuantity, MultiCommodityError } from '../../utils/singleQuantity' +import { readFile } from 'node:fs/promises' +import { pathExists } from './fsExists' + +async function loadHiddenEnvelopes(): Promise> { + const path = 'config/hidden-envelopes.json' + if (!(await pathExists(path))) return new Set() + try { + const list = JSON.parse(await readFile(path, 'utf-8')) as string[] + return new Set(list) + } catch { + return new Set() + } +} + +/** + * Maps an expense account path to its corresponding budget sub-account name. + * e.g. "expenses:food:groceries" → "food:groceries" + */ +function expenseToBudgetKey(expenseAccount: string): string { + // Strip the "expenses:" prefix to get the category path + return expenseAccount.replace(/^expenses:/, '') +} + +/** + * Build the envelope budget report: Ready to Assign + per-category + * Assigned/Activity/Available, grouped. The single source of this computation, + * shared by `GET /api/budget` and the AI `get_budget` tool (Issue #8) so the + * report and the assistant can never disagree. + * + * @param period A validated hledger period expression, or '' for all-time. + * Callers (the route, the tool) validate/normalize before calling — this + * function assumes `period` is safe to pass to hledger. + */ +export async function getBudgetReport(period: string): Promise { + const pd = period + + // 1. Fetch ALL expense accounts, filter out hidden ones + const allAccountsRaw = await hledgerExecText(['accounts']) + const allAccounts = allAccountsRaw.trim().split(/\r?\n/).filter(Boolean).map(s => s.trim()) + const hiddenSet = await loadHiddenEnvelopes() + const expenseAccounts = allAccounts.filter(a => a.startsWith('expenses:') && !hiddenSet.has(a)) + + // Derive the budget base from the account list (Issue #4 item 3) — no extra + // hledger call. All budget sub-account queries/keys hang off this prefix + // instead of a hardcoded `assets:checking:budget:`. + const budgetBase = await resolveBudgetBase(allAccounts) + const budgetPrefix = `${budgetBase}:budget:` + const unallocatedAccount = `${budgetPrefix}unallocated` + const pendingPrefix = `${budgetPrefix}pending:` + + // 2. Fetch period-filtered expense activity (Activity column) + const expenseArgs = ['bal', 'expenses:'] + if (pd) expenseArgs.push('-p', pd) + const expenseRaw = await hledgerExec(expenseArgs) + const expenseReport = transformBalanceReport(expenseRaw) + + const activityMap = new Map() + for (const row of expenseReport.rows) { + activityMap.set(row.account, singleQuantity(row.amounts, `expense activity for ${row.account}`)) + } + + // 3. Fetch budget sub-account data and real account totals + // a) Cumulative budget balances (no period) → Available column + // b) Period-scoped budget delta (with period) → derive this month's Assigned + // c) Real account totals → compute Ready to Assign via YNAB Rule 1 + const budgetBalanceMap = new Map() // cumulative Available + const budgetPeriodDeltaMap = new Map() // period net change + let readyToAssign = 0 + + try { + // a) Cumulative balances — Available is the all-time running balance + const cumulativeArgs = ['bal', budgetPrefix] + const cumulativeRaw = await hledgerExec(cumulativeArgs) + const cumulativeReport = transformBalanceReport(cumulativeRaw) + + for (const row of cumulativeReport.rows) { + const account = row.account as string + if (account.startsWith(budgetPrefix) + && account !== unallocatedAccount + && !account.startsWith(pendingPrefix)) { + const categoryKey = account.slice(budgetPrefix.length) + budgetBalanceMap.set(categoryKey, singleQuantity(row.amounts, `budget balance for ${account}`)) + } + } + + // Ready to Assign (YNAB Rule 1) = net worth − money in envelopes. The single + // source of truth lives in server/utils/budgetData.ts and is shared with the + // assign availability gate, so the report and the gate can never disagree. + // Pass the data we already fetched so this adds only the real-balance read. + readyToAssign = await getReadyToAssign({ budgetBase, cumulativeReport }) + + // b) Period-scoped delta — net change in budget sub-accounts this period + if (pd) { + const periodArgs = ['bal', budgetPrefix, '-p', pd] + const periodRaw = await hledgerExec(periodArgs) + const periodReport = transformBalanceReport(periodRaw) + + for (const row of periodReport.rows) { + const account = row.account as string + const delta = singleQuantity(row.amounts, `budget period delta for ${account}`) + if (account.startsWith(budgetPrefix) + && account !== unallocatedAccount + && !account.startsWith(pendingPrefix)) { + const categoryKey = account.slice(budgetPrefix.length) + budgetPeriodDeltaMap.set(categoryKey, delta) + } + } + } + } catch (err) { + // A multi-commodity account is a real error — surface it, don't mask it as $0. + if (err instanceof MultiCommodityError) throw err + // No budget sub-accounts yet — show $0 for everything (backward compatibility) + } + + // 4. Build categories from expense accounts, overlaying budget data + const groupMap = new Map() + + for (const accountPath of expenseAccounts) { + const isParent = expenseAccounts.some(a => a !== accountPath && a.startsWith(accountPath + ':')) + if (isParent) continue + + const activity = activityMap.get(accountPath) ?? 0 + const budgetKey = expenseToBudgetKey(accountPath) + + // Available = cumulative running balance (includes rollover from all prior periods) + const available = budgetBalanceMap.get(budgetKey) ?? 0 + + // Assigned = assignment amount, reverse-derived from the budget sub-account. + // Identity: budgetDelta = assigned − spent, and spent = activity (signed: + // an outflow is negative, a refund positive). So assigned = delta + activity + // with SIGNED activity. Using |activity| would invent a phantom assignment + // for refunds (a $20 refund would read as +$40 assigned). + let assigned: number + if (pd) { + const periodDelta = budgetPeriodDeltaMap.get(budgetKey) ?? 0 + assigned = periodDelta + activity + } else { + // No period filter: all-time assigned = cumulative available + all-time activity. + assigned = available + activity + } + + const category: BudgetCategory = { + name: stripAccountPrefix(accountPath), + accountPath, + assigned, + activity, + available, + } + + const segments = accountPath.split(':') + const groupKey = segments[1] ?? '' + + if (!groupMap.has(groupKey)) { + groupMap.set(groupKey, []) + } + groupMap.get(groupKey)!.push(category) + } + + // 5. Build category groups with totals + const categoryGroups: BudgetCategoryGroup[] = [] + for (const [key, categories] of groupMap) { + const groupAssigned = categories.reduce((s, c) => s + c.assigned, 0) + const groupActivity = categories.reduce((s, c) => s + c.activity, 0) + const groupAvailable = categories.reduce((s, c) => s + c.available, 0) + + categoryGroups.push({ + name: key.charAt(0).toUpperCase() + key.slice(1), + categories, + assigned: groupAssigned, + activity: groupActivity, + available: groupAvailable, + }) + } + + const totalAssigned = categoryGroups.reduce((s, g) => s + g.assigned, 0) + const totalActivity = categoryGroups.reduce((s, g) => s + g.activity, 0) + const totalAvailable = categoryGroups.reduce((s, g) => s + g.available, 0) + + return { + period: pd, + readyToAssign, + categoryGroups, + totalAssigned, + totalActivity, + totalAvailable, + } satisfies BudgetEnvelopeReport +} diff --git a/server/utils/transactionList.ts b/server/utils/transactionList.ts new file mode 100644 index 0000000..9cbc645 --- /dev/null +++ b/server/utils/transactionList.ts @@ -0,0 +1,79 @@ +import { isValidDate, isValidAccount } from './hledgerArgs' +import type { HledgerTransaction, HledgerPosting } from '../../types/hledger' + +export interface TransactionListEntry { + date: string + payee: string + /** Net signed amount of the posting (outflow negative, inflow positive). */ + amount: number + account: string +} + +export interface TransactionListQuery { + startDate?: string + endDate?: string + account?: string + /** Max entries returned, most-recent-first. Defaults to 50 to bound tokens. */ + limit?: number +} + +/** Sum a posting's amounts to a single number (single-commodity `$` is the norm). */ +function postingAmount(amounts: { quantity: number }[]): number { + return amounts.reduce((s, a) => s + a.quantity, 0) +} + +/** + * Compact transaction list for the AI `get_transactions` tool (Issue #8). + * + * Shapes hledger `print` output into `{date, payee, amount, account}` rows the + * model can reason over. To keep it relevant to budgeting, we surface the + * category legs (`expenses:` / `income:`) of each transaction; a transaction + * with no category leg (e.g. an account-to-account transfer) falls back to its + * non-budget legs so it isn't silently dropped. Most-recent-first, capped. + * + * Read-only and delegation-only — no accounting math here; it reuses + * `hledgerExec` + `transformTransactions` and inherits their CRLF/cents handling. + * Query params are validated (Issue #2) and the account is passed after `--` so + * it can never be read as an hledger flag. + * + * @throws if a date or account query is malformed. + */ +export async function getTransactionList(query: TransactionListQuery = {}): Promise { + const sd = query.startDate?.trim() || '' + const ed = query.endDate?.trim() || '' + const acct = query.account?.trim() || '' + const limit = query.limit && query.limit > 0 ? query.limit : 50 + + if (sd && !isValidDate(sd)) throw new Error('Invalid startDate; expected YYYY-MM-DD') + if (ed && !isValidDate(ed)) throw new Error('Invalid endDate; expected YYYY-MM-DD') + if (acct && !isValidAccount(acct)) throw new Error('Invalid account query') + + const args = ['print'] + if (sd) args.push('-b', sd) + if (ed) args.push('-e', ed) + if (acct) args.push('--', acct) + + const raw = await hledgerExec(args) + const transactions: HledgerTransaction[] = transformTransactions(raw as any[]) + + const entries: TransactionListEntry[] = [] + for (const tx of transactions) { + const categoryLegs = tx.postings.filter( + (p: HledgerPosting) => p.account.startsWith('expenses:') || p.account.startsWith('income:'), + ) + const legs = categoryLegs.length > 0 + ? categoryLegs + : tx.postings.filter((p: HledgerPosting) => !p.account.includes(':budget:')) + for (const p of legs) { + entries.push({ + date: tx.date, + payee: tx.description, + amount: postingAmount(p.amounts), + account: p.account, + }) + } + } + + // print is chronological; most-recent-first, then cap to bound token usage. + return entries.reverse().slice(0, limit) +} diff --git a/types/ai.ts b/types/ai.ts new file mode 100644 index 0000000..b91fee9 --- /dev/null +++ b/types/ai.ts @@ -0,0 +1,71 @@ +// Wire + UI types for the AI budgeting chat (Issue #8). +// +// The Anthropic conversation history (`messages`) and the read-tool results +// crossing the wire are typed as `unknown[]` here so the client stays decoupled +// from the Anthropic SDK; `server/api/ai/chat.post.ts` casts them to the SDK's +// `MessageParam[]` / `ToolResultBlockParam[]` at its validated boundary. The +// client never renders the history raw — it reads `reply` + `proposedActions`. + +export interface AssignProposalPayload { + date: string + physicalAccount: string + envelopes: Record +} + +export interface TransferProposalPayload { + date: string + sourceEnvelope: string + destinationEnvelope: string + amount: number +} + +/** A money-moving action the assistant proposes; committed only on user approval. */ +export type ProposedAction = + | { id: string; kind: 'assign'; summary: string; payload: AssignProposalPayload } + | { id: string; kind: 'transfer'; summary: string; payload: TransferProposalPayload } + +/** The user's decision on a proposed action, sent back to resume the tool loop. */ +export interface ChatResolution { + /** The `tool_use` id of the proposed action being resolved. */ + toolUseId: string + status: 'approved' | 'rejected' + /** Text fed back to the model, e.g. "Committed: …" or "User rejected this action". */ + resultText: string +} + +export interface AiChatRequest { + /** Opaque Anthropic MessageParam[] history; empty on the first turn. */ + messages: unknown[] + /** Present on a resume turn: the user's verdicts on pending proposed actions. */ + resolutions?: ChatResolution[] + /** + * Read-tool results the server computed in the same turn as the pending + * proposal (from {@link AiChatResponse.readToolResults}); echoed back on resume + * so every `tool_use` in that turn resolves together. Anthropic + * ToolResultBlockParam[]. + */ + readToolResults?: unknown[] + /** The new user message text. Omitted on a pure resume (approve/reject only). */ + message?: string +} + +export interface AiChatResponse { + /** Updated opaque history; echo back verbatim on the next request. */ + messages: unknown[] + /** Assistant's visible reply text for this turn. */ + reply: string + /** Non-empty when the turn is awaiting approval of one or more actions. */ + proposedActions: ProposedAction[] + /** + * Anthropic ToolResultBlockParam[] already computed for read tools in the same + * turn as a pending proposal; echoed back on resume so every `tool_use` block + * resolves together and the protocol stays valid. + */ + readToolResults: unknown[] +} + +/** Local-only transcript entry the chat panel renders (never crosses the wire). */ +export interface ChatDisplayMessage { + role: 'user' | 'assistant' + text: string +}