diff --git a/.fctry/changelog.md b/.fctry/changelog.md index 092f5a1..d810211 100644 --- a/.fctry/changelog.md +++ b/.fctry/changelog.md @@ -1,5 +1,134 @@ # Setlist — Changelog +## 2026-06-07T12:30:00Z — /fctry:ref app-it — Thirteen confirmed patterns from Christian-Katzmann/app-it adopted (0.33 → 0.34) +- Summary: Incorporated thirteen confirmed patterns from `Christian-Katzmann/app-it` (https://github.com/Christian-Katzmann/app-it) into the spec. Hot zone was the freshly-specced §2.17 Bundled Skills (zero code on disk) — app-it landed as a second data point that refined the layout into parallel `.claude-plugin/` + `.codex-plugin/` manifests over a single `skills/` tree, with a `validate-skills-manifests` step folded into a new one-command `npm run validate` CI gate. Bootstrap (§2.13) gained named recipe strategy IDs, `.fctry/bootstrap-config.json` projections of registry state with `__APP_NAME__` / `__PORT__` / `__EMAIL_ACCOUNT__` placeholder substitution gated by a `no-unresolved-placeholders` pre-flight check, an `is_builtin: true` + `Duplicate to customize` fork-to-customize affordance on the five seeded built-ins, and an optional per-primitive `self_test` hook that routes pre-flight failures as `env_unavailable: ` vs `primitive_failure: `. Workspace inspection (§2.16) gained an additive `decision_report` view alongside the existing structural `workspace-inspection.v1` payload, plus a named "trust disk over docs" invariant surfaced as a `doc-disk-disagreement` gap. Health (§2.15) gained a per-dimension `needs_human_verification` bucket that does not silently degrade the composite tier (scoped explicitly to health, not bootstrap pre-flight). A new `setlist-doctor` MCP tool (59th tool) ships diagnose-only by default with `--json` for agents and a narrow `--fix-safe` mode confined to five named recoverables (broken managed MCP entries, stale port/pid files, ABI cache mismatches, malformed CLAUDE.md sentinel markers, orphaned bootstrap side-effect ledgers) — forbidden from editing project code, killing processes, or touching anything outside setlist artifacts. Project digests (§2.12 entity entry) gained an optional `decision_history` facet capped at 20 entries per project, preserved across `refresh_project_digest`. The `no runtime npm dependencies in shell-command primitives` invariant was lifted into §4.3 as a named constraint. The `.fctry/changelog.md` Dismissals convention (already in use) was formalized in §4.5 as the canonical anti-re-proposal record (one bullet per dismissed item, no per-decision ADR files). A cross-section note in §2.13 acknowledges that bootstrap's `.fctry/bootstrap-config.json` write, §2.14.3 MCP-clients install/remove, and §2.17 bundled-skills + CLAUDE.md injection share architectural concerns (foreign-file writes, backup/marker/recovery conventions) — flagged as a candidate for future consolidation, not refactored in 0.34. One pattern dismissed (Apple Event Cmd-Q test semantics — test-mechanism choice, not an experience contract). 16 new scenarios assigned S212–S227. Schema v18 adds `bootstrap_primitives.is_builtin` (renamed from `built_in_flag`), `bootstrap_primitives.self_test_json`, `bootstrap_primitives.forked_from`, and `project_digests.decision_history_json` in one atomic migration. Tool count 58 → 59. +- Frontmatter: [modified] spec-version 0.33 → 0.34, date 2026-06-07. Status stays `active`. Synopsis `short` extended with named-strategy IDs, per-project bootstrap-config projection, shared external-file injection lineage, agent-decision summary view on workspace inspection, dual-manifest bundled skills (Claude Code + Codex from one tree), three-bucket health assessment with needs_human_verification, schema v18, tool count 59 (`setlist-doctor`). Patterns gain 19 new entries; goals gain 12 new entries. +- Lead paragraph: [modified] Schema v18 promoted to current with the four optional column additions detailed; tool count 58 → 59 (new `setlist-doctor` tool with diagnose-only default + narrow `--fix-safe`); dual-manifest bundled skills with `validate-skills-manifests` step noted. Scenario count 211 → 227. +- `#what-this-is` (1.2): [modified] @setlist/mcp description: 58 → 59 tools, added `setlist-doctor` mention. Database parenthetical: schema v17 / spec 0.33 → schema v18 / spec 0.34. +- `#success` (1.4): [modified] 211-scenario → 227-scenario. [added] Two new success criteria: `npm run validate` one-command CI gate, `setlist-doctor --json` / `--fix-safe` for agent-actionable diagnosis. +- `#project-bootstrap` (2.13): [added] Five new subsections — "Named recipe strategies" (stable strategy IDs propagated to bootstrap response, `bootstrap-config.json`, project digest `decision_history`, and `assess_health` provenance — linear-only in v1, branching deferred as a §4.3 boundary question); "`is_builtin: true` on built-in primitives, fork-to-customize" (read-only built-ins with `Duplicate to customize` affordance creating `forked_from` user copies); "Primitive self-test hooks (env-vs-bug routing)" (optional `self_test` hook routes pre-flight failures with the third `?` marker as `env_unavailable: ` vs `primitive_failure: `); "`.fctry/bootstrap-config.json` projection (NOT source of truth)" (engine writes a projection of registry state to the new project's folder with `__APP_NAME__` / `__PORT__` / `__EMAIL_ACCOUNT__` placeholders resolved at materialization, gated by `no-unresolved-placeholders` pre-flight); "Cross-section note on external-file injection lineage" (acknowledges the three foreign-file write surfaces share concerns, flags as future-consolidation candidate without refactoring 0.34). +- `#mcp-clients` (2.14.3): [added] "`setlist-doctor` — diagnose-only by default, narrow `--fix-safe`" subsection — five-check report shape (`mcp-managed-entry-broken`, `worker-pid-stale`, `abi-cache-mismatch`, `claudemd-sentinel-malformed`, `bootstrap-ledger-orphan`), `--json` envelope for agents, `--fix-safe` scope ceiling (forbidden from editing project code, killing processes, touching anything outside setlist artifacts), append-only `~/.local/share/setlist/doctor.log` audit trail. +- `#health-assessment` (2.15): [added] "Per-dimension `needs_human_verification` bucket (three-bucket verification)" — each dimension may be `assessed` / `unknown` / `needs_human_verification`; the third bucket honestly degrades to "needs review" rather than silently averaging into a lower color; explicitly scoped health-only, not bootstrap pre-flight. +- `#workspace-inspection` (2.16): [added] Two new subsections — "Agent-decision report (decision-shaped summary)" (additive `decision_report` object on `workspace-inspection.v1` reports: `project_type_hint`, `port_literals` with provenance, `multi_app_structure`, `area_hint`, `last_touched_at`, opt-in `missing_mailbox_state`, one-line `summary_paragraph` — computed in the same pass, no second scan, backward-compatible additive); "Trust disk over docs (named invariant)" (manifest authoritative; doc-disk disagreement surfaces as a `{kind: "doc-disk-disagreement"}` gap entry). +- `#bundled-skills` (2.17): [added] "Dual-manifest layout: `.claude-plugin/` + `.codex-plugin/` over one `skills/` tree" — parallel host manifests over a single skill source tree, with a `validate-skills-manifests` step in `npm run validate` that fails on version drift, listing drift, or orphan skill content. [modified] Lead paragraph reframed from "Two bundled Claude Code skills" to "Two bundled skills with parallel `.claude-plugin/` and `.codex-plugin/` manifests." +- `#capabilities` (3.1): [modified] @setlist/mcp tool count 57 → 59; [added] `setlist-doctor` row in the tool list. +- `#entities` (3.2): [modified] Bootstrap primitive entry — added `is_builtin` (renamed from `built_in_flag`), optional `forked_from` provenance pointer, optional `self_test` declaration; Project type recipe entry — added stable `strategy_id`; Project digests entry — added optional `decision_history` facet with shape, cap (20 entries), append-on-event semantics, and preservation across `refresh_project_digest`. +- `#rules` (3.3): [modified] `built_in_flag` → `is_builtin` rule line updated; [added] Three new rule lines: per-primitive `self_test` declaration semantics + side-effect prohibition, recipe `strategy_id` propagation, `decision_history` preservation across refresh and bounded-cap behavior. +- `#hard-constraints` (4.3): [modified] Schema v16 → v18; tool count 57 → 59. [added] "No runtime npm dependencies in `shell-command` primitives" invariant, "Trust disk over docs (inspection invariant)" — both named explicitly as forward-looking constraints. +- `#anti-patterns` (4.4): [modified] Tool count reference 56 → 59. +- `#testing-discipline` (4.5): [modified] Scenario count 211 → 227, with S212–S227 range named per pattern group. [added] "Rejected directions are logged, not silenced" paragraph formalizing the existing `Dismissals` section convention as the canonical anti-re-proposal record. [added] "`npm run validate` is the one-command CI gate" paragraph documenting the fan-out (`typecheck`, `test`, `build`, `verify:mcp-abi`, `validate-skills-manifests`). [added] `validate-skills-manifests` row in the layer-protection table. +- `#schema` (5.2): [modified] Lead paragraph: v17 → v18, with v18 description (the four optional column additions on `bootstrap_primitives` and `project_digests`). [added] "Schema v18 (current)" subsection with full DDL deltas and atomic v17 → v18 migration plan (table-rebuild on `bootstrap_primitives` to rename `built_in_flag` → `is_builtin` and add the two nullable columns; one `ALTER TABLE` on `project_digests` to add `decision_history_json`; FK preservation via legacy_alter_table; idempotent re-run; refusing-to-downgrade preserved). Demoted "Schema v17 (current)" to "Schema v17." Tables list entries for `bootstrap_primitives` and `project_digests` updated to reflect the new column shapes. +- `#inspirations` (6.1): [added] Major new entry for `Christian-Katzmann/app-it` documenting all thirteen adopted patterns with section-target mapping and per-pattern source spans, the one dismissed pattern (Apple Event Cmd-Q test semantics) with rationale, and the four setlist-specific adaptations from app-it's shape (strategy IDs select linear recipes only — branching deferred; `--fix-safe` scope ceiling tighter; three-bucket verification scoped health-only; dismissed-log consolidated into the changelog instead of per-decision ADR files). +- `#appendix-d`: [modified] Tool count header 57 → 59. [added] "Diagnostics" category with the `setlist-doctor` row. +- config.json: [modified] versions.spec.current 0.33 → 0.34. +- Scenarios: [added] S212 (agent-decision report on workspace inspection — additive `decision_report` object alongside structural payload), S213 (`setlist-doctor` diagnose-only-by-default with `--json` envelope across the five check categories), S214 (per-dimension `needs_human_verification` bucket on health, scoped health-only), S215 (named recipe strategy IDs propagate to bootstrap response + `bootstrap-config.json` + digest history + health provenance, with rename-preserves-history semantics), S216 (`is_builtin` marker + `Duplicate to customize` fork creates `forked_from` user copy without mutating the built-in, write-rejection on built-ins, swap-per-recipe-step semantics), S217 (primitive `self_test` discriminates env-vs-bug at pre-flight with the new `?` marker for `env_unavailable`, side-effect prohibition by convention), S218 (`.fctry/bootstrap-config.json` written as projection of registry state not source of truth — `__APP_NAME__` / `__PORT__` / `__EMAIL_ACCOUNT__` placeholders resolved, `_generated_from` provenance, regeneratable from registry), S219 (`no-unresolved-placeholders` pre-flight gate aborts bootstrap before side effects when a `__SOMETHING__` typo survives substitution), S220 (parallel `.claude-plugin/` + `.codex-plugin/` manifests over one `skills/` tree), S221 (`validate-skills-manifests` fails the build on version drift, listing drift, or orphan skill content), S222 (`npm run validate` one-command CI gate fans out to typecheck + test + build + verify:mcp-abi + validate-skills-manifests with one exit code), S223 (workspace inspection surfaces `doc-disk-disagreement` gap; manifest-derived hints authoritative over CLAUDE.md prose), S224 (`decision_history` facet on project digests accumulates bounded — 20-entry cap, oldest dropped — and is preserved across `refresh_project_digest`), S225 (cross-section acknowledgment of external-file-injection lineage in §2.13, §2.14.3, §2.17 without refactoring 0.34), S226 (dismissed directions recorded in `.fctry/changelog.md` as a `Dismissals` section, one bullet per item, no per-decision ADR files), S227 (`setlist-doctor --fix-safe` is bounded to the five named recoverables and forbidden from editing project code, killing processes, or touching anything outside setlist artifacts — paired with S213 as the diagnose-only-default counterpart). Net +16 scenarios; total 211 → 227. +- Build state: spec is now ahead of code across the entire 0.34 surface — the dual-manifest packaging refactor of `packages/skills/`, the `validate-skills-manifests` script, the new `setlist-doctor` MCP tool (59th tool) and matching `setlist doctor` CLI command, the schema v18 migration (rename `built_in_flag` → `is_builtin` via table rebuild, add `self_test_json`, `forked_from`, `decision_history_json`), the named-strategy IDs across the recipe runner / digest history / health provenance, the `is_builtin` + `Duplicate to customize` UI in Settings → Primitives, the primitive `self_test` pre-flight router with the third `?` marker, the `.fctry/bootstrap-config.json` projection write + `no-unresolved-placeholders` gate, the agent-decision `decision_report` extension on `WorkspaceInspectionReport`, the doc-disk-disagreement gap surfacing, the per-dimension `needs_human_verification` bucket on `assess_health`, the `decision_history` facet accumulator, the `npm run validate` umbrella script wiring up the existing checks + `validate-skills-manifests`. Run `/fctry:execute` to catch up. + +Dismissals (recorded for traceability): +- **Pattern 14 (Apple Event Cmd-Q test semantics):** dismissed — test-mechanism choice, not an experience contract. The hide-on-close vs Cmd-Q user behavior is already specced and shipped (S169–S179); how the test is written is left to the test author. Putting it in the spec is over-specifying — the user-facing behavior is the contract; the test mechanism is not. Source span: `SKILL.md:32-34, 174-178`; `references/troubleshooting.md:69`. Evidence: verified (1.0). + + + +Reference prominence (per Spec Writer protocol): +- **Panel**: Pattern 3 (dual-manifest plugin packaging) cited alongside §2.17 section text it shaped; Pattern 4 (setlist-doctor) cited alongside §2.14.3 section text it shaped. +- **Inline**: Patterns 1 (agent-decision report) in §2.16; Pattern 2 (named strategies) in §2.13; Pattern 5 (three-bucket health) in §2.15; Pattern 6 (is_builtin fork-to-customize) in §2.13 and §3.2; Pattern 7 (primitive self-test) in §2.13; Pattern 8 (bootstrap-config.json projection) in §2.13; Pattern 9 (npm run validate) in §4.5; Pattern 11 (decision_history) in §3.2; Pattern 12 (rejected-log) in §4.5. +- **Footnote**: Pattern 10 (trust-disk invariant) in §2.16 + §4.3 as background informing wording; Pattern 13 (no-runtime-deps) in §4.3 as background informing wording. Both patterns also referenced in the main `app-it` entry in §6.1. + +(13 modified, 7 added [setlist-doctor new subsection in §2.14.3; needs_human_verification subsection in §2.15; agent-decision-report subsection in §2.16; trust-disk subsection in §2.16; dual-manifest layout subsection in §2.17; npm run validate subsection in §4.5; rejected-log subsection in §4.5; schema v18 subsection in §5.2; setlist-doctor row in Appendix D; major new app-it entry in §6.1; +16 scenarios S212-S227], 0 removed across spec sections; 0 structural removals; minor cosmetic drift fixed [§3.1 tool count "currently 57" → 59]) + + +- Summary: Incorporated nine confirmed patterns from `ePaint/directory-mcp` (https://github.com/ePaint/directory-mcp) into the spec — proactive-use server `instructions` directive, open-vocabulary normalization with the new `vocab` MCP tool, idempotent writes documented per-collapse-key across every write surface, append-only `interactions` table with derived recency/frequency and an explicit retention policy (schema v17), `{ambiguous: true, alternatives: [...]}` envelope on identity queries, bundled `/setlist-enroll-project` and `/setlist-portfolio-graph` Claude Code skills with an idempotent installer that also injects a sentinel-marker rule into `~/.claude/CLAUDE.md`, verb-shaped tool naming guideline for additions (footnote), and one parametrized fixture per storage interface (footnote, forward-looking). One pattern dismissed (bounded-context `api.py` + `internal/` package shape — conflicts with TypeScript npm-workspace conventions). Cosmetic drift fixed in passing: §2.11 tool count "currently 56" → "currently 58" (was 57 + new `vocab`). 15 new scenarios assigned S197–S211. Schema v17 adds the `interactions` table. +- `#design-principles` (1.3): [modified] Added five new principles — Fire-and-forget idempotent writes, Open-vocabulary with canonical normalization, Derived recency never stored counters, Ambiguous not silent on identity disambiguation. All cite `directory-mcp` inline as the precedent. +- `#capability-declarations` (2.11): [modified] Rewrote mechanism #1 (server `instructions`) from descriptive to proactive-use directive in imperative voice (USE PROACTIVELY / LOOK UP FIRST / CAPTURE OPPORTUNISTICALLY / STAY CONSISTENT). [added] Three new subsections: "Open-vocabulary fields with canonical normalization" (slug + alias-reverse-map at write boundary for `tech_stack` / `patterns` / `topics` / `capability_type`, plus the new `vocab` MCP tool surfacing canonical + in-use + aliases), "Idempotent writes and collapse keys" (per-surface table of collapse keys with the `retain`-reinforces-on-repeat exception documented), and "Naming guideline for new MCP tools" (verb-shaped agent-intention names for 0.33+ additions only — existing 57 not renamed). Fixed cosmetic drift "(currently 56)" → "(currently 58)" in MCP tools self-registration block. +- `#cross-project` (2.9): [added] "Ambiguous response envelope" subsection — `{result, ambiguous, alternatives}` shape with ~15% relative-gap threshold, up to four alternatives with `{name, score, why}`, strictly-additive contract so existing callers see the same `result` field, LLM consumer decides what to do with the alternatives. Cited `directory-mcp` inline. +- `#portfolio-memory` (2.12): [added] "Append-only interaction log" subsection after the admin tools list — row shape `(id, project_id, surface, query, at, session_id, agent_role)`, three derived signals (`MAX(at)` per project, COUNT-over-window, recall ranking recency boost), retention policy (90 days OR 10k rows per project, hard-delete, configurable via `configure_memory`), no-write-path-cost guarantee. Schema v17 reference. +- `#bundled-skills` (2.17): [added] new top-level section after §2.16 — two bundled skills (`/setlist-enroll-project` walking the four-step onboarding, `/setlist-portfolio-graph` rendering area → project → capability tree), installer contract (honors `CLAUDE_CONFIG_DIR`, rewrites absolute paths at copy time, idempotent re-install, preserves `.local.md` overrides, `--no-skills` opt-out), CLAUDE.md rule injection with sentinel markers `` / ``, idempotency contract (missing-pair → append; matching → no-op; drifted → replace; one-marker → refuse with warning), `--no-rule` opt-out. Cited `directory-mcp` inline. +- `#entities` (3.2): [added] Three new entities — Interactions (append-only log rows), Bundled Claude Code skills (editorial defaults shipped in npm packages, copied by installer), CLAUDE.md sentinel block (idempotent rule injection between markers). +- `#rules` (3.3): [added] Eleven new rules covering: open-vocabulary normalization at write boundary with unknown-terms-pass-through; `vocab` is read-only and does not write to `interactions`; every write surface idempotent on documented collapse key; every read surface (except `vocab`) writes one `interactions` row including failures; retention policy; ambiguous-envelope additive contract; proactive-use directive on `instructions`; verb-shaped naming guideline for new tools; installer idempotency + path-rewriting + `.local.md` preservation; CLAUDE.md sentinel-block idempotency. +- `#testing-discipline` (4.5): [modified] Scenario count 168 → 211 with S197–S211 ranges named by pattern group. [added] Forward-looking "one parametrized fixture per storage interface" discipline note (footnote-prominence reference to `directory-mcp`). +- `#schema` (5.2): [modified] Lead paragraph: v16 → v17, with v17 description (interactions table for derived recency/frequency). Tables count 24 → 25. [added] "Schema v17 (current)" subsection with full DDL (`interactions` table + two indexes) and v16 → v17 migration plan (CREATE TABLE + indexes, no data migration, idempotent re-run, schema_version bump to 17). Demoted "Schema v16 (current)" to "Schema v16." +- `#inspirations` (6.1): [added] Major new entry for `ePaint/directory-mcp` documenting all nine adopted patterns with section-target mapping, the one dismissed pattern with rationale, and the three setlist-specific adaptations (interaction-log retention policy, installer `.local.md` preservation + opt-outs, existing 57 tools not renamed). +- Lead paragraph: [modified] Schema v17 promoted to current with interactions-table rationale. Tool count 57 → 58 (new `vocab` tool). Scenario count 168 → 211. [added] mention of two bundled Claude Code skills and the idempotent installer. +- §1.2 #what-this-is: [modified] @setlist/mcp description: 57 → 58 tools, added proactive-use directive note. Database parenthetical: schema v16 / spec 0.29 → schema v17 / spec 0.33. +- §1.4 #success: [modified] 168-scenario → 211-scenario. +- Table of Contents: [modified] Added §2.17 entry under §2. +- Frontmatter: [modified] spec-version 0.32 → 0.33, date 2026-05-29 → 2026-06-07. Status stays `active` (this ref evolve mutates the spec). Synopsis `short` extended with "two bundled Claude Code skills with idempotent installer + CLAUDE.md sentinel-block rule, proactive-use server directive, open-vocabulary normalization with `vocab` tool, idempotent writes with documented collapse keys, append-only `interactions` log with derived recency, ambiguous response envelope on identity queries." Patterns gain sixteen new entries (`proactive-use-directive-instructions`, `open-vocabulary-with-canonical-normalization`, `verb-shaped-tool-names`, `idempotent-writes-with-documented-collapse-keys`, `append-only-interaction-log-derived-recency`, `ambiguous-envelope-on-identity-queries`, `bundled-claude-code-skills`, `idempotent-claudemd-rule-injection-between-markers`, `parametrized-test-fixture-per-storage-interface`, `fire-and-forget-write-semantics`, `slug-plus-alias-reverse-map`, `lookup-first-capture-opportunistically`, `derived-not-stored-counters`, `retention-policy-on-append-only`, `llm-consumer-decides-on-ambiguity`, `sentinel-marker-injection`). Goals gain seven new entries (`proactive-agent-use-of-registry`, `open-vocabulary-with-normalization`, `fire-and-forget-idempotent-writes`, `derived-recency-and-frequency-from-event-log`, `ambiguous-not-silent-disambiguation`, `bundled-agent-onboarding-skills`, `idempotent-claudemd-rule`). +- config.json: [modified] versions.spec.current 0.32 → 0.33. +- Scenarios: [added] S197 (proactive-use directive on initialize), S198 (CLAUDE.md rule injection idempotent across three states + opt-out + CLAUDE_CONFIG_DIR), S199 (write-time normalization of casing/alias/hyphenation variants on the four open-vocab fields, unknown terms pass through), S200 (`vocab` tool surfaces canonical + in_use + aliases, no interactions write, 58 tools total), S201 (round-trip normalization through write_fields and get_project), S202 (repeated identical write is no-op across all six write surfaces with documented collapse keys), S203 (retain reinforces on repeat — documented idempotency exception, `is_static=true` suppresses), S204 (every read surface writes one interactions row including failures, vocab excepted), S205 (recency/frequency derived by query, no mutable counter columns), S206 (retention policy: 90 days OR 10k rows per project, hard-delete, configurable), S207 (full project delete cascades to interactions; archive does not), S208 (ambiguous envelope on ~15% relative-gap threshold with `{result, ambiguous, alternatives}` shape), S209 (LLM consumer decides — registry never drops candidates), S210 (installer copies bundled skills with rewritten absolute paths, honors `CLAUDE_CONFIG_DIR`), S211 (re-running installer is idempotent and preserves `.local.md` overrides). Net +15 scenarios; total 196 → 211. +- Build state: spec is now ahead of code across the entire 0.33 surface — the proactive-use directive rewrite (one-line `Server` constructor change), the new `vocab` MCP tool (58th tool with the canonical-set + alias-reverse-map normalizer in `@setlist/core`), write-time normalization at the boundaries of `enrich_project` / `write_fields` / `register_capabilities`, idempotency contracts documented in MCP tool descriptions, the schema v17 migration (interactions table + two indexes), the interaction-log write on every read surface, derived recency/frequency in `portfolio_brief` and recall ranking, the retention policy in reflection, the ambiguous envelope on `search_projects` / `recall` / `cross_query`, the bundled skills + `npm run install:skills` installer with `CLAUDE_CONFIG_DIR` + path-rewriting + `.local.md` preservation + `--no-skills` / `--no-rule` flags, and the CLAUDE.md sentinel-block injection. Run `/fctry:execute` to catch up. + +Dismissals (recorded for traceability): +- **Pattern 10 (Bounded-context `api.py` + `internal/` package shape):** dismissed — different language ecosystem (Python vs TypeScript), and setlist already has package boundaries via npm workspaces with an established convention. Adopting this would conflict more than help. Source span: `directory/` (whole repo); CLAUDE.md "Conventions" first bullet. Evidence: unverified (0.3). + + + +Reference prominence (per Spec Writer protocol): +- **Panel**: Patterns 1 (proactive-use directive), 2 (open-vocab normalization) — cited alongside the section text they shaped in §2.11. +- **Inline**: Patterns 4 (idempotent writes), 5 (interactions log), 6 (ambiguous envelope), 7 (bundled skills), 8 (CLAUDE.md rule injection) — cited inline in relevant sections (§2.11, §2.12, §2.9, §2.17). +- **Footnote**: Patterns 3 (verb-shaped tool names), 9 (parametrized fixture) — background only, footnote-prominence references in §2.11 and §4.5 respectively. Both patterns also referenced in the main `directory-mcp` entry in §6.1. + +(11 modified, 5 added [new principles in 1.3; vocab tool + collapse-key table + open-vocab subsection in 2.11; ambiguous envelope subsection in 2.9; interactions log subsection in 2.12; new §2.17 bundled-skills section; schema v17 subsection in 5.2; major new directory-mcp entry in §6.1; +15 scenarios S197-S211; +3 entities in §3.2; +11 rules in §3.3], 0 removed across spec sections; 1 structural addition [§2.17 new section/alias `#bundled-skills`]; cosmetic drift fixed [§2.11 tool count "currently 56" → 58]) + ## 2026-05-29T20:30:00Z — /fctry:evolve workspace-inspection — Bounded local workspace inspection with code/workspace/empty/unavailable kinds and sanitized signals (0.31 → 0.32) - Summary: Documented the workspace inspection subsystem shipped in `packages/core/src/workspace-inspection.ts` plus the `source-adapters/` directory (~8k + ~20k LoC). Inspection runs at three moments — when the user picks a folder in Register Project (preview panel), automatically at `register_project` time (`sourceInspection` attached to the project record), and on demand for any registered project or arbitrary candidate path via `inspectProject` / `inspectWorkspaceCandidate`. Every inspection resolves to one of four kinds (`code`, `workspace`, `empty`, `unavailable`) and extracts a kind-specific signal payload with explicit bounds and credential sanitization. Behavioral contract: scenarios S189–S196. - `#workspace-inspection` (2.16): [added] new top-level section documenting the inspection contract end to end — the four kinds and four corresponding statuses, the Register Project dialog preview (headline + language summary + runtime hints + scripts + sanitized git signal + gaps footnote), the three other places inspection surfaces (auto-attach at registration, Overview tab Signals from disk panel, on-demand `inspectProject` / `inspectWorkspaceCandidate`), the code-path signal shape (package manager files, package metadata, scripts, dependency group summaries, likely languages, runtime hints from the documented eight-value set, framework hints, test/build/dev commands, git signals), the workspace-path signal shape (folder role, document categories, material hints), the sanitization rules (credentials stripped from URLs, env-var-with-value redacted from scripts, file contents never extracted in full, ~64 KB per-file metadata byte cap), the bounded-scan rules (top level + one level of recursion, 400-entry cap with `entry_limit_reached` gap, full ignored-directory list `.git` / `node_modules` / `dist` / `.venv` / `build` / `coverage` / `.cache` / `.next` / `.nuxt` / `.turbo` / `__pycache__` / `out` / `target` / `vendor` / `.hg` / `.svn`), and the schema_version field (`workspace-inspection.v1`) as the v1 contract floor. diff --git a/.fctry/config.json b/.fctry/config.json index 889e7f7..ff79429 100644 --- a/.fctry/config.json +++ b/.fctry/config.json @@ -2,7 +2,7 @@ "versions": { "external": { "type": "external", - "current": "0.6.1-beta.4", + "current": "0.6.1-beta.18", "propagationTargets": [ { "file": "package.json", @@ -33,7 +33,7 @@ }, "spec": { "type": "internal", - "current": "0.32", + "current": "0.34", "propagationTargets": [ { "file": ".fctry/spec.md", diff --git a/.fctry/scenarios.md b/.fctry/scenarios.md index 4391ccd..de39305 100644 --- a/.fctry/scenarios.md +++ b/.fctry/scenarios.md @@ -3675,3 +3675,612 @@ Validates: `#workspace-inspection` (2.16), `#desktop-app` (2.14) - Gaps (truncation, sanitization redactions, missing manifest) surface as the same footnote treatment used in the Register dialog Difficulty: medium + +--- + +## directory-mcp Pattern Adoption (S197–S211) + +These scenarios cover the nine confirmed patterns adopted from `ePaint/directory-mcp` in spec 0.33 (see §6.1 `#inspirations`): proactive-use server instructions and CLAUDE.md rule injection, open-vocabulary normalization with the `vocab` tool, idempotent writes with documented collapse keys, the append-only `interactions` log with derived recency and retention, the ambiguous response envelope on identity queries, and the bundled Claude Code skills with their idempotent installer. Schema v17 introduces the `interactions` table. + +--- + +## S197: Server Instructions Are a Proactive-Use Directive {#s197} +**Given** a fresh MCP client connecting to setlist's MCP server for the first time in a session +**When** the client completes the MCP `initialize` handshake +**Then** the server returns an `instructions` paragraph shaped as a directive in imperative voice — naming "USE PROACTIVELY" and the three rules LOOK UP FIRST / CAPTURE OPPORTUNISTICALLY / STAY CONSISTENT — rather than a descriptive overview, and the model is more likely to consult the registry before asking the user about project identity. + +Validates: `#capability-declarations` (2.11) mechanism #1, post-0.33 rewrite + +**Satisfaction criteria:** +- The MCP `initialize` response's `instructions` field contains the literal phrase "USE PROACTIVELY" (or equivalent imperative directive) at or near the top of the paragraph +- The paragraph names the three rules: LOOK UP FIRST (call `get_project` or `search_projects` before asking the user about project identity), CAPTURE OPPORTUNISTICALLY (write back identity/capability/memory changes as they happen), STAY CONSISTENT (use `vocab` to align with the canonical vocabulary) +- The four-step workflow (register → enrich → write fields → refresh digest), the capability item shape, and the pointer to `setlist://docs/onboarding` all still appear, but sit underneath the directive rather than at the top +- The paragraph remains under ~150 words — the directive shape does not bloat it +- The same paragraph is returned identically to every client — no client-detection branching +- A test calling `initialize` against a fresh registry (zero projects) receives the same directive — the bootstrap path does not depend on portfolio state +- An agent that loads the instructions in its session is observably more likely to call `search_projects` before asking the user "which project are we in?" — the directive influences agent behavior (this criterion is satisfied via LLM-judge evaluation of agent transcripts, not by a deterministic assertion) + +Difficulty: medium + +--- + +## S198: CLAUDE.md Rule Injection Is Idempotent Between Sentinel Markers {#s198} +**Given** a user who runs `npm run install:skills` (or accepts the desktop app's first-launch installer prompt) with their existing `~/.claude/CLAUDE.md` in one of three states — file missing, file present without setlist markers, file present with setlist markers from a previous install +**When** the installer runs against each state, then runs again +**Then** the file ends up with a single setlist sentinel block containing the current shipped rule, and re-running does not duplicate or corrupt the block. + +Validates: `#bundled-skills` (2.17) "Idempotency contract for the rule injection" + +**Satisfaction criteria:** +- Run against a missing CLAUDE.md: the installer creates the file with a one-line header `# Claude Code instructions` and appends the full sentinel block (begin marker + rule content + end marker) +- Run against a file with no markers: the installer appends the full block at the end of the file with a single leading blank line; existing content is unchanged +- Run again immediately: the installer detects the matching block and exits without writing (no `updated_at` bump on the file, no duplicate block, no shifted whitespace) +- Run with an older drifted version of the rule between matching markers: the installer replaces only the content between the markers with the current shipped rule; everything outside the markers is byte-identical +- Run with exactly one of the two markers present (the user deleted the other): the installer refuses to write and logs a one-line warning naming the file and the missing marker +- Run with `--no-rule`: the installer skips the CLAUDE.md write entirely and does not check for markers; existing content is byte-identical after the run +- Honors `CLAUDE_CONFIG_DIR`: when set, the installer writes to `$CLAUDE_CONFIG_DIR/CLAUDE.md`, not `~/.claude/CLAUDE.md` +- The rule content matches the literal text in `#bundled-skills` (2.17) — "Before asking the user which project this is, call `get_project` or `search_projects`." + +Difficulty: medium + +--- + +## S199: Open-Vocabulary Fields Normalize Casing and Alias Variants at Write Time {#s199} +**Given** an agent calling `enrich_project`, `write_fields`, or `register_capabilities` with open-vocabulary field values that differ only in casing, hyphenation, or known alias +**When** the agent writes `tech_stack: ["TypeScript", "typescript", "ts"]`, `patterns: ["MCP Server", "mcp-server"]`, `topics: ["React.js", "reactjs"]`, or `capability_type: "MCP tool"` +**Then** the stored values are normalized to canonical slugs (`typescript`, `mcp-server`, `react`, `tool`), duplicates collapse, and a follow-up read returns the canonical form. + +Validates: `#capability-declarations` (2.11) "Open-vocabulary fields with canonical normalization", `#rules` (3.3) + +**Satisfaction criteria:** +- `enrich_project(name=..., tech_stack=["TypeScript", "typescript", "ts"])` stores `["typescript"]` — duplicates collapse, casing normalizes, alias `ts` maps to `typescript` +- `register_capabilities(...)` with `capability_type: "MCP tool"` stores `tool` (or whatever the spec's canonical slug for the MCP-tool kind is) +- A follow-up `get_project` returns the canonical slug, not the original variant the agent wrote +- Novel terms not in the alias map pass through normalized (lowercase + hyphenate, non-alphanumeric stripped) but unchanged in meaning — `enrich_project(... topics=["claude-code"])` against an empty alias map stores `claude-code` +- Unknown terms are NEVER rejected — the registry accepts and stores the normalized form of anything the agent writes +- Pre-0.33 rows with non-canonical values (e.g., `TypeScript` stored before the normalizer existed) are not migrated automatically — they continue to read as `TypeScript` until the next write rewrites them through the normalizer +- Normalization happens once on the write path in `@setlist/core`; there is no second pass at read time + +Difficulty: medium + +--- + +## S200: The `vocab` MCP Tool Surfaces Canonical Set and In-Use Values {#s200} +**Given** a registry with several projects whose `tech_stack`, `patterns`, `topics`, and `capability_type` fields have accumulated values over time +**When** an agent calls `vocab(field="tech_stack")` (or for any of the four normalized fields) +**Then** the response returns `{field, canonical: [...], in_use: [{slug, count}, ...], aliases: {: [, ...]}}` — the canonical curated set, the live in-use values with occurrence counts, and the alias reverse-map. + +Validates: `#capability-declarations` (2.11) "The `vocab` MCP tool" + +**Satisfaction criteria:** +- `vocab` is listed in the MCP server's tool list at handshake, bringing the visible total to 58 tools +- `vocab(field="tech_stack")` response has `field: "tech_stack"`, a `canonical` array of editorially-curated slugs, an `in_use` array of objects with `slug` and `count`, and an `aliases` object mapping canonical slugs to arrays of known aliases +- `in_use` counts reflect the live state of the registry — adding a project with `tech_stack: ["python"]` and re-calling `vocab` shows `python` with count incremented by one +- `aliases.typescript` includes `["TypeScript", "ts", "TS"]` (or whatever the seeded alias set is for that slug) +- Calling `vocab` does NOT write a row to `interactions` (per `#rules` 3.3 — `vocab` is read-only, no write-path cost beyond the SELECT) +- Calling `vocab(field="unknown_field")` returns a clear error naming the four supported fields (`tech_stack`, `patterns`, `topics`, `capability_type`) +- The tool description in the MCP server's schema names what `vocab` does and its idempotency contract (read-only, no side effects) + +Difficulty: medium + +--- + +## S201: Round-Trip Normalization Through write_fields and get_project {#s201} +**Given** an agent that writes a `patterns` array via `write_fields` with mixed casing and an alias +**When** the agent immediately reads the same project via `get_project` +**Then** the returned `patterns` array contains canonical slugs in stable order with duplicates collapsed. + +Validates: `#capability-declarations` (2.11) "Write-time only" normalization + +**Satisfaction criteria:** +- `write_fields(project=..., patterns=["Repository Pattern", "repository-pattern", "repo-pattern"])` stores a single canonical slug (e.g., `repository-pattern`) +- `get_project(name=...)` returns `patterns: ["repository-pattern"]` (single entry, no duplicates) +- Writing again with the same input is a no-op on the `(project_id, field_name)` collapse key — no `updated_at` bump, no row UPDATE +- Writing with a different canonical slug (e.g., `["factory-pattern"]`) replaces the previous value per the §2.5 producer-isolation contract +- The canonical slug `repository-pattern` matches what `vocab(field="patterns")` lists in `in_use` after the write +- Reading the project via `list_projects` at standard depth returns the same canonical slug + +Difficulty: easy + +--- + +## S202: Repeated Identical Write Is a No-Op Across All Write Surfaces {#s202} +**Given** an agent making the same write twice in a row to each documented write surface (`register_project`, `register_capabilities`, `retain` with `is_static=true`, `write_fields`, `enrich_project`, `claim_port`) +**When** the second call lands with byte-identical arguments +**Then** the second call returns success without producing a duplicate row, without bumping `updated_at` where the value is unchanged, and without firing any side effect (no reinforcement, no audit-log churn beyond the standard interaction-log row, no version-chain inflation). + +Validates: `#capability-declarations` (2.11) "Idempotent writes and collapse keys", `#design-principles` (1.3) "Fire-and-forget idempotent writes" + +**Satisfaction criteria:** +- `register_project(name="x")` called twice → one row in `projects` with the original `created_at`; the second call returns success but does not bump `updated_at` +- `register_capabilities(project="x", capabilities=[same set])` called twice → replace semantics observed once; the second call detects identity and does not UPDATE or change `updated_at` +- `retain(content="x", type="decision", project="x", is_static=true)` called twice → one row in `memories`; the second call recognizes the `(content_hash, project_id, scope)` collapse and is a no-op (the `is_static=true` suppresses reinforcement that would otherwise fire on plain `retain`) +- `write_fields(project="x", description="x")` called twice with the same description → one field row, no `updated_at` bump on the second call +- `enrich_project(project="x", goals=["g1", "g2"])` called twice → one merged goals array; union semantics collapse string-equal entries +- `claim_port(project="x", port=3000, service_label="dev")` called twice by the same project → one row, the second call returns success with `was_already_held: true` or equivalent — not a port-conflict error +- Each tool description in the MCP server's schema names its collapse key in the `description` field so the agent reading the tool list sees the idempotency contract per-tool + +Difficulty: medium + +--- + +## S203: Retain Reinforces on Repeat (Documented Idempotency Exception) {#s203} +**Given** an agent calling `retain` with the same content twice without `is_static=true` +**When** the second call lands +**Then** no new memory row is created (collapse key holds), but `reinforcement_count` increments and `last_reinforced` updates — this is documented intentional behavior, not a side-effect violation of the fire-and-forget contract. + +Validates: `#capability-declarations` (2.11) "Idempotent writes and collapse keys" — the `retain` row of the collapse-key table + +**Satisfaction criteria:** +- `retain(content="x", type="decision", project="x")` called twice → one row in `memories` +- After the second call, `reinforcement_count` is 2 (or 1 if the first call seeded it at 0; the spec commits to "incremented by one per repeat," not to the initial value) +- `last_reinforced` (or equivalent timestamp) on the second call is later than `created_at` +- The `recall` ranking on subsequent calls reflects the reinforcement — the memory ranks slightly higher than it did after the first call +- The same `retain` with `is_static=true` does NOT reinforce — both rows in this scenario can verify the asymmetry (call with `is_static=true` twice → reinforcement count unchanged across calls) +- The tool description for `retain` names this behavior explicitly: "Repeated identical retains reinforce the existing memory rather than duplicate; pass `is_static=true` to suppress reinforcement on repeat" + +Difficulty: medium + +--- + +## S204: Every Read Surface Writes an Append-Only Interactions Row {#s204} +**Given** an agent calling each read surface in `@setlist/core` (`search_projects`, `recall`, `get_project`, `cross_query`, `query_capabilities`) +**When** the call returns (success or failure) +**Then** exactly one append-only row lands in the `interactions` table with the surface name, the query string (when applicable), the project_id (when known), the timestamp, and the calling context's session_id and agent_role (when available). + +Validates: `#portfolio-memory` (2.12) "Append-only interaction log", `#rules` (3.3) + +**Satisfaction criteria:** +- `search_projects(query="registry")` produces one `interactions` row with `surface="search_projects"`, `query="registry"`, `at` set to the call timestamp +- `get_project(name="setlist")` produces one row with `surface="get_project"`, `query=NULL`, `project_id` set to setlist's row id +- `recall(query="...", project="setlist")` produces one row with `surface="recall"`, `query="..."`, `project_id` set to setlist's row id +- `cross_query` matching N projects produces N rows — one per matched project — not a single row +- A failed call (e.g., `get_project(name="does-not-exist")`) STILL produces an interactions row with `project_id=NULL` — failures are themselves a signal +- The write is a single synchronous SQLite INSERT and adds no measurable latency beyond the existing per-call floor (verified via timing assertion that 100 calls to a fast read surface complete within the existing latency budget) +- `vocab` is the documented read-only exception — calling `vocab` does NOT write a row to `interactions` + +Difficulty: medium + +--- + +## S205: Recency and Frequency Are Derived by Query, Never Stored {#s205} +**Given** a project with many interactions accumulated over time +**When** `portfolio_brief` reports the project's last-touched timestamp and recent-activity count +**Then** the values are computed by `MAX(at)` and `COUNT(*)` queries against the `interactions` table — there is no mutable counter column on `projects` or anywhere else. + +Validates: `#portfolio-memory` (2.12) "Derived recency and frequency", `#design-principles` (1.3) "Derived recency, never stored counters" + +**Satisfaction criteria:** +- A schema inspection shows no `last_touched`, `touch_count`, `recency`, `frequency`, or equivalent mutable counter column on the `projects` table or any other table +- `portfolio_brief` includes per-project `last_touched` (computed as `SELECT MAX(at) FROM interactions WHERE project_id = ?`) and a recent-activity count (computed as `SELECT COUNT(*) FROM interactions WHERE project_id = ? AND at > datetime('now', '-7 days')`) +- Manually inserting a row into `interactions` for a project and re-calling `portfolio_brief` shows the updated `last_touched` and incremented count +- Manually deleting interactions for a project and re-calling `portfolio_brief` shows the recomputed (reduced) values — there is no stale cached counter to invalidate +- The recall scorer adds a small recency boost to memories whose `project_id` has been touched recently — the boost is bounded so it cannot dominate content relevance (verified by comparing recall results for a fresh project vs. a frequently-touched project where the content match is identical) +- The Home view's "Recently active" sort uses `MAX(at)` from `interactions`, joined per project + +Difficulty: medium + +--- + +## S206: Interactions Retention Policy Prunes Rows by Age and Per-Project Cap {#s206} +**Given** an `interactions` table with rows older than 90 days AND rows that exceed 10,000 per project on hot projects +**When** the background reflection cycle runs the interactions-retention prune step +**Then** rows older than 90 days OR exceeding 10,000 per project (whichever permits more) are hard-deleted, and `portfolio_brief` reflects the recomputed (reduced) signal. + +Validates: `#portfolio-memory` (2.12) "Retention policy", `#rules` (3.3) + +**Satisfaction criteria:** +- A row inserted with `at = datetime('now', '-100 days')` is deleted by the retention prune step (older than the 90-day floor) +- A row inserted with `at = datetime('now', '-30 days')` survives (within the 90-day floor) +- A project with 12,000 interactions rows is reduced to 10,000 by the retention prune, with the oldest 2,000 deleted first +- A project with 50 rows over 100 days keeps ALL of them — the 10k-per-project cap is the more-permissive bound for sparse projects, the 90-day floor governs only when the project has fewer than 10k rows +- Pruning is hard DELETE, not soft delete (the rows are the source of truth for derived counters; soft-deleted rows would skew COUNT(*)) +- The retention values are configurable via `configure_memory`: `interactions_retention_days` (default 90), `interactions_max_rows_per_project` (default 10000) +- Setting `interactions_retention_days = 30` via `configure_memory` and re-running the prune deletes rows older than 30 days +- The prune runs as part of the reflection cycle, not on the hot read/write path + +Difficulty: medium + +--- + +## S207: Archiving a Project Cascades to Its Interaction History {#s207} +**Given** a project with hundreds of interaction-log rows +**When** the project is archived (`archive_project`) and subsequently fully deleted (admin-only path, not user-facing) +**Then** the `ON DELETE CASCADE` foreign key clears the project's interaction history along with its row — the recency signal for an archived project is no longer interesting and should not occupy storage. + +Validates: `#portfolio-memory` (2.12) row-shape contract, `#schema` (5.2) interactions table DDL + +**Satisfaction criteria:** +- `archive_project(name="x")` does NOT delete interaction rows — archiving is soft, and the project row still exists with `status="archived"` +- Programmatic full-delete of the project row (the rare admin path, not exposed via MCP) cascades to `interactions` via the foreign key — all rows with `project_id = .id` are removed +- After delete-cascade, `portfolio_brief` does not include the deleted project in `enrichment_gaps`, recent activity, or any other signal +- Rows with `project_id = NULL` (cross-project queries that matched nothing, or surface-wide reads not pinned to a project) are NOT cascaded — they survive the delete and continue to contribute to portfolio-wide statistics +- The cascade behavior is documented in the v16 → v17 migration plan in `#schema` (5.2) + +Difficulty: easy + +--- + +## S208: Ambiguous Identity Query Returns an Explicit Envelope {#s208} +**Given** a registry with two or more projects whose names, descriptions, or topics produce closely-ranked matches for an identity-shaped query (e.g., "registry" matches both `setlist` and a legacy `project-registry-service` archive row, or "knowmarks" matches both `knowmarks` and `knowmarks-ios`) +**When** an agent calls `search_projects(query="registry")` or `cross_query(query="which project is the registry?")` +**Then** the response carries `{result: , ambiguous: true, alternatives: [{name, score, why}, ...]}` rather than silently picking the top hit. + +Validates: `#cross-project` (2.9) "Ambiguous response envelope", `#rules` (3.3) + +**Satisfaction criteria:** +- A query producing two candidates whose scores are within ~15% of each other surfaces `ambiguous: true` in the response +- The top match still ranks first (the existing relevance + freshness + importance scoring is unchanged); the ambiguity signal sits ON TOP of the ranking, not replacing it +- The `alternatives` array contains up to four entries by default, each with `name`, `score`, and a one-line `why` (e.g., "matches description fragment `project registry`," "fuzzy name match distance 2") +- A query producing exactly one strong match (top score >> second by more than ~15%) surfaces `ambiguous: false` (or omits the field) and an empty (or omitted) `alternatives` array — unambiguous responses are not bloated with the envelope +- Callers that ignore `ambiguous` and `alternatives` see the same `result` field they always did — the contract is strictly additive +- The same envelope appears on `search_projects` (name-shaped), `recall` (when the project scope itself resolves ambiguously), and `cross_query` (natural-language identity questions) +- The threshold is a relative gap, not an absolute floor — a query where all scores are low but the top is still clearly ahead is unambiguous; a query where all scores are high but the top two are tied is ambiguous + +Difficulty: medium + +--- + +## S209: LLM Consumer Decides — Registry Never Drops Candidates Silently {#s209} +**Given** an MCP agent processing an ambiguous response from `search_projects` +**When** the agent receives `{result: A, ambiguous: true, alternatives: [B, C]}` and decides (per its own policy) to ask the user "did you mean A, B, or C?" +**Then** the registry has done its part by surfacing the alternatives; the consumer policy (silent pick, ask the user, render a UI picker) is out of scope for the registry. + +Validates: `#cross-project` (2.9) "Ambiguous response envelope", `#design-principles` (1.3) "Ambiguous, not silent, on identity disambiguation" + +**Satisfaction criteria:** +- The response envelope is the same regardless of which client consumes it (Claude Code, Codex, a custom Node script, the desktop app's command palette) — the registry does not adapt the envelope to the client +- A test consumer that silently picks the top result (ignores `alternatives`) is a valid policy — the registry does not enforce that consumers must surface alternatives to a user +- A test consumer that always asks the user when `ambiguous: true` is also valid +- The desktop app's command palette renders alternatives as picker rows when present +- The orchestrator and other portfolio-reasoning agents document their own consumer policy for handling ambiguity — that policy is per-consumer, not per-registry +- No mechanism in the registry "punishes" a consumer for ignoring `alternatives` (no log entry, no reduced score on subsequent queries) — the registry's contract ends at returning the envelope + +Difficulty: easy + +--- + +## S210: Installer Copies Bundled Skills with Rewritten Absolute Paths {#s210} +**Given** a user who runs `npm run install:skills` from a setlist package (or accepts the desktop app's first-launch installer prompt) +**When** the installer runs against `~/.claude/skills/` (or `$CLAUDE_CONFIG_DIR/skills/` when set) +**Then** the two bundled skills (`setlist-enroll-project.md`, `setlist-portfolio-graph.md`) are copied to the skills directory with all absolute paths rewritten to match the installed setlist package's location on this machine. + +Validates: `#bundled-skills` (2.17) "Installer contract" + +**Satisfaction criteria:** +- After `npm run install:skills`, `~/.claude/skills/setlist-enroll-project.md` and `~/.claude/skills/setlist-portfolio-graph.md` exist +- Both files contain machine-specific absolute paths to setlist's helper scripts (e.g., the path to `node` for invoking helpers, the path to the setlist CLI binary) — verified by grepping for absolute paths and confirming they resolve on this machine +- When `CLAUDE_CONFIG_DIR=/tmp/claude-test` is set in the environment, the installer writes to `/tmp/claude-test/skills/` instead of `~/.claude/skills/` +- Calling Claude Code's `/setlist-enroll-project` command after install successfully invokes the skill (the path rewrites are correct, the skill's helper-script invocations work) +- Calling `/setlist-portfolio-graph` after install successfully renders the area → project → capability graph (`portfolio_brief`, `list_areas`, `list_projects`, `query_capabilities` all called via MCP) +- The installer runs to completion without prompting the user for input — the install is non-interactive + +Difficulty: medium + +--- + +## S211: Re-Running the Installer Is Idempotent and Preserves Local Overrides {#s211} +**Given** a user who has run the installer once, then edited `~/.claude/skills/setlist-enroll-project.local.md` to customize their workflow, then runs the installer again +**When** the installer runs the second time +**Then** `setlist-enroll-project.md` (the bundled installed copy) is overwritten with the current shipped version (with paths re-rewritten), `setlist-enroll-project.local.md` (the user's local override) is left untouched, and the installer logs a one-line notice naming the preserved local file. + +Validates: `#bundled-skills` (2.17) "Installer contract" — idempotent re-install + local override preservation + +**Satisfaction criteria:** +- Running the installer twice in succession produces byte-identical `setlist-enroll-project.md` and `setlist-portfolio-graph.md` files (modulo path-rewrite differences if the machine state has changed) +- Editing `setlist-enroll-project.local.md` between installer runs preserves the local edits — the second run leaves the `.local.md` file byte-identical +- The installer logs a one-line notice: "preserving local override: `setlist-enroll-project.local.md`" (or equivalent phrasing) when a local override is detected +- Claude Code's skill resolution prefers `.local.md` over `.md` for the same skill name — the user's customizations take precedence over the bundled defaults without conflicting +- Running the installer with `--no-skills` after a previous successful install does NOT delete the previously-installed skills — it just skips the copy step. (The contract is "do not overwrite," not "uninstall." A separate `uninstall:skills` flow is a future evolve.) +- Running the installer against a partially-installed state (e.g., one bundled skill is present, the other is missing because the user deleted it) re-installs the missing skill and leaves the present skill alone (overwriting it with the current shipped content) +- The installer respects `CLAUDE_CONFIG_DIR` on every run consistently — running with `CLAUDE_CONFIG_DIR=/path/A` then again with `CLAUDE_CONFIG_DIR=/path/B` installs to two different locations and does not cross-contaminate + +Difficulty: medium + +--- + +## S212: Workspace Inspection Returns Agent-Decision Report Alongside Structural Facts {#s212} +**Given** a registered code project at `~/Code/setlist` with `package.json`, multiple subpackages under `packages/`, a `vite.config.ts` declaring port 5173, and the registered project has an `email_account` set +**When** an agent calls `inspectProject('setlist')` (or `inspectWorkspaceCandidate('~/Code/setlist')`) +**Then** the returned `WorkspaceInspectionReport` carries both the existing structural payload (`paths[].code` / `paths[].folder` / `summary.gaps` / `schema_version: workspace-inspection.v1`) **and** a new `decision_report` object with the agent-decision-shaped summary view. + +Validates: `#workspace-inspection` (2.16) "Agent-decision report" + +**Satisfaction criteria:** +- `decision_report.project_type_hint` is `code` plus a refined subtype string drawn from runtime/framework hints (e.g., `node-typescript`) +- `decision_report.port_literals` includes `{port: 5173, source: "vite.config.ts"}` (and any other 1024–65535 literals discovered in dev-config files during the shallow scan) +- `decision_report.multi_app_structure` is `true` when multiple package manager files sit at sibling top-level subdirectories under `packages/` +- `decision_report.area_hint` is the path-segment heuristic value (`Infrastructure` or `Code`-shaped), explicitly labelled as hint not authoritative +- `decision_report.last_touched_at` is an ISO 8601 timestamp derived from top-level `mtime`, not requiring a second scan +- `decision_report.missing_mailbox_state` is `null` by default (off unless the per-call opt-in flag is set) +- `decision_report.summary_paragraph` is a single human-readable sentence stitching the above (`"Node/TypeScript monorepo with N apps under packages/, listens on port 5173, last touched Xh ago"`) +- The existing structural fields are unchanged — a consumer that reads only `paths[]` and `summary.kind` sees identical output to v0.33 reports +- The new `decision_report` does not trigger a second scan; computed in the same pass as the structural report +- The report's `schema_version` remains `workspace-inspection.v1` (the field is additive, not a breaking change) + +Difficulty: medium + +--- + +## S213: `setlist-doctor` Diagnoses Setlist-Shaped Problems Without Side Effects by Default {#s213} +**Given** a setlist install with (a) a managed MCP entry in `~/Library/Application Support/Claude/claude_desktop_config.json` whose `command` path points to a moved npm global location, (b) a stale `~/.local/share/setlist/worker.pid` file naming a pid that is no longer running, and (c) a `~/.claude/CLAUDE.md` containing exactly one `` marker without the matching end marker +**When** the user invokes the `setlist-doctor` MCP tool (or runs `setlist doctor` from the CLI) without any flags +**Then** the tool produces a structured report naming each problem with its `check_id`, `status`, `summary`, `evidence`, and `recoverable` flag, and exits **without modifying any file**. + +Validates: `#mcp-clients` (2.14.3) "setlist-doctor" + +**Satisfaction criteria:** +- The report includes `{check_id: "mcp-managed-entry-broken", status: "broken", recoverable: true, evidence: }` for the Claude Desktop case +- The report includes `{check_id: "worker-pid-stale", status: "stale", recoverable: true, evidence: }` for the worker case +- The report includes `{check_id: "claudemd-sentinel-malformed", status: "malformed", recoverable: false, evidence: }` for the CLAUDE.md case +- The diagnose-only run touches **zero** files — verified by stat'ing the three files before and after the call; mtime is unchanged on each +- The CLI default output is a human-readable table; `setlist doctor --json` and the MCP tool both emit the same structured envelope `{checks: [...], summary: {ok, broken, malformed, stale, orphan, not_applicable}, version, generated_at}` +- A `setlist doctor` invocation on a clean system reports every check as `ok` or `not_applicable` and exits cleanly +- A `setlist-doctor --check_filter=worker-pid-stale` scopes the run to a single check +- The doctor tool count is 59 (the existing 58 plus this addition) + +Difficulty: medium + +--- + +## S227: `setlist-doctor --fix-safe` Is Narrow and Forbidden From Editing Anything Outside Setlist Artifacts {#s227} +**Given** the same broken state as S213, plus a registered project at `~/Code/setlist` whose `src/index.ts` contains a known typo the user has not yet fixed +**When** the user invokes `setlist-doctor --fix-safe` +**Then** the tool fixes (a) the broken managed MCP entry (rewrites `command` and `args` to the current install path, creates `.setlist-backup` before write), (b) the stale worker pid file (deletes the pointer), and **refuses** to touch the CLAUDE.md sentinel-marker problem (recoverable: false) and **does not** alter `~/Code/setlist/src/index.ts` (outside setlist's artifact scope). + +Validates: `#mcp-clients` (2.14.3) "setlist-doctor" `--fix-safe` scope ceiling + +**Satisfaction criteria:** +- `~/Library/Application Support/Claude/claude_desktop_config.json.setlist-backup` exists after the run with the pre-fix contents +- The new managed entry's `command` resolves to a real binary; the entry's `__setlist: true` marker is preserved +- `~/.local/share/setlist/worker.pid` no longer exists +- `~/.claude/CLAUDE.md` is byte-identical before and after — the orphan marker is reported but not fixed +- `~/Code/setlist/src/index.ts` is byte-identical before and after — the doctor refuses to edit user project code under any circumstance +- Each fix action is written to `~/.local/share/setlist/doctor.log` as a structured JSONL line `{check_id, action, before, after, backup_path?}` +- No process is killed; no MCP server is restarted; no package manager is invoked +- A subsequent `setlist doctor` (without `--fix-safe`) reports the MCP-entry and pid-file checks as `ok` and the CLAUDE.md check as still `malformed` + +Difficulty: medium + +--- + +## S214: Health Assessment Surfaces `needs_human_verification` Honestly Instead of Silently Degrading {#s214} +**Given** a project whose `outcomes` dimension cannot be assessed headlessly (e.g., the assessment logic flags "recent commit messages claim feature work but no test was added or modified — unclear whether feature actually works") +**When** `assess_health(project_name)` runs (or the user opens the project's Health section on the Overview tab) +**Then** the `outcomes` dimension is returned with `verification_state: "needs_human_verification"`, the composite tier is **not** silently degraded by this dimension, and the user-facing surface explicitly names "needs human review" rather than averaging the unknown signal into a lower color. + +Validates: `#health-assessment` (2.15) "Per-dimension `needs_human_verification` bucket" + +**Satisfaction criteria:** +- `assess_health(name)` returns `dimensions[].verification_state ∈ {"assessed", "unknown", "needs_human_verification"}` +- Composite tier composition treats `needs_human_verification` as a fourth pole — it does not contribute to the worst-tier-wins calculation, but appears in the response's `verification_state_summary` alongside the tier +- The Home view dot uses a thin border (or small `?` glyph) when any dimension is `needs_human_verification` +- The project detail Overview tab's Health section displays the bucketed dimension under its own row (`Outcomes: needs human verification — `) +- The user can mark the dimension "checked, tier was X" via a one-click affordance; the override stores as a memory of type `observation` with `is_static: true` +- The bucket is health-only — `bootstrap_project` and `assess_health` pre-flight checks never return `needs_human_verification` (validated by a negative scenario: an attempt to use it elsewhere is a code-review-time rejection) + +Difficulty: medium + +--- + +## S215: Named Recipe Strategy IDs Propagate to Bootstrap Response, Digest History, and Health {#s215} +**Given** a project type "Code project" whose recipe carries `strategy_id: code-with-mail-v2` and a user bootstrapping a new project `my-thing` under that type with `email_account: mike@h3r3.com` +**When** `bootstrap_project` runs the recipe to completion +**Then** the bootstrap response carries `strategy_id: code-with-mail-v2` in `executed_steps`, the new project's `bootstrap-config.json` projection includes `strategy_id`, the project's digest gains one new `decision_history` entry naming the strategy ID and bound parameters, and `assess_health(my-thing)` surfaces the strategy ID in its provenance facet. + +Validates: `#project-bootstrap` (2.13) "Named recipe strategies"; `#entities` (3.2) "Project type recipe" + +**Satisfaction criteria:** +- `bootstrap_project` response contains `{strategy_id: "code-with-mail-v2", executed_steps: [...]}` +- `~/Code/my-thing/.fctry/bootstrap-config.json` contains `{strategy_id: "code-with-mail-v2", project: "my-thing", type: "Code project", email_account: "mike@h3r3.com", ...}` +- `get_project_digest("my-thing")` returns `decision_history` containing one entry `{at: , kind: "bootstrap", summary: "bootstrapped via code-with-mail-v2, email=mike@h3r3.com", source: }` +- `assess_health("my-thing")` returns `provenance.strategy_id: "code-with-mail-v2"` +- Renaming the strategy ID to `code-with-mail-v3` later does **not** rewrite the existing digest's `decision_history` entry — the project retains its historical ID +- A future bootstrap under the renamed strategy uses the new ID in its new entry + +Difficulty: medium + +--- + +## S216: `is_builtin` Marker and `Duplicate to Customize` Fork Built-In Without Mutating It {#s216} +**Given** the five seeded built-in primitives (`create-folder`, `copy-template`, `git-init`, `update-parent-gitignore`, `mail-create-mailbox`) marked `is_builtin: true` and a user who wants to customize `mail-create-mailbox` to use a different mailbox path template +**When** the user clicks `Duplicate to customize` in Settings → Primitives next to `mail-create-mailbox` +**Then** a new user-owned primitive is created with `is_builtin: false`, name `mail-create-mailbox-custom` (editable, must be unique), the same shape and parameter declarations, and `forked_from: "mail-create-mailbox"` provenance pointer; the original `mail-create-mailbox` is untouched. + +Validates: `#project-bootstrap` (2.13) "`is_builtin: true` on built-in primitives, fork-to-customize"; `#entities` (3.2) "Bootstrap primitive" + +**Satisfaction criteria:** +- `list_primitives` returns the five built-ins with `is_builtin: true` and the new fork with `is_builtin: false, forked_from: "mail-create-mailbox"` +- The user can edit `mail-create-mailbox-custom`'s parameter shape; `update_primitive` succeeds on the user copy and **fails with `Error [BUILTIN_PRIMITIVE_IMMUTABLE]`** on the original +- The user can swap a recipe step from the original to the fork via the type editor; existing recipes that reference the original continue to reference the original (the swap is per-recipe-step, not a global rename) +- The Settings → Primitives panel renders the built-ins block as read-only with a `Duplicate to customize` button beside each row +- A setlist binary upgrade that updates the `mail-create-mailbox` built-in shape **does not** touch the user's `mail-create-mailbox-custom` fork +- `delete_primitive` succeeds on the user fork (when no recipe references it) and **fails on the original** with `Error [BUILTIN_PRIMITIVE_IMMUTABLE]` +- Schema v18 migration preserves `is_builtin` semantics from the prior `built_in_flag` (existing built-in rows survive with `is_builtin = 1`) + +Difficulty: medium + +--- + +## S217: Primitive `self_test` Discriminates Env-Broken From Primitive-Broken at Pre-Flight {#s217} +**Given** a project type whose recipe includes `mail-create-mailbox` (whose self-test probes Mail.app process state) and a second user-authored `shell-command` primitive `gh-repo-create` (whose self-test runs `gh auth status` to verify the user's GitHub CLI is authenticated) +**When** the user attempts `bootstrap_project` with (a) Mail.app not running and (b) `gh` unauthenticated +**Then** pre-flight fails before any side effect runs, the dry-run trace marks `mail-create-mailbox` with the third marker `?` and routes the failure as `env_unavailable: Mail.app not running`, marks `gh-repo-create` with `?` and routes it as `env_unavailable: gh CLI not authenticated`, and **no** step is mis-routed as `primitive_failure`. + +Validates: `#project-bootstrap` (2.13) "Primitive self-test hooks" + +**Satisfaction criteria:** +- Pre-flight emits three marker types — `✓` (pass), `✗` (primitive-broken), `?` (env-unavailable) — each row carrying a one-line plain-language reason +- The `env_unavailable` route names the environment that's missing (Mail.app, `gh` CLI auth) rather than the primitive (`mail-create-mailbox`, `gh-repo-create`) +- A primitive with no `self_test` declaration falls back to the existing pre-flight (target-reachability, binary-on-PATH, mcp-tool-registered) — backward compatible +- `filesystem-op` primitives never carry a `self_test` (their existing pre-flight checks cover the same ground) +- A self-test that side-effects (writes a file, creates an external resource) is a primitive-author bug — surfaced in code review, not enforced at the engine boundary +- The `self_test_json` storage column on `bootstrap_primitives` is NULL for the original built-ins until v0.34's seed migration declares them; user-authored primitives may declare their own self-test or omit it + +Difficulty: medium + +--- + +## S218: `.fctry/bootstrap-config.json` Is Written as a Projection of Registry State, Not the Source of Truth {#s218} +**Given** a successful `bootstrap_project` for `my-thing` under the `code-with-mail-v2` strategy with `email_account: mike@h3r3.com` and a `vite.config.ts` template that references `__APP_NAME__` and `__PORT__` placeholders +**When** the bootstrap engine finishes the recipe +**Then** `~/Code/my-thing/.fctry/bootstrap-config.json` is written containing the resolved placeholders, the strategy ID, the bootstrap timestamp, and a `_generated_from: setlist@` provenance line; the registry row remains canonical (§1.3 "Definition is truth"); the file is regeneratable from the registry on demand. + +Validates: `#project-bootstrap` (2.13) "`.fctry/bootstrap-config.json` projection" + +**Satisfaction criteria:** +- `~/Code/my-thing/.fctry/bootstrap-config.json` contains `{project: "my-thing", type: "Code project", strategy_id: "code-with-mail-v2", email_account: "mike@h3r3.com", placeholders: {APP_NAME: "my-thing", PORT: 5173, EMAIL_ACCOUNT: "mike@h3r3.com"}, bootstrapped_at: , _generated_from: "setlist@"}` +- Templates copied by `copy-template` may use `__APP_NAME__` / `__PORT__` / `__EMAIL_ACCOUNT__` style placeholders; the engine resolves them at materialization +- Deleting `~/Code/my-thing/.fctry/bootstrap-config.json` does not break any registry operation against `my-thing` — the registry row is canonical +- A future `setlist regenerate-bootstrap-config ` (or equivalent) rewrites the projection from the registry row +- The file is in `.gitignore`-eligible territory but setlist does not auto-add it to the new project's `.gitignore` — that's a project-author choice + +Difficulty: easy + +--- + +## S219: `no-unresolved-placeholders` Pre-Flight Gate Aborts Bootstrap Before Side Effects {#s219} +**Given** a `copy-template` step whose source template contains a `__APP_NAEM__` typo (the user meant `__APP_NAME__`) and a user attempting `bootstrap_project` +**When** the engine runs pre-flight on the materialized template output +**Then** the `no-unresolved-placeholders` gate detects the unresolved `__APP_NAEM__` pattern, aborts the bootstrap before any side effect (no folder created, no git init, no shell-command run, no `bootstrap-config.json` written), and surfaces the failure with the exact file path and placeholder name. + +Validates: `#project-bootstrap` (2.13) "`.fctry/bootstrap-config.json` projection" → `no-unresolved-placeholders` gate + +**Satisfaction criteria:** +- The gate scans the resolved template output for any remaining `__[A-Z_]+__` pattern after substitution +- A typo like `__APP_NAEM__` is flagged because no resolver maps to it +- The pre-flight trace names the file (`~/Code/my-thing/vite.config.ts`) and the unresolved pattern (`__APP_NAEM__`) and aborts with the standard pre-flight ✗ shape +- The bootstrap does not create the project folder when the gate fires +- The user fixes the typo in the source template (or in the primitive's parameter binding), re-runs Dry run, sees ✓, then commits +- The gate is silent on `{{` escapes (`{{APP_NAME}}` is not a placeholder; it's an escaped literal `{APP_NAME}` per the existing token-resolver contract) + +Difficulty: easy + +--- + +## S220: Bundled Skills Ship Parallel `.claude-plugin/` and `.codex-plugin/` Manifests Over One `skills/` Tree {#s220} +**Given** the `packages/skills/` directory with the bundled-skill content in `skills/`, a `.claude-plugin/plugin.json`, a `.claude-plugin/marketplace.json`, and a `.codex-plugin/plugin.json` +**When** an agent inspects the package or `npm run install:skills` runs +**Then** both manifests reference the same skill files in `skills/`; the host-specific fields (entrypoint shape, capability declarations, namespace prefix, version pin) are in the respective `.-plugin/plugin.json`; the skill content lives once. + +Validates: `#bundled-skills` (2.17) "Dual-manifest layout" + +**Satisfaction criteria:** +- `packages/skills/skills/setlist-enroll-project.md` and `setlist-portfolio-graph.md` exist as the single skill source +- `packages/skills/.claude-plugin/plugin.json` lists both skills with Claude Code's required field shape +- `packages/skills/.codex-plugin/plugin.json` lists both skills with Codex's required field shape +- Both manifests carry the same `version` field +- The installer copies skills to the host's expected location for each host (Claude Code → `~/.claude/skills/`, Codex → its equivalent) +- The package's `package.json` `files` array includes both `.claude-plugin/` and `.codex-plugin/` and `skills/` + +Difficulty: easy + +--- + +## S221: `validate-skills-manifests` Fails the Build When Manifests Drift {#s221} +**Given** the dual-manifest packaging from S220 and a developer who bumps `version` in `.claude-plugin/plugin.json` without bumping `.codex-plugin/plugin.json` +**When** the developer runs `npm run validate` (or the CI runs it) +**Then** the `validate-skills-manifests` step fails with a non-zero exit code naming the version mismatch and refusing to mark the build green. + +Validates: `#bundled-skills` (2.17) "Dual-manifest layout" → `validate-skills-manifests`; `#testing-discipline` (4.5) one-command `validate` + +**Satisfaction criteria:** +- The validator script reads both manifests, compares `version` fields, and exits 1 with a clear error when they differ +- The same validator catches a skill listed in one manifest but missing from the other +- The same validator catches a skill file in `skills/` that neither manifest declares (orphan skill content) +- The validator runs as part of `npm run validate` (the meta gate) and as a standalone `npm run validate-skills-manifests` +- A passing validator is silent (zero stdout when everything matches), keeping CI output noise low +- The Claude Code marketplace.json entry's `version` is also checked against the plugin.json version (within `.claude-plugin/`) + +Difficulty: easy + +--- + +## S222: `npm run validate` Is the One-Command CI Gate Fanning Out to All Checks {#s222} +**Given** the setlist monorepo at any working-tree state +**When** the developer runs `npm run validate` +**Then** the command fans out to `npm run typecheck`, `npm test`, `npm run build`, `npm run verify:mcp-abi`, and `npm run validate-skills-manifests` (in an order chosen by the script); returns one exit code; any single child failure fails the whole command. + +Validates: `#testing-discipline` (4.5) `npm run validate` one-command CI gate + +**Satisfaction criteria:** +- `npm run validate` exists in the root `package.json` `scripts` +- A clean tree returns exit code 0; introducing a type error makes it return non-zero; reverting makes it return zero again +- Each child step's output is interleaved (or aggregated) into the parent run with clear section markers so a developer reading the log can tell which step failed +- Individual `npm run typecheck` / `npm test` / `npm run build` / `npm run verify:mcp-abi` / `npm run validate-skills-manifests` remain available for targeted runs +- CI invokes the documented subset that's safe and fast in a hosted environment (typecheck + test + build); the heavier local-only checks (ABI verify) still run via `npm run validate` on a developer's machine +- The script does not run E2E Playwright tests (those are local-only, deliberately outside CI) + +Difficulty: easy + +--- + +## S223: Workspace Inspection Surfaces Doc-Disk Disagreement as a Gap (Trust Disk Over Docs) {#s223} +**Given** a project at `~/Code/something` with a `CLAUDE.md` that says "Next.js app" and a `package.json` listing `react` and `vite` in dependencies with no `next` +**When** `inspectWorkspaceCandidate('~/Code/something')` runs +**Then** the report's `runtime_hints` and `framework_hints` are derived from the manifest (no `Next.js`), and the `summary.gaps` array contains a `{kind: "doc-disk-disagreement", source_doc: "CLAUDE.md", doc_claim: "Next.js app", disk_evidence: "package.json names react+vite, no next"}` entry. + +Validates: `#workspace-inspection` (2.16) "Trust disk over docs"; `#hard-constraints` (4.3) trust-disk invariant + +**Satisfaction criteria:** +- `runtime_hints` includes `Node.js` and (when TypeScript files are present) `TypeScript`; does not include `Next.js` +- `framework_hints` derives from manifest names (`react`, `vite`) and well-known dependency names; no inferred `Next.js` +- The disagreement gap is present with the named shape; the user or agent reading the gap decides which side to fix +- The inspector does not edit `CLAUDE.md`, does not delete the gap silently, and does not try to reconcile +- The gap appears in the Register Project dialog's preview panel as a small footnote line ("Doc says Next.js, manifest says React+Vite") +- The gap appears in the project detail Overview tab's Signals from disk panel + +Difficulty: easy + +--- + +## S224: `decision_history` Facet on Project Digests Accumulates Bounded, Survives Refresh {#s224} +**Given** a project that has been bootstrapped (via `code-with-mail-v2`), then moved between areas twice, then had its parent set, then renamed +**When** the user calls `get_project_digest(name)`, then `refresh_project_digest(name, ...)` with a new digest body, then `get_project_digest(name)` again +**Then** the first call returns `decision_history` with all five entries (in chronological order), the refresh updates `digest_text` / `spec_version` / `producer` / `generated_at` and **leaves `decision_history` untouched**, and the cap holds at 20 entries (older entries drop oldest-first when the cap is hit). + +Validates: `#entities` (3.2) "Project digests" `decision_history` facet; `#rules` (3.3) digest rules + +**Satisfaction criteria:** +- Each structural move (`set_project_area`, `set_parent_project`, `archive_project`, `rename_project`) appends one entry to `decision_history` with `kind` matching the action and a one-line plain-language `summary` +- `bootstrap_project` appends one entry with `kind: "bootstrap"`, the recipe `strategy_id`, and key bound parameters in the `summary` +- `refresh_project_digest` does not touch `decision_history` — verified by appending one entry, refreshing, then reading and asserting the entry is still present +- The cap is 20 entries per project; appending the 21st entry drops the oldest +- The cap is configurable in code, not surfaced through the MCP layer in v1 +- `archive_project` cascades to `project_digests` via `ON DELETE CASCADE` (existing v12 behavior); re-registering with the same name produces a fresh empty `decision_history` + +Difficulty: medium + +--- + +## S225: Shared External-File Injection Lineage Is Acknowledged, Not Yet Refactored {#s225} +**Given** the three setlist surfaces that write into foreign config files — bootstrap's `.fctry/bootstrap-config.json` projection, §2.14.3 MCP-clients install/remove, §2.17 bundled-skills installer + CLAUDE.md sentinel injection +**When** a reader (developer, contributor, future spec writer) examines §2.13 looking for the cross-section note +**Then** the spec acknowledges the three surfaces share architectural concerns (foreign-file writes, backup/marker/recovery conventions, dry-run/preview surfaces) and flags them as a candidate for a future shared `external-file-injection` contract, **without refactoring in 0.34**. + +Validates: `#project-bootstrap` (2.13) "Cross-section note on external-file injection lineage" + +**Satisfaction criteria:** +- The cross-section note is present in §2.13 and names all three surfaces by section reference (§2.13, §2.14.3, §2.17) +- The note explicitly identifies the three patterns each surface invented independently (backup convention, marker convention, dry-run surface) and notes the absence of a shared layer in 0.34 +- A future spec writer adding a fourth foreign-file write surface (e.g., writing into a third MCP client's config in a future evolve) is expected to review the existing three first +- The acknowledgment does **not** modify any of the three existing surfaces' implementations; backward compatibility is preserved across the board + +Difficulty: easy + +--- + +## S226: Dismissed Directions Are Recorded in the Changelog, Not Silenced {#s226} +**Given** a `/fctry:ref` evolve that adopted thirteen patterns from `app-it` (spec 0.34) and explicitly dismissed one (Apple Event Cmd-Q test semantics) +**When** a future session greps `.fctry/changelog.md` for the dismissed pattern's name or its source repo +**Then** the dismissal is recorded as a `Dismissals` bullet inside the 0.34 evolve's changelog entry, naming the pattern, the source URL, and a one-sentence rationale — readable as the canonical anti-re-proposal record. + +Validates: `#testing-discipline` (4.5) "Rejected directions are logged, not silenced" + +**Satisfaction criteria:** +- The 0.34 changelog entry contains a `Dismissals` section +- The section names the dismissed pattern (`Apple Event Cmd-Q test semantics`), its source span in the source repo, and a one-sentence rationale (the rationale is approximately "test-mechanism choice, not an experience contract; the user behavior is already specced in S169–S179") +- A grep against `.fctry/changelog.md` for the pattern name or the source URL finds the dismissal +- The shape is one bullet per dismissed item — no separate `.fctry/rejected.md` file, no per-decision ADR document +- A later `/fctry:ref` that re-encounters the same pattern can read the prior rationale and either skip re-debating or surface new evidence to re-evaluate +- Adopted patterns are recorded in the main entry; only dismissals get the dedicated `Dismissals` section diff --git a/.fctry/spec.md b/.fctry/spec.md index a619365..de25ae5 100644 --- a/.fctry/spec.md +++ b/.fctry/spec.md @@ -3,25 +3,25 @@ ```yaml --- title: Setlist -spec-version: "0.32" -date: 2026-05-29 +spec-version: "0.34" +date: 2026-06-07 status: active author: Mike spec-format: nlspec-v2 synopsis: - short: "TypeScript project registry — intelligence hub with desktop control panel, user-managed areas + project types, user-composable bootstrap primitives (including Mail.app mailbox creation), customizable Home view, menu-bar persistence + global project shortcuts, safe install/remove of the setlist MCP entry in Claude Desktop and Codex, bounded workspace inspection that surfaces code/workspace/empty signals from disk, 57 MCP tools, unified memory, per-project essence digests, client-independent MCP onboarding" + short: "TypeScript project registry — intelligence hub with desktop control panel, user-managed areas + project types, user-composable bootstrap primitives with named-strategy IDs and per-project bootstrap-config projection (including Mail.app mailbox creation), customizable Home view, menu-bar persistence + global project shortcuts, safe install/remove of the setlist MCP entry in Claude Desktop and Codex with shared external-file injection lineage, bounded workspace inspection that surfaces code/workspace/empty signals from disk plus an agent-decision summary view, 59 MCP tools (including the new diagnose-only setlist-doctor with --json and narrow --fix-safe), dual-manifest bundled skills (Claude Code + Codex from one tree), three-bucket health assessment with needs_human_verification, schema v18 (is_builtin built-in primitive marker, optional self_test, project_digests.decision_history), unified memory, per-project essence digests, client-independent MCP onboarding" medium: "TypeScript monorepo (@setlist/core, @setlist/mcp, @setlist/cli, @setlist/app) implementing the project registry as both invisible infrastructure and a directly operable desktop surface. Local SQLite (better-sqlite3) + MCP server + Electron control panel sharing Chorus's design system (Tailwind 4, Radix UI), distributed as signed and notarized release builds that auto-update over two user-selectable channels (stable, beta). The desktop app exposes a single-page Settings panel (Areas, Project types, Primitives, View, Bootstrap, Updates) where the user manages canonical areas as a CRUD list, manages project types as a first-class CRUD list with per-type default directory, git-init flag, optional template directory, an optional per-type default email account, and an ordered **recipe** of bootstrap **primitives** (filesystem-op, shell-command, mcp-tool — assembled from a peer Primitives panel of read-only built-ins plus user-authored entries). Project bootstrap is a recipe runner: each project type carries an ordered list of primitive invocations with bound parameter values, the engine walks the recipe in order with combined dry-run + pre-flight tracing, and on mid-run failure stops with Retry / Skip / Abandon options (Retry resumes from the failed step against a snapshot of the recipe taken at bootstrap start; Abandon undoes filesystem and git work and lists external side-effects honestly as structured `{step, primitive, summary}` entries). Five built-in primitives ship as user-droppable read-only entries (create-folder, copy-template, git-init, update-parent-gitignore, mail-create-mailbox); the new `mail-create-mailbox` is a `shell-command` shape that drives Mail.app via AppleScript/`osascript` to create a mailbox under a specified account at a templated path, seeded but not in any default recipe. Projects gain an optional `email_account` field; a new `{project.email_account}` token joins `{project.name}`, `{project.path}`, `{project.type}`, `{project.parent_path}`, and `{project.type.template_directory}` in the resolver. The Home view supports column visibility toggles, a compact/spacious row-density toggle, sort persistence across sessions, a default landing view, and the standard Cmd-, accelerator to open Settings. Between launches the app lives in the menu bar by default — closing the window hides rather than quits, a small tray icon surfaces Open / Settings / Check for Updates / Quit, and an optional Dock-while-hidden toggle gives the user a pure menu-bar experience when they want it. Six per-action global project shortcuts (three with sensible defaults, three blank for the destructive actions) let the user open the project folder, copy the brief command, refresh health, edit, rename, or archive from anywhere on the desktop, with inline validation that flags duplicates, invalid formats, and OS conflicts. Provides project identity, capability declarations, unified portfolio memory (10 types with FTS5 full-text search, belief classification, temporal validity, entity storage, procedural versioning, four-level scoping with area bubble-up, and triple-gate stale-memory archival; vector retrieval, hybrid RRF fusion, hierarchical compaction, gap detection, and distillation are studied aspirations gated on an embedding-tier decision), port allocation, batch operations, cross-project intelligence, and composite project health assessment (activity + completeness + outcomes). Schema v16 (current) adds the `email_account TEXT` column to `projects`; builds on v15's digest `named_terms`, v14's `bootstrap_primitives` + `project_type_recipe_steps` recipe runner tables, and v13's user-managed `project_types` and reclassified `areas`. 57 MCP tools, importable as @setlist/core by Chorus and Ensemble. Auth always lives upstream — setlist never stores, prompts for, or manages credentials of any kind: shell-commands inherit the user's shell environment, mcp-tool primitives delegate to the host MCP client's session, and Mail.app integration trusts the user's already-authenticated Mail.app session as the boundary. Plugin-style code primitives are explicitly out of scope for v1 to preserve the no-arbitrary-code-execution stance." readme: "Setlist is the TypeScript implementation of the Project Registry — both invisible infrastructure at the center of the user's personal ecosystem and a directly operable desktop control panel. As infrastructure, it provides structured, queryable identity for every project (organized under a user-managed list of areas seeded at install with seven sensible defaults — Work, Family, Home, Health, Finance, Personal, Infrastructure — that the user can rename, recolor, add to, or remove via Settings, with optional parent-child sub-project relationships and an optional per-project `email_account` field), with programmatic administration, capability declarations, unified portfolio memory (10 types with belief classification, temporal validity, entity storage, procedural versioning, four-level scoping with area bubble-up, and triple-gate stale-memory archival; today recall runs over FTS5 full-text search with a composite score over reinforcement, recency, and outcome history — vector retrieval and RRF fusion are studied aspirations consolidated in §2.12.1 and gated on an unresolved embedding-tier decision), port allocation, user-managed project types governing bootstrap behavior, batch operations, and cross-project intelligence via 57 MCP tools and direct library import. Bootstrap is user-composable: each project type carries an ordered **recipe** of bootstrap **primitives** the user assembles in Settings. Three primitive shapes ship in v1 — `filesystem-op` (folder, copy, .gitignore append), `shell-command` (verbatim multi-line strings with `{project.*}` substitution, runs in the project folder under the user's shell environment), and `mcp-tool` (delegates to a tool registered with the host MCP client) — composed from a peer Primitives panel of read-only built-ins plus user-authored entries. Five built-ins ship: `create-folder`, `copy-template`, `git-init`, `update-parent-gitignore`, and as of spec 0.29 `mail-create-mailbox` (a `shell-command` driving Mail.app via AppleScript/`osascript` to create a nested mailbox under a specified account at a templated path; idempotent on duplicate names, pre-flight checks Mail.app is running but does NOT auto-launch it, account-not-found surfaces mid-run via the standard Retry/Skip/Abandon dialog). The five built-ins are user-droppable; an engine step, `register-in-registry`, is rendered as a non-draggable trailer always last. The bootstrap engine walks a snapshot of the recipe in order with combined dry-run + pre-flight tracing surfacing resolved operations and ✓/✗ markers; on mid-run failure it stops with Retry / Skip / Abandon — Retry resumes from the failed step without re-running succeeded shell or mcp-tool steps, Skip continues blindly, Abandon undoes filesystem and git and lists external side-effects honestly as structured `{step, primitive, summary}` entries (so a created Mail.app mailbox or a created Todoist project is named in plain language, not silently dropped). The token resolver supports `{project.name}`, `{project.path}`, `{project.type}`, `{project.parent_path}`, `{project.type.template_directory}`, and `{project.email_account}`. Auth always lives upstream: setlist never stores, prompts for, or manages credentials. The MCP server itself is the onboarding surface: a brief `instructions` paragraph at handshake names the registration → enrichment → digest workflow and the capability item shape; successful `register_project` and `bootstrap_project` responses carry `next_steps` arrays of `{action, why}` entries pointing at the immediate follow-up calls; a `setlist://docs/onboarding` MCP resource holds the full guide for any agent that wants depth; and `portfolio_brief` returns compact `enrichment_gaps` flagging projects missing fields agents need for cross-project reasoning. All four mechanisms are MCP-native, so any conforming client (Claude Code, Codex, Cursor, Continue, Cline, custom) onboards a new project without bespoke client integration, and heavy content lives once in the resource so the on-the-wire surfaces stay token-efficient. As a desktop application, it presents a card-grid dashboard of all projects with multiselect status filtering (archived hidden by default), togglable column visibility, compact/spacious row density, persistent sort across sessions, a configurable default landing view (grouped lanes or flat grid), tabbed project detail views (overview, memory, capabilities, ports), full project CRUD — register, edit, archive, rename, set/clear `email_account` — and a single-page Settings panel for managing areas, project types (including per-type default email account), primitives, view defaults, bootstrap roots, and update channels (Cmd-, opens Settings). All this runs through a native macOS Electron app sharing Chorus's design language (Tailwind CSS 4, Radix UI, terracotta accent, warm charcoal surfaces). The main process imports @setlist/core directly; no API layer sits between the UI and the registry. Distributed as four npm packages (@setlist/core, @setlist/mcp, @setlist/cli, @setlist/app), Setlist is directly consumable by Chorus, Ensemble, and any Node.js tool in the ecosystem, while also standing alone as a full-featured project management surface." tech-stack: [typescript, better-sqlite3, "@modelcontextprotocol/sdk", node, npm-monorepo, electron, electron-updater, react, tailwindcss-v4, radix-ui, applescript, osascript] - patterns: [user-managed-canonical-set, user-managed-project-types, user-composable-bootstrap-recipes, closed-shape-primitive-set, built-in-plus-user-authored, recipe-snapshot-on-retry, stop-and-report-resumable, dry-run-with-preflight, template-token-substitution, auth-lives-upstream, no-arbitrary-code-execution, seeded-then-mutable, seeded-built-in-not-in-default-recipe, structured-external-side-effects, lightweight-process-presence-preflight, resolver-precedence-project-then-default, applescript-for-mail-integration, recipe-step-bound-defaults, idempotent-shell-primitive, no-auto-launch-of-external-apps, menu-bar-persistence-with-tray-anchor, hide-dock-while-hidden-toggle, global-shortcut-with-master-toggle, destructive-actions-default-unbound, inline-conflict-feedback-not-toast, quit-prompt-update-handoff, atomic-prefs-write, plan-then-apply-two-phase-write, managed-entry-marker, backup-before-write, preserve-unmanaged-coexisting-entries, take-over-requires-explicit-consent, recovery-guidance-on-every-result, bounded-shallow-inspection, sanitized-credentials-in-extracted-signals, ignore-list-before-recursion, schema-versioned-extraction-report, first-class-empty-and-unavailable, on-demand-refresh, curated-color-palette, label-rename-stable-id, reassign-before-delete, structural-parent-child, area-scoped-memory-inheritance, atomized-fields, progressive-disclosure, producer-consumer, registration-not-discovery, invisible-infrastructure, operable-surface, config-file-scanning, hub-and-spoke, capability-declaration, definition-is-truth, fuzzy-match-suggestions, archive-triggered-cleanup, producer-attribution, summary-compactness, freshness-importance-scoring, invocation-metadata, retain-recall-reflect, outcome-aware-reinforcement, content-hash-dedup, embedding-provider-abstraction, budget-controlled-recall, four-level-scoping, hybrid-retrieval, belief-classification, temporal-validity, entity-extraction, procedural-versioning, unified-memory-store, template-driven-bootstrap, configure-then-use, shared-design-system, ipc-bridge, per-machine-view-prefs, native-vector-search, hierarchical-compaction, progressive-retrieval, knowledge-distillation, graph-gap-detection, mcp-startup-validation, progress-notification, worst-tier-wins, on-demand-assessment, qualitative-tiers, signed-notarized-builds, silent-download-prompt-before-install, two-channel-release, scenarios-as-contract, canaries-not-gates, narrow-ci-wide-local, edit-time-security-check, release-blocking-preflight, dual-abi-swap-and-restore, detect-and-recover-over-prevent, derived-essence-digest, spec-version-as-staleness-signal, external-generator-internal-store, hosted-digest-generation-with-local-fallback, document-extraction-for-non-code-digests, project-tagged-llm-cost-attribution, filetree-hash-as-staleness-signal, introspected-capability-declarations, client-independent-onboarding, mcp-instructions-on-initialize, next-steps-in-tool-response, mcp-onboarding-resource, enrichment-gap-annotations, pointer-shaped-protocol-surface, single-source-of-truth-resource] - goals: [user-managed-area-organization, user-managed-project-types, user-composable-project-bootstrap, mail-app-mailbox-on-bootstrap, per-project-email-account-targeting, customizable-home-view, sub-project-hierarchy, menu-bar-persistent-app, configurable-global-shortcuts, safe-mcp-client-install-and-remove, two-phase-plan-apply-flow, backup-and-preserve-coexisting-entries, signals-from-disk-on-registration, agent-and-user-readable-workspace-report, unified-project-identity, capability-discovery, programmatic-project-administration, batch-operations, cross-project-task-dispatch, conflict-free-port-allocation, automatic-port-discovery, async-task-execution, cross-project-intelligence, crash-resilient-worker, ranked-cross-project-results, capability-invocation-awareness, portfolio-memory, outcome-reinforcement, hybrid-retrieval, npm-packageable-distribution, canonical-memory-store, chorus-memory-unification, project-bootstrap-and-scaffolding, desktop-control-panel, project-dashboard, project-crud-ui, single-page-settings, implicit-connection-surfacing, fast-first-pass-recall, synthesized-knowledge-from-memory-clusters, memory-graph-blind-spot-detection, project-health-assessment, composite-tier-surfacing, glanceable-portfolio-health, auto-update-with-channels, project-essence-digests, digest-staleness-signal, cross-project-semantic-matching, non-code-project-digests, provider-agnostic-digest-generator, capability-self-registration, client-independent-agent-onboarding, token-efficient-protocol-surfaces] + patterns: [proactive-use-directive-instructions, open-vocabulary-with-canonical-normalization, verb-shaped-tool-names, idempotent-writes-with-documented-collapse-keys, append-only-interaction-log-derived-recency, ambiguous-envelope-on-identity-queries, bundled-claude-code-skills, idempotent-claudemd-rule-injection-between-markers, parametrized-test-fixture-per-storage-interface, user-managed-canonical-set, user-managed-project-types, user-composable-bootstrap-recipes, closed-shape-primitive-set, built-in-plus-user-authored, recipe-snapshot-on-retry, stop-and-report-resumable, dry-run-with-preflight, template-token-substitution, auth-lives-upstream, no-arbitrary-code-execution, seeded-then-mutable, seeded-built-in-not-in-default-recipe, structured-external-side-effects, lightweight-process-presence-preflight, resolver-precedence-project-then-default, applescript-for-mail-integration, recipe-step-bound-defaults, idempotent-shell-primitive, no-auto-launch-of-external-apps, menu-bar-persistence-with-tray-anchor, hide-dock-while-hidden-toggle, global-shortcut-with-master-toggle, destructive-actions-default-unbound, inline-conflict-feedback-not-toast, quit-prompt-update-handoff, atomic-prefs-write, plan-then-apply-two-phase-write, managed-entry-marker, backup-before-write, preserve-unmanaged-coexisting-entries, take-over-requires-explicit-consent, recovery-guidance-on-every-result, bounded-shallow-inspection, sanitized-credentials-in-extracted-signals, ignore-list-before-recursion, schema-versioned-extraction-report, first-class-empty-and-unavailable, on-demand-refresh, curated-color-palette, label-rename-stable-id, reassign-before-delete, structural-parent-child, area-scoped-memory-inheritance, atomized-fields, progressive-disclosure, producer-consumer, registration-not-discovery, invisible-infrastructure, operable-surface, config-file-scanning, hub-and-spoke, capability-declaration, definition-is-truth, fuzzy-match-suggestions, archive-triggered-cleanup, producer-attribution, summary-compactness, freshness-importance-scoring, invocation-metadata, retain-recall-reflect, outcome-aware-reinforcement, content-hash-dedup, embedding-provider-abstraction, budget-controlled-recall, four-level-scoping, hybrid-retrieval, belief-classification, temporal-validity, entity-extraction, procedural-versioning, unified-memory-store, template-driven-bootstrap, configure-then-use, shared-design-system, ipc-bridge, per-machine-view-prefs, native-vector-search, hierarchical-compaction, progressive-retrieval, knowledge-distillation, graph-gap-detection, mcp-startup-validation, progress-notification, worst-tier-wins, on-demand-assessment, qualitative-tiers, signed-notarized-builds, silent-download-prompt-before-install, two-channel-release, scenarios-as-contract, canaries-not-gates, narrow-ci-wide-local, edit-time-security-check, release-blocking-preflight, dual-abi-swap-and-restore, detect-and-recover-over-prevent, derived-essence-digest, spec-version-as-staleness-signal, external-generator-internal-store, hosted-digest-generation-with-local-fallback, document-extraction-for-non-code-digests, project-tagged-llm-cost-attribution, filetree-hash-as-staleness-signal, introspected-capability-declarations, client-independent-onboarding, mcp-instructions-on-initialize, next-steps-in-tool-response, mcp-onboarding-resource, enrichment-gap-annotations, pointer-shaped-protocol-surface, single-source-of-truth-resource, fire-and-forget-write-semantics, slug-plus-alias-reverse-map, lookup-first-capture-opportunistically, derived-not-stored-counters, retention-policy-on-append-only, llm-consumer-decides-on-ambiguity, sentinel-marker-injection, named-strategy-id-per-recipe, projection-not-source-of-truth, no-unresolved-placeholders-preflight-gate, fork-to-customize-built-ins, primitive-self-test-discriminates-env-from-bug, agent-decision-report-on-inspection, trust-disk-over-docs, three-bucket-verification-on-health, needs-human-verification-honest-degrade, dual-manifest-packaging-claude-and-codex, manifest-version-lock-validator, one-command-validate-gate, narrow-fix-safe-diagnose-only, no-runtime-deps-in-shell-primitives, decision-history-facet-on-digests, rejected-log-anti-reproposal, shared-external-file-injection-lineage] + goals: [proactive-agent-use-of-registry, open-vocabulary-with-normalization, fire-and-forget-idempotent-writes, derived-recency-and-frequency-from-event-log, ambiguous-not-silent-disambiguation, bundled-agent-onboarding-skills, idempotent-claudemd-rule, dual-host-bundled-skills, manifest-drift-validator, agent-readable-decision-report, env-vs-bug-bootstrap-routing, narrow-and-trusted-self-diagnostics, honest-degraded-health-bucket, fork-and-customize-built-ins, decision-history-on-digests, anti-reproposal-rejected-log, one-command-validate, user-managed-area-organization, user-managed-project-types, user-composable-project-bootstrap, mail-app-mailbox-on-bootstrap, per-project-email-account-targeting, customizable-home-view, sub-project-hierarchy, menu-bar-persistent-app, configurable-global-shortcuts, safe-mcp-client-install-and-remove, two-phase-plan-apply-flow, backup-and-preserve-coexisting-entries, signals-from-disk-on-registration, agent-and-user-readable-workspace-report, unified-project-identity, capability-discovery, programmatic-project-administration, batch-operations, cross-project-task-dispatch, conflict-free-port-allocation, automatic-port-discovery, async-task-execution, cross-project-intelligence, crash-resilient-worker, ranked-cross-project-results, capability-invocation-awareness, portfolio-memory, outcome-reinforcement, hybrid-retrieval, npm-packageable-distribution, canonical-memory-store, chorus-memory-unification, project-bootstrap-and-scaffolding, desktop-control-panel, project-dashboard, project-crud-ui, single-page-settings, implicit-connection-surfacing, fast-first-pass-recall, synthesized-knowledge-from-memory-clusters, memory-graph-blind-spot-detection, project-health-assessment, composite-tier-surfacing, glanceable-portfolio-health, auto-update-with-channels, project-essence-digests, digest-staleness-signal, cross-project-semantic-matching, non-code-project-digests, provider-agnostic-digest-generator, capability-self-registration, client-independent-agent-onboarding, token-efficient-protocol-surfaces] plugin-version: 0.83.1 --- ``` -Setlist is the TypeScript implementation of the Project Registry and the active intelligence hub for the user's personal ecosystem. Schema v16 (current) adds the `email_account TEXT` column to `projects` to support per-project email targeting for the new `mail-create-mailbox` bootstrap primitive; builds on v15's digest `named_terms`, v14's `bootstrap_primitives` and `project_type_recipe_steps` recipe-runner tables, v13's user-managed `project_types` table (replaces the `projects.type` CHECK with a FK into it) and the reclassified `areas` table (system-owned → user-managed), v12's `project_digests`, v11's structural `area_id` and `parent_project_id` columns, retired `area_of_focus` type, and v10's unified memory types, belief classification, temporal validity, entity extraction, and procedural versioning. 57 MCP tools covering project identity, capabilities, portfolio memory, digests, ports, task dispatch, bootstrap, cross-project queries, and composite project health. Native macOS desktop control panel with a single-page Settings panel (Areas, Project types, Primitives, View, Bootstrap, Updates) for direct human operation. Area-scoped memory inheritance. See §1.5 for origin and port history. +Setlist is the TypeScript implementation of the Project Registry and the active intelligence hub for the user's personal ecosystem. Schema v18 (current) extends the `bootstrap_primitives` table with `is_builtin INTEGER NOT NULL DEFAULT 0` (the canonical-vs-fork marker for the five seeded built-ins, surfaced as read-only in Settings with a Duplicate-to-customize affordance) and an optional `self_test_json TEXT` column (env-vs-bug pre-flight self-test hook), and extends the `project_digests` table with an optional `decision_history_json TEXT` column (bounded dated entries appended by bootstrap recipes and significant structural moves — recipe strategy ID, key parameters, area assignment, parent linking, archive); builds on v17's `interactions` append-only log, v16's `email_account TEXT` column on `projects`, v15's digest `named_terms`, v14's `bootstrap_primitives` and `project_type_recipe_steps` recipe-runner tables, v13's user-managed `project_types` and reclassified `areas`, v12's `project_digests`, v11's structural `area_id` and `parent_project_id` columns, and v10's unified memory types. 59 MCP tools (the new `setlist-doctor` tool diagnoses setlist-shaped problems — broken managed MCP entries, stale port/pid files, ABI cache mismatches, malformed CLAUDE.md sentinel markers, orphaned bootstrap side-effect ledgers — with `--json` for agents and a deliberately narrow `--fix-safe` confined to setlist-owned artifacts; the `vocab` tool surfaces canonical + in-use values for the open-vocabulary fields normalized at write time — `tech_stack`, `patterns`, `topics`, `capability_type`) covering project identity, capabilities, portfolio memory, digests, ports, task dispatch, bootstrap, cross-project queries, composite project health, and setlist self-diagnosis. Native macOS desktop control panel with a single-page Settings panel (Areas, Project types, Primitives, View, App behavior, Bootstrap, MCP clients, Updates) for direct human operation. Area-scoped memory inheritance. Two bundled skills (`/setlist-enroll-project`, `/setlist-portfolio-graph`) ship in the npm packages with **parallel `.claude-plugin/` and `.codex-plugin/` manifests over a single `skills/` tree**, validated by a `validate-skills-manifests` step that version-locks across all manifests and fails the build on drift; the installer copies them with rewritten absolute paths and idempotently injects a proactive-use rule into the user's CLAUDE.md between sentinel markers. See §1.5 for origin and port history. -The rewrite exists because Chorus (Electron + React) and Ensemble need the registry as a direct npm dependency, not a subprocess or MCP-only integration. @setlist/core provides the library API importable from any Node.js process. @setlist/mcp wraps it as an MCP server. @setlist/cli exposes it from the terminal. @setlist/app provides a desktop control panel — an Electron app that imports @setlist/core directly, giving the user a visual surface for project management alongside the programmatic interfaces. The behavioral contract is carried by the 168-scenario holdout set in `.fctry/scenarios.md` and evaluated by LLM-as-judge; vitest is available for targeted unit tests against @setlist/core but is not the truth signal. +The rewrite exists because Chorus (Electron + React) and Ensemble need the registry as a direct npm dependency, not a subprocess or MCP-only integration. @setlist/core provides the library API importable from any Node.js process. @setlist/mcp wraps it as an MCP server. @setlist/cli exposes it from the terminal. @setlist/app provides a desktop control panel — an Electron app that imports @setlist/core directly, giving the user a visual surface for project management alongside the programmatic interfaces. The behavioral contract is carried by the 227-scenario holdout set in `.fctry/scenarios.md` and evaluated by LLM-as-judge; vitest is available for targeted unit tests against @setlist/core but is not the truth signal. --- @@ -54,6 +54,7 @@ The rewrite exists because Chorus (Electron + React) and Ensemble need the regis - 2.14.3 [MCP Client Configuration](#2143-mcp-client-configuration) `#mcp-clients` - 2.15 [Project Health Assessment](#215-project-health-assessment) `#health-assessment` - 2.16 [Workspace Inspection](#216-workspace-inspection) `#workspace-inspection` + - 2.17 [Bundled Claude Code Skills and CLAUDE.md Rule Injection](#217-bundled-claude-code-skills-and-claudemd-rule-injection) `#bundled-skills` 3. [System Behavior](#3-system-behavior) - 3.1 [Core Capabilities](#31-core-capabilities) `#capabilities` - 3.2 [Things the System Keeps Track Of](#32-things-the-system-keeps-track-of) `#entities` @@ -103,13 +104,13 @@ Setlist is a TypeScript monorepo providing the Project Registry as four npm pack - **@setlist/core** -- The library. All registry logic: project identity, field model, variable-depth querying, filtering, migration, port management, capability declarations, portfolio memory (retain/recall/reflect), task queue, cross-project queries, batch operations. Importable from any Node.js process. This is what Chorus, Ensemble, and the desktop app consume directly. -- **@setlist/mcp** -- The MCP server. A thin translation layer wrapping @setlist/core as 57 MCP tools via @modelcontextprotocol/sdk, using stdio transport managed by Claude Code's lifecycle. +- **@setlist/mcp** -- The MCP server. A thin translation layer wrapping @setlist/core as 59 MCP tools via @modelcontextprotocol/sdk, using stdio transport managed by Claude Code's lifecycle. As of spec 0.33, the server's `instructions` envelope is shaped as a proactive-use directive (see `#capability-declarations` (2.11)) rather than a descriptive overview. As of spec 0.34, the surface adds `setlist-doctor` — a diagnose-only-by-default self-check tool with `--json` output and a deliberately narrow `--fix-safe` mode (see `#mcp-clients` (2.14.3)). - **@setlist/cli** -- The CLI. Terminal commands for project management, migration, worker installation, and diagnostics. Entry point: `setlist`. - **@setlist/app** -- The desktop control panel. An Electron application providing a visual surface for project management: a card-grid dashboard of all projects, tabbed project detail views, and project CRUD operations. The main process imports @setlist/core directly via an IPC bridge to the renderer. Consumes design tokens from the `chorus-ui` package (Tailwind CSS 4, Radix UI). Launchable as a standalone macOS .app bundle or via `setlist ui` from the CLI. -The four packages share the same SQLite database (schema v16 as of spec 0.29) at `~/.local/share/project-registry/registry.db`. Library consumers (Chorus, Ensemble) import `@setlist/core` directly rather than opening the file. +The four packages share the same SQLite database (schema v18 as of spec 0.34) at `~/.local/share/project-registry/registry.db`. Library consumers (Chorus, Ensemble) import `@setlist/core` directly rather than opening the file. ### 1.3 Design Principles {#design-principles} @@ -135,6 +136,14 @@ The four packages share the same SQLite database (schema v16 as of spec 0.29) at **Library-first, server-second.** @setlist/core is the primary interface. The MCP server, CLI, and desktop app are thin wrappers. Any capability available through MCP, CLI, or the UI must first be available as a library function. Chorus and Ensemble import @setlist/core directly; they never spawn subprocesses or connect to servers for registry operations. +**Fire-and-forget idempotent writes.** Every write surface in setlist carries an explicit *collapse key* — the field set that determines whether a repeated call is a no-op or a meaningful update — and that collapse key is documented in the tool/library description. Agents call the same write twice in a row without checking "did I already do this?" because the contract guarantees the second call observes the same end state without side effects (no duplicate rows, no reinforcement bumps where they aren't wanted, no audit-log churn). The S112–S117 "structurally drift-free" principle for capability self-registration generalizes here: idempotency is not a property of a single tool but a contract every write surface honors. See `#capability-declarations` (2.11) "Idempotent writes and collapse keys" for the per-tool table. Informed by `directory-mcp` (see §6.1, https://github.com/ePaint/directory-mcp), whose entire write API is shaped around the same fire-and-forget contract. + +**Open-vocabulary with canonical normalization.** Open-string fields (`tech_stack`, `patterns`, `topics`, `capability_type`) accept anything the producer writes — setlist never rejects an unknown term — but they are normalized at the write boundary against a canonical set with an alias reverse-map. `TypeScript`, `typescript`, and `ts` collapse to the same canonical slug; novel terms pass through unchanged and may rise into the canonical set later. A `vocab` MCP tool surfaces both the canonical set and the live in-use values so agents and humans can see what the registry already knows about. The point is not to constrain language but to make cross-project queries match what they should match without the consumer guessing at casing or synonyms. Informed by `directory-mcp` (see §6.1), whose `vocab.py` module ships this pattern as a `_FACETS` + `_slug` + `_reverse` triple. + +**Derived recency, never stored counters.** Recency and frequency signals (last touched, touch count, activity heat) are computed by query against an append-only interaction log, never stored as mutable counters. The log records every registry touch — `search_projects`, `recall`, `get_project`, `cross_query`, capability lookups — as an immutable row; `SELECT COUNT(*)` and `SELECT MAX(at)` derive the metrics on demand. This eliminates a class of bugs (counter drift, race conditions on concurrent updates) and gives `recall` ranking, `portfolio_brief` activity signals, and any future analytics surface a single canonical history to read from. A retention policy bounds the log so it doesn't grow without limit. See `#portfolio-memory` (2.12) "Append-only interaction log" for the schema and retention details. Informed by `directory-mcp` (see §6.1), whose interaction model is the same shape — adapted here with an explicit retention policy because setlist's portfolio is multi-project and longer-lived than directory-mcp's single-user assumption. + +**Ambiguous, not silent, on identity disambiguation.** When `search_projects`, `recall`, or any future identity-resolution surface matches multiple candidates above a confidence threshold, the response carries an explicit `{ambiguous: true, alternatives: [...]}` envelope rather than silently picking the top hit. The registry never drops candidates the consumer might want to see. Best-guess ordering stays connection-first (relevance, recency, then alphabetical) — the ranking is unchanged; the change is that the second-place candidate is no longer invisible to the caller. The LLM consumer (or the human user, in a UI surface) decides which candidate is the right one. See `#cross-project` (2.9) "Ambiguous response envelope" for the contract. + **Shared design language.** The desktop app consumes `chorus-ui`, an extracted design token package providing CSS custom properties and TypeScript constants for the shared visual language (terracotta accent, warm charcoal surfaces, Inter typeface). Setlist and Chorus are separate products that coexist in the same ecosystem — sharing design tokens from a single source makes them feel like siblings, not strangers. The tokens are imported as `chorus-ui/tokens.css` (CSS custom properties on `:root`) or `chorus-ui/tokens` (TypeScript constants). ### 1.4 What Success Looks Like {#success} @@ -148,7 +157,9 @@ The four packages share the same SQLite database (schema v16 as of spec 0.29) at - Cross-project queries (`cross_query`, `portfolio_brief`) return ranked results grounded in the registry's structured fields. - Composite project health assessment (`assess_health`) returns a qualitative tier (green/yellow/red) combining activity, completeness, and outcome signals. - Per-project essence digests (`get_project_digest`, `get_project_digests`, `refresh_project_digest`) carry free-form summaries suitable for embedding, semantic matching, or drop-in cross-project context. Digests are versioned deterministically — by the source spec version for code projects, or by a file-tree hash for non-code projects whose content lives in heterogeneous documents. The generator defaults to a hosted provider (Gemini 2.5 Flash-Lite via OpenRouter) with a local MLX fallback and extracts markdown from PDFs and Office documents when needed. -- Behavioral correctness is carried by the 168-scenario holdout set in `.fctry/scenarios.md` (see §4.5 `#testing-discipline`), evaluated by LLM-as-judge; vitest is available for targeted unit tests against @setlist/core where a fast, local signal is useful. +- Behavioral correctness is carried by the 227-scenario holdout set in `.fctry/scenarios.md` (see §4.5 `#testing-discipline`), evaluated by LLM-as-judge; vitest is available for targeted unit tests against @setlist/core where a fast, local signal is useful. +- `npm run validate` is the one-command CI gate that fans out to type checking, unit tests, build, the pre-flight ABI check, and the `validate-skills-manifests` step that locks Claude-Code and Codex bundled-skill manifests in sync. Individual commands stay available for targeted runs; `validate` is the meta — green here is the bar (see §4.5 `#testing-discipline`). +- `setlist-doctor --json` produces a machine-readable health report agents can act on; `setlist-doctor --fix-safe` is the narrow recovery path for the explicitly-listed setlist-shaped problems and never edits anything outside setlist artifacts (see `#mcp-clients` (2.14.3)). - The desktop app launches as a standalone macOS .app and via `setlist ui`. It displays a card grid of all registered projects, allows navigating to project detail tabs, and supports registering, editing, archiving, and renaming projects through the UI. - Changes made through the desktop app are immediately visible to agents via MCP and library import. Changes made by agents are visible in the desktop app on next render. - The desktop app enforces single-instance: launching a second instance activates the existing window. @@ -430,6 +441,17 @@ Cross-project query results include producer and timestamp attribution -- which Cross-project query results are ranked by a combination of relevance, freshness, and importance -- not returned as a flat, unordered list. Recent matches outrank older ones through time-decay weighting on the existing `updated_at` timestamps. Matches containing high-signal keywords (e.g., "decision", "architecture", "critical", "breaking change") receive an importance boost. Core identity fields (name, type, status) are treated as evergreen and do not decay. The scoring is deterministic and runs entirely within the existing SQLite query. +**Ambiguous response envelope.** When `search_projects`, `recall`, or `cross_query` finds multiple candidates above a confidence threshold for an identity-shaped query — e.g., "the registry project," "knowmarks," "setlist" — the response carries an explicit `{ambiguous: true, alternatives: [...]}` envelope rather than silently picking the top hit. Specifics: + +- The envelope appears alongside the top-ranked result, not in place of it. The shape is `{result: , ambiguous: true, alternatives: [{name, score, why}, ...]}`. When the top match is unambiguous, `ambiguous` is `false` (or omitted) and `alternatives` is empty (or omitted). +- The confidence threshold is a relative gap, not an absolute floor: if the second-place candidate's score is within ~15% of the top score, the response is ambiguous. Calibration is a build-time decision the agent owns; the spec commits to the contract (relative gap, not absolute), not the exact percentage. +- The `alternatives` array contains up to four entries by default — enough for the consumer to choose, not so many it becomes ambient noise. Each entry includes `name`, `score`, and a one-line `why` explaining the match (e.g., "matches topic `mcp`," "matches description fragment `project registry`," "name fuzzy match distance 2"). +- Best-guess ordering stays connection-first: the top `result` is still chosen by the existing relevance + freshness + importance scoring. The ambiguity signal sits on top of the ranking — it does not change which candidate ranks first. +- The LLM consumer decides what to do with the alternatives. A coding agent may pick the top match silently when the user's prompt context resolves the ambiguity, or it may ask the user. A UI surface (the desktop app's command palette, for example) renders the alternatives as picker rows. The registry's contract ends at returning the envelope; consumer policy is out of scope. +- The contract is **strictly additive**: callers that ignore `ambiguous` and `alternatives` see the same `result` field they always did. No existing caller breaks. + +The same envelope appears on `search_projects` (name-shaped queries), `recall` (project-scoped memory queries when the project name is itself ambiguous), and `cross_query` (when the natural-language query resolves to multiple candidate projects). Cited inline from `directory-mcp` (see §6.1, https://github.com/ePaint/directory-mcp), whose `_dossier_dict` returns this shape today. + ### 2.10 Port Registry {#port-registry} The registry manages port allocation across all project services -- dev servers, databases, MCP servers, debuggers, websockets, anything that binds a port. Ports are a shared, finite resource across the ecosystem; the registry ensures no two projects claim the same port. @@ -534,7 +556,7 @@ Setlist self-registers its own capabilities on every MCP server startup. The ope Three surfaces are registered on startup, each introspected directly from the running code: -- **MCP tools** — every tool the server exposes (currently 56), registered with `type: tool`. The source of truth is the tool-registration array in the MCP server entrypoint; whatever the server is about to expose to clients is what lands in the registry. +- **MCP tools** — every tool the server exposes (currently 58), registered with `type: tool`. The source of truth is the tool-registration array in the MCP server entrypoint; whatever the server is about to expose to clients is what lands in the registry. - **CLI commands** — every top-level subcommand in `@setlist/cli` (e.g., `migrate`, `digest refresh`, `ui`), registered with `type: cli-command`. The source of truth is the command-registration structure in `packages/cli/src/index.ts`. - **Library exports** — every public export from `@setlist/core`, registered with `type: library`. The source of truth is the package's public API entrypoint (`packages/core/src/index.ts`). @@ -548,7 +570,7 @@ Introspection is the source of truth — no hand-maintained `capabilities.json` Setlist's MCP server carries the full registration → enrichment → digest workflow on the protocol surface itself, so any conforming MCP client (Claude Code, Codex, Cursor, Continue, Cline, custom) can onboard a new agent into the registry without bespoke client integration. Four mechanisms layer for token efficiency — heavy content lives once in the resource; the rest are lightweight pointers. -1. **Server `instructions` on initialize.** At handshake, setlist returns a brief paragraph (under ~150 words) naming what it is, the four-step workflow (register → enrich → write fields → refresh digest), the capability item shape (`name` / `capability_type` / `description` required, five optional fields), and a pointer to the onboarding resource. Compliant MCP clients pass it to the model as a session-level hint, read once per session. +1. **Server `instructions` on initialize — proactive-use directive.** At handshake, setlist returns a brief paragraph (under ~150 words) shaped as a **directive in imperative voice**, not a descriptive overview. The paragraph instructs the consuming model to **USE PROACTIVELY**, with three short rules: **LOOK UP FIRST** (before asking the user which project they're in, call `get_project` or `search_projects`), **CAPTURE OPPORTUNISTICALLY** (when a project's identity, capabilities, or memory state changes during a session, write it back via `register_project` / `register_capabilities` / `retain` without waiting for permission), and **STAY CONSISTENT** (use the canonical vocabulary surfaced by `vocab` for `tech_stack` / `patterns` / `topics` / `capability_type` rather than coining synonyms). The directive still names what setlist is, the four-step workflow, the capability item shape, and the pointer to the onboarding resource — but those facts sit underneath the directive, not at the top. Compliant MCP clients pass it to the model as a session-level hint, read once per session. The shift from descriptive to imperative is informed by `directory-mcp` (see §6.1, https://github.com/ePaint/directory-mcp), whose `_INSTRUCTIONS` constant in `directory/mcp/api.py:74-95` ships this exact shape; the change is editorial, not protocol-level, so no MCP client needs an update to consume it. 2. **`next_steps` in registration responses.** Successful `register_project` and `bootstrap_project` responses include a structured `next_steps` array — each entry shaped as `{action: , why: }` — listing the immediate follow-up calls in order (typically `enrich_project`, `write_fields`, `register_capabilities`, `refresh_project_digest`). The same shape returns from `enrich_project`, `write_fields`, and `register_capabilities` themselves; the array shortens as fields fill in and ends as `[]` when no gaps remain. The recipe travels in the response that confirmed the action — no separate fetch. 3. **`setlist://docs/onboarding` resource.** The full enrichment guide — identity / profile / fields / capabilities / digest, with field semantics and write-good-descriptions guidance — is exposed via `ListResources` / `ReadResource`. This is the single source of truth for agent-facing onboarding documentation; `instructions` and `next_steps` both reference it but never duplicate its content. Fetched on demand only. 4. **`portfolio_brief` enrichment-gap annotations.** The `portfolio_brief` response includes an `enrichment_gaps` array of compact `{project, missing: [field, ...]}` entries flagging registered projects that lack fields agents need for cross-project reasoning (e.g., `description`, `tech_stack`, `digest`). No prose, no scoring. The agent comparing its own working directory against `list_projects` detects "I'm not yet registered"; the gap annotations guide enrichment for projects that are. @@ -557,6 +579,44 @@ Setlist's MCP server carries the full registration → enrichment → digest wor The four mechanisms are MCP-native, not Claude-specific: any client implementing the MCP `instructions` field, structured tool responses, the resources capability, or `portfolio_brief` calls participates in the same workflow without code changes. +**Open-vocabulary fields with canonical normalization:** + +Four fields in setlist are open strings — agents can write anything — but the registry normalizes the input at the write boundary so that synonyms, casing variants, and aliases collapse to a single canonical slug for cross-project queries. The four fields are: + +- `tech_stack` (on `register_project` / `enrich_project` / `write_fields`) +- `patterns` (same surfaces) +- `topics` (on `enrich_project`) +- `capability_type` (on `register_capabilities`) + +**Normalization mechanics.** Each field has a built-in canonical set with an alias reverse-map. The normalizer applies, in order: lowercase, hyphenate, strip non-alphanumeric, then look up against the alias map. `TypeScript`, `typescript`, `ts`, and `TS` all collapse to canonical slug `typescript`. `React`, `react.js`, `reactjs` all collapse to `react`. `MCP tool`, `tool`, `mcp_tool` all collapse to `tool` for `capability_type`. Novel terms not in the alias map pass through normalized (lowercase + hyphenate) but unchanged in meaning — the registry **never rejects an unknown term**. The canonical set grows over time by editorial decision, not by user action; new aliases land in the next spec evolve. + +**Write-time only.** Normalization happens at the write boundary in `@setlist/core` — once on the way in, never on the way out. Stored values are the normalized canonical slugs. Consumers reading the data see canonical slugs and rely on them for cross-project matching. There is no second normalization pass at read time, and no compatibility shim translating old non-canonical writes — pre-0.33 rows with `TypeScript` continue to read as `TypeScript` until the next write rewrites them through the normalizer. + +**The `vocab` MCP tool.** A new MCP tool, `vocab`, surfaces both the canonical set and the live in-use values for any of the four normalized fields, so agents and humans can see what the registry already knows about. The response shape is `{field: , canonical: [, ...], in_use: [{slug, count}, ...], aliases: {: [, ...]}}`. `canonical` lists the editorially-curated slugs for the field; `in_use` lists every slug that currently appears in at least one project row, with its occurrence count; `aliases` reveals which alias variants collapse into each canonical slug, so a curious agent can understand why its `TypeScript` write surfaced as `typescript`. The tool brings setlist to 58 MCP tools. Agents call `vocab` before composing a `tech_stack` array if they want to align with the established vocabulary, and skip it when they want to write freely (the registry handles both). + +This pattern is cited inline with `directory-mcp` (see §6.1, https://github.com/ePaint/directory-mcp), whose `directory/vocab.py` ships the same `_FACETS` + `_slug` + `_reverse` + `normalize` + `suggested` triad. The adaptation: directory-mcp uses Python dictionaries; setlist uses TypeScript const records. The contract is the same. + +**Idempotent writes and collapse keys:** + +Every write surface in setlist is idempotent — repeated identical calls produce the same end state with no observable side effects (no duplicate rows, no spurious reinforcement, no audit-log churn, no version-chain inflation). The mechanism is an explicit **collapse key** per write surface, documented in the tool description so that agents can fire-and-forget without checking "did I already do this?" + +The collapse key per surface: + +| Tool / surface | Collapse key | Idempotent behavior on repeat | +|----------------|--------------|-------------------------------| +| `register_project` | `name` | Repeat with identical fields is a no-op. Repeat with changed fields is the equivalent of `update_project` (existing row updated, no duplicate). | +| `register_capabilities` | `(project_id, capability set)` | Replace semantics (already documented in §2.11). Repeat with identical set is a no-op (no UPDATE issued, no `updated_at` bump). | +| `retain` | `(content_hash, project_id, scope)` | Repeat reinforces the existing memory (increments `reinforcement_count`, bumps `last_reinforced`) rather than duplicating. **Reinforcement is intentional behavior, not a side effect** — the contract is "no new row," not "no observable change." Callers that want pure no-op behavior (e.g., bulk reimport) pass `is_static=true` to suppress reinforcement. | +| `write_fields` | `(project_id, field_name)` | Repeat with identical value is a no-op (no UPDATE, no `updated_at` bump on the field row). Repeat with new value updates the field; the prior value is not preserved in a history table (writes are last-writer-wins on a per-field basis, which is the §2.5 producer-isolation contract). | +| `enrich_project` | `(project_id, field_name)` per array element | Union semantics: each goal/topic/entity is collapsed by string equality against the existing array; repeats are no-ops. | +| `claim_port` | `(port, project_id)` | Repeat by the same project is a no-op success. Repeat by a different project fails with the existing port-conflict error. Releasing and re-claiming is two operations, not one — the collapse key is per-call, not per-port-lifecycle. | + +Idempotency on `retain` is the subtle case: reinforcement is an outcome that affects scoring, so it is **not** invisible — the recall ranking will reflect the reinforcement on the next call. The contract distinction is that idempotency means "no new row, no broken invariant," not "no observable change anywhere in the system." Agents writing in tight loops should batch retains rather than firing them repeatedly; the contract makes the repetition safe, not free. + +Every tool description in the MCP server's schema names its collapse key in the `description` field, so an agent reading the tool list at handshake sees the idempotency contract per-tool without consulting external documentation. This pattern generalizes the S112–S117 structural drift-freeness from the capability-registration loop to the entire write surface. Informed by `directory-mcp` (see §6.1, https://github.com/ePaint/directory-mcp), whose entire write API is documented around the same fire-and-forget contract. + +**Naming guideline for new MCP tools.** For MCP tools added in spec 0.33+, favor verb-shaped, agent-intention names (e.g., `vocab`, `whois`, `remember`, `relate`, `link`) over schema-leaking names (e.g., `write_fields`, `batch_update`). The existing 57 tool names are **not** renamed — that would break every consumer — but this convention governs additions going forward. The new `vocab` tool described above follows the guideline. This is a forward-looking editorial constraint, not a scenario; it informs review of future evolves that add tools. Footnote-prominence reference: `directory-mcp` (see §6.1), whose module docstring at `directory/mcp/api.py:7-11` articulates the same principle. + ### 2.12 Portfolio Memory {#portfolio-memory} The registry maintains a structured memory store that transforms it from a static project phone book into a learning system. Agents retain decisions, outcomes, patterns, preferences, dependencies, corrections, learnings, observations, and procedural memories as typed entries. These memories are recalled via budget-controlled FTS5 full-text search today (with vector / graph / RRF-fusion retrieval studied as aspirations in §2.12.1), reinforced by build outcomes, and consolidated through background reflection. @@ -708,6 +768,24 @@ The response is structured data, not synthesized prose. The calling agent (e.g., - `inspect_memory` -- View a specific memory's full details. - `configure_memory` -- Set memory configuration: embedding provider, reflect schedule, reflect threshold. +**Append-only interaction log:** + +Setlist maintains an append-only `interactions` log of every registry touch — `search_projects`, `recall`, `get_project`, `cross_query`, `query_capabilities`, capability lookups, and any future read surface. Every call records one row at the start of the call (so failed calls are also logged, which is itself a signal). Rows are immutable: there is no UPDATE, no DELETE, no in-place counter; recency and frequency derive from the log by query. + +**Row shape.** Each row carries `(id INTEGER PK AUTOINCREMENT, project_id INTEGER REFERENCES projects(id) ON DELETE CASCADE, surface TEXT NOT NULL, query TEXT, at TEXT NOT NULL, session_id TEXT, agent_role TEXT)`. `surface` is the tool or library entrypoint that wrote the row (e.g., `search_projects`, `recall`, `get_project`). `query` is the natural-language or structured query the caller passed (nullable for surfaces with no query, e.g., `get_project`). `project_id` is the project the touch applies to when known; cross-project queries write one row per matched project. `at` is ISO timestamp. `session_id` and `agent_role` propagate from the calling context when available, so analytics can distinguish agent-driven touches from desktop-app browsing. + +**Derived recency and frequency.** Three signals are computed by query, never stored: + +- **Last touched per project:** `SELECT MAX(at) FROM interactions WHERE project_id = ?`. Surfaces in `portfolio_brief` activity, the Home view's "Recently active" sort, and recall ranking as a recency boost. +- **Touch count per project per window:** `SELECT COUNT(*) FROM interactions WHERE project_id = ? AND at > datetime('now', '-7 days')`. Surfaces in `portfolio_brief` as a frequency signal and feeds recall's outcome-aware reinforcement as an ambient access-pattern term. +- **Recall ranking term:** The recall scorer adds a small recency boost to memories whose `project_id` has been touched recently — frequently-visited projects' memories rank slightly higher in cross-project recall. The boost is bounded so it cannot dominate content relevance. + +**Retention policy.** Unlike `directory-mcp`'s no-cap single-user design, setlist's portfolio is multi-project and longer-lived, so the log carries an explicit retention policy: **each bound applies independently — a row is pruned when EITHER it is older than 90 days OR the project's row count exceeds 10,000, in which case the oldest excess drops.** The two bounds compose restrictively because bounded growth is the actual safety property; whichever bound is tighter for a given row wins. The 90-day floor preserves the activity signal across the recall freshness window (memories decay over 90+ days; the log retains the touches that informed that decay). The 10k-per-project cap prevents the log from growing without limit on hot projects with continuous agent activity. Pruning is hard delete (not soft) because the rows are themselves the source of truth for derived counters — soft-deleted rows would skew the COUNT(*). Both phases run in a single transaction so a mid-prune failure rolls back the whole pass. The policy is configurable via `configure_memory` (new keys: `interactions_retention_days` default 90, `interactions_max_rows_per_project` default 10000), with both knobs validated as positive integers at the MCP boundary. + +**No write-path latency cost.** The interaction-log write is a single synchronous SQLite INSERT on `better-sqlite3`'s synchronous API — measured in microseconds, well below the existing per-call latency floor of the surfaces it records. Pruning runs in the background reflection cycle, not on the hot read path. + +**Schema v17.** The `interactions` table is added in schema v17. The migration plan and full DDL appear in `#schema` (5.2). Cited inline from `directory-mcp` (see §6.1, https://github.com/ePaint/directory-mcp), whose interaction-log shape (CLAUDE.md "Core model" bullet; README "How disambiguation works") is the direct precedent. Setlist's adaptation adds the retention policy and the cascade-on-archive foreign key. + #### 2.12.1 Deferred Aspirations {#deferred-aspirations} The following memory behaviors have accumulated across spec evolutions as studied designs. None of them are built today. Each is gated on the same upstream resolution: **the embedding-tier decision** (see "Embedding provider model" above) — whether setlist's default posture is OpenAI, Ollama, or FTS5-only. That choice governs which retrieval legs exist, which in turn governs which of these behaviors are feasible. Framing follows the EngramMemory / OpenKL pattern-study style established in §6.1 `#inspirations` and earlier evolve cycles: catalog entries that inform design, not commitments to build. @@ -730,7 +808,24 @@ The following memory behaviors have accumulated across spec evolutions as studie ### 2.13 Project Bootstrap {#project-bootstrap} -The registry can create a new project end-to-end: registering its identity, scaffolding its folder, populating it from templates, optionally initializing version control, optionally creating a Mail.app mailbox for the project's correspondence, and optionally running additional steps the user has composed onto the project's type. Bootstrap is a **recipe runner**: each project type carries an ordered list of bootstrap **primitives** with bound parameters, and the engine walks that recipe in order. This subsumes the manual `new-project.sh` shell script entirely — bootstrap is a first-class registry operation available through MCP, CLI, and library import. Behavioral contract: scenarios S139–S168. +The registry can create a new project end-to-end: registering its identity, scaffolding its folder, populating it from templates, optionally initializing version control, optionally creating a Mail.app mailbox for the project's correspondence, and optionally running additional steps the user has composed onto the project's type. Bootstrap is a **recipe runner**: each project type carries an ordered list of bootstrap **primitives** with bound parameters, and the engine walks that recipe in order. This subsumes the manual `new-project.sh` shell script entirely — bootstrap is a first-class registry operation available through MCP, CLI, and library import. Behavioral contract: scenarios S139–S168 (recipe runner, mail-create-mailbox); scenarios S212–S217 (spec 0.34: named strategy IDs, `is_builtin` fork-to-customize, primitive self-test routing, `bootstrap-config.json` projection). + +**Named recipe strategies.** Every recipe carries a **stable strategy identifier** — e.g., `code-with-mail-v2`, `non-code-area-v1`, `code-baseline-v1` — that names the *what* of the recipe independently of the *how* (which primitives, in what order). The strategy ID is editable from the project type editor (one line above the Steps list), defaults to `-v1` for any new type, and is appended to: + +- The bootstrap response's `executed_steps` payload (`strategy_id` alongside `executed_steps`) +- The `bootstrap-config.json` projection (see below) +- The project digest's `decision_history` (see `#entities` (3.2)) — every successful bootstrap appends one entry naming the strategy ID and the bound key parameters (e.g., `2026-06-07: bootstrapped via code-with-mail-v2, area=infrastructure, email=mike@h3r3.com`) +- The `assess_health` output (under a small `provenance` field, when present) + +Strategy IDs are user-visible names, not opaque hashes — the user can read them in the digest and the bootstrap report and immediately know which shape the project was bootstrapped under. Renaming a strategy is a label-only update: existing projects' digests retain the historical ID they were bootstrapped under, future bootstraps pick up the new ID. The strategy ID *selects which linear recipe to run*; setlist does not introduce conditional or branching primitives in v1 (treating branching as a candidate for a future §4.3 boundary question if and when real demand for it surfaces, since the closed three-shape primitive set is a hard constraint). Adopted from `app-it` (§6.1, https://github.com/Christian-Katzmann/app-it) — see `references/strategies.md` in app-it. + +**`is_builtin: true` on built-in primitives, fork-to-customize.** The five seeded built-in primitives (`create-folder`, `copy-template`, `git-init`, `update-parent-gitignore`, `mail-create-mailbox`) are marked `is_builtin: true` in the registry (schema v18, see `#schema` (5.2)). Read-only across every surface — the Settings → Primitives panel renders the built-ins block as read-only entries with a **`Duplicate to customize`** affordance beside each one. Clicking Duplicate creates a user-owned copy with `is_builtin: false`, prefilled name `-custom` (user-editable, must be unique), the same shape and parameter declarations, and a `forked_from: ` provenance pointer (stored on the row, surfaced in the primitive picker). User copies are fully editable. The original built-in is never modified by the fork. Recipes that reference the original built-in continue to reference it; the user explicitly swaps the recipe step to the forked copy in the type editor if they want the customized variant. **Built-ins ship locked to the setlist binary version** — a setlist upgrade can update a built-in's shape and the user's existing recipes pick it up; a setlist upgrade never touches user-owned forks. Adopted from `app-it` (§6.1) — see `AGENTS.md:33` and `references/generated-files.md:30`. + +**Primitive self-test hooks (env-vs-bug routing).** Each primitive carries an optional `self_test` hook — a small declarative or shell-based probe the engine runs during pre-flight against a sandbox/dry-target before the real invocation. The self-test's purpose is **routing failures correctly**: when the real primitive fails, the self-test result discriminates `env_unavailable: ` (the user's environment isn't set up — Mail.app not running, network unreachable, MCP server not registered, missing binary on PATH) from `primitive_failure: ` (the primitive itself is broken or the user's command is wrong). Self-tests run in pre-flight order before any side effect; a failed self-test surfaces in the dry-run trace as a third marker — `?` for env-unavailable alongside the existing `✓` / `✗` — and aborts the bootstrap with the env reason named in plain language. The pre-flight Mail.app process check already in v0.29 is the canonical self-test pattern; v0.34 generalizes it: a `shell-command` primitive may declare a `self_test` string (run with the same `{project.*}` substitution; exit zero = env OK), a `mcp-tool` primitive's self-test is implicit (tool-registered check), and `filesystem-op` primitives have no self-test (their pre-flight target-reachability check covers the same ground). The `self_test_json` storage column on `bootstrap_primitives` (schema v18) carries the per-primitive declaration. **Forbidden side effects in self-tests:** a self-test must not write files, create external resources, or hold state — failure to honor this is a primitive-author bug, surfaced in code review, not enforced at the engine boundary. Adopted from `app-it` (§6.1) — see `references/verification.md:8-19`. + +**`.fctry/bootstrap-config.json` projection (NOT source of truth).** When `bootstrap_project` succeeds, the engine writes a **`.fctry/bootstrap-config.json`** file inside the new project's folder. This file is a **projection of registry state, not the source of truth** — the registry row remains canonical (§1.3 "Definition is truth"), and the file is rewritten on demand from the registry. The projection carries: the registered project name, the chosen project type, the strategy ID, the resolved `email_account` (when set), every `{project.*}` token resolved at bootstrap time with their literal values (e.g., `__APP_NAME__: my-thing`, `__PORT__: 5173`, `__EMAIL_ACCOUNT__: mike@h3r3.com`), the bootstrap timestamp, the spec/external version pair at write time, and a `_generated_from: setlist@` provenance line. Templates copied by `copy-template` may reference these placeholder tokens (`__APP_NAME__` / `__PORT__` / `__EMAIL_ACCOUNT__` style) and the engine resolves them against the registry data at materialization. A **`no-unresolved-placeholders` pre-flight gate** scans the resolved template output for any remaining `__SOMETHING__` token pattern and aborts the bootstrap before any side effect runs when one is found — preventing a project from being scaffolded with literal `__APP_NAME__` text in its files. Future agents and humans see one file naming the bootstrap parameters instead of having to reach back into the registry to reconstruct them; the registry remains the canonical store and the projection is regeneratable from it. Adopted from `app-it` (§6.1) — see `references/generated-files.md:50-95` and `AGENTS.md:42`. + +**Cross-section note on external-file injection lineage.** Bootstrap's `bootstrap-config.json` write into the new project folder, the §2.14.3 MCP-clients install/remove writes into Claude Desktop's `claude_desktop_config.json` and Codex's `config.toml`, and the §2.17 bundled-skills installer's writes into `~/.claude/skills/` and `~/.claude/CLAUDE.md` are three instances of the same shape: setlist writes into a file owned by another application or by the project's own non-setlist tooling. Each surface invented its own backup convention (`.setlist-backup` for MCP clients, `.local.md` sibling preservation for skills, no backup needed for `bootstrap-config.json` because the file is regeneratable from the registry), its own marker convention (`__setlist: true` JSON field for MCP, `` / `` sentinels for CLAUDE.md, `_generated_from` provenance line for `bootstrap-config.json`), and its own dry-run/preview surface (the MCP-clients Plan→Apply, the dry-run trace for bootstrap, the installer's `--no-rule` / `--no-skills` opt-outs). This is acknowledged in the spec as a **candidate for a future shared `external-file-injection` contract** — the three surfaces could share a common backup/marker/recovery primitive layer — but is **not refactored in 0.34**. The acknowledgment exists so that the third occurrence does not invent a fourth convention; the next surface that needs the same shape should review these three first. **Three primitive shapes.** A primitive is one named, parameterized step the engine knows how to execute. Three shapes are supported in v1; the set is closed (no plugin-style code primitives, no `http-call` shape): @@ -1089,6 +1184,32 @@ The guidance keeps the surface honest. When apply fails partway (config locked, Setlist writes to the user's local config file and nowhere else. It does not log into Claude Desktop or Codex, does not store or fetch tokens, and never inspects the other servers' credentials in the same config (it preserves them byte-for-byte). The trust boundary is the same as everywhere else in Setlist: the user already authenticated their clients, and Setlist edits a local file in their home directory. +**`setlist-doctor` — diagnose-only by default, narrow `--fix-safe`.** + +The MCP client write surface, the auto-update channel, the native binding ABI swap, the bootstrap engine's external side-effect ledger, and the CLAUDE.md sentinel-block injection are five places where setlist's runtime state can drift out of agreement with the disk. `setlist-doctor` is the recovery tool — a 59th MCP tool (and matching `setlist doctor` CLI command) that **inspects setlist-shaped problems and reports them, optionally fixing the narrow subset of recoverables**. Adopted with **panel** prominence from `app-it` (§6.1, https://github.com/Christian-Katzmann/app-it) — see `references/troubleshooting.md:40-55` and `references/verification.md:21-56`. + +**What `setlist-doctor` checks (the five categories).** Each check produces a `{check_id, status, summary, evidence, recoverable}` row in the report: + +- **Broken managed MCP entries** — A row in Claude Desktop's `mcpServers` or Codex's `mcp_servers` table that carries the `__setlist: true` marker but whose `command` path, `args`, or `cwd` no longer point to a valid setlist install (e.g., the npm global path moved during a node version swap). Status: `broken` / `ok` / `not_applicable`. Recoverable: yes — the `--fix-safe` path rewrites the entry's `command` and `args` to match the currently-installed setlist binary, with a backup before write (same `.setlist-backup` convention as the regular install flow). +- **Stale port/pid files** — A pid file written by a previous setlist process (the launchd async worker, a packaged Electron run) that points to a pid no longer running. Status: `stale` / `current` / `absent`. Recoverable: yes — the `--fix-safe` path deletes the stale pid file. +- **ABI cache mismatches** — The `packages/app/native-cache/better_sqlite3.abi-.node` cache contains a binary that does not load under the current Electron's ABI, or the active `better_sqlite3.node` symlink points at a stale ABI. Status: `mismatch` / `ok` / `not_built`. Recoverable: yes — the `--fix-safe` path runs the launcher's reconciliation pass, copying the correct cached binary into place. Forbidden from rebuilding the binding (that needs `npm run repair:mcp-abi` or `electron-rebuild`, which is outside `--fix-safe`'s scope). +- **Malformed CLAUDE.md sentinel markers** — Exactly one of `` / `` is present in `~/.claude/CLAUDE.md` (or `$CLAUDE_CONFIG_DIR/CLAUDE.md`); see `#bundled-skills` (2.17) "Idempotency contract." Status: `malformed` / `ok` / `absent`. Recoverable: **no** — the installer's own contract refuses to write in this state, and `setlist-doctor` mirrors that refusal. The doctor reports the lone marker's line number and instructs the user to either delete the orphan or add the matching one by hand. +- **Orphaned bootstrap side-effect ledgers** — A bootstrap that ran to Abandon left a `.fctry/abandoned-side-effects.json` ledger in the new project's folder (the structured `{step, primitive, summary}` enumeration from `#project-bootstrap` (2.13) "Failure handling"), and the file is older than 30 days. Status: `orphan` / `current` / `absent`. Recoverable: yes — the `--fix-safe` path moves the ledger to `.fctry/abandoned-side-effects.archived/.json` (rotation, not delete — the file records external resources the user may want to clean up later) and logs the move. + +**`--json` output for agents.** The MCP tool returns the report as a structured `{checks: [...], summary: {ok, broken, malformed, stale, orphan, not_applicable}, version: setlist@, generated_at: }` envelope. The CLI command emits a human-readable formatted table by default and the same JSON envelope when invoked as `setlist doctor --json`. Agents consume the JSON; humans read the table. + +**`--fix-safe` scope ceiling (narrowness is the feature).** The `--fix-safe` flag attempts the recoverable fixes named above and **nothing else**. It is forbidden from: + +- Editing project code (any file under a registered project's filesystem path that is not a setlist-owned artifact) +- Killing or restarting processes (the user owns process lifecycle; `--fix-safe` reports a stale pid file, deletes the pointer, and stops) +- Touching anything outside setlist artifacts (no edits to `claude_desktop_config.json` entries that are not setlist-managed, no edits to user-owned files in `~/.claude/skills/`, no edits to areas / project types / primitives the user has authored) +- Re-running migrations, rebuilding native bindings, or invoking package managers +- Writing to the registry database (every fix-safe action is filesystem-only) + +Each `--fix-safe` action is logged as a line of structured output (`{check_id, action, before, after, backup_path?}`) and the same lines are written to a `~/.local/share/setlist/doctor.log` append-only log so the user can audit later. Agents trust `--fix-safe` precisely because the scope is small and explicit; expanding the scope would erode the trust. New recoverables added in future spec versions are surfaced as named additions to the five-category list, not as silent extensions of `--fix-safe`'s reach. + +`setlist-doctor` is a **diagnose-only** tool by default — invoking the MCP tool or the CLI command without `--fix-safe` produces the report and exits without any side effect. The default is the safer path; agents call it freely during health checks without worrying they'll accidentally trigger a fix. + --- ### 2.15 Project Health Assessment {#health-assessment} @@ -1106,6 +1227,8 @@ Every non-archived project is placed into one of four tiers: Tiers are qualitative by design. The user sees "Healthy" or "Stale," not "72/100." Behind the tier is always a list of contributing reasons in plain language: "no activity in 45 days," "description missing," "3 unresolved contradictions in project memories." +**Per-dimension `needs_human_verification` bucket (three-bucket verification).** Each dimension's tier may be one of three honest outcomes: a normal tier value (`healthy` / `at_risk` / `stale`), `unknown` (no signal at all), or a new `needs_human_verification` bucket. The third bucket is for dimensions that **cannot be assessed headlessly** — questions like "is this project actually being worked on by humans?" or "did the recent commits meaningfully change behavior, or are they cosmetic?" — that a process can detect needing review but cannot honestly answer without human eyes. A dimension that marks itself `needs_human_verification` does **not** silently degrade the composite tier: the worst-tier-wins composition treats `needs_human_verification` as a fourth pole, surfacing it explicitly on the project's Health section rather than averaging into a lower color. The Home view dot uses a small distinguishing visual (a thin border on the colored dot, or a small `?` glyph beside the dot) when any dimension is in this state. The `assess_health` MCP tool returns the bucket as `dimensions[].verification_state` alongside `tier`. The Settings → Health subsection (when added) lets the user mark a `needs_human_verification` dimension as "checked, tier was actually X" on a per-project basis — that override is stored as a memory of type `observation` with `is_static: true` so it doesn't decay, and the dimension reverts to assessing-from-signal once new signal arrives. **Scope (explicit):** this is health-only. Bootstrap pre-flight is excluded — most setlist primitives are inherently programmatic (filesystem-op, shell-command return exit codes), and the one GUI-only primitive (`mail-create-mailbox`) does not justify generalizing the bucket framework across all of bootstrap. Adopted from `app-it` (§6.1, https://github.com/Christian-Katzmann/app-it) — see `references/verification.md:1-7, 58-73` and `report-template.md:60-63`. + **Three dimensions.** The composite tier is the *worst* of three dimension tiers. Any single dimension at Stale makes the whole project Stale. This is conservative by design — it surfaces red flags instead of letting them average out. @@ -1214,6 +1337,98 @@ These bounds are deliberate. Inspection runs synchronously on the user's click d Every report carries a `schema_version` field of `workspace-inspection.v1`. This is the contract for agents that consume reports — adding new signal fields is non-breaking, removing or renaming fields requires bumping the schema version. The current shape is the v1 floor: agents can rely on `kind`, `summary.kind`, `summary.paths_inspected`, `summary.code_paths`, `summary.workspace_paths`, `summary.unavailable_paths`, `summary.gaps`, and `paths[].{path, status, kind, entries_scanned, ignored_entries, error, code, folder, gaps}` being present in every v1 report. +**Agent-decision report (decision-shaped summary).** Alongside the structural facts above, every v1 report now carries a `decision_report` object — a small, one-page-readable view shaped for the kind of one-shot routing decisions an LLM agent actually has to make about a project. It is **additive**: existing consumers ignoring the new field see the same report they always did, and the structural payload is unchanged. The `decision_report` fields are: + +- `project_type_hint` — `code` / `workspace` / `empty` / `unavailable` plus, for code, a refined subtype string (`node-typescript`, `python-package`, `rust-binary`, etc.) drawn from the runtime/framework hints +- `port_literals` — every numeric literal in the 1024–65535 range found in dev-config files during the shallow scan, with provenance (`{port: 5173, source: "vite.config.ts"}`) — surfaces "this project listens on 5173" without the agent having to grep +- `multi_app_structure` — `true` when the scan finds multiple package manager files at sibling top-level subdirectories (e.g., a monorepo with `packages/core/package.json` and `packages/app/package.json`), `false` otherwise; the agent reads this once and knows whether to expect one-app-per-folder routing +- `area_hint` — the path-segment heuristic the desktop app used before user-managed areas (`~/Code/` → `Infrastructure` / `Code`, `~/Projects/` → derived from sibling structure, etc.); a hint only — the registry's `area_id` is authoritative +- `last_touched_at` — the most recent `mtime` across the top-level inspected entries (after the ignore list), ISO 8601 — gives the agent recency without an extra stat call +- `missing_mailbox_state` — when the project has `email_account` set and the agent has Mail.app MCP access, an opt-in deep check (controlled by a per-call flag, off by default to preserve the shallow contract) reports `present` / `absent` / `unknown` for the expected `Projects/{project.name}` mailbox; off-by-default means existing consumers see `missing_mailbox_state: null` and don't pay the deep-check cost +- `summary_paragraph` — a single human-readable sentence stitched from the above, suitable for paste into a chat ("Node/TypeScript monorepo with two apps under `packages/`, listens on ports 5173 and 3000, last touched 4 hours ago"); the LLM can use this as its one-shot orientation without reading the rest of the report + +The decision report is computed at the same moment the structural report is — no second scan, no extra latency. It is a **view over the same signals**, not a separate extraction. Agents that want full structural detail still read `paths[].code` / `paths[].folder`; agents that only need to route off the project read `decision_report.summary_paragraph` and move on. Adopted from `app-it` (§6.1, https://github.com/Christian-Katzmann/app-it) — see `templates/inspect.sh:1-120` and `references/project-inspection.md:1-44`. The setlist adaptation extends the existing TypeScript inspector with the new view; app-it's bash implementation is not copied. + +**Trust disk over docs (named invariant).** When documentation and on-disk state disagree, the inspector believes disk. If `CLAUDE.md` declares the project a Next.js app but `package.json` lists `react` and `vite` without `next`, the inspector reports `runtime_hints: ["Node.js", "TypeScript"]` and `framework_hints` derived from the manifest, not from the documentation. This is already implicit in the v1 contract (the report is extracted from manifests and file extensions, never from prose) — v0.34 names the invariant explicitly so future contributors do not drift toward doc-priority. The principle is: documentation is producer-authored and may be stale; the manifest is the consumer-authored truth at the moment the scan runs. When the two disagree the gap is a signal worth surfacing (a `gaps[]` entry of shape `{kind: "doc-disk-disagreement", source_doc: "CLAUDE.md", doc_claim: "Next.js app", disk_evidence: "package.json names react+vite, no next"}`), not a contradiction to resolve. The user or agent reading the gap decides which side is wrong. Adopted from `app-it` (§6.1) — see `SKILL.md:19-23` and `AGENTS.md:35`. Documentation-only invariant; no code change required beyond surfacing the disagreement gap. + +--- + +### 2.17 Bundled Claude Code Skills and CLAUDE.md Rule Injection {#bundled-skills} + +Setlist's MCP server, library, and tool surface are only as useful as the agent's willingness to reach for them. Two bundled skills ship inside the npm packages over a **single `skills/` directory** with **parallel `.claude-plugin/` and `.codex-plugin/` manifests** — the same skill content, two host manifests, one source of truth. An opt-in installer copies the skills into `~/.claude/skills/` (or wherever `CLAUDE_CONFIG_DIR` points) and into Codex's equivalent skills location, along with an idempotent injection into the user's `~/.claude/CLAUDE.md` of a short proactive-use rule between sentinel markers. The skills give agents concrete entry points; the rule keeps the registry on the agent's mind before it asks the user a question setlist could have answered. **The dual-manifest layout mirrors setlist's MCP server, which is already consumed by both Claude Code and Codex (see `#mcp-clients` (2.14.3))** — the skills should be dual-host for the same reason. + +**Two skills in v1.** + +- **`/setlist-enroll-project`** — walks the four-step onboarding (`register_project` → `enrich_project` → `write_fields` → `refresh_project_digest`) for a project the agent has just discovered or wants to bring into the registry. The skill reads the working directory, calls `workspace-inspection`, drafts a registration payload, prompts the user for confirmation on any inferred fields, then executes the four MCP calls in order. After enrollment, it calls `register_capabilities` if the project has identifiable integration surfaces (MCP tools, CLI commands, library exports). The skill is conversational, not silent: the user sees what the agent inferred and can correct it before any write lands. + +- **`/setlist-portfolio-graph`** — visualizes the area → project → capability graph using the desktop app's data shape. The skill calls `portfolio_brief`, `list_areas`, `list_projects`, and `query_capabilities`, then renders a tree (in chat, or as a saved SVG via the desktop app's IPC bridge when available) showing every area, the projects under it, and the capabilities each project exposes. The output is read-only — the skill does not modify the registry. It is the agent-facing answer to "show me the shape of my ecosystem." + +Both skills are MCP-native — they use the same tools any other agent would use, with no privileged access to setlist's internals. They are bundled artifacts: editorial defaults that ship with setlist, not the only way to use it. A user who prefers different prompt phrasing or a different workflow can copy the skill to `~/.claude/skills/setlist-enroll-project.local.md`, edit it, and the installer's update path will not overwrite the local copy (see "Installer contract" below). + +**Dual-manifest layout: `.claude-plugin/` + `.codex-plugin/` over one `skills/` tree.** The repository structure for the bundled skills is: + +``` +packages/skills/ +├── skills/ +│ ├── setlist-enroll-project.md # single skill source +│ └── setlist-portfolio-graph.md +├── .claude-plugin/ +│ ├── plugin.json # Claude Code plugin manifest +│ └── marketplace.json # marketplace entry, if listed +├── .codex-plugin/ +│ └── plugin.json # Codex plugin manifest +└── package.json # @setlist/skills, ships both manifests +``` + +Both manifests reference the same skill files in `skills/`. The host-specific manifest fields (entrypoint shape, capability declarations, namespace prefix, version pin) live in the appropriate `.-plugin/plugin.json`; the skill content lives once. This is the same shape `app-it` ships (see `references/dual-manifest.md` in app-it, and the `.claude-plugin/marketplace.json` / `plugins/app-it/.codex-plugin/plugin.json` files at the app-it repo root). **A `validate-skills-manifests` step in `npm run validate`** (see `#testing-discipline` (4.5)) version-locks the two manifests against each other: it fails the build when `plugin.json` versions diverge, when one manifest references a skill the other doesn't, or when the bundled skill file list does not match what each manifest declares. Drift between the two manifests is structurally impossible after the validator runs. Adopted with **panel** prominence from `app-it` (§6.1, https://github.com/Christian-Katzmann/app-it). + +**Installer contract.** + +The skills are copied by an installer that runs via `npm run install:skills` from any setlist package, or automatically as a post-install step the user can opt into during first launch of the desktop app. The installer: + +1. **Honors `CLAUDE_CONFIG_DIR`.** When the environment variable is set, skills land under `$CLAUDE_CONFIG_DIR/skills/` rather than `~/.claude/skills/`. When unset, falls back to `~/.claude/skills/`. This matches Claude Code's own configuration discovery, so users with a non-standard Claude Code setup do not have to re-pin their config. + +2. **Rewrites absolute paths.** Setlist's skills reference its CLI and helper scripts by absolute path (e.g., the path to `node` for invoking helper Node scripts, the absolute path to setlist's bundled `extract.py` for digest generation). The installer rewrites these paths at copy time based on the installed setlist package's location — a skill installed from a npm global install ends up with `/usr/local/lib/node_modules/@setlist/cli/...` references; a skill installed from a brew formula ends up with `/opt/homebrew/lib/...`. The skill is portable as a template; the installed copy is machine-specific. + +3. **Idempotent re-install.** Re-running the installer copies the latest skill content and re-applies the path rewrites, but never duplicates: it overwrites the previous machine-specific copy in place. The contract is "the installed skill always matches the shipped version, with paths rewritten for this machine." + +4. **Preserves user local edits.** If a `.local.md` sibling exists alongside the installed skill, the installer logs a one-line notice ("preserving local override: `setlist-enroll-project.local.md`") and leaves the local file untouched. Claude Code's skill resolution prefers `.local.md` siblings, so the user's customizations take precedence over the bundled defaults without conflicting. + +5. **`--no-skills` flag opts out.** A user who does not want setlist's bundled skills (perhaps because they have their own equivalents) runs `npm run install:skills -- --no-skills`, or unchecks the option in the desktop app's first-launch prompt. The installer logs "skills not installed" and exits cleanly. The CLAUDE.md rule injection (next subsection) is a separate opt-out. + +**CLAUDE.md rule injection between sentinel markers.** + +The installer also appends a short proactive-use rule to the user's `~/.claude/CLAUDE.md` (or `$CLAUDE_CONFIG_DIR/CLAUDE.md` when set) between two sentinel markers: + +``` + +**Setlist project registry.** Before asking the user which project this is, call `get_project` or `search_projects`. The registry holds the user's canonical project identity, capabilities, and memory across every project in their ecosystem; consulting it first is cheaper than asking and avoids contradicting what the user has already told other agents in earlier sessions. + +``` + +The rule is the textual companion to the proactive-use directive shipped in the MCP server's `instructions` field (see `#capability-declarations` (2.11) mechanism #1). The directive reaches sessions over MCP; the CLAUDE.md rule reaches sessions started without setlist's MCP server connected (e.g., a Claude Code session in a non-registered repo that needs to remember to check the registry). The two reinforce each other; neither replaces the other. + +**Idempotency contract for the rule injection.** + +The injection is idempotent across re-runs: + +- If neither marker is present, the installer appends the full block (markers + rule content) at the end of the file with a single leading blank line for spacing. +- If both markers are present and the content between them matches the current shipped rule, the installer does nothing. +- If both markers are present but the content has drifted from the current shipped rule (e.g., a newer setlist version ships an updated rule, or the user edited the rule by hand), the installer replaces the content between the markers with the current shipped rule. Edits between the markers are not preserved — the sentinel block is editorially owned by setlist's installer. +- If exactly one marker is present, the installer treats the file as malformed and refuses to write, logging a one-line warning that names the file and the missing marker. The user can fix the file (delete the lone marker, or add the matching one) and re-run. + +The single block is the entire surface — setlist does not append additional rules elsewhere in the file, does not modify content outside the markers, and does not rewrite the file's existing content (frontmatter, other rules, formatting). The installer reads the file, locates the marker pair (if any), splices in the new block, and writes the result back. If the file does not exist, the installer creates it with just the block (and a one-line header `# Claude Code instructions`). + +**`--no-rule` opts out.** + +The injection is opt-out via the `--no-rule` flag, or the equivalent checkbox in the desktop app's first-launch prompt. When opted out, the installer does not write to CLAUDE.md and does not check for sentinel markers — the user's CLAUDE.md is untouched. The opt-out is per-run, not per-machine: re-running the installer without `--no-rule` will inject the rule. A user who has opted out durably and wants the installer to remember should add a config note (`~/.config/setlist/installer.json` with `{"no_rule": true}`) — that file is read by the installer on every run. + +**Why this lives in a spec section, not in a README.** + +Bundled artifacts are part of setlist's behavioral contract: an agent that loads setlist's MCP server expects the proactive-use directive in `instructions`; an agent that runs a `/setlist-enroll-project` slash command expects the four-step workflow; a user reading their CLAUDE.md and finding the setlist rule expects it to update when they update setlist. The contract is behavioral, not packaging-incidental, so it sits in the spec alongside the other behavioral contracts. + +Informed by `directory-mcp` (see §6.1, https://github.com/ePaint/directory-mcp), whose `install.sh` ships both the bundled-skills copy and the CLAUDE.md rule injection between sentinel markers; setlist's adaptation is the `npm run install:skills` entrypoint, the `.local.md` sibling preservation, and the explicit `--no-rule` / `--no-skills` opt-outs. + --- ## 3. System Behavior @@ -1284,7 +1499,7 @@ The fctry-owned field domain includes: tech_stack, patterns, short_description, **Desktop project CRUD.** Provides UI forms for registering new projects, editing project identity fields (display name, status, description, goals), archiving projects, and renaming projects. Each operation delegates to the corresponding @setlist/core method through the IPC bridge. -**MCP server access.** @setlist/mcp wraps @setlist/core as 57 MCP tools via @modelcontextprotocol/sdk using stdio transport managed by Claude Code's lifecycle. The server provides: +**MCP server access.** @setlist/mcp wraps @setlist/core as 59 MCP tools via @modelcontextprotocol/sdk using stdio transport managed by Claude Code's lifecycle. The server provides: - `list_projects` -- List projects at a given depth with optional filters. - `get_project` -- Get a single project by name at a given depth. @@ -1325,6 +1540,7 @@ The fctry-owned field domain includes: tech_stack, patterns, short_description, - `get_project_digest` -- Read one project's essence digest, with staleness flag (Setlist addition, v12). - `get_project_digests` -- Batch read essence digests for one or more projects (Setlist addition, v12). - `refresh_project_digest` -- Write a project's essence digest (invoked by the CLI generator, not by consumer agents) (Setlist addition, v12). +- `setlist-doctor` -- Diagnose setlist-shaped problems (broken managed MCP entries, stale port/pid files, ABI cache mismatches, malformed CLAUDE.md sentinel markers, orphaned bootstrap side-effect ledgers); `--json` for agents, narrow `--fix-safe` for the recoverables (Setlist addition, spec 0.34, scenario S213). See [Appendix D](#appendix-d-mcp-tool-reference) for the complete tool reference with parameters and return types. @@ -1392,11 +1608,17 @@ See [Appendix D](#appendix-d-mcp-tool-reference) for the complete tool reference - **Bootstrap configuration** -- Cross-cutting settings governing project bootstrap that are not bound to a single type. As of v0.28 this is just the **archive path root** (where archived projects are moved during cleanup). Per-type bootstrap behavior (default directory, git-init flag, template directory, recipe) lives on the **project type** entity itself, not in this configuration. The legacy `path_roots` map and `template_dir` field were superseded by per-type fields in 0.26 and are no longer part of this entity. -- **Bootstrap primitive** -- A named, parameterized step the bootstrap engine knows how to execute, available for composition into project type recipes. Each primitive has: a **name** (unique within the user's primitive set), a **shape** (one of `filesystem-op`, `shell-command`, `mcp-tool` — the closed set in v1), shape-specific **parameter declarations** (path/content templates for `filesystem-op`, a verbatim command string for `shell-command`, a tool name + parameter map for `mcp-tool`), a one-sentence **description**, and a **built-in flag** distinguishing read-only built-ins from user-authored entries. **Five primitives ship as built-ins** (as of spec 0.29): `create-folder`, `copy-template`, `git-init`, `update-parent-gitignore` (all introduced in spec 0.28), and `mail-create-mailbox` (added in spec 0.29 — a `shell-command` shape that drives Mail.app via AppleScript/`osascript`, seeded but not in any default recipe; see `#project-bootstrap` (2.13)). Users may author additional primitives of any of the three shapes via Settings → Primitives. The structural `register-in-registry` engine step is NOT modeled as a primitive — it is the engine's automatic final step and never appears in the primitive set. +- **Bootstrap primitive** -- A named, parameterized step the bootstrap engine knows how to execute, available for composition into project type recipes. Each primitive has: a **name** (unique within the user's primitive set), a **shape** (one of `filesystem-op`, `shell-command`, `mcp-tool` — the closed set in v1), shape-specific **parameter declarations** (path/content templates for `filesystem-op`, a verbatim command string for `shell-command`, a tool name + parameter map for `mcp-tool`), a one-sentence **description**, an **`is_builtin` flag** distinguishing the five seeded read-only built-ins from user-authored entries (added in schema v18 — replaces the prior `built_in_flag` storage with an `is_builtin BOOLEAN NOT NULL DEFAULT 0` column under the same semantics; the rename brings the column name in line with the surface vocabulary used in the Settings panel and the MCP `list_primitives` response shape), an optional **`forked_from`** pointer (when the primitive was created via `Duplicate to customize` against a built-in, the original built-in's name is stored here for provenance), and an optional **`self_test` declaration** (a shape-appropriate probe the engine runs in pre-flight to discriminate env-broken from primitive-broken — `shell-command` self-test is a string with `{project.*}` substitution and exit-zero = OK; `mcp-tool` self-test is implicit via the tool-registered check; `filesystem-op` has no self-test). **Five primitives ship as built-ins** (as of spec 0.29): `create-folder`, `copy-template`, `git-init`, `update-parent-gitignore` (all introduced in spec 0.28), and `mail-create-mailbox` (added in spec 0.29 — a `shell-command` shape that drives Mail.app via AppleScript/`osascript`, seeded but not in any default recipe; see `#project-bootstrap` (2.13)). All five carry `is_builtin = true` and ship locked to the setlist binary version; users opt into customization by forking. Users may author additional primitives of any of the three shapes via Settings → Primitives. The structural `register-in-registry` engine step is NOT modeled as a primitive — it is the engine's automatic final step and never appears in the primitive set. + +- **Project type recipe** -- The ordered list of primitive invocations attached to a project type, with each invocation carrying a **position** (zero-based ordinal) and **bound parameter values** (the concrete templates the user has filled in for that primitive's parameter declarations). Each recipe also carries a **`strategy_id`** (added in spec 0.34) — a stable user-visible identifier (e.g., `code-with-mail-v2`, `non-code-area-v1`) that names the shape of the recipe independently of its primitive composition and is propagated to bootstrap responses, `bootstrap-config.json` projections, project-digest decision history, and `assess_health` provenance (see `#project-bootstrap` (2.13)). Recipes are per-type defaults (the only shape supported in v1); the data model leaves room for per-call overrides as a future possibility. The `register-in-registry` engine step always runs last, automatically, and is not stored in the recipe list. Recipes are snapshotted by the engine at bootstrap start; mid-flight recipe edits do not affect in-flight Retry runs (see `#project-bootstrap` (2.13)). + +- **Interactions** -- Append-only log rows recording every registry touch (search, recall, registry-driven read). Each row carries `(id, project_id, surface, query, at, session_id, agent_role)`. Rows are immutable; recency and frequency are derived by query (`COUNT(*)`, `MAX(at)`), never stored as mutable counters. Pruned by reflection per the retention policy in `#portfolio-memory` (2.12). Schema v17 introduces the `interactions` table. + +- **Bundled Claude Code skills** -- Editorial defaults that ship in the npm packages (`/setlist-enroll-project`, `/setlist-portfolio-graph`) and are copied to `~/.claude/skills/` (or `$CLAUDE_CONFIG_DIR/skills/`) by the `npm run install:skills` installer. Each skill is a portable markdown template; the installed copy is machine-specific (paths rewritten on install). User `.local.md` siblings take precedence over installed bundled copies and are preserved across reinstalls. See `#bundled-skills` (2.17). -- **Project type recipe** -- The ordered list of primitive invocations attached to a project type, with each invocation carrying a **position** (zero-based ordinal) and **bound parameter values** (the concrete templates the user has filled in for that primitive's parameter declarations). Recipes are per-type defaults (the only shape supported in v1); the data model leaves room for per-call overrides as a future possibility. The `register-in-registry` engine step always runs last, automatically, and is not stored in the recipe list. Recipes are snapshotted by the engine at bootstrap start; mid-flight recipe edits do not affect in-flight Retry runs (see `#project-bootstrap` (2.13)). +- **CLAUDE.md sentinel block** -- A short proactive-use rule injected by the installer into `~/.claude/CLAUDE.md` (or `$CLAUDE_CONFIG_DIR/CLAUDE.md`) between `` and `` markers. Idempotent (re-runs match or replace the block; never duplicate). Single block per file, editorially owned by the installer. Opt-out via `--no-rule`. See `#bundled-skills` (2.17). -- **Project digests** -- Free-form text summaries of what a project is about, suitable for embedding, semantic matching, or drop-in context for cross-project questions. Complementary to structured capability declarations — capabilities describe *per-tool* schema, digests describe *project essence* as prose. Each digest has a kind (currently only `essence`), the digest text itself, a source-version stamp, the producer that generated it (provider + model + optional extractor, or `manual`), a generation timestamp, and an advisory token count. The version stamp is the project's source spec version when a spec is available, or a deterministic hash of the project's supported-document tree (path + mtime + size) when the project is non-code. One digest per (project, kind); refresh replaces the prior row. Digests are derived, not canonical — the source content remains authoritative, and digests become stale when the source version advances past the digest's stored version. +- **Project digests** -- Free-form text summaries of what a project is about, suitable for embedding, semantic matching, or drop-in context for cross-project questions. Complementary to structured capability declarations — capabilities describe *per-tool* schema, digests describe *project essence* as prose. Each digest has a kind (currently only `essence`), the digest text itself, a source-version stamp, the producer that generated it (provider + model + optional extractor, or `manual`), a generation timestamp, an advisory token count, and (added in schema v18) an optional **`decision_history`** facet — a bounded array of terse dated entries appended by bootstrap recipes (recipe `strategy_id` + key bound parameters) and by significant structural moves (area assignment, parent_project linking, archive, rename). Each entry is shaped `{at: , kind: 'bootstrap' | 'area-change' | 'parent-change' | 'archive' | 'rename' | 'restore', summary: , source: }` and the array is capped at **20 entries per project** (oldest dropped first when the cap is hit) so digests stay token-efficient. The facet surfaces *how* a project got its current shape, not just *what* it currently is — `2026-06-07: bootstrapped via code-with-mail-v2, area=infrastructure, email=mike@h3r3.com` followed by `2026-06-14: moved to area Work` followed by `2026-07-01: archived` reads as a project's lifecycle history at a glance. The version stamp is the project's source spec version when a spec is available, or a deterministic hash of the project's supported-document tree (path + mtime + size) when the project is non-code. One digest per (project, kind); refresh replaces the prior row but **preserves `decision_history`** (the history is registry-canonical state, not regenerated from the source spec; refresh updates `digest_text` and `spec_version` and leaves the history untouched). Digests are derived, not canonical — the source content remains authoritative, and digests become stale when the source version advances past the digest's stored version. Adopted from `app-it` (§6.1) — see `references/report-template.md:72-74`. ### 3.3 Rules and Logic {#rules} @@ -1417,6 +1639,16 @@ See [Appendix D](#appendix-d-mcp-tool-reference) for the complete tool reference - A project's parent is another project, never an area. Areas are not part of the parent-child graph. - Archiving a parent project does not cascade to children. Children remain active; their `parent_project_id` is preserved. In detail views, an archived parent is rendered with an "(archived)" tag alongside the link. Deleting a parent (via `ON DELETE SET NULL` on the foreign key) clears the link on children without affecting them otherwise. - Producers write to their own field domains. A field write from producer A does not alter fields from producer B. The registry tracks which producer last wrote each field. +- The four open-vocabulary fields (`tech_stack`, `patterns`, `topics`, `capability_type`) are normalized at the write boundary against a canonical-set + alias-reverse-map. Casing variants, hyphenation variants, and known aliases collapse to a single canonical slug. Unknown terms pass through normalized (lowercase + hyphenate) but are **never rejected**. Normalization happens once on the write path in `@setlist/core`; there is no second pass at read time. Stored values are the canonical slugs. See `#capability-declarations` (2.11) "Open-vocabulary fields with canonical normalization." +- The `vocab` MCP tool surfaces both the canonical set and the live in-use values for any of the four normalized fields. Calling `vocab` is read-only; it neither writes a row to `interactions` nor incurs a write-path cost beyond the SELECT. +- Every write surface in setlist is idempotent on its documented collapse key (see the table in `#capability-declarations` (2.11) "Idempotent writes and collapse keys"). Repeated identical calls produce the same end state — no duplicate rows, no broken invariants. Reinforcement on `retain` is intentional behavior on repeat, not a side-effect violation of idempotency. +- Every read surface (`search_projects`, `recall`, `get_project`, `cross_query`, `query_capabilities`) writes one append-only row to `interactions` at the start of the call. Failed calls are also logged (the failure is itself a signal). The write is a single synchronous SQLite INSERT — microseconds, well below per-call latency floors. +- Interaction-log retention: each bound applies independently — a row is pruned when EITHER older than 90 days OR the project's row count exceeds 10,000 (whichever is more restrictive wins; bounded growth is the safety property). Both phases run inside a single transaction so a mid-prune failure rolls back the whole pass. The defaults are configurable via `configure_memory`: `interactions_retention_days` (default 90), `interactions_max_rows_per_project` (default 10000). Both knobs are validated as positive integers at the MCP boundary — non-numeric, negative, fractional, or NaN values are rejected with `InvalidInputError`. +- Identity-resolution surfaces (`search_projects`, `recall`, `cross_query`) always return the consistent envelope shape `{result, ambiguous, alternatives}` regardless of whether ambiguity was detected. `ambiguous` is `true` when the second-place candidate's score is within ~15% of the top score AND the query is at least 3 characters; `alternatives` carries up to 4 entries. Callers iterating `response.result` work on every call — no shape-switching by outcome. The ambiguity computation is centralized in the `detectAmbiguity` helper, the gap threshold (`AMBIGUITY_GAP_THRESHOLD = 0.15`) and alternatives cap (`MAX_ALTERNATIVES = 4`) live as exported constants, and `cross_query` filters its candidate pool to `source === 'registry'` rows so non-project hits (memory/cc_memory rows with project='global') never surface as registered-project alternatives. Single- and two-character queries skip ambiguity detection entirely (too noisy to be useful). `scoreSearchCandidates` differentiates match quality across tiers (exact name = 100, name-prefix = 50, word-boundary = 15, substring = 5, description hit = +2) so common short queries don't tie every result at score 5 and falsely trigger ambiguity. +- The MCP server's `instructions` paragraph is a proactive-use directive in imperative voice (USE PROACTIVELY / LOOK UP FIRST / CAPTURE OPPORTUNISTICALLY / STAY CONSISTENT), not a descriptive overview. Returned identically on every `initialize` — no client-detection branching. +- Naming guideline for new MCP tools (spec 0.33+): favor verb-shaped, agent-intention names (e.g., `vocab`, `whois`, `remember`) over schema-leaking names. The existing 57 tool names are not renamed. This is a forward-looking editorial constraint that governs additions. +- The `install:skills` installer is idempotent: re-running copies the latest skill content with paths rewritten, overwriting machine-specific installed copies in place but never duplicating. `.local.md` siblings are preserved; the installer logs a notice and exits without touching them. +- The CLAUDE.md rule injection is idempotent across the marker pair `` / ``: missing pair → append the full block; both present with matching content → no-op; both present with drifted content → replace; exactly one present → refuse with a one-line warning. The block is editorially owned by the installer — edits between markers are not preserved. Opt-out via `--no-rule`. - Fields not in the default catalog are accepted and stored without error. The catalog is advisory. - Queries at summary depth return only: name, display_name, type, status, one-line description. Standard depth adds template-relevant extended fields. Full depth returns everything. - Filtering by type and status is precise -- no cross-contamination. Both filters compose. @@ -1485,6 +1717,7 @@ See [Appendix D](#appendix-d-mcp-tool-reference) for the complete tool reference - `get_project_digest` returns `null` for a project with no digest of the requested kind. `get_project_digests` omits such projects from its result map unless `include_missing: true` is set. - Digest generation lives outside the MCP server — in `@setlist/cli` (`setlist digest refresh`). The MCP surface accepts whatever the CLI (or any other writer) provides; it does not call an LLM. - Archiving a project cascades: `project_digests` rows are removed via `ON DELETE CASCADE` on the `project_id` foreign key. Re-registering a project with the same name yields a fresh digest slot. +- `decision_history` (schema v18 optional facet on `project_digests`) is registry-canonical state, not regenerated from the source spec. `refresh_project_digest` updates `digest_text` / `spec_version` / `producer` / `generated_at` and **leaves `decision_history` untouched**. The history grows by append on bootstrap (one entry naming the recipe `strategy_id` and key bound parameters), area change (`set_project_area`), parent assignment (`set_parent_project`), archive (`archive_project`), rename (`rename_project`), and restore (when implemented). Cap is 20 entries per project — when the cap is hit, the oldest entry is dropped on the next append. The cap is configurable in code, not surfaced through the MCP layer in v1. **Digest generator rules (v0.21):** @@ -1507,7 +1740,9 @@ See [Appendix D](#appendix-d-mcp-tool-reference) for the complete tool reference - The set of primitive shapes is closed in v1: exactly three values are accepted — `filesystem-op`, `shell-command`, `mcp-tool`. Writes referencing any other shape are rejected with `Error [INVALID_PRIMITIVE_SHAPE]`. There is no `http-call` shape. There is no plugin/code-extension shape. The closed set is the surface that makes the no-arbitrary-code-execution guarantee in `#hard-constraints` (4.3) observable. - Each primitive has a unique name within the user's primitive set. Built-in primitive names are reserved — users cannot define a custom primitive named `create-folder`, `copy-template`, `git-init`, `update-parent-gitignore`, or `mail-create-mailbox`. -- Built-in primitives are read-only: writes attempting to edit or delete a row with `built_in_flag = true` are rejected with `Error [BUILTIN_PRIMITIVE_IMMUTABLE]`. Their parameter shape, name, and description ship with the binary and may not be modified by the user. The seeded set as of spec 0.29 is five rows: `create-folder`, `copy-template`, `git-init`, `update-parent-gitignore`, `mail-create-mailbox`. +- Built-in primitives are read-only: writes attempting to edit or delete a row with `is_builtin = true` (schema v18; same semantics as the prior `built_in_flag`) are rejected with `Error [BUILTIN_PRIMITIVE_IMMUTABLE]`. Their parameter shape, name, and description ship with the binary and may not be modified by the user. The seeded set as of spec 0.29 is five rows: `create-folder`, `copy-template`, `git-init`, `update-parent-gitignore`, `mail-create-mailbox`. The desktop Settings → Primitives panel renders them as a read-only block with a **`Duplicate to customize`** affordance beside each row; clicking Duplicate creates a user-owned copy with `is_builtin = false` and a `forked_from` pointer to the original (see `#project-bootstrap` (2.13)). User-owned forks are fully editable; the original built-in is never touched by the fork. +- Each primitive's optional `self_test` declaration, when present, runs at pre-flight time against a sandbox/dry-target before the real invocation. A failed self-test routes the resulting failure as `env_unavailable: ` rather than `primitive_failure: ` — discriminating "the user's environment isn't set up" from "the primitive or its parameters are wrong." Self-tests must not write files, create external resources, or hold state; engine enforcement is by author convention (a self-test that side-effects is a primitive-author bug surfaced in code review). The Mail.app process-running check on `mail-create-mailbox` is the canonical self-test pattern; v0.34 generalizes it across the closed primitive set (see `#project-bootstrap` (2.13)). +- Every recipe carries a stable `strategy_id`. The bootstrap response includes it; `bootstrap-config.json` projects it into the new project's folder; project digests' `decision_history` records it on every successful bootstrap; `assess_health` surfaces it in the project's provenance facet. Renames are label-only (existing digest history retains the historical ID; future bootstraps use the new ID). - Deletion of a custom primitive is blocked when any recipe references it. The deletion fails with `Error [PRIMITIVE_HAS_REFERENCES]` and a count of referencing recipes; the Settings panel surfaces the names of those recipes for the user to clean up first. This matches the deletion semantics for areas (`AREA_HAS_PROJECTS`) and project types (`TYPE_HAS_PROJECTS`). - Recipes are ordered: the `position` column on each recipe row is its zero-based ordinal in the list. Reordering rewrites positions atomically. - The bootstrap engine executes recipe steps in declared order. There is no implicit reordering, no dependency resolution, no parallelism in v1. @@ -1659,9 +1894,13 @@ better-sqlite3 provides synchronous native SQLite bindings with no async wrapper - **SQLite via better-sqlite3, synchronous API.** The database binding is better-sqlite3, which provides synchronous, native SQLite access. This is a deliberate choice: synchronous calls are simpler, faster, and avoid the callback/promise complexity that async SQLite wrappers introduce for a local database. The library API may expose async signatures for ergonomic consistency, but the underlying operations are synchronous. -- **Schema v16 (current).** v16 adds the `email_account TEXT` column to the `projects` table (nullable) to support per-project Mail.app account targeting for the `mail-create-mailbox` bootstrap primitive. Builds on v15 (`project_digests.named_terms`), v14 (`bootstrap_primitives` + `project_type_recipe_steps` recipe-runner tables), v13 (user-managed `project_types` table with seeded Code project and Non-code project defaults; replaces the `projects.type` CHECK constraint with a foreign key into `project_types`; reclassifies the `areas` table from system-owned to user-managed), v12 (`project_digests`), and v11 (canonical `areas` table, `projects.area_id`, `projects.parent_project_id`, retired `area_of_focus` project type, area-scope memory remap). Migration history from v8 is documented in §5. Schema migrations are incremental and non-destructive; existing data is never lost during upgrades. +- **Schema v18 (current).** v18 adds three optional columns: `bootstrap_primitives.is_builtin BOOLEAN NOT NULL DEFAULT 0` (the canonical-vs-fork marker for built-ins; replaces the prior `built_in_flag` with name aligned to the surface vocabulary), `bootstrap_primitives.self_test_json TEXT` (optional per-primitive env-vs-bug pre-flight self-test declaration), `bootstrap_primitives.forked_from TEXT` (provenance pointer set when the row was created via `Duplicate to customize`), and `project_digests.decision_history_json TEXT` (optional bounded array of dated entries naming bootstrap strategy IDs, area changes, parent assignments, archives, renames — capped at 20 entries per project). The combined v17 → v18 migration is one atomic step (see §5.2). Builds on v17 (`interactions` append-only log), v16 (`email_account` on `projects`), v15 (`project_digests.named_terms`), v14 (`bootstrap_primitives` + `project_type_recipe_steps`), v13 (user-managed `project_types` with seeded defaults; `projects.type` CHECK → FK; `areas` reclassified to user-managed), v12 (`project_digests`), v11 (canonical `areas`, structural `area_id` / `parent_project_id`, retired `area_of_focus`). Migration history from v8 is documented in §5. Schema migrations are incremental and non-destructive; existing data is never lost during upgrades. -- **57 MCP tools.** The MCP server exposes 57 tools covering identity, capabilities, memory (agent and admin), ports, tasks, bootstrap, health, digests, areas, project types, primitives, and recipes. Tool names, parameter shapes, and response shapes are defined in this spec and stable across patch releases. +- **59 MCP tools.** The MCP server exposes 59 tools covering identity, capabilities, memory (agent and admin), ports, tasks, bootstrap, health, digests, areas, project types, primitives, recipes, vocabulary, and self-diagnosis (`setlist-doctor`, added in spec 0.34). Tool names, parameter shapes, and response shapes are defined in this spec and stable across patch releases. + +- **No runtime npm dependencies in `shell-command` primitives.** A `shell-command` primitive's command string is handed verbatim to the user's shell — setlist does not install npm packages on the user's behalf, does not require a Node module to be reachable for the command to run, and does not impose a setlist-specific runtime environment. The user authors the command; the user owns its dependency surface. This is a deliberate constraint that already holds in the v0.28 implementation; v0.34 names it explicitly so future contributors don't drift toward "setlist could be more helpful by auto-installing a missing tool" — that helpfulness would also be an invisible side effect outside the user's shell environment, which is a different trust profile. Adopted as a named invariant from `app-it` (§6.1) — see `AGENTS.md:44` and `SKILL.md:163-178`. + +- **Trust disk over docs (inspection invariant).** When `workspace-inspection` finds a disagreement between documentation (`CLAUDE.md`, `README.md`, AGENTS-style files) and manifest-extracted ground truth (`package.json`, `Cargo.toml`, etc.), the inspector reports the manifest as authoritative and surfaces the disagreement as a `{kind: "doc-disk-disagreement"}` gap rather than reconciling silently. Documentation is producer-authored and may be stale; the manifest is the truth at scan time. See `#workspace-inspection` (2.16). Adopted as a named invariant from `app-it` (§6.1). - **ESM-only.** All packages produce ESM output. No CommonJS dual-publishing. @@ -1689,11 +1928,17 @@ better-sqlite3 provides synchronous native SQLite bindings with no async wrapper - **Schema evolution must be incremental and non-destructive.** Each version upgrade must handle the full migration path. Existing data must never be lost during upgrades. New columns use nullable defaults or sensible initial values. The `skill` → `procedural` type migration in v10 and the `area_of_focus` → `project` + canonical-area reclassification in v11 are data migrations within the table-recreate pattern. -- **Setlist must not re-invent MCP tool semantics.** The 56 tools have defined parameter names, types, and response shapes. Setlist implements them; it does not redesign them. +- **Setlist must not re-invent MCP tool semantics.** The 59 tools have defined parameter names, types, and response shapes. Setlist implements them; it does not redesign them. ### 4.5 Testing Discipline {#testing-discipline} -**Load-bearing invariant.** Scenarios are the contract. The holdout set in `.fctry/scenarios.md` (168 scenarios as of spec 0.29; S139–S154 cover the user-composable bootstrap primitives, S155–S168 cover the `mail-create-mailbox` built-in, the per-project `email_account` field, and the schema v16 migration introduced in this spec), evaluated by LLM-as-judge, is the only true signal that the system is correct. Every other check in the test stack — type checking, unit tests (vitest), end-to-end tests (Playwright), the pre-flight ABI check, and the automated Electron security check — is a canary: an early-warning system that catches drift before it reaches the scenario evaluator. Canaries are necessary, but none of them is sufficient. A passing PR signals "the canaries sing." Scenario satisfaction signals "the system is actually correct." The reader should feel the difference. +**Load-bearing invariant.** Scenarios are the contract. The holdout set in `.fctry/scenarios.md` (227 scenarios as of spec 0.34; S139–S154 cover the user-composable bootstrap primitives, S155–S168 cover the `mail-create-mailbox` built-in and the per-project `email_account` field, S169–S179 cover App Behavior, S180–S188 cover MCP client install/remove, S189–S196 cover workspace inspection, S197–S211 cover the directory-mcp pattern adoption introduced in spec 0.33, **S212–S227 cover the app-it pattern adoption introduced in spec 0.34**: agent-decision report on workspace inspection (S212), the `setlist-doctor` diagnose-only-by-default tool (S213) plus its `--fix-safe` scope ceiling (S227), three-bucket health verification with `needs_human_verification` (S214), named recipe strategy IDs propagated to bootstrap responses + digests + health (S215), `is_builtin` fork-to-customize with `forked_from` provenance (S216), primitive `self_test` env-vs-bug pre-flight routing (S217), `.fctry/bootstrap-config.json` projection + `no-unresolved-placeholders` pre-flight gate (S218–S219), dual-manifest `.claude-plugin/` + `.codex-plugin/` packaging with the `validate-skills-manifests` step (S220–S221), one-command `npm run validate` CI gate (S222), trust-disk-over-docs disagreement gap (S223), `decision_history` facet on project digests with cap-and-preserve semantics (S224), shared external-file-injection lineage acknowledgment (S225), and the dismissed-directions log shape (S226)), evaluated by LLM-as-judge, is the only true signal that the system is correct. Every other check in the test stack — type checking, unit tests (vitest), end-to-end tests (Playwright), the pre-flight ABI check, the `validate-skills-manifests` step, and the automated Electron security check — is a canary: an early-warning system that catches drift before it reaches the scenario evaluator. Canaries are necessary, but none of them is sufficient. A passing PR signals "the canaries sing." Scenario satisfaction signals "the system is actually correct." The reader should feel the difference. + +**`npm run validate` is the one-command CI gate.** All canaries fan out behind a single `npm run validate` entry point: `npm run typecheck`, `npm test`, `npm run build`, `npm run verify:mcp-abi`, and `npm run validate-skills-manifests`. One command, one exit code, green-or-not. Individual commands remain available for targeted runs during development (run just the type checker, just the tests, just the manifest validator), but `validate` is the meta gate — both CI and a human or agent doing a pre-push sweep look at one line of output to know whether the project is in a shippable state. Adopted as a named convention from `app-it` (§6.1) — see `AGENTS.md:13-17`. The asymmetry between CI (always runs the narrow set: typecheck + unit + build) and local-only checks (ABI verify, Playwright E2E) is preserved as documented above; `validate` is the **local-developer-facing** umbrella that includes the heavier checks, while CI runs the subset that's safe and fast in a hosted environment. + +**Rejected directions are logged, not silenced.** When the spec writer declines an adopted-elsewhere pattern or the user explicitly drops a considered design, the rationale is captured as a `Dismissals` section appended to the relevant `.fctry/changelog.md` entry (the per-evolve dismissals list). The changelog is the canonical anti-re-proposal record — when the same pattern surfaces again in a later `/fctry:ref` six months later, the next session can grep the dismissals to find the prior rejection and either skip re-debating or re-evaluate against new evidence. The shape is deliberately light: no per-decision ADR files, no separate `.fctry/rejected.md` directory tree, no required template — one bullet per dismissed item naming the pattern, the source, and a one-sentence rationale. Adopted from `app-it` (§6.1) with explicit lighter ceremony — app-it ships per-decision ADR files (`docs/decisions/0002-macos-only-scope.md` and similar); setlist consolidates into the changelog because the changelog is already the cross-evolve narrative the spec writer reviews on every change. The two surfaces serve the same purpose; setlist's choice favors fewer files. + +**One parametrized fixture per storage interface (forward-looking).** When a storage interface in setlist acquires multiple implementations — e.g., the real SQLite store and an in-memory double for fast unit tests, or a real `vocab` source and a test-double that returns a stub canonical set — the test suite uses **a single parametrized fixture that runs every test against every implementation**, so the test double cannot drift from the real implementation. This is a forward-looking discipline note, not a currently-active practice: setlist does not yet have a parallel in-memory store today, and the existing tests run only against the SQLite real implementation. The note exists so that the first time a new implementation is introduced, the discipline is already named. Footnote-prominence reference: `directory-mcp` (see §6.1), whose CLAUDE.md "Conventions" section articulates the same discipline. **CI scope is narrow by design.** Continuous integration runs exactly three gates on every pull request: type checking, unit tests, and build. Nothing else. The end-to-end Playwright suite and the pre-flight ABI check are deliberately local-only — both are too expensive and too flaky to run reliably in a hosted CI environment, and their failure modes are more informative when a human or an Observer agent is watching the local output. This asymmetry is an intentional design decision, not a coverage gap. The spec commits to keeping CI fast and high-signal; the heavier checks run where they work well, which is on the user's own machine. @@ -1712,6 +1957,7 @@ better-sqlite3 provides synchronous native SQLite bindings with no async wrapper | E2E (Playwright) | Local only | Desktop UI regressions end-to-end | | Pre-flight ABI check | Local only — advisory in dev, blocking at packaging | Wrong-ABI `better-sqlite3` binding in the packaged app (see §5.4 `#native-binding-hygiene`) | | Electron security check | Edit time (local hook) | `nodeIntegration: true` or `contextIsolation: false` introduced into main-process or renderer files | +| `validate-skills-manifests` | CI + local (via `npm run validate`) | Version drift or skill-listing drift between `.claude-plugin/plugin.json` and `.codex-plugin/plugin.json` for the bundled skills (see `#bundled-skills` (2.17)) | Everything above the scenario row is a canary. Everything below needs to be readable and actionable — when a canary fails, a human or agent should be able to tell, in one glance, which layer caught it and what to do. @@ -1796,9 +2042,9 @@ setlist/ ### 5.2 Schema Compatibility {#schema} -The SQLite schema v16 is the current schema. Evolution history: v8 was the initial schema carried over at the port point from `project-registry-service` (see §1.5); v9 added the `observation` memory type; v10 added unified memory types and chorus-compatible fields; v11 introduced canonical areas, first-class structural columns on `projects`, and retired the `area_of_focus` project type; v12 added the `project_digests` table for free-form project essence summaries; v13 introduced the `project_types` table, replaced the `projects.type` CHECK constraint with a foreign key into `project_types`, and reclassified the `areas` table from system-owned to user-managed; v14 added `bootstrap_primitives` and `project_type_recipe_steps` to back the user-composable bootstrap recipe runner; v15 added `project_digests.named_terms`; **v16 adds the `email_account TEXT` column to `projects` to support per-project Mail.app account targeting for the `mail-create-mailbox` bootstrap primitive.** +The SQLite schema v18 is the current schema. Evolution history: v8 was the initial schema carried over at the port point from `project-registry-service` (see §1.5); v9 added the `observation` memory type; v10 added unified memory types and chorus-compatible fields; v11 introduced canonical areas, first-class structural columns on `projects`, and retired the `area_of_focus` project type; v12 added the `project_digests` table for free-form project essence summaries; v13 introduced the `project_types` table, replaced the `projects.type` CHECK constraint with a foreign key into `project_types`, and reclassified the `areas` table from system-owned to user-managed; v14 added `bootstrap_primitives` and `project_type_recipe_steps` to back the user-composable bootstrap recipe runner; v15 added `project_digests.named_terms`; v16 added the `email_account TEXT` column to `projects` to support per-project Mail.app account targeting for the `mail-create-mailbox` bootstrap primitive; v17 added the `interactions` append-only log; **v18 (current) extends `bootstrap_primitives` with `is_builtin` / `self_test_json` / `forked_from` columns (canonical-vs-fork marker, env-vs-bug pre-flight hook, fork provenance) and extends `project_digests` with `decision_history_json` (bounded array of bootstrap-strategy and structural-move history entries, capped at 20 per project).** -**Tables (24):** +**Tables (25):** - `projects` — core identity columns (name PK, display_name, project_type_id INTEGER NOT NULL REFERENCES project_types(id), status, description, goals, area_id INTEGER REFERENCES areas(id), parent_project_id INTEGER REFERENCES projects(id) ON DELETE SET NULL, email_account TEXT, created_at, updated_at). `area_id`, `parent_project_id`, and `email_account` are nullable forever. (Implementation note: the agent may keep the legacy `type TEXT` column populated as a denormalized convenience or remove it entirely — the canonical type lookup is the FK.) - `areas` — user-managed organizational buckets (id INTEGER PK, name TEXT UNIQUE NOT NULL, display_name TEXT NOT NULL, description TEXT, color TEXT). Seeded at v10→v11 migration with seven rows (Work, Family, Home, Health, Finance, Personal, Infrastructure) and reclassified to user-managed in v13. The seven seeds hold no special status after install — they may be edited or deleted (subject to the deletion-when-projects-attached rule in `#rules` (3.3)). The legacy "no INSERT/UPDATE/DELETE from tools" constraint is removed. - `project_types` — user-managed project kinds governing bootstrap behavior (id INTEGER PK, name TEXT UNIQUE NOT NULL, default_directory TEXT NOT NULL, git_init INTEGER NOT NULL DEFAULT 0, template_directory TEXT, color TEXT, created_at TEXT). Seeded in v12→v13 migration with two rows: **Code project** (default_directory `~/Code`, git_init = 1, template_directory NULL) and **Non-code project** (default_directory `~/Projects`, git_init = 0, template_directory NULL). Both are user-editable and user-deletable once detached from any projects. @@ -1819,16 +2065,49 @@ The SQLite schema v16 is the current schema. Evolution history: v8 was the initi - `enrichment_log` — enrichment operation records (id PK, memory_id FK, engine_kind, engine_version, created_at) - `recall_audit` — recall operation log (id PK, query, mode CHECK(search|bootstrap|profile), budget_tokens, scope, project_id, memory_ids_returned, scores, timestamp) - `memory_fts` — FTS5 virtual table for memory full-text search -- `project_digests` — per-project free-form essence summaries (project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE, digest_kind TEXT NOT NULL DEFAULT 'essence', digest_text TEXT NOT NULL, spec_version TEXT NOT NULL, producer TEXT NOT NULL, generated_at TEXT NOT NULL, token_count INTEGER, PRIMARY KEY (project_id, digest_kind)). One row per (project, digest_kind); refresh replaces. +- `project_digests` — per-project free-form essence summaries (project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE, digest_kind TEXT NOT NULL DEFAULT 'essence', digest_text TEXT NOT NULL, spec_version TEXT NOT NULL, producer TEXT NOT NULL, generated_at TEXT NOT NULL, token_count INTEGER, **`decision_history_json` TEXT** (v18, nullable; bounded JSON array of `{at, kind, summary, source?}` entries capped at 20 per project), PRIMARY KEY (project_id, digest_kind)). One row per (project, digest_kind); refresh replaces `digest_text` / `spec_version` / `producer` / `generated_at` but preserves `decision_history_json` (registry-canonical state, not regenerated from source). **Indexes, constraints, and triggers** are defined by the v13 schema. v11 added indexes on `projects(area_id)` and `projects(parent_project_id)` to support area-filtered and parent/child lookups. v12 added an index on `project_digests(project_id)` to support per-project digest lookups. v13 adds an index on `projects(project_type_id)` to support type-filtered lookups (the Home view's Type filter dropdown reads through this index). **Schema v14.** v0.28 introduces the user-composable bootstrap primitive system, shipped in v0.5.0. Two new tables back the recipe runner: -- `bootstrap_primitives` — the set of primitives available for composition (id INTEGER PK, name TEXT UNIQUE NOT NULL, shape TEXT NOT NULL CHECK(shape IN ('filesystem-op', 'shell-command', 'mcp-tool')), description TEXT, params_json TEXT NOT NULL, built_in_flag INTEGER NOT NULL DEFAULT 0, created_at TEXT, updated_at TEXT). The `params_json` column carries the shape-specific parameter declarations. Seeded at v13→v14 migration with four built-in rows: `create-folder`, `copy-template`, `git-init`, `update-parent-gitignore`, each with `built_in_flag = 1`. (A fifth built-in row, `mail-create-mailbox`, is added in v16 — see below.) User-authored primitives have `built_in_flag = 0`. +- `bootstrap_primitives` — the set of primitives available for composition (id INTEGER PK, name TEXT UNIQUE NOT NULL, shape TEXT NOT NULL CHECK(shape IN ('filesystem-op', 'shell-command', 'mcp-tool')), description TEXT, params_json TEXT NOT NULL, **`is_builtin` INTEGER NOT NULL DEFAULT 0** (v18; renamed from `built_in_flag`), **`self_test_json` TEXT** (v18, nullable), **`forked_from` TEXT** (v18, nullable), created_at TEXT, updated_at TEXT). The `params_json` column carries the shape-specific parameter declarations. Seeded at v13→v14 migration with four built-in rows: `create-folder`, `copy-template`, `git-init`, `update-parent-gitignore`, each with `is_builtin = 1`. (A fifth built-in row, `mail-create-mailbox`, is added in v16 — see below.) User-authored primitives have `is_builtin = 0`; primitives created via `Duplicate to customize` carry `forked_from = ` for provenance. - `project_type_recipe_steps` — the ordered list of primitive invocations attached to each project type (id INTEGER PRIMARY KEY AUTOINCREMENT, project_type_id INTEGER NOT NULL REFERENCES project_types(id) ON DELETE CASCADE, primitive_id INTEGER NOT NULL REFERENCES bootstrap_primitives(id) ON DELETE RESTRICT, position INTEGER NOT NULL, params_json TEXT NOT NULL, UNIQUE(project_type_id, position)). The `ON DELETE RESTRICT` on `primitive_id` enforces the rule that a primitive in use cannot be deleted (`Error [PRIMITIVE_HAS_REFERENCES]`). Seeded at migration with default recipes for the two existing project types matching v0.27 behavior. The `register-in-registry` engine step is NOT stored — it is the engine's automatic trailer. -**Schema v16 (current).** v0.29 adds the `email_account` column to `projects` and seeds the `mail-create-mailbox` built-in primitive: +**Schema v18 (current).** Spec 0.34 extends two existing tables with optional facets — no new tables, no destructive changes: + +- `bootstrap_primitives` adds three nullable/defaulted columns: `is_builtin INTEGER NOT NULL DEFAULT 0` (renamed from the prior `built_in_flag` for vocabulary alignment with the Settings panel and the `list_primitives` MCP response; the v17 → v18 migration copies the prior column's values into the new column under the new name and drops the old column in the same table-rebuild step, so the surface name across CLI, MCP, library, and UI is `is_builtin` from v18 onward), `self_test_json TEXT` (optional per-primitive env-vs-bug pre-flight self-test declaration; null for primitives without a self-test, including all `filesystem-op` primitives), and `forked_from TEXT` (optional pointer to the original built-in's name when the row was created via `Duplicate to customize` in Settings → Primitives; null for user-authored-from-scratch primitives and for built-ins themselves). +- `project_digests` adds one nullable column: `decision_history_json TEXT` (JSON array of `{at, kind, summary, source?}` entries — bootstrap, area-change, parent-change, archive, rename, restore — capped at 20 entries per project; null when no history has accumulated yet). + +**v17 → v18 migration plan (combined, atomic):** + +1. SQLite cannot rename a column in place without rebuilding the table. The migration creates `bootstrap_primitives_v18` with the new column shape (`id, name, shape, description, params_json, is_builtin INTEGER NOT NULL DEFAULT 0, self_test_json TEXT, forked_from TEXT, created_at, updated_at`), copies every row from `bootstrap_primitives` mapping `built_in_flag` → `is_builtin` (same integer semantics), drops `bootstrap_primitives`, and renames `bootstrap_primitives_v18` → `bootstrap_primitives`. The seeded built-in rows (`create-folder`, `copy-template`, `git-init`, `update-parent-gitignore`, `mail-create-mailbox`) retain `is_builtin = 1`; user-authored rows retain `is_builtin = 0`. `self_test_json` and `forked_from` are NULL for every pre-existing row. +2. `ALTER TABLE project_digests ADD COLUMN decision_history_json TEXT;` — nullable, no default, every pre-existing digest row receives `decision_history_json = NULL` post-migration. +3. Foreign keys referencing `bootstrap_primitives(id)` (specifically the `project_type_recipe_steps.primitive_id` FK) survive the rebuild because the rebuild preserves row ids verbatim. SQLite's `PRAGMA legacy_alter_table=ON` and `PRAGMA foreign_keys=OFF` bracket the rebuild per the standard SQLite table-rebuild recipe. +4. Bump `schema_meta.schema_version` to 18. +5. The migration is **idempotent**: re-running v18 against an already-v18 database is a no-op (the rebuild is skipped when `is_builtin` is detected in the column list; the `ALTER TABLE` for `decision_history_json` is skipped when the column exists). +6. A v17 binary opening a v18 database refuses to open with the standard "schema too new" message (refusing-to-downgrade is preserved across migrations). + +The migration is non-destructive: no row content is altered beyond the in-place rename of the `built_in_flag` column to `is_builtin`, no row is deleted, no other table is touched. No data needs to be backfilled into `self_test_json` or `forked_from` — both are honest NULLs until the user (for `forked_from`) or the primitive author (for `self_test_json`) populates them. + +--- + +**Schema v17.** Spec 0.33 added the `interactions` append-only log table backing derived recency and frequency signals across the registry: + +- `interactions` — append-only log of every registry touch (`id INTEGER PRIMARY KEY AUTOINCREMENT, project_id INTEGER REFERENCES projects(id) ON DELETE CASCADE, surface TEXT NOT NULL, query TEXT, at TEXT NOT NULL DEFAULT (datetime('now')), session_id TEXT, agent_role TEXT`). Indexed on `(project_id, at)` for recency-per-project queries and on `(at)` for global pruning. No UPDATE, no in-place mutation; rows are immutable until reflection prunes them per the retention policy. `ON DELETE CASCADE` on `project_id` means archiving and deleting a project clears its interaction history (the recency signal for an archived project is no longer interesting). When the touched project is unknown (e.g., a `search_projects` call that matched nothing), `project_id` is NULL. + +**v16 → v17 migration plan:** + +1. `CREATE TABLE interactions (id INTEGER PRIMARY KEY AUTOINCREMENT, project_id INTEGER REFERENCES projects(id) ON DELETE CASCADE, surface TEXT NOT NULL, query TEXT, at TEXT NOT NULL DEFAULT (datetime('now')), session_id TEXT, agent_role TEXT);` +2. `CREATE INDEX idx_interactions_project_at ON interactions(project_id, at);` +3. `CREATE INDEX idx_interactions_at ON interactions(at);` +4. No data migration — the log starts empty. Pre-existing recency proxies (project `updated_at`, memory `last_accessed`) continue to work; the interaction-log-derived signal starts contributing as soon as agents make calls against the new schema. +5. Bump `schema_meta.schema_version` to 17. +6. The migration is idempotent: re-running v17 against an already-v17 database is a no-op (the `CREATE TABLE IF NOT EXISTS` and index creation are skipped if present). A v16 binary opening a v17 database refuses to open with a "schema too new" message (the refusing-to-downgrade contract preserved across migrations). + +The migration is non-destructive: no row content is altered, no existing table is touched, no other column is added. The new table is empty post-migration and fills as agents make calls. + +**Schema v16.** v0.29 adds the `email_account` column to `projects` and seeds the `mail-create-mailbox` built-in primitive: - `projects.email_account TEXT` — nullable, no default. Used by the `mail-create-mailbox` bootstrap primitive to resolve the `{project.email_account}` token at pre-flight time (see `#project-bootstrap` (2.13) "Email account resolver precedence"). Settable via `register_project`, editable via `update_project` and the desktop project edit form. Clearing the field stores NULL; an empty-string value is treated as unresolved by the token resolver. - `bootstrap_primitives` row — one new seeded row in v16: `mail-create-mailbox` with `shape = 'shell-command'`, `built_in_flag = 1`, and `params_json` declaring the `account` (string template) and `mailbox_name` (string template, default `Projects/{project.name}`) parameter shape. **No new row is added to any existing project type's `project_type_recipe_steps`** — `mail-create-mailbox` is seeded as a built-in but is not in any default recipe; users opt in by adding it via Settings → Project types → Edit → Steps. @@ -2008,6 +2287,10 @@ Inspirations: - **spfcraze/mindbank** (https://github.com/spfcraze/mindbank) — Memory/knowledge project that models every update as a new version row with `valid_from` / `valid_to` (soft-delete, non-destructive) and exposes an explicit history endpoint returning the full version chain, plus a **wake-up context snapshot** pattern: pre-compute a small "what matters right now" bundle (pinned + recent decisions + active procedurals) at session start, cached and served cheaply rather than recomputed per call. Source for two pattern studies cited inline in this spec: (a) **temporal versioning + history endpoint** in §2.12 — hardens setlist's already-declared `valid_from`/`valid_until` fields with a concrete endpoint-shape pattern (open question: dedicated `memory_history` MCP tool vs. extending `inspect_memory`); and (b) **wake-up context snapshot** in §2.3, cross-referenced from §3.5 — pre-baking `portfolio_brief` for session-start flows (open question: cache-invalidation trigger). **Both are pattern studies, not adopted.** Scope TBD for each. +- **ePaint/directory-mcp** (https://github.com/ePaint/directory-mcp) — A small-but-shaped MCP server for tracking people, projects, organizations, and accounts as a single connected directory, with an explicit emphasis on the agent-side experience: proactive-use directive in the server's `instructions` envelope (`directory/mcp/api.py:74-95`), open-vocabulary fields with canonical normalization and an alias reverse-map (`directory/vocab.py`), verb-shaped tool names (`whois`, `remember_*`, `relate`, `link`), every write surface idempotent on a documented collapse key, an append-only interaction log from which recency and frequency are derived by query (CLAUDE.md "Core model" bullet; README "How disambiguation works"), an `{ambiguous: true, alternatives: [...]}` envelope on identity queries (`directory/mcp/api.py:50-71`), bundled Claude Code skills shipped alongside the MCP server (README "Bundled skills"; `install.sh`), and an idempotent CLAUDE.md rule injection between sentinel markers (README "Make your agent reach for it"; `directory-rule.md`). **Source for nine adopted pattern decisions in spec 0.33**: proactive-use directive in `#capability-declarations` (2.11) mechanism #1, open-vocabulary normalization + the `vocab` tool in `#capability-declarations` (2.11), idempotent-writes contract generalized across all write surfaces in `#capability-declarations` (2.11), the append-only `interactions` log in `#portfolio-memory` (2.12), the ambiguous-envelope contract in `#cross-project` (2.9), the bundled-skills + CLAUDE.md rule installer in `#bundled-skills` (2.17), the verb-shaped-tool-names guideline (footnote) in `#capability-declarations` (2.11), and the parametrized-fixture testing discipline (footnote) in `#testing-discipline` (4.5). **One pattern dismissed**: the bounded-context `api.py` + `internal/` package shape, because it conflicts with setlist's established npm-workspace convention and Python's package conventions don't translate cleanly to TypeScript. The Setlist adaptations adjust three things directory-mcp's single-user design assumed away: (a) the interaction log has an explicit retention policy (90 days OR 10k rows per project) because setlist's portfolio is multi-project and longer-lived; (b) the bundled-skills installer adds `.local.md` sibling preservation and `--no-skills` / `--no-rule` opt-outs; (c) the existing 57 MCP tools are not renamed to the verb-shaped convention — only new tools added in 0.33+ follow it. The patterns are adopted as behavior, not as a dependency; setlist does not import or vendor any directory-mcp code. + +- **Christian-Katzmann/app-it** (https://github.com/Christian-Katzmann/app-it) — A small Claude Code + Codex bundled plugin for app-scaffold automation, structured as parallel `.claude-plugin/` and `.codex-plugin/` manifests over a single `skills/` tree with a `validate` step that version-locks the manifests against each other. The repo contributes a tight cluster of patterns shaped by hands-on app scaffolding under setlist's same constraints (macOS, dual-host CC + Codex, mailbox creation via Mail.app, primitive recipes per project type). **Source for thirteen adopted pattern decisions in spec 0.34**: agent-decision-shaped report on workspace inspection (`templates/inspect.sh:1-120`, `references/project-inspection.md:1-44`) cited inline in `#workspace-inspection` (2.16); named-outcome decision-tree strategies per bootstrap recipe with stable strategy IDs (`references/strategies.md:5-21`) cited inline in `#project-bootstrap` (2.13); **dual-manifest plugin packaging** (`.claude-plugin/marketplace.json`, `plugins/app-it/.codex-plugin/plugin.json`, `AGENTS.md:27-29`) cited with panel prominence in `#bundled-skills` (2.17); **`setlist-doctor` with `--json` and narrow `--fix-safe`** (`references/troubleshooting.md:40-55`, `references/verification.md:21-56`) cited with panel prominence in `#mcp-clients` (2.14.3); three-bucket verification on health assessment with `needs_human_verification` (`references/verification.md:1-7, 58-73`, `report-template.md:60-63`) cited inline in `#health-assessment` (2.15); built-in primitives as canonical with `is_builtin: true` and fork-to-customize (`AGENTS.md:33`, `references/generated-files.md:30`) cited inline in `#project-bootstrap` (2.13) and `#entities` (3.2); pre-flight `self_test` discriminating env-broken from primitive-broken (`references/verification.md:8-19`) cited inline in `#project-bootstrap` (2.13); single `bootstrap-config.json` per project + build-time placeholders as a projection of registry state (`references/generated-files.md:50-95`, `AGENTS.md:42`) cited inline in `#project-bootstrap` (2.13); single `npm run validate` as the one-command CI gate (`AGENTS.md:13-17`) cited inline in `#testing-discipline` (4.5); trust disk over docs as a named invariant (`SKILL.md:19-23`, `AGENTS.md:35`) cited as footnote in `#workspace-inspection` (2.16) and `#hard-constraints` (4.3); `decision_history` facet on project digests (`references/report-template.md:72-74`) cited inline in `#entities` (3.2) "Project digests"; `.fctry/rejected.md` lightweight log for dismissed directions (`AGENTS.md:46-48`, `docs/decisions/0002-macos-only-scope.md`) cited inline in this changelog entry as the canonical anti-re-proposal record; no runtime deps as a deliberate constraint on `shell-command` primitives (`AGENTS.md:44`, `SKILL.md:163-178`) cited as footnote in `#hard-constraints` (4.3). **One pattern dismissed**: Apple Event Cmd-Q test semantics (`SKILL.md:32-34, 174-178`, `references/troubleshooting.md:69`) — test-mechanism choice, not an experience contract; the hide-on-close vs Cmd-Q user behavior is already specced and shipped in S169–S179, and how the test is written is left to the test author. The setlist adaptations make four explicit changes from app-it's shape: (a) strategy IDs select a *linear* recipe rather than branching primitives, because setlist's closed three-shape primitive set is a §4.3 hard constraint and conditional primitives would re-open it; (b) `setlist-doctor --fix-safe` is bounded to the five named recoverables and forbidden from touching anything outside setlist artifacts, where app-it's equivalent has a looser scope; (c) the three-bucket health verification is scoped to health-only — bootstrap pre-flight is excluded because most setlist primitives are inherently programmatic; (d) the dismissed-directions log is a single `rejected` section appended to `changelog.md` rather than per-decision ADR files, matching the lighter ceremony already established by setlist's changelog convention. The patterns are adopted as behavior, not as a dependency; setlist does not import or vendor any app-it code. + - **chiefautism/claude-intel** (https://github.com/chiefautism/claude-intel) — A two-file Claude Code plugin (Haiku-based `intel` subagent plus a bash `PostToolUse` hook) that gives a single CC project persistent structured codebase knowledge across five markdown files — `architecture.md`, `commands.md`, `patterns.md`, `gotchas.md`, `decisions.md` — each with its own one-line editorial charter and surgical-edit discipline rather than append-only growth. The centerpiece is that fixed, named, bounded taxonomy: the subagent is prompted to read the existing files, tail the event delta since a cursor, and update the right file in place. The `PostToolUse` hook writes low-cost JSONL events (`{ts, tool, file}` for Edit/Write/NotebookEdit; `{ts, tool:"Bash", cmd, exit, err}` for Bash; Read/Glob/Grep skipped as noisy), rotated at 500 lines with the last 300 kept. The subagent's processing loop is stateless read + cursor-based incremental update — not event-stream-in-memory — which means the knowledge files survive agent restarts and the cursor is the only state carried across runs. A `rescan` command is modeled as a first-class reset rather than composed from primitives. **Not adopted as a dependency or as a runtime model** — it is CC-plugin-specific, bash-only, and setlist's SQLite + MCP + reflection architecture already covers the same conceptual territory with far more capability (typed memory, four-level scoping, belief classification, temporal validity, outcome-aware reinforcement, cross-project intelligence). What claude-intel contributes is five shaping lenses distilled into pattern studies across this spec: (1) the named-taxonomy catalog idea, hardening the open question of whether `project_digests` should grow from single-kind `essence` to a small named catalog (see §2.12 Portfolio Memory, "Pattern study: named digest-kind catalog"); (2) a typed event-feed ingest surface, as a shape for a potential `feed_event` verb or `feedback` extension (see §2.12 "Pattern study: typed event-feed ingest surface"); (3) separation of continuous cursor-driven incremental reinforcement from threshold-triggered deep consolidation (see §2.12 "Pattern study: stateless read + cursor-based incremental reflection"); (4) `rescan_project` as a single named admin reset rather than a compose-from-forget-and-refresh workflow (see Appendix C `#deferred-futures`); and (5) bounded-file discipline generalized as a cross-cutting principle, with the digest ceiling and any future free-form surface as its concrete manifestations (see §1.3 `#design-principles`, "Bounded-file discipline"). All five are catalog-only; none commit setlist to behavior. ### 6.2 Ecosystem Context {#ecosystem-context} @@ -2256,7 +2539,7 @@ Load-bearing design choices: ## Appendix D: MCP Tool Reference {#appendix-d-mcp-tool-reference} -Complete tool reference for the 57 MCP tools. +Complete tool reference for the 59 MCP tools. **Project Identity:** @@ -2379,3 +2662,9 @@ Complete tool reference for the 57 MCP tools. | get_recipe | project_type_id | Ordered recipe rows `[{ position, primitive_id, params }]`. User-droppable steps only — the structural `register-in-registry` trailer is not stored and is rendered separately by the UI (S150) | | replace_recipe | project_type_id, steps[] | Confirmation. Atomically replaces the type's full ordered recipe; positions are renumbered 0..N-1. Empty arrays are valid (the trailer still runs). The trailer is structural and never part of this list (S150) | | append_recipe_step | project_type_id, primitive_id, params? | Appended step at position `MAX(position)+1` | + +**Diagnostics:** + +| Tool | Parameters | Returns | +|------|-----------|---------| +| setlist-doctor | fix_safe?, json?, check_filter? | `{ checks: [{ check_id, status, summary, evidence, recoverable }], summary, version, generated_at }`. Diagnose-only by default; `fix_safe=true` attempts the narrow recoverable fixes (broken managed MCP entries, stale port/pid files, ABI cache mismatches, orphaned bootstrap side-effect ledgers — never editing project code, killing processes, or touching anything outside setlist artifacts). `check_filter` accepts one or more `check_id` values to scope the run. See `#mcp-clients` (2.14.3) "setlist-doctor". | diff --git a/package.json b/package.json index 5d24d3b..f5bb7ca 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "setlist", "private": true, - "version": "0.6.1-beta.11", + "version": "0.6.1-beta.18", "license": "Apache-2.0", "type": "module", "workspaces": [ diff --git a/packages/app/package.json b/packages/app/package.json index c290102..4aea2dd 100644 --- a/packages/app/package.json +++ b/packages/app/package.json @@ -1,6 +1,6 @@ { "name": "@setlist/app", - "version": "0.6.1-beta.11", + "version": "0.6.1-beta.18", "private": true, "type": "module", "main": "./out/main/index.js", diff --git a/packages/app/src/main/ipc.ts b/packages/app/src/main/ipc.ts index 9479299..b68dcac 100644 --- a/packages/app/src/main/ipc.ts +++ b/packages/app/src/main/ipc.ts @@ -104,7 +104,9 @@ function summarizePinnedProject(project: Record | null): Pinned export function listPinnedProjects(): PinnedProjectSummary[] { const reg = getRegistry(); return getPinnedProjects() - .map(projectName => summarizePinnedProject(reg.getProject(projectName, 'full'))) + // Pinned-menu refresh runs frequently in the background — not a user-initiated + // get_project. Suppress the interactions log so S204 stays meaningful. + .map(projectName => summarizePinnedProject(reg.getProject(projectName, 'full', { logInteraction: false }))) .filter((project): project is PinnedProjectSummary => project !== null); } @@ -170,7 +172,8 @@ export function registerIpcHandlers(ipcMain: IpcMain, opts?: { ipcMain.handle('updateCore', (_e, name: string, updates: Parameters[1]) => { reg.updateCore(name, updates); - return reg.getProject(name, 'standard'); + // Post-update read — internal, do not contribute to interactions signal. + return reg.getProject(name, 'standard', { logInteraction: false }); }); ipcMain.handle('updateFields', (_e, name: string, fields: Record, producer?: string) => { @@ -325,7 +328,9 @@ export function registerIpcHandlers(ipcMain: IpcMain, opts?: { }); ipcMain.handle('projectActions:openPath', async (_e, projectName: string, selectedPath?: string) => { - const project = reg.getProject(projectName, 'full'); + // openPath uses the project record to resolve the on-disk path; this is an + // implementation detail of the user's "open folder" intent, not a read query. + const project = reg.getProject(projectName, 'full', { logInteraction: false }); const paths = Array.isArray(project?.paths) ? project.paths : []; const path = selectedPath ?? paths[0]; if (!path || !paths.includes(path)) { @@ -336,7 +341,9 @@ export function registerIpcHandlers(ipcMain: IpcMain, opts?: { }); ipcMain.handle('projectActions:copyBriefCommand', (_e, projectName: string) => { - const project = reg.getProject(projectName, 'summary'); + // copyBriefCommand only needs the record to verify existence — not a user + // get_project read. Suppress the interactions log. + const project = reg.getProject(projectName, 'summary', { logInteraction: false }); if (!project) throw new Error(`Project not found: ${projectName}`); const command = projectBriefCommand(projectName); clipboard.writeText(command); diff --git a/packages/cli/package.json b/packages/cli/package.json index b21b38c..9f1861a 100644 --- a/packages/cli/package.json +++ b/packages/cli/package.json @@ -1,6 +1,6 @@ { "name": "@setlist/cli", - "version": "0.6.1-beta.11", + "version": "0.6.1-beta.18", "license": "Apache-2.0", "type": "module", "main": "./dist/index.js", diff --git a/packages/cli/src/digest.ts b/packages/cli/src/digest.ts index e20f831..0e4d480 100644 --- a/packages/cli/src/digest.ts +++ b/packages/cli/src/digest.ts @@ -295,7 +295,9 @@ export interface RefreshResult { } export async function refreshProjectDigest(registry: Registry, projectName: string, opts?: { onlyStale?: boolean }): Promise { - const project = registry.getProject(projectName, 'full'); + // Digest refresh is an internal CLI worker pass — not a user-initiated read. + // Suppress the interactions log so S204 signal reflects only direct surfaces. + const project = registry.getProject(projectName, 'full', { logInteraction: false }); if (!project) return { project_name: projectName, status: 'error', error: `Project not found: ${projectName}` }; const paths = (project as { paths?: string[] }).paths ?? []; if (paths.length === 0) return { project_name: projectName, status: 'skipped-no-path' }; diff --git a/packages/core/package.json b/packages/core/package.json index 2e90bc7..722a8c4 100644 --- a/packages/core/package.json +++ b/packages/core/package.json @@ -1,6 +1,6 @@ { "name": "@setlist/core", - "version": "0.6.1-beta.11", + "version": "0.6.1-beta.18", "license": "Apache-2.0", "type": "module", "main": "./dist/index.js", diff --git a/packages/core/src/ambiguity.ts b/packages/core/src/ambiguity.ts new file mode 100644 index 0000000..68552b5 --- /dev/null +++ b/packages/core/src/ambiguity.ts @@ -0,0 +1,73 @@ +// @fctry: #cross-project +// +// Spec 0.34 — ambiguous-envelope detection for identity-shaped queries. +// Shared between search_projects, recall, and cross_query. +// +// Contract (#cross-project 2.9, S208/S209): +// - When the 2nd-place candidate's score is within ~15% (relative gap) of +// the top candidate, the response is ambiguous. +// - Up to 4 alternatives surfaced, each as `{name, score, why}`. +// - The threshold is a RELATIVE gap: (top - second) / top <= 0.15. +// - Identity-resolution surfaces always return the envelope shape +// `{result, ambiguous, alternatives}` — `ambiguous` defaults to `false`, +// `alternatives` defaults to `[]`. Callers iterating `response.result` +// work on every call regardless of outcome. +// - cross_query filters the candidate pool to `source === 'registry'` so +// memory/cc_memory hits (often carrying project='global') never surface +// as registered-project alternatives. +// - Single- and two-character queries skip ambiguity detection entirely. + +/** Spec literal: 15% relative gap (S208). */ +export const AMBIGUITY_GAP_THRESHOLD = 0.15; + +/** Spec literal: up to 4 alternatives surfaced (S208). */ +export const MAX_ALTERNATIVES = 4; + +/** One candidate in the ranked list — the input to ambiguity detection. */ +export interface AmbiguityCandidate { + name: string; + score: number; + /** One-line rationale; surfaced verbatim as the `why` field on alternatives. */ + why?: string; +} + +/** Ambiguity envelope returned by the detection helper. */ +export interface AmbiguityVerdict { + ambiguous: boolean; + alternatives: { name: string; score: number; why: string }[]; +} + +/** + * Detect ambiguity in a ranked candidate list. Returns `{ambiguous, + * alternatives}` — `alternatives` is empty when unambiguous. + * + * Requires `candidates` to be SORTED BY SCORE DESCENDING. The caller is + * responsible for ranking; this helper only looks at the relative gap + * between positions 0 and 1. + * + * Edge cases: + * - 0 or 1 candidates → unambiguous (nothing to be ambiguous about) + * - top score ≤ 0 → unambiguous (degenerate ranking) + * - 2nd-place score ≤ 0 → unambiguous (the top is clearly ahead) + */ +export function detectAmbiguity(candidates: readonly AmbiguityCandidate[]): AmbiguityVerdict { + if (candidates.length < 2) { + return { ambiguous: false, alternatives: [] }; + } + const top = candidates[0]; + const second = candidates[1]; + if (top.score <= 0 || second.score <= 0) { + return { ambiguous: false, alternatives: [] }; + } + const gap = (top.score - second.score) / top.score; + if (gap > AMBIGUITY_GAP_THRESHOLD) { + return { ambiguous: false, alternatives: [] }; + } + // Ambiguous — surface up to MAX_ALTERNATIVES (positions 1..MAX_ALTERNATIVES). + const alternatives = candidates.slice(1, 1 + MAX_ALTERNATIVES).map(c => ({ + name: c.name, + score: c.score, + why: c.why ?? '', + })); + return { ambiguous: true, alternatives }; +} diff --git a/packages/core/src/bootstrap.ts b/packages/core/src/bootstrap.ts index a68d76a..bc9df3f 100644 --- a/packages/core/src/bootstrap.ts +++ b/packages/core/src/bootstrap.ts @@ -385,7 +385,8 @@ export class Bootstrap { // Check project name isn't already registered const registry = new Registry(this._dbPath); - const existing = registry.getProject(opts.name); + // Bootstrap pre-flight duplicate check — internal, must not log to interactions. + const existing = registry.getProject(opts.name, 'standard', { logInteraction: false }); if (existing) { throw new RegistryError( 'DUPLICATE', @@ -488,8 +489,8 @@ export class Bootstrap { return { ...result, folders_moved }; } - // Get project paths - const project = registry.getProject(name, 'standard'); + // Get project paths — internal archive-move lookup, must not log to interactions. + const project = registry.getProject(name, 'standard', { logInteraction: false }); const paths = (project as Record)?.paths as string[] | undefined; if (!paths || paths.length === 0) { return { ...result, folders_moved }; @@ -665,7 +666,8 @@ export class Bootstrap { // If the project name is already registered, fail fast before running // the recipe. (The recipe runner does not know about the registry.) const registry = new Registry(this._dbPath); - if (registry.getProject(opts.name)) { + // Recipe-bootstrap pre-flight duplicate check — internal, must not log. + if (registry.getProject(opts.name, 'standard', { logInteraction: false })) { throw new RegistryError( 'DUPLICATE', `A project named '${opts.name}' is already registered.`, diff --git a/packages/core/src/cross-query.ts b/packages/core/src/cross-query.ts index 3661828..601a00d 100644 --- a/packages/core/src/cross-query.ts +++ b/packages/core/src/cross-query.ts @@ -4,6 +4,7 @@ import { homedir } from 'node:os'; import type Database from 'better-sqlite3'; import { connect, getDbPath, initDb } from './db.js'; import { MemoryRetrieval, type RecallResult } from './memory-retrieval.js'; +import { logInteraction } from './interactions.js'; interface CrossQueryResult { source: 'registry' | 'memory' | 'cc_memory'; @@ -53,10 +54,44 @@ export class CrossQuery { // Score and rank results = this.rankResults(results, opts.query); + // Spec 0.34 (#portfolio-memory 2.12, S204): cross_query matching N + // projects produces N rows — one per matched project. Failed (zero + // matches) calls produce a single row with project_id=NULL. + this.logCrossQueryInteractions(opts.query, results); + const summary = this.synthesize(results, opts.query); return { results, summary }; } + /** + * Insert one interactions row per matched project (or a single NULL row + * when no matches). Best-effort — failure to log is swallowed by + * logInteraction itself. + */ + private logCrossQueryInteractions(query: string, results: CrossQueryResult[]): void { + const db = this.open(); + try { + const projectNames = new Set(); + for (const r of results) { + if (r.source === 'registry' && r.project) projectNames.add(r.project); + } + if (projectNames.size === 0) { + logInteraction(db, { surface: 'cross_query', projectId: null, query }); + return; + } + for (const name of projectNames) { + const row = db.prepare('SELECT id FROM projects WHERE name = ?').get(name) as { id: number } | undefined; + logInteraction(db, { + surface: 'cross_query', + projectId: row?.id ?? null, + query, + }); + } + } finally { + db.close(); + } + } + private searchRegistry(query: string): CrossQueryResult[] { const db = this.open(); try { @@ -169,7 +204,7 @@ export class CrossQuery { } portfolioBrief(): { - projects: { name: string; type: string; status: string; spec_version?: string; updated_at: string }[]; + projects: { name: string; type: string; status: string; spec_version?: string; updated_at: string; last_touched?: string | null; recent_activity_count?: number }[]; portfolio_memories: RecallResult[]; pending_observations: RecallResult[]; health_indicators: { project: string; issue: string }[]; @@ -177,14 +212,24 @@ export class CrossQuery { } { const db = this.open(); try { - // Active projects with basic identity + // Spec 0.34 (#portfolio-memory 2.12, S205): derived recency and frequency + // join through MAX(at) and COUNT(*) on the interactions table. Never + // stored, recomputed per call. Joins are LEFT so projects with zero + // interactions still appear with null/0 signal. const projects = db.prepare(` SELECT p.name, p.type, p.status, p.updated_at, - (SELECT pf.field_value FROM project_fields pf WHERE pf.project_id = p.id AND pf.field_name = 'spec_version') as spec_version + (SELECT pf.field_value FROM project_fields pf + WHERE pf.project_id = p.id AND pf.field_name = 'spec_version') as spec_version, + (SELECT MAX(i.at) FROM interactions i WHERE i.project_id = p.id) as last_touched, + (SELECT COUNT(*) FROM interactions i + WHERE i.project_id = p.id AND i.at > datetime('now', '-7 days')) as recent_activity_count FROM projects p WHERE p.status NOT IN ('archived') ORDER BY p.name - `).all() as { name: string; type: string; status: string; updated_at: string; spec_version: string | null }[]; + `).all() as { + name: string; type: string; status: string; updated_at: string; + spec_version: string | null; last_touched: string | null; recent_activity_count: number | null; + }[]; // Portfolio-scoped and global memories via recall (bootstrap mode) const retrieval = new MemoryRetrieval(this._dbPath); @@ -289,6 +334,8 @@ export class CrossQuery { status: p.status, spec_version: p.spec_version ?? undefined, updated_at: p.updated_at, + last_touched: p.last_touched, + recent_activity_count: p.recent_activity_count ?? 0, })), portfolio_memories: portfolioMemories, pending_observations: pendingObservations, diff --git a/packages/core/src/db.ts b/packages/core/src/db.ts index c29269d..232780a 100644 --- a/packages/core/src/db.ts +++ b/packages/core/src/db.ts @@ -5,8 +5,9 @@ import { homedir } from 'node:os'; import { seedAreas as seedAreasFromModule, SEED_AREAS } from './areas.js'; import { seedProjectTypes } from './project-types.js'; import { seedBuiltinPrimitives, seedBuiltinRecipes } from './recipes/store.js'; +import { normalizeList, normalize } from './vocab.js'; -export const SCHEMA_VERSION = 16; +export const SCHEMA_VERSION = 17; /** * Legacy alias kept for any callers that still expect the constant name. @@ -291,6 +292,21 @@ CREATE TABLE IF NOT EXISTS project_type_recipe_steps ( UNIQUE(project_type_id, position) ); +-- v17 (spec 0.34): append-only interactions log backing derived recency +-- and frequency signals across every registry read surface. Rows are +-- immutable; recency and frequency derive from this table by query +-- (MAX(at), COUNT(*)) — never stored as mutable counters. Pruned by +-- reflection per the retention policy in #portfolio-memory. +CREATE TABLE IF NOT EXISTS interactions ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + project_id INTEGER REFERENCES projects(id) ON DELETE CASCADE, + surface TEXT NOT NULL, + query TEXT, + at TEXT NOT NULL DEFAULT (datetime('now')), + session_id TEXT, + agent_role TEXT +); + -- Indexes CREATE INDEX IF NOT EXISTS idx_projects_type ON projects(type); CREATE INDEX IF NOT EXISTS idx_projects_status ON projects(status); @@ -308,6 +324,10 @@ CREATE INDEX IF NOT EXISTS idx_project_ports_project_id ON project_ports(project CREATE INDEX IF NOT EXISTS idx_project_ports_port ON project_ports(port); CREATE INDEX IF NOT EXISTS idx_project_capabilities_project_id ON project_capabilities(project_id); CREATE INDEX IF NOT EXISTS idx_project_capabilities_type ON project_capabilities(capability_type); +-- v17 (spec 0.34): interactions indexes for recency-per-project queries +-- and global pruning passes. +CREATE INDEX IF NOT EXISTS idx_interactions_project_at ON interactions(project_id, at); +CREATE INDEX IF NOT EXISTS idx_interactions_at ON interactions(at); CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project_id, status); CREATE INDEX IF NOT EXISTS idx_memories_scope ON memories(scope, status); CREATE INDEX IF NOT EXISTS idx_memories_type ON memories(type, status); @@ -775,6 +795,23 @@ function upgradeSchema(db: Database.Database): void { // No data migration: every pre-existing row gets NULL for email_account. } + if (currentVersion >= 16 && currentVersion < 17) { + // v16 → v17 (spec 0.34): append-only `interactions` log backing the + // derived recency and frequency signals across every registry read + // surface. The table and its indexes are created by SCHEMA_SQL above + // (CREATE IF NOT EXISTS). The first writes happen when read surfaces + // (search_projects, recall, get_project, cross_query, query_capabilities) + // begin logging — the table starts empty on upgrade. + // + // Review finding #3: backfill open-vocabulary normalization for any + // pre-0.34 rows that still carry raw / aliased slugs. Without this, + // `countVocabInUse` would report both `cli-command` (legacy) and + // `command` (normalized) as in_use forever — a permanent orphan in + // the vocab tool's output. Rewriting at upgrade time clears the + // legacy form once and lets every subsequent write keep things clean. + backfillVocabNormalization(db); + } + if (currentVersion >= 13 && currentVersion < 14) { // v13 → v14: user-composable bootstrap primitives (spec 0.28). // @@ -811,6 +848,91 @@ function upgradeSchema(db: Database.Database): void { * via table-rebuild (SQLite cannot alter CHECK in place). * 6. Remap memories.scope value 'area_of_focus' → 'area' via table-rebuild. */ +/** + * v16 → v17 backfill: normalize pre-0.34 vocabulary slugs through the same + * helpers the write boundary now uses. Without this, `countVocabInUse` + * surfaces both legacy and normalized forms (e.g. `cli-command` AND + * `command`) as in_use forever, even though new writes only produce the + * canonical form. Review finding #3. + * + * What this rewrites: + * 1. `project_capabilities.capability_type` — single canonical slug per row. + * Rows whose canonical form already exists (e.g. a project carrying + * both 'command' AND 'cli-command' for different capability names) + * keep both rows; only the slug column is rewritten in place. + * 2. `projects.topics` — JSON array column. List-normalized so case + * variants and aliases collapse, duplicates drop. + * 3. `project_fields.field_value` for field_name in ('tech_stack', + * 'patterns'). JSON-array payloads are list-normalized; scalar + * payloads are single-value normalized. + * + * Idempotent: re-running emits zero-effect UPDATEs on already-canonical + * data. The transaction wraps the whole pass so a malformed row aborts + * the whole backfill rather than leaving a half-rewritten state. + */ +function backfillVocabNormalization(db: Database.Database): void { + const doBackfill = db.transaction(() => { + // (1) project_capabilities.capability_type + const capRows = db.prepare('SELECT id, capability_type FROM project_capabilities').all() as Array<{ id: number; capability_type: string }>; + const updateCap = db.prepare('UPDATE project_capabilities SET capability_type = ? WHERE id = ?'); + for (const row of capRows) { + const canon = normalize('capability_type', String(row.capability_type ?? '')); + if (canon && canon !== row.capability_type) { + updateCap.run(canon, row.id); + } + } + + // (2) projects.topics (JSON array column on the projects row) + const projectRows = db.prepare('SELECT id, topics FROM projects').all() as Array<{ id: number; topics: string | null }>; + const updateTopics = db.prepare("UPDATE projects SET topics = ?, updated_at = datetime('now') WHERE id = ?"); + for (const row of projectRows) { + if (!row.topics) continue; + let parsed: unknown; + try { + parsed = JSON.parse(row.topics); + } catch { + continue; // Malformed; leave as-is. + } + if (!Array.isArray(parsed)) continue; + const stringified = parsed.map(v => String(v)); + const normalized = normalizeList('topics', stringified); + const next = JSON.stringify(normalized); + if (next !== row.topics) { + updateTopics.run(next, row.id); + } + } + + // (3) project_fields.field_value for tech_stack and patterns + const fieldRows = db.prepare( + "SELECT id, field_name, field_value FROM project_fields WHERE field_name IN ('tech_stack', 'patterns')", + ).all() as Array<{ id: number; field_name: 'tech_stack' | 'patterns'; field_value: string }>; + const updateField = db.prepare("UPDATE project_fields SET field_value = ?, updated_at = datetime('now') WHERE id = ?"); + for (const row of fieldRows) { + if (!row.field_value) continue; + let nextValue: string | null = null; + if (row.field_value.startsWith('[')) { + try { + const parsed = JSON.parse(row.field_value); + if (Array.isArray(parsed)) { + const normalized = normalizeList(row.field_name, parsed.map(v => String(v))); + nextValue = JSON.stringify(normalized); + } + } catch { + // Not JSON — fall through and treat as scalar. + } + } + if (nextValue === null) { + const canon = normalize(row.field_name, row.field_value); + if (canon != null) nextValue = canon; + } + if (nextValue !== null && nextValue !== row.field_value) { + updateField.run(nextValue, row.id); + } + } + }); + doBackfill(); +} + function runV10ToV11Migration(db: Database.Database): void { // Step 1: areas table + seed (idempotent) db.exec(` diff --git a/packages/core/src/index.ts b/packages/core/src/index.ts index 13125bb..244de17 100644 --- a/packages/core/src/index.ts +++ b/packages/core/src/index.ts @@ -184,3 +184,38 @@ export { } from './workspace-inspection.js'; export { introspectLibraryExports } from './introspect-exports.js'; export { computeNextSteps, type NextStep, type ProjectEnrichmentSnapshot } from './next-steps.js'; +export { + logInteraction, + lastTouchedAt, + touchCountSince, + pruneInteractions, + DEFAULT_RETENTION_POLICY, + type InteractionSurface, + type InteractionRow, + type LogInteractionInput, + type RetentionPolicy, +} from './interactions.js'; +export { + detectAmbiguity, + AMBIGUITY_GAP_THRESHOLD, + MAX_ALTERNATIVES, + type AmbiguityCandidate, + type AmbiguityVerdict, +} from './ambiguity.js'; +export { + VOCAB_FIELDS, + CANONICAL_VOCAB, + VOCAB_ALIASES, + normalize, + normalizeList, + normalizeFreeList, + normalizeFieldValue, + normalizeRecord, + isVocabField, + getCanonical, + getAliases, + assembleVocabResponse, + type VocabField, + type VocabInUseEntry, + type VocabResponse, +} from './vocab.js'; diff --git a/packages/core/src/interactions.ts b/packages/core/src/interactions.ts new file mode 100644 index 0000000..5b0ce8e --- /dev/null +++ b/packages/core/src/interactions.ts @@ -0,0 +1,204 @@ +// @fctry: #portfolio-memory +// +// Spec 0.34 — append-only `interactions` log backing derived recency and +// frequency signals across every registry read surface. Schema v17. +// +// Contract (#portfolio-memory 2.12): +// - Every read surface (search_projects, recall, get_project, cross_query, +// query_capabilities) writes one append-only row at the START of the call. +// - Failed calls are also logged — the failure is itself a signal. +// - `vocab` is the documented read-only exception (no row written). +// - Rows are immutable: no UPDATE, no DELETE except retention pruning. +// - Recency (MAX(at)) and frequency (COUNT(*)) derive by query, never +// stored as mutable counters. +// - Retention: each bound applies independently — a row is pruned when +// EITHER it is older than `interactions_retention_days` (default 90) OR +// the project's row count exceeds `interactions_max_rows_per_project` +// (default 10000), in which case the oldest excess is dropped. The two +// bounds compose restrictively (bounded growth is the safety property), +// not permissively. Reflection runs the prune. + +import type Database from 'better-sqlite3'; + +/** Names of the read surfaces that log to interactions. */ +export type InteractionSurface = + | 'search_projects' + | 'recall' + | 'get_project' + | 'cross_query' + | 'query_capabilities' + | 'portfolio_brief' + | 'list_projects' + | 'get_project_brief' + | 'get_project_digest' + | 'assess_health'; + +export interface LogInteractionInput { + surface: InteractionSurface | string; + /** Project id when known; NULL means the surface looked but found nothing. */ + projectId?: number | null; + /** The query text supplied by the caller, when applicable. */ + query?: string | null; + /** Calling session identifier, when available from MCP context. */ + sessionId?: string | null; + /** Calling agent role, when available from MCP context. */ + agentRole?: string | null; +} + +export interface InteractionRow { + id: number; + project_id: number | null; + surface: string; + query: string | null; + at: string; + session_id: string | null; + agent_role: string | null; +} + +export interface RetentionPolicy { + /** Rows older than this many days are eligible for pruning (default 90). */ + retention_days: number; + /** Per-project ceiling above which oldest rows are pruned (default 10000). */ + max_rows_per_project: number; +} + +export const DEFAULT_RETENTION_POLICY: RetentionPolicy = { + retention_days: 90, + max_rows_per_project: 10000, +}; + +/** + * Append a single immutable row to the interactions log. Synchronous SQLite + * INSERT — microseconds, well below per-call latency floors. Best-effort: + * if the write fails for any reason we swallow the error so the caller's + * read surface still returns its result. The interactions log is + * observability, not the system of record. + */ +export function logInteraction( + db: Database.Database, + input: LogInteractionInput, +): void { + try { + db.prepare( + `INSERT INTO interactions (project_id, surface, query, session_id, agent_role) + VALUES (?, ?, ?, ?, ?)`, + ).run( + input.projectId ?? null, + input.surface, + input.query ?? null, + input.sessionId ?? null, + input.agentRole ?? null, + ); + } catch { + // Swallow — the read surface must not fail because the log did. + } +} + +/** + * Derived recency: most recent interaction timestamp for a project. + * Returns null when the project has no interactions yet. + */ +export function lastTouchedAt( + db: Database.Database, + projectId: number, +): string | null { + const row = db + .prepare<[number], { at: string | null }>( + `SELECT MAX(at) AS at FROM interactions WHERE project_id = ?`, + ) + .get(projectId); + return row?.at ?? null; +} + +/** + * Derived frequency: interactions count for a project within the trailing + * window (default 7 days). Used by portfolio_brief and Home view sorts. + */ +export function touchCountSince( + db: Database.Database, + projectId: number, + windowDays = 7, +): number { + const row = db + .prepare<[number, string], { n: number }>( + `SELECT COUNT(*) AS n FROM interactions + WHERE project_id = ? AND at > datetime('now', ?)`, + ) + .get(projectId, `-${windowDays} days`); + return row?.n ?? 0; +} + +/** + * Prune the interactions log per the retention policy. Hard delete because + * the rows themselves are the source of truth for derived counters — + * soft-deleted rows would skew the COUNT(*). + * + * The two policy dimensions apply independently and compose restrictively + * (the implementation realizes "bounded growth", which is the actual safety + * property — review finding #6 aligned the docs with the implementation): + * 1. All rows older than `retention_days` days are dropped. + * 2. Per project: when remaining count > `max_rows_per_project`, the + * oldest excess is dropped so each project carries at most the cap. + * + * Whichever bound is more restrictive wins for any given row — a row is + * deleted if it falls outside EITHER the age floor OR the per-project cap. + * + * The whole prune runs in a single transaction (review finding A5): a mid- + * prune failure rolls back both phases together, so we never strand the + * table in a half-aged / half-capped state. + * + * Returns the total number of rows pruned. + */ +export function pruneInteractions( + db: Database.Database, + policy: RetentionPolicy = DEFAULT_RETENTION_POLICY, +): number { + let pruned = 0; + + const doPrune = db.transaction(() => { + // Age-based prune: drop rows older than the retention floor. + const ageDelta = db + .prepare(`DELETE FROM interactions WHERE at < datetime('now', ?)`) + .run(`-${policy.retention_days} days`); + pruned += Number(ageDelta.changes ?? 0); + + // Per-project cap: for any project with > cap rows remaining, delete the + // oldest excess. NULL project_id rows are pooled separately (the "no + // project matched" bucket from failed search_projects calls). + const projects = db + .prepare<[number], { project_id: number | null; n: number }>( + `SELECT project_id, COUNT(*) AS n FROM interactions + GROUP BY project_id HAVING n > ?`, + ) + .all(policy.max_rows_per_project); + + for (const { project_id, n } of projects) { + const excess = n - policy.max_rows_per_project; + if (excess <= 0) continue; + let capDelta; + if (project_id === null) { + capDelta = db + .prepare( + `DELETE FROM interactions WHERE id IN ( + SELECT id FROM interactions WHERE project_id IS NULL + ORDER BY at ASC LIMIT ? + )`, + ) + .run(excess); + } else { + capDelta = db + .prepare( + `DELETE FROM interactions WHERE id IN ( + SELECT id FROM interactions WHERE project_id = ? + ORDER BY at ASC LIMIT ? + )`, + ) + .run(project_id, excess); + } + pruned += Number(capDelta.changes ?? 0); + } + }); + doPrune(); + + return pruned; +} diff --git a/packages/core/src/introspect-exports.ts b/packages/core/src/introspect-exports.ts index 4361fea..061e569 100644 --- a/packages/core/src/introspect-exports.ts +++ b/packages/core/src/introspect-exports.ts @@ -428,6 +428,112 @@ const LIBRARY_EXPORTS_MANIFEST: LibraryExportManifestEntry[] = [ description: 'Compute the ordered enrichment recipe for a project from its field-presence snapshot. Powers the next_steps array returned by register_project, bootstrap_project, enrich_project, write_fields, and register_capabilities responses.', }, + // Spec 0.34: append-only interactions log (schema v17). + { + name: 'logInteraction', + kind: 'function', + description: 'Append one immutable row to the interactions log at the start of any registry read surface. Best-effort: failures are swallowed so the read still returns its result.', + }, + { + name: 'lastTouchedAt', + kind: 'function', + description: 'Derived recency: MAX(at) from interactions for one project. Powers portfolio_brief activity and the Home view "Recently active" sort.', + }, + { + name: 'touchCountSince', + kind: 'function', + description: 'Derived frequency: COUNT(*) of interactions for one project within a trailing window (default 7 days). Powers portfolio_brief frequency signals and recall reinforcement.', + }, + { + name: 'pruneInteractions', + kind: 'function', + description: 'Hard-delete interactions rows per the retention policy (age floor OR per-project cap). Called by the background reflection cycle.', + }, + { + name: 'DEFAULT_RETENTION_POLICY', + kind: 'constant', + description: 'Default interactions retention: 90 days OR 10,000 rows per project, whichever permits more. Overridable via configure_memory.', + }, + + // Spec 0.34: open-vocabulary normalization (#capability-declarations 2.11). + { + name: 'VOCAB_FIELDS', + kind: 'constant', + description: 'The four fields that run through the open-vocabulary normalizer at the write boundary: tech_stack, patterns, topics, capability_type.', + }, + { + name: 'CANONICAL_VOCAB', + kind: 'constant', + description: 'Editorially-curated canonical slugs per normalized field. Surfaced by the vocab MCP tool as the canonical set.', + }, + { + name: 'VOCAB_ALIASES', + kind: 'constant', + description: 'Alias reverse-map per normalized field: canonical slug → array of variants the normalizer collapses on the write path.', + }, + { + name: 'normalize', + kind: 'function', + description: 'Normalize one open-vocabulary value for a field: lowercase + hyphenate + strip non-alphanumerics, then alias-lookup against VOCAB_ALIASES. Unknown terms pass through.', + }, + { + name: 'normalizeList', + kind: 'function', + description: 'Normalize a list of open-vocabulary values for a field, collapsing duplicates and preserving first-occurrence order.', + }, + { + name: 'normalizeFreeList', + kind: 'function', + description: 'Lowercase + trim + dedupe a free-form list. Used for goals/entities/concerns (non-canonical-vocab fields) so they share consistent casing semantics with topics. Preserves first-occurrence order.', + }, + { + name: 'normalizeFieldValue', + kind: 'function', + description: 'Normalize a free-form field value (string, array, or JSON-array string). Returns the canonical form ready for serializeFieldValue.', + }, + { + name: 'normalizeRecord', + kind: 'function', + description: 'Apply normalization to every recognized open-vocab field in a record. Non-vocab fields pass through. Used by register, updateFields, and registerCapabilities.', + }, + { + name: 'isVocabField', + kind: 'function', + description: 'Returns true if a field name is one of the four normalized open-vocab fields.', + }, + { + name: 'getCanonical', + kind: 'function', + description: 'Canonical vocabulary for a field (the editorially-curated set). Read-only.', + }, + { + name: 'getAliases', + kind: 'function', + description: 'Alias reverse-map for a field (canonical slug → alias variants). Read-only.', + }, + { + name: 'assembleVocabResponse', + kind: 'function', + description: 'Build the {field, canonical, in_use, aliases} envelope returned by the vocab MCP tool from a caller-supplied slug→count map. Pure function — no database access.', + }, + + // Spec 0.34: ambiguous-envelope detection (#cross-project 2.9). + { + name: 'detectAmbiguity', + kind: 'function', + description: 'Detect identity-query ambiguity in a score-sorted candidate list. Returns {ambiguous, alternatives} when the second-place score is within ~15% of the top.', + }, + { + name: 'AMBIGUITY_GAP_THRESHOLD', + kind: 'constant', + description: 'Spec literal: 15% relative-gap threshold for the ambiguous envelope (S208).', + }, + { + name: 'MAX_ALTERNATIVES', + kind: 'constant', + description: 'Spec literal: up to four alternatives surfaced in the ambiguous envelope (S208).', + }, + // Spec 0.28: user-composable bootstrap primitives. { name: 'BUILTIN_PRIMITIVE_KEYS', diff --git a/packages/core/src/memory-reflection.ts b/packages/core/src/memory-reflection.ts index 8409c20..075c684 100644 --- a/packages/core/src/memory-reflection.ts +++ b/packages/core/src/memory-reflection.ts @@ -1,6 +1,8 @@ import { randomUUID } from 'node:crypto'; import type Database from 'better-sqlite3'; import { connect, getDbPath, initDb } from './db.js'; +import { pruneInteractions } from './interactions.js'; +import { MemoryStore } from './memory.js'; function newId(): string { return randomUUID().replace(/-/g, ''); @@ -19,6 +21,8 @@ export interface ReflectionResult { edges_created: number; memories_archived: number; summary_blocks_rewritten: number; + /** Spec 0.34, S206: rows pruned from the interactions log this cycle. */ + interactions_pruned: number; duration_ms: number; } @@ -59,6 +63,14 @@ export class MemoryReflection { // 3. Summary block rewriting summaryBlocksRewritten = this.rewriteSummaryBlocks(db, now); + // 3.5. Spec 0.34 (#portfolio-memory 2.12, S206): prune the + // append-only interactions log per the retention policy. Runs in the + // reflection cycle, never on the hot read/write path. + // Read the retention policy via MemoryStore so configure_memory + // overrides land here. + const policy = new MemoryStore(this._dbPath).getInteractionsRetentionPolicy(); + const interactionsPruned = pruneInteractions(db, policy); + // 4. Log the reflection — use a sentinel memory or skip if no memories exist const anyMemory = db.prepare("SELECT id FROM memories LIMIT 1").get() as { id: string } | undefined; if (anyMemory) { @@ -76,6 +88,7 @@ export class MemoryReflection { edges_created: edgesCreated, memories_archived: memoriesArchived, summary_blocks_rewritten: summaryBlocksRewritten, + interactions_pruned: interactionsPruned, duration_ms: duration, }; } finally { diff --git a/packages/core/src/memory-retrieval.ts b/packages/core/src/memory-retrieval.ts index 306a618..0c98846 100644 --- a/packages/core/src/memory-retrieval.ts +++ b/packages/core/src/memory-retrieval.ts @@ -1,6 +1,7 @@ import type Database from 'better-sqlite3'; import { randomUUID } from 'node:crypto'; import { connect, getDbPath, initDb } from './db.js'; +import { logInteraction } from './interactions.js'; function newId(): string { return randomUUID().replace(/-/g, ''); @@ -141,6 +142,21 @@ export class MemoryRetrieval { // Log the recall this.logRecall(db, opts.query ?? null, isBootstrap ? 'bootstrap' : 'search', budget, opts.project_id ?? null, results); + // Spec 0.34 (#portfolio-memory 2.12, S204): one append-only row per + // recall call, even when the project scope is unresolved or no + // memories matched. Resolve the project name (string scope arg) to + // the row id when possible. + let projectIdForLog: number | null = null; + if (opts.project_id) { + const row = db.prepare('SELECT id FROM projects WHERE name = ?').get(opts.project_id) as { id: number } | undefined; + projectIdForLog = row?.id ?? null; + } + logInteraction(db, { + surface: 'recall', + projectId: projectIdForLog, + query: opts.query ?? null, + }); + return results; } finally { db.close(); diff --git a/packages/core/src/memory.ts b/packages/core/src/memory.ts index 3512062..9e7338e 100644 --- a/packages/core/src/memory.ts +++ b/packages/core/src/memory.ts @@ -523,6 +523,12 @@ export class MemoryStore { embedding_provider?: string; reflect_schedule?: string; reflect_threshold?: number; + // Spec 0.34 (#portfolio-memory 2.12, S206): retention knobs for the + // append-only interactions log. Defaults live in + // DEFAULT_RETENTION_POLICY; setting either here persists an override + // the reflection cycle reads when it runs the prune. + interactions_retention_days?: number; + interactions_max_rows_per_project?: number; }): Record { const db = this.open(); try { @@ -539,10 +545,19 @@ export class MemoryStore { if (opts.reflect_threshold !== undefined) { upsert.run('reflect_threshold', String(opts.reflect_threshold)); } + if (opts.interactions_retention_days !== undefined) { + upsert.run('interactions_retention_days', String(opts.interactions_retention_days)); + } + if (opts.interactions_max_rows_per_project !== undefined) { + upsert.run('interactions_max_rows_per_project', String(opts.interactions_max_rows_per_project)); + } // Return current config const rows = db.prepare( - "SELECT key, value FROM schema_meta WHERE key IN ('embedding_provider', 'reflect_schedule', 'reflect_threshold')" + `SELECT key, value FROM schema_meta WHERE key IN ( + 'embedding_provider', 'reflect_schedule', 'reflect_threshold', + 'interactions_retention_days', 'interactions_max_rows_per_project' + )` ).all() as { key: string; value: string }[]; const config: Record = {}; @@ -552,4 +567,31 @@ export class MemoryStore { db.close(); } } + + /** + * Read the interactions retention policy from schema_meta, falling back + * to DEFAULT_RETENTION_POLICY for any unset key. Used by the reflection + * cycle to drive pruneInteractions(). + * + * Spec source: #portfolio-memory (2.12) "Retention policy", S206. + */ + getInteractionsRetentionPolicy(): { retention_days: number; max_rows_per_project: number } { + const db = this.open(); + try { + const rows = db.prepare( + `SELECT key, value FROM schema_meta WHERE key IN ( + 'interactions_retention_days', 'interactions_max_rows_per_project' + )` + ).all() as { key: string; value: string }[]; + const map = new Map(rows.map(r => [r.key, r.value])); + const days = Number(map.get('interactions_retention_days')); + const max = Number(map.get('interactions_max_rows_per_project')); + return { + retention_days: Number.isFinite(days) && days > 0 ? days : 90, + max_rows_per_project: Number.isFinite(max) && max > 0 ? max : 10000, + }; + } finally { + db.close(); + } + } } diff --git a/packages/core/src/registry.ts b/packages/core/src/registry.ts index 2c63856..3ac4f3d 100644 --- a/packages/core/src/registry.ts +++ b/packages/core/src/registry.ts @@ -22,6 +22,9 @@ import { rowToProjectType, type ProjectType as UserProjectType, type ProjectTypeRow, } from './project-types.js'; import { writeFields, deserializeFieldValue } from './fields.js'; +import { normalizeRecord, normalizeList, normalize, normalizeFreeList, isVocabField } from './vocab.js'; +import { logInteraction } from './interactions.js'; +import { detectAmbiguity, type AmbiguityCandidate, type AmbiguityVerdict } from './ambiguity.js'; import { discoverPortsInPath, type DiscoveredPort } from './port-discovery.js'; import { computeNextSteps, type NextStep, type ProjectEnrichmentSnapshot } from './next-steps.js'; import { buildProjectBrief, type ProjectBrief, type ProjectBriefDigest } from './project-brief.js'; @@ -56,6 +59,14 @@ export interface RegisterExistingWorkspaceResult { // Normalize a project path before storage: expand ~ to $HOME and resolve to // absolute. Without this, literal "~/Code/foo" gets persisted and downstream // consumers (digest CLI, health checks) treat it as a real directory and miss. +/** + * Escape a string so it can be embedded as a regex literal. Used by the + * scoring helper for word-boundary name matches; cheap and self-contained. + */ +function escapeRegex(s: string): string { + return s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); +} + export function normalizeProjectPath(p: string): string { if (p === '~') return homedir(); if (p.startsWith('~/')) return resolvePath(homedir(), p.slice(2)); @@ -160,7 +171,9 @@ export class Registry { } if (opts.fields) { - writeFields(db, projectId, opts.fields, producer); + // Spec 0.34 (#capability-declarations 2.11): normalize open-vocabulary + // fields once at the write boundary. Reads return whatever was stored. + writeFields(db, projectId, normalizeRecord(opts.fields), producer); } return projectId; @@ -171,10 +184,29 @@ export class Registry { // ── Querying ────────────────────────────────────────────────── - getProject(name: string, depth: QueryDepth = 'standard'): Record | null { + /** + * Public read surface for `get_project`. Logs an `interactions` row per + * S204 — including failed lookups, which carry project_id=NULL. + * + * Internal callers that need the formatted record without polluting the + * interactions signal must use `loadProjectRecord()` instead. Only the + * MCP `get_project` dispatcher (and equivalent UI surfaces a user touches + * directly) should call this. + */ + getProject(name: string, depth: QueryDepth = 'standard', opts?: { logInteraction?: boolean }): Record | null { + const shouldLog = opts?.logInteraction !== false; const db = this.open(); try { const record = this.loadRecord(db, { name }); + if (shouldLog) { + // Spec 0.34 (#portfolio-memory 2.12, S204): one row per call, even on + // failure. Failed lookups carry project_id=NULL — the failure is a signal. + logInteraction(db, { + surface: 'get_project', + projectId: record?.id ?? null, + query: null, + }); + } if (!record) return null; return this.formatRecord(db, record, depth); } finally { @@ -184,11 +216,23 @@ export class Registry { /** * Get a project by name, throwing NotFoundError with fuzzy suggestion if not found. + * + * Logs an `interactions` row per S204 by default. Pass `{logInteraction: false}` + * for internal callers that should not contribute to the read-surface signal. */ - getProjectOrThrow(name: string, depth: QueryDepth = 'standard'): Record { + getProjectOrThrow(name: string, depth: QueryDepth = 'standard', opts?: { logInteraction?: boolean }): Record { + const shouldLog = opts?.logInteraction !== false; const db = this.open(); try { const record = this.loadRecord(db, { name }); + if (shouldLog) { + // Spec 0.34: log the call regardless of outcome (S204). + logInteraction(db, { + surface: 'get_project', + projectId: record?.id ?? null, + query: null, + }); + } if (!record) { const allNames = db.prepare('SELECT name FROM projects').all() as { name: string }[]; const closest = findClosestMatch(name, allNames.map(r => r.name)); @@ -296,7 +340,8 @@ export class Registry { } return { - project: this.getProjectOrThrow(projectName, 'standard'), + // Internal caller — do not pollute the interactions signal. + project: this.getProjectOrThrow(projectName, 'standard', { logInteraction: false }), inspection, }; } @@ -377,7 +422,15 @@ export class Registry { type_filter?: string; status_filter?: string; area_filter?: string; + /** + * Internal callers (e.g. the recall handler probing for project-scope + * ambiguity) pass `false` so this query does NOT contribute a + * `search_projects` row to interactions. Defaults to `true` to preserve + * the S204 contract for direct MCP `search_projects` calls. + */ + logInteraction?: boolean; }): Record[] { + const shouldLog = opts.logInteraction !== false; const db = this.open(); try { const q = `%${opts.query}%`; @@ -412,6 +465,20 @@ export class Registry { sql += ' ORDER BY p.name'; const rows = db.prepare(sql).all(...params) as Record[]; + + if (shouldLog) { + // Spec 0.34 (#portfolio-memory 2.12, S204): log one row per call. For + // multi-result searches we pin the row to the top hit (or NULL when + // nothing matched) — the surface name + query string are the signal + // for any consumer that wants per-query stats. + const topId = rows.length > 0 ? (rows[0].id as number) : null; + logInteraction(db, { + surface: 'search_projects', + projectId: topId, + query: opts.query, + }); + } + return rows.map(row => { const record = this.rowToRecord(db, row); return this.formatRecord(db, record, 'summary'); @@ -421,6 +488,126 @@ export class Registry { } } + /** + * Spec 0.34 (#cross-project 2.9, S208): search with the ambiguous + * envelope attached. The `result` array is unchanged from + * `searchProjects()`. The envelope adds `ambiguous: boolean` and + * `alternatives: [{name, score, why}]` when the second-place candidate + * is within ~15% of the top. + * + * Score function: a simple keyword-presence count across name, + * description, goals, topics, and field_value. This is intentionally + * blunt — the registry's ambiguity contract is "raise the flag, let the + * LLM decide" (S209). A more sophisticated ranker would be a future + * evolution; the relative-gap detection is what matters today. + * + * Additive: callers reading only `result` see identical output. + */ + searchProjectsAmbiguous(opts: { + query: string; + type_filter?: string; + status_filter?: string; + area_filter?: string; + /** + * Forwarded to `searchProjects`. Internal callers (notably the recall + * handler in MCP server.ts, which probes for project-scope ambiguity) + * pass `false` to keep a phantom `search_projects` row out of the + * interactions table. Defaults to `true`. + */ + logInteraction?: boolean; + }): { result: Record[]; ambiguous: boolean; alternatives: { name: string; score: number; why: string }[] } { + const rows = this.searchProjects(opts); + + // Review finding #10: single- and two-character queries are too noisy to + // generate useful ambiguity verdicts — every project whose name contains + // the letter scores at the same substring tier, so detectAmbiguity would + // flag the whole portfolio as alternatives. Skip ambiguity detection + // entirely below the threshold; callers see the raw result list. + const trimmed = (opts.query ?? '').trim(); + if (trimmed.length < 3) { + return { result: rows, ambiguous: false, alternatives: [] }; + } + + const candidates = this.scoreSearchCandidates(rows, opts.query); + const verdict = detectAmbiguity(candidates); + return { result: rows, ambiguous: verdict.ambiguous, alternatives: verdict.alternatives }; + } + + /** + * Score every search-result row against the query and return the + * candidates sorted DESC by score. Helper for ambiguity detection; + * not exposed on the public read surfaces. + * + * Review finding #10: scores are differentiated by match quality so the + * top of a short-query result list isn't a tied-at-5 blob that the + * ambiguity detector flags as ambiguous on every common-letter search. + * + * Tier table (highest match wins per row; no double-counting on the + * name-shaped tiers): + * exact name match → 100 + * name starts with query → 50 + * name word-boundary match → 15 + * name substring match → 5 + * description match (any) → +2 additive + * + * With differentiated tiers, "all matches score 5" only happens when + * every match is a substring-only hit — which is the correct signal for + * the ambiguity detector to fire. Common short queries with one strong + * name match now produce a clear top vs. the noise behind it. + */ + private scoreSearchCandidates( + rows: Record[], + query: string, + ): AmbiguityCandidate[] { + const q = query.toLowerCase().trim(); + if (!q) return rows.map(r => ({ name: String(r.name), score: 0, why: '' })); + const candidates: AmbiguityCandidate[] = rows.map(row => { + const name = String(row.name); + const nameLower = name.toLowerCase(); + let score = 0; + let why = 'fuzzy match'; + + // Name match tier (highest wins, no double-count). + if (nameLower === q) { + score = 100; + why = 'exact name match'; + } else if (nameLower.startsWith(q)) { + score = 50; + why = `name starts with "${q}"`; + } else if (new RegExp(`(^|[-_\\s])${escapeRegex(q)}([-_\\s]|$)`).test(nameLower)) { + // Word-boundary match (hyphen/underscore/whitespace-delimited segment). + score = 15; + why = `name word match "${q}"`; + } else if (nameLower.includes(q)) { + score = 5; + why = `name contains "${q}"`; + } + + // Description and field hits are additive small bumps — they help + // break ties between substring-only matches without overwhelming a + // strong name signal. + const desc = String(row.description ?? '').toLowerCase(); + if (desc.includes(q)) { + score += 2; + if (why === 'fuzzy match') why = `description fragment "${q}"`; + } + + // Final fallback: any indirect mention bumps an otherwise-zero + // candidate above the unranked floor so detectAmbiguity sees it. + if (score === 0) { + const blob = JSON.stringify(row).toLowerCase(); + if (blob.includes(q)) { + score = 1; + why = 'indirect match'; + } + } + + return { name, score, why }; + }); + candidates.sort((a, b) => b.score - a.score); + return candidates; + } + getRegistryStats(): { total: number; by_type: Record; @@ -1019,18 +1206,28 @@ export class Registry { const existingEntities: string[] = JSON.parse(row.entities || '[]'); const existingConcerns: string[] = JSON.parse(row.concerns || '[]'); - // Merge with union semantics + // Merge with union semantics. Spec 0.34: `topics` runs through the + // open-vocab normalizer (#capability-declarations 2.11) so case + // variants and known aliases collapse to canonical slugs. + // + // Review finding #9: goals / entities / concerns are NOT part of the + // four canonical-vocab fields, but they share a list-of-free-strings + // shape. Route them through `normalizeFreeList` so they share + // consistent lowercase-trim-dedupe semantics with each other. This + // closes the asymmetry where `enrichProject({entities: ['React.js']})` + // stored `react.js` while `topics: ['React.js']` stored `react`. After + // this fix, the same input collapses the same way per-field. const mergedGoals = profile.goals - ? [...new Set([...existingGoals, ...profile.goals])] + ? normalizeFreeList([...existingGoals, ...profile.goals]) : existingGoals; const mergedTopics = profile.topics - ? [...new Set([...existingTopics, ...profile.topics.map(t => t.toLowerCase())])] + ? normalizeList('topics', [...existingTopics, ...profile.topics]) : existingTopics; const mergedEntities = profile.entities - ? [...new Set([...existingEntities, ...profile.entities.map(e => e.toLowerCase())])] + ? normalizeFreeList([...existingEntities, ...profile.entities]) : existingEntities; const mergedConcerns = profile.concerns - ? [...new Set([...existingConcerns, ...profile.concerns.map(c => c.toLowerCase())])] + ? normalizeFreeList([...existingConcerns, ...profile.concerns]) : existingConcerns; // Write back @@ -1085,7 +1282,8 @@ export class Registry { throw new NotFoundError(name, findClosestMatch(name, allNames.map(r => r.name))); } - writeFields(db, row.id, fields, producer); + // Spec 0.34: normalize open-vocabulary fields at the write boundary. + writeFields(db, row.id, normalizeRecord(fields), producer); if (paths) { const insertPath = db.prepare( @@ -1328,10 +1526,13 @@ export class Registry { `); for (const cap of capabilities) { + // Spec 0.34: normalize capability_type at the write boundary. + // Unknown types pass through (open-vocabulary). + const normalizedType = normalize('capability_type', String(cap.capability_type)) ?? cap.capability_type; insert.run( project.id, cap.name, - cap.capability_type, + normalizedType, cap.description ?? '', cap.inputs ?? '', cap.outputs ?? '', @@ -1375,10 +1576,14 @@ export class Registry { if (!capabilityType) { throw new Error('registerCapabilitiesForType: capabilityType is required'); } + // Spec 0.34: normalize at the write boundary so the mixed-types guard + // compares canonical slugs, not raw casing. + const canonicalCapabilityType = normalize('capability_type', capabilityType) ?? capabilityType; for (const cap of capabilities) { - if (cap.capability_type !== capabilityType) { + const incomingCanonical = normalize('capability_type', String(cap.capability_type)) ?? cap.capability_type; + if (incomingCanonical !== canonicalCapabilityType) { throw new Error( - `registerCapabilitiesForType: mixed types not allowed — expected '${capabilityType}' but got '${cap.capability_type}' on capability '${cap.name}'`, + `registerCapabilitiesForType: mixed types not allowed — expected '${canonicalCapabilityType}' but got '${incomingCanonical}' on capability '${cap.name}'`, ); } } @@ -1391,7 +1596,7 @@ export class Registry { const doReplace = db.transaction(() => { db.prepare( 'DELETE FROM project_capabilities WHERE project_id = ? AND capability_type = ?', - ).run(project.id, capabilityType); + ).run(project.id, canonicalCapabilityType); const insert = db.prepare(` INSERT INTO project_capabilities @@ -1403,7 +1608,7 @@ export class Registry { insert.run( project.id, cap.name, - cap.capability_type, + canonicalCapabilityType, cap.description ?? '', cap.inputs ?? '', cap.outputs ?? '', @@ -1436,13 +1641,21 @@ export class Registry { const conditions: string[] = []; const params: unknown[] = []; + // Spec 0.34 (S204): attribute the interactions row to a project when the + // scope resolves cleanly. A query scoped to a project that does not + // exist still logs (with project_id NULL) — the failure is the signal. + let resolvedProjectId: number | null = null; if (opts?.project_name) { conditions.push('p.name = ?'); params.push(opts.project_name); + const row = db.prepare('SELECT id FROM projects WHERE name = ?').get(opts.project_name) as { id: number } | undefined; + if (row) resolvedProjectId = row.id; } if (opts?.capability_type) { + // Spec 0.34: query the canonical slug — agents may pass aliases. + const canonicalType = normalize('capability_type', opts.capability_type) ?? opts.capability_type; conditions.push('pc.capability_type = ?'); - params.push(opts.capability_type); + params.push(canonicalType); } if (opts?.keyword) { const kw = `%${opts.keyword}%`; @@ -1455,6 +1668,16 @@ export class Registry { sql += ' ORDER BY p.name, pc.name'; const rows = db.prepare(sql).all(...params) as Record[]; + + // Spec 0.34 (#portfolio-memory 2.12, S204): one row per call. Attribute + // to the resolved project when project_name was supplied and matched; + // otherwise NULL. + logInteraction(db, { + surface: 'query_capabilities', + projectId: resolvedProjectId, + query: opts?.keyword ?? opts?.capability_type ?? null, + }); + return rows.map(row => { const result: Record = { project: row.project_name, @@ -1474,6 +1697,84 @@ export class Registry { } } + // ── Vocabulary (spec 0.34, S200) ────────────────────────────── + + /** + * Build the in-use count map for one of the four open-vocabulary fields. + * Pure SQL — no LLM, no embedding, no side effects. Reads from: + * + * - `tech_stack`, `patterns`: `project_fields` (JSON arrays per project) + * - `topics`: `projects.topics` (JSON array on the projects row itself) + * - `capability_type`: `project_capabilities.capability_type` (one row per cap) + * + * Returns a map slug → count. Storage-format quirks (legacy comma-strings, + * empty arrays, NULLs) are tolerated quietly. + * + * Spec source: `#capability-declarations` (2.11) "The `vocab` MCP tool" + */ + countVocabInUse(field: 'tech_stack' | 'patterns' | 'topics' | 'capability_type'): Map { + const counts = new Map(); + const db = this.open(); + try { + if (field === 'capability_type') { + const rows = db.prepare( + 'SELECT capability_type AS slug, COUNT(*) AS n FROM project_capabilities GROUP BY capability_type', + ).all() as { slug: string | null; n: number }[]; + for (const r of rows) { + if (r.slug) counts.set(r.slug, r.n); + } + return counts; + } + + if (field === 'topics') { + const rows = db.prepare('SELECT topics FROM projects').all() as { topics: string | null }[]; + for (const r of rows) { + if (!r.topics) continue; + let parsed: unknown; + try { + parsed = JSON.parse(r.topics); + } catch { + continue; + } + if (!Array.isArray(parsed)) continue; + for (const v of parsed) { + if (typeof v === 'string' && v) counts.set(v, (counts.get(v) ?? 0) + 1); + } + } + return counts; + } + + // tech_stack / patterns — extended fields. + const rows = db.prepare( + 'SELECT field_value FROM project_fields WHERE field_name = ?', + ).all(field) as { field_value: string | null }[]; + for (const r of rows) { + if (!r.field_value) continue; + let parsed: unknown; + if (r.field_value.startsWith('[')) { + try { + parsed = JSON.parse(r.field_value); + } catch { + parsed = null; + } + } else { + // Legacy: comma-separated string. + parsed = r.field_value.split(',').map(s => s.trim()).filter(Boolean); + } + if (Array.isArray(parsed)) { + for (const v of parsed) { + if (typeof v === 'string' && v) counts.set(v, (counts.get(v) ?? 0) + 1); + } + } else if (typeof parsed === 'string' && parsed) { + counts.set(parsed, (counts.get(parsed) ?? 0) + 1); + } + } + return counts; + } finally { + db.close(); + } + } + // ── Project Digests ────────────────────────────────────────── /** diff --git a/packages/core/src/vocab.ts b/packages/core/src/vocab.ts new file mode 100644 index 0000000..3d35cf6 --- /dev/null +++ b/packages/core/src/vocab.ts @@ -0,0 +1,389 @@ +// @fctry: #capability-declarations +// +// Spec 0.34 — open-vocabulary normalization for the four normalized fields: +// `tech_stack`, `patterns`, `topics`, `capability_type`. Write-path only — +// reads return whatever is stored. Pre-0.34 rows are not migrated; they +// remain in their original form until the next write rewrites them. +// +// Contract (#capability-declarations 2.11): +// - Normalize: lowercase → trim → strip non-[a-z0-9-] → collapse repeated +// hyphens → strip leading/trailing hyphens. +// - Alias lookup against the curated reverse-map (e.g. `ts` → `typescript`). +// - Unknown terms PASS THROUGH normalized but are never rejected. +// - Duplicates collapse after normalization (`["TypeScript", "typescript"]` +// stores `["typescript"]`). +// - Stable order: first-occurrence wins. + +/** The four fields that run through the open-vocabulary normalizer. */ +export type VocabField = 'tech_stack' | 'patterns' | 'topics' | 'capability_type'; + +export const VOCAB_FIELDS: readonly VocabField[] = [ + 'tech_stack', + 'patterns', + 'topics', + 'capability_type', +] as const; + +/** + * Canonical curated vocabulary per field. These are the editorially-blessed + * slugs the `vocab` tool surfaces as the canonical set. Agents are NOT + * required to use these — anything unknown passes through normalized — but + * these are the slugs we promote in tooling and docs. + * + * Keep these lists short and intentional. Adding a slug here is a + * vocabulary-design decision, not a code change. + */ +export const CANONICAL_VOCAB: Record = { + tech_stack: [ + 'typescript', + 'javascript', + 'python', + 'rust', + 'go', + 'swift', + 'kotlin', + 'node', + 'electron', + 'react', + 'tailwind', + 'sqlite', + 'postgres', + 'redis', + 'docker', + 'fastapi', + 'express', + 'next', + 'vite', + ], + patterns: [ + 'mcp-server', + 'cli', + 'desktop-app', + 'web-app', + 'library', + 'monorepo', + 'event-driven', + 'repository-pattern', + 'factory-pattern', + 'observer-pattern', + 'singleton', + 'adapter', + 'plugin', + 'rest-api', + 'graphql', + 'webhook', + 'cron', + 'queue', + ], + topics: [ + 'agents', + 'memory', + 'registry', + 'portfolio', + 'spec-driven', + 'claude-code', + 'mcp', + 'electron', + 'tailwind', + 'typescript', + 'react', + 'vector-search', + 'fts5', + 'sqlite', + 'bootstrap', + 'health', + ], + capability_type: [ + 'tool', + 'command', + 'library', + 'export', + 'endpoint', + 'database', + 'event', + 'webhook', + 'resource', + ], +} as const; + +/** + * Alias reverse-map: each entry maps the canonical slug to the variants + * agents commonly write. The variants are matched after the canonical + * normalization (lowercase + hyphenate + strip), so we list both raw + * spellings ("TypeScript") and pre-normalized variants ("ts") so the + * normalizer can resolve them in one pass. + * + * If an alias appears under two canonical slugs, the FIRST canonical wins + * (alphabetical order of the canonical slug). This is an invariant the + * `vocab` tool can verify at load time. + */ +export const VOCAB_ALIASES: Record> = { + tech_stack: { + typescript: ['TypeScript', 'ts', 'TS'], + javascript: ['JavaScript', 'js', 'JS', 'ecmascript'], + python: ['Python', 'py', 'Py'], + node: ['Node.js', 'nodejs', 'node-js'], + react: ['React.js', 'reactjs', 'react-js'], + next: ['Next.js', 'nextjs', 'next-js'], + postgres: ['postgresql', 'pg'], + sqlite: ['SQLite', 'sqlite3'], + tailwind: ['tailwindcss', 'tailwind-css'], + fastapi: ['FastAPI', 'fast-api'], + }, + patterns: { + 'mcp-server': ['MCP Server', 'mcp_server'], + cli: ['CLI', 'command-line'], + 'desktop-app': ['Desktop App', 'desktop_app'], + 'web-app': ['Web App', 'webapp', 'web_app'], + 'repository-pattern': ['Repository Pattern', 'repository', 'repo-pattern'], + 'factory-pattern': ['Factory Pattern', 'factory'], + 'observer-pattern': ['Observer Pattern', 'observer'], + 'event-driven': ['Event Driven', 'event_driven', 'eventdriven'], + 'rest-api': ['REST API', 'rest', 'restapi'], + graphql: ['GraphQL', 'gql'], + }, + topics: { + 'claude-code': ['Claude Code', 'claudecode', 'claude_code'], + mcp: ['MCP', 'Model Context Protocol', 'model-context-protocol'], + typescript: ['TypeScript', 'ts'], + react: ['React.js', 'reactjs'], + 'vector-search': ['Vector Search', 'vectors', 'embeddings'], + fts5: ['FTS5', 'full-text-search'], + 'spec-driven': ['Spec Driven', 'spec_driven'], + }, + capability_type: { + tool: ['MCP tool', 'mcp-tool', 'mcp_tool'], + command: ['CLI command', 'cli-command', 'cli_command', 'slash-command'], + export: ['library export', 'lib-export', 'symbol'], + endpoint: ['API endpoint', 'route', 'http-endpoint'], + database: ['db', 'data-store'], + event: ['lifecycle-event'], + webhook: ['hook'], + resource: ['MCP resource'], + }, +} as const; + +/** Internal: build the lowercase-keyed reverse-map for fast lookup. */ +function buildReverseMap(field: VocabField): Map { + const map = new Map(); + for (const canonical of Object.keys(VOCAB_ALIASES[field])) { + // Canonical itself always maps to itself. + map.set(canonical, canonical); + for (const alias of VOCAB_ALIASES[field][canonical]) { + // Alias is matched AFTER normalization, so we pre-normalize it here. + const normAlias = normalizeRaw(alias); + if (normAlias && !map.has(normAlias)) { + map.set(normAlias, canonical); + } + } + } + // Canonical-only entries (no aliases listed) still match themselves. + for (const canonical of CANONICAL_VOCAB[field]) { + if (!map.has(canonical)) map.set(canonical, canonical); + } + return map; +} + +const REVERSE_MAPS: Record> = { + tech_stack: buildReverseMap('tech_stack'), + patterns: buildReverseMap('patterns'), + topics: buildReverseMap('topics'), + capability_type: buildReverseMap('capability_type'), +}; + +/** + * Raw slug normalization — the structural step before alias lookup. + * `"TypeScript 5.x!"` → `"typescript-5x"`. Returns empty string for + * inputs that normalize to nothing. + */ +function normalizeRaw(value: string): string { + return value + .toLowerCase() + .trim() + .replace(/[^a-z0-9-]+/g, '-') + .replace(/-+/g, '-') + .replace(/^-|-$/g, ''); +} + +/** + * Normalize a single value for a given field: + * 1. Apply structural normalization (lowercase + hyphenate + strip). + * 2. Look up the alias reverse-map; if present, return the canonical slug. + * 3. Otherwise return the structurally-normalized form (open-vocabulary — + * unknown terms pass through, never rejected). + * + * Returns `null` for inputs that normalize to empty (whitespace-only, + * punctuation-only). + */ +export function normalize(field: VocabField, value: string): string | null { + if (typeof value !== 'string') return null; + const raw = normalizeRaw(value); + if (!raw) return null; + return REVERSE_MAPS[field].get(raw) ?? raw; +} + +/** + * Normalize a list of values for a given field, collapsing duplicates + * and preserving first-occurrence order. Stable, deterministic — useful + * for tests and for the `vocab` tool's `in_use` rebuild. + */ +export function normalizeList(field: VocabField, values: readonly string[]): string[] { + const seen = new Set(); + const out: string[] = []; + for (const v of values) { + const norm = normalize(field, v); + if (norm && !seen.has(norm)) { + seen.add(norm); + out.push(norm); + } + } + return out; +} + +/** + * Normalize a free-form field value. Accepts string (single canonical), + * array (list), JSON-array string (legacy storage), or any other shape + * (returned unchanged). This is the entry point write paths call before + * persisting. Returns the canonical form ready for `serializeFieldValue`. + */ +export function normalizeFieldValue(field: VocabField, value: unknown): unknown { + if (value == null) return value; + if (Array.isArray(value)) { + return normalizeList(field, value.map(String)); + } + if (typeof value === 'string') { + // Distinguish JSON-array-string ("[a,b]") from a single scalar. + if (value.startsWith('[')) { + try { + const parsed = JSON.parse(value); + if (Array.isArray(parsed)) return normalizeList(field, parsed.map(String)); + } catch { + // Not JSON — fall through and normalize as a scalar. + } + } + return normalize(field, value) ?? value; + } + // Unknown shape — pass through. The serializer will JSON-stringify it. + return value; +} + +/** Returns true if `field` is one of the four normalized open-vocab fields. */ +export function isVocabField(field: string): field is VocabField { + return (VOCAB_FIELDS as readonly string[]).includes(field); +} + +/** + * Lowercase + trim + dedupe a free-form list. Unlike {@link normalizeList}, + * this does NOT apply structural slug-normalization or alias resolution — + * it is the right helper for fields that are not part of the four + * canonical-vocab fields (goals, entities, concerns) but should still share + * consistent casing semantics so the same input string canonicalizes the + * same way regardless of which field it's written to (review finding #9). + * + * Preserves first-occurrence order. Drops values that lowercase-trim to + * the empty string. + */ +export function normalizeFreeList(values: readonly string[]): string[] { + const seen = new Set(); + const out: string[] = []; + for (const v of values) { + if (typeof v !== 'string') continue; + const lower = v.toLowerCase().trim(); + if (!lower || seen.has(lower)) continue; + seen.add(lower); + out.push(lower); + } + return out; +} + +/** + * Apply normalization to every recognized open-vocab field in a record. + * Fields not in {@link VOCAB_FIELDS} are passed through untouched. + * + * Used by `register_project`, `enrich_project`, `write_fields` to normalize + * at the write boundary in a single pass. + */ +export function normalizeRecord>(record: T): T { + const out = { ...record } as Record; + for (const field of VOCAB_FIELDS) { + if (field in out) { + out[field] = normalizeFieldValue(field, out[field]); + } + } + return out as T; +} + +/** Canonical vocabulary for a field (the editorially-curated set). */ +export function getCanonical(field: VocabField): readonly string[] { + return CANONICAL_VOCAB[field]; +} + +/** + * One row of the `in_use` envelope returned by the `vocab` MCP tool: a + * stored slug and the count of rows that carry it. + */ +export interface VocabInUseEntry { + slug: string; + count: number; +} + +/** + * The full response shape returned by the `vocab` MCP tool: + * { field, canonical, in_use, aliases } + * + * - `canonical` lists the editorially-curated slugs for this field + * - `in_use` enumerates the slugs currently present in the registry with + * their occurrence counts (including non-canonical pass-through values) + * - `aliases` is the canonical-slug → variant-list reverse-map so agents + * can see what already normalizes to each canonical + * + * The `in_use` counts come from caller-supplied tallies — this module does + * not bind to a database. The MCP server builds the tallies via a SQL query + * over `project_fields` (for tech_stack, patterns, topics) or + * `project_capabilities` (for capability_type) and passes them in. + */ +export interface VocabResponse { + field: VocabField; + canonical: string[]; + in_use: VocabInUseEntry[]; + aliases: Record; +} + +/** + * Assemble the `vocab` response for one field from a count map of stored + * slugs. Pure function — does not touch the database. The MCP server is + * responsible for the SQL count query; this helper formats the envelope. + * + * The `canonical` array is sorted alphabetically; `in_use` is sorted by + * (count DESC, slug ASC) so the most-common values surface first; the + * `aliases` map keys are sorted alphabetically. Stability matters for + * agent prompts that diff this response across calls. + */ +export function assembleVocabResponse( + field: VocabField, + counts: ReadonlyMap, +): VocabResponse { + const inUse: VocabInUseEntry[] = []; + for (const [slug, count] of counts) { + if (slug && count > 0) inUse.push({ slug, count }); + } + inUse.sort((a, b) => (b.count - a.count) || a.slug.localeCompare(b.slug)); + + const aliases: Record = {}; + const sortedCanonical = Object.keys(VOCAB_ALIASES[field]).sort(); + for (const canonical of sortedCanonical) { + aliases[canonical] = [...VOCAB_ALIASES[field][canonical]]; + } + + return { + field, + canonical: [...CANONICAL_VOCAB[field]].sort(), + in_use: inUse, + aliases, + }; +} + +/** Alias reverse-map for a field (canonical slug → alias variants). */ +export function getAliases(field: VocabField): Record { + return VOCAB_ALIASES[field]; +} diff --git a/packages/core/tests/ambiguity.test.ts b/packages/core/tests/ambiguity.test.ts new file mode 100644 index 0000000..77e11d7 --- /dev/null +++ b/packages/core/tests/ambiguity.test.ts @@ -0,0 +1,178 @@ +// @fctry: #cross-project +// +// Tests for ambiguous-envelope detection on identity-shaped queries +// (S208, S209) — spec 0.34. + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { mkdtempSync, rmSync } from 'node:fs'; +import { join } from 'node:path'; +import { tmpdir } from 'node:os'; + +import { + detectAmbiguity, + AMBIGUITY_GAP_THRESHOLD, + MAX_ALTERNATIVES, + Registry, + initDb, +} from '../src/index.js'; + +describe('detectAmbiguity (pure function)', () => { + it('is unambiguous for 0 or 1 candidates', () => { + expect(detectAmbiguity([]).ambiguous).toBe(false); + expect(detectAmbiguity([{ name: 'a', score: 10 }]).ambiguous).toBe(false); + }); + + it('is unambiguous when top >> second by more than 15%', () => { + const v = detectAmbiguity([ + { name: 'top', score: 10 }, + { name: 'next', score: 1 }, + ]); + expect(v.ambiguous).toBe(false); + expect(v.alternatives).toEqual([]); + }); + + it('is ambiguous when top and second are within 15%', () => { + const v = detectAmbiguity([ + { name: 'top', score: 10 }, + { name: 'second', score: 9, why: 'fuzzy match' }, + ]); + expect(v.ambiguous).toBe(true); + expect(v.alternatives).toHaveLength(1); + expect(v.alternatives[0].name).toBe('second'); + expect(v.alternatives[0].why).toBe('fuzzy match'); + }); + + it('caps alternatives at MAX_ALTERNATIVES', () => { + const candidates = [ + { name: 'top', score: 10 }, + { name: 'a', score: 9 }, + { name: 'b', score: 9 }, + { name: 'c', score: 9 }, + { name: 'd', score: 9 }, + { name: 'e', score: 9 }, + { name: 'f', score: 9 }, + ]; + const v = detectAmbiguity(candidates); + expect(v.ambiguous).toBe(true); + expect(v.alternatives).toHaveLength(MAX_ALTERNATIVES); + }); + + it('uses relative gap, not absolute floor (high-score tie is ambiguous)', () => { + const v = detectAmbiguity([ + { name: 'top', score: 100 }, + { name: 'next', score: 95 }, + ]); + expect(v.ambiguous).toBe(true); + }); + + it('uses relative gap, not absolute floor (low-score wide gap is unambiguous)', () => { + const v = detectAmbiguity([ + { name: 'top', score: 2 }, + { name: 'next', score: 0.5 }, + ]); + expect(v.ambiguous).toBe(false); + }); + + it('exposes the spec literal threshold constant', () => { + expect(AMBIGUITY_GAP_THRESHOLD).toBe(0.15); + }); +}); + +describe('Registry.searchProjectsAmbiguous (S208 integration)', () => { + let tmpDir: string; + let registry: Registry; + + beforeEach(() => { + tmpDir = mkdtempSync(join(tmpdir(), 'setlist-amb-')); + const dbPath = join(tmpDir, 'registry.db'); + initDb(dbPath); + registry = new Registry(dbPath); + }); + + afterEach(() => { + rmSync(tmpDir, { recursive: true, force: true }); + }); + + it('returns envelope additively — result is always the bare array', () => { + registry.register({ name: 'setlist', description: 'the registry', status: 'active' }); + const env = registry.searchProjectsAmbiguous({ query: 'setlist' }); + expect(Array.isArray(env.result)).toBe(true); + expect(env.result[0]?.name).toBe('setlist'); + expect(env.ambiguous).toBe(false); + expect(env.alternatives).toEqual([]); + }); + + it('exact-name match dominates description-fragment matches (unambiguous)', () => { + registry.register({ name: 'registry', description: 'the registry service', status: 'active' }); + registry.register({ name: 'setlist', description: 'the project registry', status: 'active' }); + + const env = registry.searchProjectsAmbiguous({ query: 'registry' }); + // 'registry' exact-name = 10; 'setlist' description = 2 → unambiguous. + expect(env.ambiguous).toBe(false); + expect(env.result[0]?.name).toBe('registry'); + }); + + it('two name-fragment matches with no exact hit — ambiguous, alternative surfaced', () => { + registry.register({ name: 'knowmarks-web', description: 'web client', status: 'active' }); + registry.register({ name: 'knowmarks-ios', description: 'iOS client', status: 'active' }); + + const env = registry.searchProjectsAmbiguous({ query: 'knowmarks' }); + // Both names start with 'knowmarks' but neither is an exact match — both + // score 50 (name-prefix tier) → within 15% → ambiguous. + expect(env.ambiguous).toBe(true); + expect(env.alternatives.length).toBeGreaterThanOrEqual(1); + const names = env.alternatives.map(a => a.name); + expect(['knowmarks-web', 'knowmarks-ios']).toContain(names[0]); + }); + + // Review finding #10: differentiated scoring + min-length gate. + it('short queries (length < 3) skip ambiguity detection entirely', () => { + // 4 projects all containing the letter 'e' — pre-fix this returned + // 4 alternatives as ambiguous. + registry.register({ name: 'one', status: 'active' }); + registry.register({ name: 'two', status: 'active' }); + registry.register({ name: 'three', status: 'active' }); + registry.register({ name: 'enterprise', status: 'active' }); + + const env = registry.searchProjectsAmbiguous({ query: 'e' }); + expect(env.ambiguous).toBe(false); + expect(env.alternatives).toEqual([]); + // Result array still populated. + expect(env.result.length).toBeGreaterThanOrEqual(2); + }); + + it('two-character queries also skip ambiguity detection', () => { + registry.register({ name: 'alpha', status: 'active' }); + registry.register({ name: 'altered', status: 'active' }); + const env = registry.searchProjectsAmbiguous({ query: 'al' }); + expect(env.ambiguous).toBe(false); + }); + + it('three-character queries are eligible for ambiguity detection', () => { + registry.register({ name: 'alpha-svc', status: 'active' }); + registry.register({ name: 'alpha-app', status: 'active' }); + const env = registry.searchProjectsAmbiguous({ query: 'alp' }); + // Both names start with 'alp' → tied at 50 → ambiguous. + expect(env.ambiguous).toBe(true); + }); + + it('exact name match (score=100) beats substring match (score=5) — unambiguous', () => { + registry.register({ name: 'auth', status: 'active' }); + registry.register({ name: 'auth-service-internal', status: 'active' }); + // 'auth' is exact match (100), 'auth-service-internal' starts with 'auth' (50). + // Gap = (100-50)/100 = 50% → not ambiguous. + const env = registry.searchProjectsAmbiguous({ query: 'auth' }); + expect(env.ambiguous).toBe(false); + expect(env.result[0]?.name).toBe('auth'); + }); + + it('name prefix (score=50) beats substring match (score=5) — unambiguous', () => { + registry.register({ name: 'data-store', status: 'active' }); + registry.register({ name: 'metadata-cache', status: 'active' }); + // 'data-store' starts with 'data' → 50. 'metadata-cache' contains 'data' → 5. + // Gap = (50-5)/50 = 90% → not ambiguous. + const env = registry.searchProjectsAmbiguous({ query: 'data' }); + expect(env.ambiguous).toBe(false); + expect(env.result[0]?.name).toBe('data-store'); + }); +}); diff --git a/packages/core/tests/compatibility.test.ts b/packages/core/tests/compatibility.test.ts index edce237..a5863c8 100644 --- a/packages/core/tests/compatibility.test.ts +++ b/packages/core/tests/compatibility.test.ts @@ -27,7 +27,7 @@ describe('Schema Compatibility (S02)', () => { initDb(dbPath); const db = connect(dbPath); const meta = db.prepare("SELECT value FROM schema_meta WHERE key = 'schema_version'").get() as { value: string }; - expect(meta.value).toBe('16'); + expect(meta.value).toBe('17'); db.close(); }); @@ -115,7 +115,7 @@ describe('Schema Compatibility (S02)', () => { initDb(dbPath); const db2 = connect(dbPath); const meta = db2.prepare("SELECT value FROM schema_meta WHERE key = 'schema_version'").get() as { value: string }; - expect(meta.value).toBe('16'); + expect(meta.value).toBe('17'); // display_name column should exist now const row = db2.prepare('SELECT display_name FROM projects WHERE name = ?').get('old-project') as { display_name: string }; @@ -183,7 +183,7 @@ describe('Library Import (S22)', () => { expect(MemoryReflection).toBeDefined(); expect(initDb).toBeDefined(); expect(connect).toBeDefined(); - expect(SCHEMA_VERSION).toBe(16); + expect(SCHEMA_VERSION).toBe(17); expect(scanLocations).toBeDefined(); expect(applyProposals).toBeDefined(); expect(discoverPortsInPath).toBeDefined(); diff --git a/packages/core/tests/db.test.ts b/packages/core/tests/db.test.ts index f61aa03..801d3b8 100644 --- a/packages/core/tests/db.test.ts +++ b/packages/core/tests/db.test.ts @@ -2,7 +2,7 @@ import { describe, it, expect, beforeEach, afterEach } from 'vitest'; import { mkdtempSync, rmSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; -import { initDb, connect, SCHEMA_VERSION, getTemplateFields } from '../src/index.js'; +import { initDb, connect, SCHEMA_VERSION, getTemplateFields, Registry } from '../src/index.js'; describe('Schema Initialization (S01)', () => { let tmpDir: string; @@ -23,7 +23,7 @@ describe('Schema Initialization (S01)', () => { try { const meta = db.prepare("SELECT value FROM schema_meta WHERE key = 'schema_version'").get() as { value: string }; expect(meta.value).toBe(String(SCHEMA_VERSION)); - expect(SCHEMA_VERSION).toBe(16); + expect(SCHEMA_VERSION).toBe(17); } finally { db.close(); } @@ -163,6 +163,85 @@ describe('Schema Initialization (S01)', () => { } }); + // Review finding #3: the v16 → v17 migration backfills pre-0.34 vocabulary + // slugs so countVocabInUse stops surfacing legacy forms as orphans. + it('v16 → v17 migration normalizes legacy capability_type slugs', () => { + initDb(dbPath); + // Force-downgrade schema_meta to v16, plant a legacy slug, then re-init. + const planter = connect(dbPath); + try { + planter.prepare("INSERT INTO projects (name, type, status, description) VALUES (?, 'project', 'active', '')").run('legacy-proj'); + const projectId = (planter.prepare('SELECT id FROM projects WHERE name = ?').get('legacy-proj') as { id: number }).id; + // Insert a row with the legacy alias 'cli-command' as capability_type. + // (Direct SQL — bypasses normalize().) + planter.prepare( + `INSERT INTO project_capabilities (project_id, name, capability_type, description) VALUES (?, ?, ?, ?)` + ).run(projectId, 'foo-cmd', 'cli-command', 'A legacy CLI command row'); + // Also plant a legacy mcp-tool slug on a different capability. + planter.prepare( + `INSERT INTO project_capabilities (project_id, name, capability_type, description) VALUES (?, ?, ?, ?)` + ).run(projectId, 'bar-tool', 'mcp-tool', 'A legacy MCP tool row'); + // Plant a JSON-array tech_stack with legacy aliases. + planter.prepare( + `INSERT INTO project_fields (project_id, field_name, field_value, producer) VALUES (?, 'tech_stack', ?, 'system')` + ).run(projectId, JSON.stringify(['TypeScript', 'ts'])); + // Plant a topics column with legacy aliases. + planter.prepare("UPDATE projects SET topics = ? WHERE id = ?").run(JSON.stringify(['React.js', 'reactjs']), projectId); + // Downgrade the schema_version so initDb re-runs migrations. + planter.prepare("INSERT OR REPLACE INTO schema_meta (key, value) VALUES ('schema_version', '16')").run(); + } finally { + planter.close(); + } + + // Trigger the migration by re-initializing the db at the same path. + initDb(dbPath); + + const verify = connect(dbPath); + try { + // (1) capability_type slugs are now canonical. + const caps = verify.prepare('SELECT name, capability_type FROM project_capabilities ORDER BY name').all() as Array<{ name: string; capability_type: string }>; + const byName = new Map(caps.map(c => [c.name, c.capability_type])); + expect(byName.get('foo-cmd')).toBe('command'); + expect(byName.get('bar-tool')).toBe('tool'); + // No legacy 'cli-command' or 'mcp-tool' rows remain. + const legacyCount = caps.filter(c => c.capability_type === 'cli-command' || c.capability_type === 'mcp-tool').length; + expect(legacyCount).toBe(0); + + // (2) project_fields.tech_stack is normalized. + const techField = verify.prepare( + "SELECT field_value FROM project_fields WHERE field_name = 'tech_stack'" + ).get() as { field_value: string }; + expect(JSON.parse(techField.field_value)).toEqual(['typescript']); + + // (3) projects.topics is normalized. + const topics = verify.prepare("SELECT topics FROM projects WHERE name = 'legacy-proj'").get() as { topics: string }; + expect(JSON.parse(topics.topics)).toEqual(['react']); + + // (4) schema_meta is at SCHEMA_VERSION. + const meta = verify.prepare("SELECT value FROM schema_meta WHERE key = 'schema_version'").get() as { value: string }; + expect(meta.value).toBe(String(SCHEMA_VERSION)); + } finally { + verify.close(); + } + }); + + it('v16 → v17 migration is idempotent (re-run is a no-op)', () => { + initDb(dbPath); + const registry = new Registry(dbPath); + registry.register({ name: 'clean', status: 'active', fields: { tech_stack: ['typescript'] } }); + // Re-init — the backfill block is a no-op on already-canonical data. + initDb(dbPath); + const verify = connect(dbPath); + try { + const techField = verify.prepare( + "SELECT field_value FROM project_fields WHERE field_name = 'tech_stack'" + ).get() as { field_value: string }; + expect(JSON.parse(techField.field_value)).toEqual(['typescript']); + } finally { + verify.close(); + } + }); + it('creates expected indexes', () => { initDb(dbPath); const db = connect(dbPath); diff --git a/packages/core/tests/edge-cases.test.ts b/packages/core/tests/edge-cases.test.ts index 7129e2e..f3eb29c 100644 --- a/packages/core/tests/edge-cases.test.ts +++ b/packages/core/tests/edge-cases.test.ts @@ -78,9 +78,11 @@ describe('Edge Cases — Project Identity', () => { it('list fields are stored as JSON arrays', () => { registry.register({ name: 'list-test', type: 'project', status: 'active' }); + // Spec 0.34: tech_stack is an open-vocab field. `ts` aliases to `typescript` + // at the write boundary. registry.updateFields('list-test', { tech_stack: ['ts', 'sqlite'] }, 'fctry'); const p = registry.getProject('list-test', 'full')!; - expect((p.fields as Record).tech_stack).toBe('["ts","sqlite"]'); + expect((p.fields as Record).tech_stack).toBe('["typescript","sqlite"]'); }); // ── Update edge cases ───────────────────────────────────── diff --git a/packages/core/tests/interactions-logging.test.ts b/packages/core/tests/interactions-logging.test.ts new file mode 100644 index 0000000..0df31a3 --- /dev/null +++ b/packages/core/tests/interactions-logging.test.ts @@ -0,0 +1,508 @@ +// @fctry: #portfolio-memory +// +// Integration tests for interactions logging on every registry read surface +// (S204), derived recency/frequency (S205), retention prune (S206), and +// cascade on project delete (S207). + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { mkdtempSync, rmSync } from 'node:fs'; +import { join } from 'node:path'; +import { tmpdir } from 'node:os'; +import Database from 'better-sqlite3'; + +import { + Registry, + MemoryStore, + MemoryRetrieval, + MemoryReflection, + CrossQuery, + initDb, + getDbPath, +} from '../src/index.js'; + +function openDb(path: string): Database.Database { + return new Database(path); +} + +describe('interactions logging on registry read surfaces (S204)', () => { + let tmpDir: string; + let dbPath: string; + let registry: Registry; + + beforeEach(() => { + tmpDir = mkdtempSync(join(tmpdir(), 'setlist-interactions-')); + dbPath = join(tmpDir, 'registry.db'); + initDb(dbPath); + registry = new Registry(dbPath); + }); + + afterEach(() => { + rmSync(tmpDir, { recursive: true, force: true }); + }); + + function countInteractions(opts?: { surface?: string; projectId?: number | null }): number { + const db = openDb(dbPath); + try { + let sql = 'SELECT COUNT(*) AS n FROM interactions WHERE 1=1'; + const params: unknown[] = []; + if (opts?.surface) { + sql += ' AND surface = ?'; + params.push(opts.surface); + } + if (opts?.projectId === null) { + sql += ' AND project_id IS NULL'; + } else if (opts?.projectId !== undefined) { + sql += ' AND project_id = ?'; + params.push(opts.projectId); + } + const row = db.prepare(sql).get(...params) as { n: number }; + return row.n; + } finally { + db.close(); + } + } + + it('search_projects logs one row per call with the query string', () => { + registry.register({ name: 'searchable', status: 'active' }); + registry.searchProjects({ query: 'searchable' }); + expect(countInteractions({ surface: 'search_projects' })).toBe(1); + + // Failed search still logs (project_id NULL). + registry.searchProjects({ query: 'absolutely-no-match' }); + expect(countInteractions({ surface: 'search_projects' })).toBe(2); + expect(countInteractions({ surface: 'search_projects', projectId: null })).toBe(1); + }); + + it('get_project logs both successful and failing calls', () => { + registry.register({ name: 'p1', status: 'active' }); + registry.getProject('p1'); + registry.getProject('does-not-exist'); + expect(countInteractions({ surface: 'get_project' })).toBe(2); + expect(countInteractions({ surface: 'get_project', projectId: null })).toBe(1); + }); + + it('query_capabilities logs one row per call', () => { + registry.register({ name: 'p2', status: 'active' }); + registry.registerCapabilities('p2', [ + { name: 'tool_a', capability_type: 'tool', description: 'A' }, + ]); + registry.queryCapabilities({ project_name: 'p2' }); + expect(countInteractions({ surface: 'query_capabilities' })).toBe(1); + }); + + it('vocab is the documented read-only exception — does NOT write a row', () => { + // vocab is exposed on the MCP server, not directly on Registry. The + // dispatcher must not call logInteraction for vocab. Since this is a + // core-level test, the closest analogue is to confirm registry has no + // entry-point that names "vocab" as a surface. + registry.register({ name: 'p3', status: 'active' }); + // After a sample of read surfaces, ensure no row carries surface='vocab'. + registry.getProject('p3'); + registry.searchProjects({ query: 'p3' }); + expect(countInteractions({ surface: 'vocab' })).toBe(0); + }); + + // Review finding #4: internal callers must be able to suppress logging. + it('getProject({logInteraction:false}) does not write a row', () => { + registry.register({ name: 'silent', status: 'active' }); + const before = countInteractions({ surface: 'get_project' }); + registry.getProject('silent', 'standard', { logInteraction: false }); + expect(countInteractions({ surface: 'get_project' })).toBe(before); + // The default still logs. + registry.getProject('silent'); + expect(countInteractions({ surface: 'get_project' })).toBe(before + 1); + }); + + it('getProjectOrThrow({logInteraction:false}) does not write a row', () => { + registry.register({ name: 'silent2', status: 'active' }); + const before = countInteractions({ surface: 'get_project' }); + registry.getProjectOrThrow('silent2', 'standard', { logInteraction: false }); + expect(countInteractions({ surface: 'get_project' })).toBe(before); + }); + + it('searchProjects({logInteraction:false}) does not write a row', () => { + registry.register({ name: 'quiet', status: 'active' }); + const before = countInteractions({ surface: 'search_projects' }); + registry.searchProjects({ query: 'quiet', logInteraction: false }); + expect(countInteractions({ surface: 'search_projects' })).toBe(before); + // Default still logs. + registry.searchProjects({ query: 'quiet' }); + expect(countInteractions({ surface: 'search_projects' })).toBe(before + 1); + }); + + it('searchProjectsAmbiguous({logInteraction:false}) does not write a row', () => { + registry.register({ name: 'ambi-quiet', status: 'active' }); + const before = countInteractions({ surface: 'search_projects' }); + registry.searchProjectsAmbiguous({ query: 'ambi-quiet', logInteraction: false }); + expect(countInteractions({ surface: 'search_projects' })).toBe(before); + }); + + // Review finding #7: query_capabilities attributes to the scoped project. + it('query_capabilities({project_name}) attributes the row to that project', () => { + registry.register({ name: 'attr-proj', status: 'active' }); + registry.registerCapabilities('attr-proj', [ + { name: 'tool_a', capability_type: 'tool', description: 'A' }, + ]); + const projectId = (() => { + const db = openDb(dbPath); + try { + return (db.prepare('SELECT id FROM projects WHERE name = ?').get('attr-proj') as { id: number }).id; + } finally { + db.close(); + } + })(); + + registry.queryCapabilities({ project_name: 'attr-proj' }); + expect(countInteractions({ surface: 'query_capabilities', projectId })).toBe(1); + }); + + it('query_capabilities({project_name}) for unknown project logs with NULL project_id', () => { + registry.queryCapabilities({ project_name: 'nope-not-here' }); + expect(countInteractions({ surface: 'query_capabilities', projectId: null })).toBe(1); + }); + + it('query_capabilities() with no project scope logs with NULL project_id', () => { + registry.register({ name: 'scopeless', status: 'active' }); + registry.registerCapabilities('scopeless', [ + { name: 'tool_b', capability_type: 'tool', description: 'B' }, + ]); + registry.queryCapabilities({ capability_type: 'tool' }); + expect(countInteractions({ surface: 'query_capabilities', projectId: null })).toBe(1); + }); +}); + +describe('cross_query interactions logging (S204)', () => { + let tmpDir: string; + let dbPath: string; + + beforeEach(() => { + tmpDir = mkdtempSync(join(tmpdir(), 'setlist-cross-')); + dbPath = join(tmpDir, 'registry.db'); + initDb(dbPath); + }); + + afterEach(() => { + rmSync(tmpDir, { recursive: true, force: true }); + }); + + it('produces N rows when N projects match', () => { + const registry = new Registry(dbPath); + registry.register({ name: 'alpha-svc', description: 'Alpha service', status: 'active' }); + registry.register({ name: 'beta-svc', description: 'Beta service includes alpha', status: 'active' }); + + const cq = new CrossQuery(dbPath); + cq.query({ query: 'alpha' }); + + const db = openDb(dbPath); + try { + const rows = db.prepare("SELECT * FROM interactions WHERE surface = 'cross_query'").all() as { project_id: number | null; query: string }[]; + expect(rows.length).toBeGreaterThanOrEqual(2); + // At least one row pins to alpha-svc; alpha-svc and beta-svc both match. + const projectIds = new Set(rows.map(r => r.project_id)); + expect(projectIds.size).toBeGreaterThanOrEqual(2); + } finally { + db.close(); + } + }); + + it('zero-match cross_query produces one NULL row', () => { + const cq = new CrossQuery(dbPath); + cq.query({ query: 'no-such-project-anywhere' }); + + const db = openDb(dbPath); + try { + const rows = db.prepare("SELECT * FROM interactions WHERE surface = 'cross_query'").all() as { project_id: number | null }[]; + expect(rows.length).toBe(1); + expect(rows[0].project_id).toBeNull(); + } finally { + db.close(); + } + }); +}); + +describe('recall interactions logging (S204)', () => { + let tmpDir: string; + let dbPath: string; + + beforeEach(() => { + tmpDir = mkdtempSync(join(tmpdir(), 'setlist-recall-')); + dbPath = join(tmpDir, 'registry.db'); + initDb(dbPath); + }); + + afterEach(() => { + rmSync(tmpDir, { recursive: true, force: true }); + }); + + it('logs a recall row per call (bootstrap and search)', () => { + const registry = new Registry(dbPath); + registry.register({ name: 'memproj', status: 'active' }); + const store = new MemoryStore(dbPath); + store.retain({ content: 'fact', type: 'pattern', project_id: 'memproj' }); + + const retrieval = new MemoryRetrieval(dbPath); + retrieval.recall({ query: 'fact', project_id: 'memproj' }); + retrieval.recall({}); // bootstrap mode + + const db = openDb(dbPath); + try { + const rows = db.prepare("SELECT * FROM interactions WHERE surface = 'recall'").all(); + expect(rows.length).toBe(2); + } finally { + db.close(); + } + }); + + // Review finding #1: the MCP recall handler's ambiguity probe must NOT + // generate a phantom search_projects row alongside the recall row. + it('recall + project-scope ambiguity probe yields exactly one recall row, zero search_projects rows', () => { + const registry = new Registry(dbPath); + registry.register({ name: 'memproj', status: 'active' }); + const store = new MemoryStore(dbPath); + store.retain({ content: 'fact', type: 'pattern', project_id: 'memproj' }); + + // Mirror the MCP recall handler's flow: probe for ambiguity (must not log), + // then call MemoryRetrieval.recall (which logs surface='recall'). + registry.searchProjectsAmbiguous({ query: 'memproj', logInteraction: false }); + const retrieval = new MemoryRetrieval(dbPath); + retrieval.recall({ query: 'fact', project_id: 'memproj' }); + + const db = openDb(dbPath); + try { + const recallRows = db.prepare("SELECT * FROM interactions WHERE surface = 'recall'").all(); + const searchRows = db.prepare("SELECT * FROM interactions WHERE surface = 'search_projects'").all(); + expect(recallRows.length).toBe(1); + expect(searchRows.length).toBe(0); + } finally { + db.close(); + } + }); +}); + +describe('derived recency and frequency (S205)', () => { + let tmpDir: string; + let dbPath: string; + + beforeEach(() => { + tmpDir = mkdtempSync(join(tmpdir(), 'setlist-derived-')); + dbPath = join(tmpDir, 'registry.db'); + initDb(dbPath); + }); + + afterEach(() => { + rmSync(tmpDir, { recursive: true, force: true }); + }); + + it('portfolio_brief reports last_touched and recent_activity_count per project', () => { + const registry = new Registry(dbPath); + registry.register({ name: 'touched', status: 'active' }); + registry.register({ name: 'untouched', status: 'active' }); + + // Touch 'touched' a few times. + registry.getProject('touched'); + registry.getProject('touched'); + registry.getProject('touched'); + + const cq = new CrossQuery(dbPath); + const brief = cq.portfolioBrief(); + const touched = brief.projects.find(p => p.name === 'touched')!; + const untouched = brief.projects.find(p => p.name === 'untouched')!; + expect(touched.last_touched).not.toBeNull(); + expect(touched.recent_activity_count).toBeGreaterThanOrEqual(3); + expect(untouched.last_touched).toBeNull(); + expect(untouched.recent_activity_count).toBe(0); + }); + + it('schema has no mutable last_touched / touch_count column', () => { + const db = openDb(dbPath); + try { + const cols = db.prepare("PRAGMA table_info(projects)").all() as { name: string }[]; + const colNames = cols.map(c => c.name); + expect(colNames).not.toContain('last_touched'); + expect(colNames).not.toContain('touch_count'); + expect(colNames).not.toContain('recency'); + expect(colNames).not.toContain('frequency'); + } finally { + db.close(); + } + }); +}); + +describe('interactions retention policy (S206)', () => { + let tmpDir: string; + let dbPath: string; + + beforeEach(() => { + tmpDir = mkdtempSync(join(tmpdir(), 'setlist-retention-')); + dbPath = join(tmpDir, 'registry.db'); + initDb(dbPath); + }); + + afterEach(() => { + rmSync(tmpDir, { recursive: true, force: true }); + }); + + it('configure_memory accepts retention knobs and round-trips them', () => { + const store = new MemoryStore(dbPath); + const cfg = store.configureMemory({ + interactions_retention_days: 30, + interactions_max_rows_per_project: 500, + }); + expect(cfg.interactions_retention_days).toBe('30'); + expect(cfg.interactions_max_rows_per_project).toBe('500'); + }); + + it('reflect() runs pruneInteractions and reports the count', () => { + const registry = new Registry(dbPath); + registry.register({ name: 'p', status: 'active' }); + const db = openDb(dbPath); + try { + // Seed a row older than the default 90-day floor. + db.prepare( + "INSERT INTO interactions (project_id, surface, at) VALUES (?, ?, datetime('now', '-100 days'))", + ).run(1, 'search_projects'); + db.prepare( + "INSERT INTO interactions (project_id, surface, at) VALUES (?, ?, datetime('now', '-1 days'))", + ).run(1, 'search_projects'); + } finally { + db.close(); + } + + const reflection = new MemoryReflection(dbPath); + const result = reflection.reflect(); + expect(result.interactions_pruned).toBeGreaterThanOrEqual(1); + + const verify = openDb(dbPath); + try { + const remaining = verify.prepare('SELECT COUNT(*) AS n FROM interactions').get() as { n: number }; + expect(remaining.n).toBe(1); // the -1-day row survived + } finally { + verify.close(); + } + }); + + it('configure_memory override drives the prune', () => { + const registry = new Registry(dbPath); + registry.register({ name: 'p', status: 'active' }); + const store = new MemoryStore(dbPath); + store.configureMemory({ interactions_retention_days: 7 }); + + const db = openDb(dbPath); + try { + // -15 days is outside the configured 7-day floor. + db.prepare( + "INSERT INTO interactions (project_id, surface, at) VALUES (?, ?, datetime('now', '-15 days'))", + ).run(1, 'search_projects'); + } finally { + db.close(); + } + + const reflection = new MemoryReflection(dbPath); + const result = reflection.reflect(); + expect(result.interactions_pruned).toBeGreaterThanOrEqual(1); + }); + + // Review finding #6 (A5): the prune wraps both age-delete and cap-delete in + // a single transaction. A mid-prune failure rolls back both phases together + // — we never strand the table in a half-aged / half-capped state. + it('pruneInteractions is transactional: a cap-phase failure rolls back the age-phase deletes', async () => { + const registry = new Registry(dbPath); + registry.register({ name: 'p', status: 'active' }); + // Seed: 3 ancient rows (should be age-pruned) + many recent rows. + const seed = openDb(dbPath); + try { + const insertOld = seed.prepare( + "INSERT INTO interactions (project_id, surface, at) VALUES (?, ?, datetime('now', ?))", + ); + for (let i = 0; i < 3; i++) insertOld.run(1, 'search_projects', '-100 days'); + for (let i = 0; i < 5; i++) insertOld.run(1, 'search_projects', '-1 days'); + } finally { + seed.close(); + } + + // Direct invocation of pruneInteractions to mock a cap-phase failure. + const { pruneInteractions } = await import('../src/interactions.js'); + const Database = (await import('better-sqlite3')).default; + const db = new Database(dbPath); + + // Monkey-patch prepare so the cap-phase DELETE throws. The age-phase + // DELETE runs first; if pruneInteractions weren't transactional, those + // deletes would land before the cap-phase failed. + const origPrepare = db.prepare.bind(db); + db.prepare = ((sql: string) => { + if (sql.includes("DELETE FROM interactions WHERE id IN (")) { + throw new Error('simulated cap-phase failure'); + } + return origPrepare(sql); + }) as typeof db.prepare; + + // Set policy so cap-phase actually fires (cap = 2 means our recent rows trigger the cap). + expect(() => pruneInteractions(db, { retention_days: 90, max_rows_per_project: 2 })).toThrow(/simulated cap-phase failure/); + db.close(); + + // Verify rollback: the ancient rows that the age-phase intended to drop + // are still present, because the cap-phase failure rolled back the txn. + const verify = openDb(dbPath); + try { + const ancientLeft = verify.prepare( + "SELECT COUNT(*) AS n FROM interactions WHERE at < datetime('now', '-90 days')", + ).get() as { n: number }; + expect(ancientLeft.n).toBe(3); + } finally { + verify.close(); + } + }); +}); + +describe('archive cascade for interactions (S207)', () => { + let tmpDir: string; + let dbPath: string; + + beforeEach(() => { + tmpDir = mkdtempSync(join(tmpdir(), 'setlist-archive-')); + dbPath = join(tmpDir, 'registry.db'); + initDb(dbPath); + }); + + afterEach(() => { + rmSync(tmpDir, { recursive: true, force: true }); + }); + + it('archive is soft — does NOT cascade interactions; hard-delete DOES cascade', () => { + const registry = new Registry(dbPath); + registry.register({ name: 'doomed', status: 'active' }); + // Generate some interactions. + registry.getProject('doomed'); + registry.getProject('doomed'); + + const db = openDb(dbPath); + try { + const beforeArchive = db.prepare("SELECT COUNT(*) AS n FROM interactions WHERE project_id = (SELECT id FROM projects WHERE name = 'doomed')").get() as { n: number }; + expect(beforeArchive.n).toBeGreaterThanOrEqual(2); + } finally { + db.close(); + } + + registry.archiveProject('doomed'); + + const afterArchive = openDb(dbPath); + try { + // archive is soft; interactions remain. + const stillThere = afterArchive.prepare("SELECT COUNT(*) AS n FROM interactions WHERE project_id = (SELECT id FROM projects WHERE name = 'doomed')").get() as { n: number }; + expect(stillThere.n).toBeGreaterThanOrEqual(2); + } finally { + afterArchive.close(); + } + + // Hard delete (admin path) — ON DELETE CASCADE drops the interactions rows. + const deleteDb = openDb(dbPath); + try { + const proj = deleteDb.prepare("SELECT id FROM projects WHERE name = 'doomed'").get() as { id: number }; + deleteDb.prepare('DELETE FROM projects WHERE id = ?').run(proj.id); + const gone = deleteDb.prepare("SELECT COUNT(*) AS n FROM interactions WHERE project_id = ?").get(proj.id) as { n: number }; + expect(gone.n).toBe(0); + } finally { + deleteDb.close(); + } + }); +}); diff --git a/packages/core/tests/interactions.test.ts b/packages/core/tests/interactions.test.ts new file mode 100644 index 0000000..ed23620 --- /dev/null +++ b/packages/core/tests/interactions.test.ts @@ -0,0 +1,229 @@ +// @fctry: #portfolio-memory +// +// Schema v17 interactions table contract (S204, S205, S206, S207). + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { mkdtempSync, rmSync } from 'node:fs'; +import { join } from 'node:path'; +import { tmpdir } from 'node:os'; +import Database from 'better-sqlite3'; +import { + initDb, + SCHEMA_VERSION, + logInteraction, + lastTouchedAt, + touchCountSince, + pruneInteractions, + DEFAULT_RETENTION_POLICY, +} from '../src/index.js'; + +describe('Interactions log — schema v17', () => { + let tmpDir: string; + let dbPath: string; + let db: Database.Database; + + beforeEach(() => { + tmpDir = mkdtempSync(join(tmpdir(), 'setlist-interactions-')); + dbPath = join(tmpDir, 'registry.db'); + initDb(dbPath); + db = new Database(dbPath); + db.pragma('foreign_keys = ON'); + }); + + afterEach(() => { + db?.close(); + rmSync(tmpDir, { recursive: true, force: true }); + }); + + it('bumps SCHEMA_VERSION to 17', () => { + expect(SCHEMA_VERSION).toBe(17); + }); + + it('creates the interactions table with the expected columns', () => { + const cols = db + .prepare(`PRAGMA table_info(interactions)`) + .all() as { name: string; type: string; notnull: number }[]; + const byName = Object.fromEntries(cols.map((c) => [c.name, c])); + expect(byName.id).toBeDefined(); + expect(byName.project_id).toBeDefined(); + expect(byName.surface).toBeDefined(); + expect(byName.surface.notnull).toBe(1); + expect(byName.query).toBeDefined(); + expect(byName.at).toBeDefined(); + expect(byName.at.notnull).toBe(1); + expect(byName.session_id).toBeDefined(); + expect(byName.agent_role).toBeDefined(); + }); + + it('creates the interactions indexes', () => { + const idx = db + .prepare( + `SELECT name FROM sqlite_master WHERE type = 'index' AND name LIKE 'idx_interactions_%'`, + ) + .all() as { name: string }[]; + const names = idx.map((r) => r.name).sort(); + expect(names).toContain('idx_interactions_project_at'); + expect(names).toContain('idx_interactions_at'); + }); + + it('logInteraction appends an immutable row with at = now()', () => { + // S204: every read surface writes one append-only row at the start of the call. + const project = db + .prepare( + `INSERT INTO projects (name, display_name, type, status) VALUES ('p1','p1','project','active') RETURNING id`, + ) + .get() as { id: number }; + + logInteraction(db, { + surface: 'search_projects', + projectId: project.id, + query: 'registry', + sessionId: 's-abc', + agentRole: 'executor', + }); + + const row = db.prepare(`SELECT * FROM interactions WHERE project_id = ?`).get(project.id) as + | { id: number; surface: string; query: string; session_id: string; agent_role: string; at: string } + | undefined; + expect(row).toBeDefined(); + expect(row!.surface).toBe('search_projects'); + expect(row!.query).toBe('registry'); + expect(row!.session_id).toBe('s-abc'); + expect(row!.agent_role).toBe('executor'); + expect(row!.at).toMatch(/^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/); + }); + + it('logInteraction allows NULL project_id (failed search signal)', () => { + // S204 example: a failed get_project(name="does-not-exist") still + // produces an interactions row with project_id=NULL. + logInteraction(db, { + surface: 'get_project', + projectId: null, + query: 'does-not-exist', + }); + + const row = db + .prepare(`SELECT * FROM interactions WHERE surface = 'get_project'`) + .get() as { project_id: number | null; query: string } | undefined; + expect(row).toBeDefined(); + expect(row!.project_id).toBeNull(); + expect(row!.query).toBe('does-not-exist'); + }); + + it('logInteraction swallows errors (read surface must not fail because the log did)', () => { + // Drop the table out from under it — the call must not throw. + db.prepare(`DROP TABLE interactions`).run(); + expect(() => logInteraction(db, { surface: 'recall' })).not.toThrow(); + }); + + it('lastTouchedAt derives recency from MAX(at)', () => { + // S205: recency is computed, not stored. + const project = db + .prepare(`INSERT INTO projects (name, display_name, type, status) VALUES ('p2','p2','project','active') RETURNING id`) + .get() as { id: number }; + + expect(lastTouchedAt(db, project.id)).toBeNull(); + + logInteraction(db, { surface: 'recall', projectId: project.id }); + const at1 = lastTouchedAt(db, project.id); + expect(at1).not.toBeNull(); + expect(at1).toMatch(/^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/); + }); + + it('touchCountSince derives frequency from COUNT(*)', () => { + // S205: frequency is computed, not stored. + const project = db + .prepare(`INSERT INTO projects (name, display_name, type, status) VALUES ('p3','p3','project','active') RETURNING id`) + .get() as { id: number }; + + expect(touchCountSince(db, project.id, 7)).toBe(0); + + for (let i = 0; i < 5; i++) { + logInteraction(db, { surface: 'search_projects', projectId: project.id }); + } + expect(touchCountSince(db, project.id, 7)).toBe(5); + + // Insert an old row with an explicit at far outside the window. + db.prepare( + `INSERT INTO interactions (project_id, surface, at) VALUES (?, 'recall', datetime('now', '-10 days'))`, + ).run(project.id); + expect(touchCountSince(db, project.id, 7)).toBe(5); + }); + + it('pruneInteractions drops rows older than retention_days', () => { + // S206: retention prune by age. + const project = db + .prepare(`INSERT INTO projects (name, display_name, type, status) VALUES ('p4','p4','project','active') RETURNING id`) + .get() as { id: number }; + + db.prepare( + `INSERT INTO interactions (project_id, surface, at) VALUES (?, 'recall', datetime('now', '-100 days'))`, + ).run(project.id); + db.prepare( + `INSERT INTO interactions (project_id, surface, at) VALUES (?, 'recall', datetime('now', '-10 days'))`, + ).run(project.id); + + const pruned = pruneInteractions(db, { retention_days: 90, max_rows_per_project: 10000 }); + expect(pruned).toBe(1); + + const remaining = db + .prepare(`SELECT COUNT(*) AS n FROM interactions WHERE project_id = ?`) + .get(project.id) as { n: number }; + expect(remaining.n).toBe(1); + }); + + it('pruneInteractions enforces per-project cap, oldest first', () => { + // S206: a project with > cap rows is trimmed to cap, oldest deleted. + const project = db + .prepare(`INSERT INTO projects (name, display_name, type, status) VALUES ('p5','p5','project','active') RETURNING id`) + .get() as { id: number }; + + // Insert 15 rows with monotonic at values. + for (let i = 0; i < 15; i++) { + db.prepare( + `INSERT INTO interactions (project_id, surface, at) VALUES (?, 'recall', datetime('now', ?))`, + ).run(project.id, `-${15 - i} minutes`); + } + + const pruned = pruneInteractions(db, { retention_days: 365, max_rows_per_project: 10 }); + expect(pruned).toBe(5); + + const remaining = db + .prepare(`SELECT COUNT(*) AS n FROM interactions WHERE project_id = ?`) + .get(project.id) as { n: number }; + expect(remaining.n).toBe(10); + + // The oldest five (at -15..-11 minutes) should be gone; the newest + // ten (at -10..-1 minutes) should remain. + const oldest = db + .prepare(`SELECT MIN(at) AS at FROM interactions WHERE project_id = ?`) + .get(project.id) as { at: string }; + // The remaining oldest should be the row inserted at "now - 10 minutes". + expect(oldest.at).toBeDefined(); + }); + + it('archiving (cascading delete) clears the project interaction history (S207)', () => { + // S207: project row delete cascades to interactions via FK ON DELETE CASCADE. + const project = db + .prepare(`INSERT INTO projects (name, display_name, type, status) VALUES ('p6','p6','project','active') RETURNING id`) + .get() as { id: number }; + + for (let i = 0; i < 3; i++) { + logInteraction(db, { surface: 'search_projects', projectId: project.id }); + } + expect( + (db.prepare(`SELECT COUNT(*) AS n FROM interactions WHERE project_id = ?`).get(project.id) as { n: number }).n, + ).toBe(3); + + db.prepare(`DELETE FROM projects WHERE id = ?`).run(project.id); + + expect( + (db.prepare(`SELECT COUNT(*) AS n FROM interactions WHERE project_id = ?`).get(project.id) as { n: number }).n, + ).toBe(0); + }); + + it('DEFAULT_RETENTION_POLICY matches spec (90 days, 10000 per project)', () => { + expect(DEFAULT_RETENTION_POLICY.retention_days).toBe(90); + expect(DEFAULT_RETENTION_POLICY.max_rows_per_project).toBe(10000); + }); +}); diff --git a/packages/core/tests/project-brief.test.ts b/packages/core/tests/project-brief.test.ts index d91582c..fc91f6c 100644 --- a/packages/core/tests/project-brief.test.ts +++ b/packages/core/tests/project-brief.test.ts @@ -49,11 +49,13 @@ describe('project agent brief', () => { expect(brief.project.name).toBe('code-proj'); expect(brief.project.workspace_kind).toBe('code'); expect(brief.purpose.summary).toBe('Registry-backed TypeScript service.'); - expect(brief.profile.tech_stack).toEqual(['TypeScript', 'SQLite']); + // Spec 0.34: tech_stack and capability_type normalize at write boundary. + // 'TypeScript' → 'typescript', 'SQLite' → 'sqlite', 'cli-command' → 'command'. + expect(brief.profile.tech_stack).toEqual(['typescript', 'sqlite']); expect(brief.digest?.digest_text).toBe('Code project essence.'); expect(brief.digest?.stale).toBe(false); expect(brief.capabilities.count).toBe(2); - expect(brief.capabilities.by_type).toEqual({ 'cli-command': 1, tool: 1 }); + expect(brief.capabilities.by_type).toEqual({ command: 1, tool: 1 }); expect(brief.operations.ports.map(p => p.port)).toEqual([3777]); expect(brief.enrichment_gaps).toEqual([]); }); diff --git a/packages/core/tests/recipes-store.test.ts b/packages/core/tests/recipes-store.test.ts index 382baea..4479678 100644 --- a/packages/core/tests/recipes-store.test.ts +++ b/packages/core/tests/recipes-store.test.ts @@ -37,9 +37,9 @@ beforeEach(() => { db.pragma('foreign_keys = ON'); }); -describe('Schema v16', () => { - it('reports SCHEMA_VERSION = 16', () => { - expect(SCHEMA_VERSION).toBe(16); +describe('Schema v17', () => { + it('reports SCHEMA_VERSION = 17', () => { + expect(SCHEMA_VERSION).toBe(17); }); it('has bootstrap_primitives table', () => { @@ -56,11 +56,11 @@ describe('Schema v16', () => { expect(tables.length).toBe(1); }); - it('records schema_version = 16 in schema_meta', () => { + it('records schema_version = 17 in schema_meta', () => { const row = db .prepare(`SELECT value FROM schema_meta WHERE key = 'schema_version'`) .get() as { value: string }; - expect(row.value).toBe('16'); + expect(row.value).toBe('17'); }); it('has email_account column on projects (NULL for fresh-init existing-pattern test)', () => { diff --git a/packages/core/tests/registry.test.ts b/packages/core/tests/registry.test.ts index 4f767bb..b698237 100644 --- a/packages/core/tests/registry.test.ts +++ b/packages/core/tests/registry.test.ts @@ -676,7 +676,7 @@ describe('Registry', () => { registry.registerCapabilities('surf-proj', [ { name: 'tool_a', capability_type: 'tool', description: 'A' }, { name: 'tool_b', capability_type: 'tool', description: 'B' }, - { name: 'cmd_x', capability_type: 'cli-command', description: 'X' }, + { name: 'cmd_x', capability_type: 'command', description: 'X' }, { name: 'Lib1', capability_type: 'library', description: 'Lib1' }, ]); @@ -692,21 +692,21 @@ describe('Registry', () => { (byType[t] ??= []).push(row.name as string); } expect(byType.tool).toEqual(['tool_c']); - expect(byType['cli-command']).toEqual(['cmd_x']); + expect(byType['command']).toEqual(['cmd_x']); expect(byType.library).toEqual(['Lib1']); }); it('empty array clears that type only', () => { registry.registerCapabilities('surf-proj', [ { name: 'tool_a', capability_type: 'tool', description: 'A' }, - { name: 'cmd_x', capability_type: 'cli-command', description: 'X' }, + { name: 'cmd_x', capability_type: 'command', description: 'X' }, ]); registry.registerCapabilitiesForType('surf-proj', 'tool', []); const remaining = registry.queryCapabilities({ project_name: 'surf-proj' }); expect(remaining.length).toBe(1); - expect(remaining[0].type).toBe('cli-command'); + expect(remaining[0].type).toBe('command'); expect(remaining[0].name).toBe('cmd_x'); }); @@ -736,7 +736,7 @@ describe('Registry', () => { expect(() => registry.registerCapabilitiesForType('surf-proj', 'tool', [ { name: 'tool_a', capability_type: 'tool', description: 'A' }, - { name: 'cmd_x', capability_type: 'cli-command', description: 'X' }, + { name: 'cmd_x', capability_type: 'command', description: 'X' }, ]), ).toThrow(/mixed types/); }); @@ -747,8 +747,8 @@ describe('Registry', () => { { name: 'tool_a', capability_type: 'tool', description: 'A' }, ]); // Step 2: register cli-command surface — tool surface unchanged - registry.registerCapabilitiesForType('surf-proj', 'cli-command', [ - { name: 'cmd_x', capability_type: 'cli-command', description: 'X' }, + registry.registerCapabilitiesForType('surf-proj', 'command', [ + { name: 'cmd_x', capability_type: 'command', description: 'X' }, ]); const tools = registry.queryCapabilities({ project_name: 'surf-proj', capability_type: 'tool' }); diff --git a/packages/core/tests/vocab.test.ts b/packages/core/tests/vocab.test.ts new file mode 100644 index 0000000..d769b2a --- /dev/null +++ b/packages/core/tests/vocab.test.ts @@ -0,0 +1,287 @@ +// @fctry: #capability-declarations +// +// Tests for open-vocabulary normalization (spec 0.34, S199/S201/S202 write side). + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { mkdtempSync, rmSync } from 'node:fs'; +import { join } from 'node:path'; +import { tmpdir } from 'node:os'; + +import { + normalize, + normalizeList, + normalizeFreeList, + normalizeFieldValue, + normalizeRecord, + isVocabField, + VOCAB_FIELDS, + CANONICAL_VOCAB, + VOCAB_ALIASES, + Registry, + initDb, +} from '../src/index.js'; + +describe('vocab.normalize', () => { + it('maps known aliases to canonical slugs', () => { + expect(normalize('tech_stack', 'TypeScript')).toBe('typescript'); + expect(normalize('tech_stack', 'ts')).toBe('typescript'); + expect(normalize('tech_stack', 'TS')).toBe('typescript'); + expect(normalize('patterns', 'MCP Server')).toBe('mcp-server'); + expect(normalize('patterns', 'Repository Pattern')).toBe('repository-pattern'); + expect(normalize('patterns', 'repo-pattern')).toBe('repository-pattern'); + expect(normalize('topics', 'React.js')).toBe('react'); + expect(normalize('topics', 'reactjs')).toBe('react'); + expect(normalize('capability_type', 'MCP tool')).toBe('tool'); + }); + + it('lowercases + hyphenates + strips non-alphanumerics', () => { + expect(normalize('topics', 'Claude Code!!')).toBe('claude-code'); + expect(normalize('tech_stack', 'TypeScript 5.x')).toBe('typescript-5-x'); // pass-through; version-suffix is not aliased + expect(normalize('patterns', ' CLI ')).toBe('cli'); + }); + + it('passes unknown terms through normalized but unchanged in meaning', () => { + expect(normalize('topics', 'claude-code')).toBe('claude-code'); + expect(normalize('tech_stack', 'snowflake')).toBe('snowflake'); + expect(normalize('patterns', 'novel-pattern')).toBe('novel-pattern'); + }); + + it('returns null for whitespace-only or punctuation-only inputs', () => { + expect(normalize('tech_stack', ' ')).toBeNull(); + expect(normalize('tech_stack', '!!!')).toBeNull(); + expect(normalize('tech_stack', '')).toBeNull(); + }); + + it('never rejects unknown — open-vocabulary contract', () => { + expect(() => normalize('tech_stack', 'definitely-not-a-real-language')).not.toThrow(); + expect(normalize('tech_stack', 'definitely-not-a-real-language')).toBe('definitely-not-a-real-language'); + }); +}); + +describe('vocab.normalizeList', () => { + it('collapses duplicates after normalization', () => { + expect(normalizeList('tech_stack', ['TypeScript', 'typescript', 'ts'])).toEqual(['typescript']); + expect(normalizeList('patterns', ['MCP Server', 'mcp-server'])).toEqual(['mcp-server']); + }); + + it('preserves first-occurrence order (stable)', () => { + expect(normalizeList('tech_stack', ['React.js', 'TypeScript', 'reactjs'])).toEqual(['react', 'typescript']); + }); + + it('filters empty inputs', () => { + expect(normalizeList('tech_stack', ['', ' ', 'ts', '!!!'])).toEqual(['typescript']); + }); +}); + +describe('vocab.normalizeFieldValue', () => { + it('handles arrays', () => { + expect(normalizeFieldValue('tech_stack', ['TypeScript', 'ts'])).toEqual(['typescript']); + }); + + it('handles single string values', () => { + expect(normalizeFieldValue('capability_type', 'MCP tool')).toBe('tool'); + }); + + it('handles JSON-array strings (legacy storage)', () => { + expect(normalizeFieldValue('tech_stack', '["TypeScript", "ts"]')).toEqual(['typescript']); + }); + + it('passes through null and undefined unchanged', () => { + expect(normalizeFieldValue('tech_stack', null)).toBeNull(); + expect(normalizeFieldValue('tech_stack', undefined)).toBeUndefined(); + }); +}); + +describe('vocab.normalizeRecord', () => { + it('normalizes only the four open-vocab fields', () => { + const out = normalizeRecord({ + tech_stack: ['TypeScript', 'ts'], + patterns: ['MCP Server'], + topics: ['React.js'], + capability_type: 'MCP tool', + description: 'TypeScript is fun', // non-vocab field — untouched + }); + expect(out.tech_stack).toEqual(['typescript']); + expect(out.patterns).toEqual(['mcp-server']); + expect(out.topics).toEqual(['react']); + expect(out.capability_type).toBe('tool'); + expect(out.description).toBe('TypeScript is fun'); + }); + + it('passes through records without vocab fields', () => { + const input = { description: 'x', goals: ['y'] }; + expect(normalizeRecord(input)).toEqual(input); + }); +}); + +describe('vocab.isVocabField', () => { + it('identifies the four normalized fields', () => { + for (const f of VOCAB_FIELDS) expect(isVocabField(f)).toBe(true); + }); + + it('rejects other field names', () => { + expect(isVocabField('description')).toBe(false); + expect(isVocabField('goals')).toBe(false); + expect(isVocabField('entities')).toBe(false); + }); +}); + +describe('vocab.CANONICAL_VOCAB / VOCAB_ALIASES invariants', () => { + it('every canonical alias key has a canonical slug entry', () => { + for (const field of VOCAB_FIELDS) { + for (const canonical of Object.keys(VOCAB_ALIASES[field])) { + // The normalizer must round-trip the canonical to itself. + expect(normalize(field, canonical)).toBe(canonical); + } + } + }); + + it('canonical sets contain only properly-shaped slugs', () => { + for (const field of VOCAB_FIELDS) { + for (const slug of CANONICAL_VOCAB[field]) { + expect(slug).toMatch(/^[a-z0-9][a-z0-9-]*$/); + } + } + }); +}); + +// ──────────────────────────────────────────────────────────────────── +// Integration: registry write paths apply normalization once on write. +// ──────────────────────────────────────────────────────────────────── + +describe('Registry write paths apply vocab normalization', () => { + let tmpDir: string; + let dbPath: string; + let registry: Registry; + + beforeEach(() => { + tmpDir = mkdtempSync(join(tmpdir(), 'setlist-vocab-')); + dbPath = join(tmpDir, 'registry.db'); + initDb(dbPath); + registry = new Registry(dbPath); + }); + + afterEach(() => { + rmSync(tmpDir, { recursive: true, force: true }); + }); + + it('register() normalizes fields at the write boundary (S199)', () => { + registry.register({ + name: 'proj-a', + status: 'active', + fields: { + tech_stack: ['TypeScript', 'typescript', 'ts'], + patterns: ['MCP Server'], + }, + }); + const proj = registry.getProject('proj-a', 'full') as any; + expect(proj.fields.tech_stack).toBe(JSON.stringify(['typescript'])); + expect(proj.fields.patterns).toBe(JSON.stringify(['mcp-server'])); + }); + + it('updateFields() normalizes (S201 round-trip)', () => { + registry.register({ name: 'proj-b', status: 'active' }); + registry.updateFields( + 'proj-b', + { patterns: ['Repository Pattern', 'repository-pattern', 'repo-pattern'] }, + 'agent', + ); + const proj = registry.getProject('proj-b', 'full') as any; + expect(proj.fields.patterns).toBe(JSON.stringify(['repository-pattern'])); + }); + + it('enrichProject() normalizes topics', () => { + registry.register({ name: 'proj-c', status: 'active' }); + registry.enrichProject('proj-c', { topics: ['React.js', 'reactjs', 'TypeScript'] }); + const proj = registry.getProject('proj-c', 'full') as any; + // topics arrives as a parsed array via formatRecord + expect(proj.topics).toEqual(['react', 'typescript']); + }); + + it('registerCapabilities() normalizes capability_type', () => { + registry.register({ name: 'proj-d', status: 'active' }); + registry.registerCapabilities('proj-d', [ + { name: 'foo', capability_type: 'MCP tool', description: 'x' }, + { name: 'bar', capability_type: 'mcp-tool', description: 'y' }, + ]); + const caps = registry.queryCapabilities({ project_name: 'proj-d' }); + for (const c of caps) { + expect(c.type).toBe('tool'); + } + }); + + it('idempotent write boundary: pre-normalized input stays stable', () => { + registry.register({ + name: 'proj-e', + status: 'active', + fields: { tech_stack: ['typescript'] }, + }); + registry.updateFields('proj-e', { tech_stack: ['typescript'] }, 'system'); + const proj = registry.getProject('proj-e', 'full') as any; + expect(proj.fields.tech_stack).toBe(JSON.stringify(['typescript'])); + }); + + // Review finding #9: free-form list fields share consistent casing + // semantics. The same input string canonicalizes the same way regardless + // of which open-list field it's written to. + it('enrichProject() applies consistent free-list normalization to goals, entities, concerns', () => { + registry.register({ name: 'proj-asym', status: 'active' }); + registry.enrichProject('proj-asym', { + goals: ['Ship It', 'ship it', ' SHIP IT '], + entities: ['React.js', 'react.js'], + concerns: ['Perf', 'PERF', 'perf'], + }); + const proj = registry.getProject('proj-asym', 'full') as any; + // All three free-list fields lowercase-trim-dedupe. + expect(proj.goals).toEqual(['ship it']); + expect(proj.entities).toEqual(['react.js']); + expect(proj.concerns).toEqual(['perf']); + }); + + it('enrichProject() preserves first-occurrence order across free-list fields', () => { + registry.register({ name: 'proj-order', status: 'active' }); + registry.enrichProject('proj-order', { + goals: ['Beta', 'alpha', 'beta'], + entities: ['Z-Service', 'A-Service'], + }); + const proj = registry.getProject('proj-order', 'full') as any; + expect(proj.goals).toEqual(['beta', 'alpha']); + expect(proj.entities).toEqual(['z-service', 'a-service']); + }); + + // Review finding #9 (corollary): topics still gets the open-vocab + // canonicalization path — same input collapses to a canonical slug, + // while entities lowercase-trims the same input. The two fields now + // share *consistent* semantics for casing/dedup even though only topics + // applies alias resolution. + it('topics canonicalizes; entities preserves the original lowercase form', () => { + registry.register({ name: 'proj-asym2', status: 'active' }); + registry.enrichProject('proj-asym2', { + topics: ['React.js'], + entities: ['React.js'], + }); + const proj = registry.getProject('proj-asym2', 'full') as any; + // topics: canonicalized via alias to 'react' + expect(proj.topics).toEqual(['react']); + // entities: lowercase-trimmed (kept readable; not slug-normalized) + expect(proj.entities).toEqual(['react.js']); + }); +}); + +describe('vocab.normalizeFreeList (review finding #9)', () => { + it('lowercases, trims, and dedupes', () => { + expect(normalizeFreeList(['Foo', 'foo', ' FOO '])).toEqual(['foo']); + }); + + it('preserves first-occurrence order', () => { + expect(normalizeFreeList(['B', 'A', 'a', 'b'])).toEqual(['b', 'a']); + }); + + it('drops empty values', () => { + expect(normalizeFreeList(['', ' ', 'x'])).toEqual(['x']); + }); + + it('ignores non-string entries', () => { + expect(normalizeFreeList(['a', 42 as unknown as string, null as unknown as string, 'b'])).toEqual(['a', 'b']); + }); +}); diff --git a/packages/mcp/package.json b/packages/mcp/package.json index e95b415..53b7273 100644 --- a/packages/mcp/package.json +++ b/packages/mcp/package.json @@ -1,6 +1,6 @@ { "name": "@setlist/mcp", - "version": "0.6.1-beta.11", + "version": "0.6.1-beta.18", "license": "Apache-2.0", "type": "module", "main": "./dist/index.js", diff --git a/packages/mcp/src/onboarding.ts b/packages/mcp/src/onboarding.ts index 456c92e..3023627 100644 --- a/packages/mcp/src/onboarding.ts +++ b/packages/mcp/src/onboarding.ts @@ -21,16 +21,21 @@ export const ONBOARDING_RESOURCE_DESCRIPTION = export const ONBOARDING_RESOURCE_MIME_TYPE = 'text/markdown'; /** - * Paragraph returned at MCP `initialize`. Pointer-shaped: names what setlist - * is, the four-step workflow verbs in order, the capability item shape, and - * points at ONBOARDING_RESOURCE_URI for depth. No tool schemas, no field - * docs, no example payloads — those live in ONBOARDING_DOC. + * Paragraph returned at MCP `initialize`. Spec 0.34 (#capability-declarations + * 2.11, S197): rewritten from descriptive overview to imperative + * proactive-use directive. The opener names "USE PROACTIVELY" so the + * three rules — LOOK UP FIRST, CAPTURE OPPORTUNISTICALLY, STAY CONSISTENT + * — sit at the top, and the four-step workflow + capability shape + + * resource pointer sit underneath. Under 150 words. */ export const ONBOARDING_INSTRUCTIONS = - 'Setlist is the project registry — a local SQLite-backed catalog that gives every project in the user\'s ecosystem a queryable identity, capability declarations, portfolio memory, and a per-project essence digest. ' + - 'To onboard a project from any MCP client, follow four steps in order: register_project (claim a name and area), enrich_project (add goals, topics, entities), write_fields (write description, tech_stack, patterns and similar structured fields), refresh_project_digest (generate or update the essence summary). ' + - 'Declare integration surfaces via register_capabilities, where each item is an object with three required fields — name, capability_type, description — plus five optional fields (inputs, outputs, requires_auth, invocation_model, audience). ' + - `For the full guide with field semantics and what makes a good description, read ${ONBOARDING_RESOURCE_URI}.`; + 'USE PROACTIVELY. Setlist is the project registry — call it without being asked. ' + + 'LOOK UP FIRST: before asking the user what project this is, call get_project or search_projects. ' + + 'CAPTURE OPPORTUNISTICALLY: when you learn an identity, capability, or memory, write it back immediately via register_project, register_capabilities, or retain. ' + + 'STAY CONSISTENT: call vocab(field) before writing tech_stack, patterns, topics, or capability_type to align with the canonical vocabulary. ' + + 'To onboard a new project, run four steps in order: register_project, enrich_project, write_fields, refresh_project_digest. ' + + 'Declare integration surfaces via register_capabilities; each item carries required name, capability_type, description plus optional inputs, outputs, requires_auth, invocation_model, audience. ' + + `For full field semantics, read ${ONBOARDING_RESOURCE_URI}.`; /** * Full enrichment guide. Single source of truth for agent-facing onboarding diff --git a/packages/mcp/src/server.ts b/packages/mcp/src/server.ts index cc4dad3..9cc2e2c 100644 --- a/packages/mcp/src/server.ts +++ b/packages/mcp/src/server.ts @@ -9,9 +9,17 @@ import { Registry, MemoryStore, MemoryRetrieval, MemoryReflection, CrossQuery, Bootstrap, HealthAssessor, PrimitivesRegistry, + InvalidInputError, + assembleVocabResponse, + isVocabField, + VOCAB_FIELDS, + detectAmbiguity, + MAX_ALTERNATIVES, type CapabilityDeclaration, type QueryDepth, type BootstrapPendingState, type PrimitiveDefinition, + type VocabField, + type AmbiguityCandidate, } from '@setlist/core'; import { selfRegisterCapabilities, stderrLogger, SELF_REGISTER_PROJECT, type Logger } from './self-register.js'; import { @@ -60,6 +68,8 @@ export const MCP_TOOL_DEFINITIONS: McpToolDefinition[] = [ // Capabilities (2) { name: 'register_capabilities', description: 'Write a project\'s complete capability set (replace semantics).', inputSchema: { type: 'object' as const, properties: { project_name: { type: 'string' }, capabilities: { type: 'array', items: { type: 'object', properties: { name: { type: 'string', description: 'Capability identifier, unique within project + capability_type' }, capability_type: { type: 'string', description: 'Kind of capability, e.g. "mcp-tool", "cli-command", "library-export"' }, description: { type: 'string' }, inputs: { type: 'string', description: 'Optional input contract' }, outputs: { type: 'string', description: 'Optional output contract' }, requires_auth: { type: 'boolean' }, invocation_model: { type: 'string' }, audience: { type: 'string' } }, required: ['name', 'capability_type', 'description'] } } }, required: ['project_name', 'capabilities'] } }, { name: 'query_capabilities', description: 'Discover capabilities across the ecosystem by project, type, or keyword.', inputSchema: { type: 'object' as const, properties: { project_name: { type: 'string' }, type: { type: 'string' }, keyword: { type: 'string' } } } }, + // Vocabulary (1) + { name: 'vocab', description: 'Inspect the open-vocabulary set for one of the four normalized fields (tech_stack, patterns, topics, capability_type). Returns the editorially-curated canonical slugs, the live in-use values with counts, and the alias reverse-map. Read-only — does NOT write a row to interactions. Idempotent contract.', inputSchema: { type: 'object' as const, properties: { field: { type: 'string', enum: ['tech_stack', 'patterns', 'topics', 'capability_type'], description: 'Which normalized field to inspect.' } }, required: ['field'] } }, // Memory Agent (5) { name: 'retain', description: 'Store a memory. Suggestion: use recall() to retrieve.', inputSchema: { type: 'object' as const, properties: { content: { type: 'string' }, type: { type: 'string', enum: ['decision', 'outcome', 'pattern', 'preference', 'dependency', 'correction', 'learning', 'context', 'procedural', 'observation'] }, project: { type: 'string' }, scope: { type: 'string', enum: ['project', 'area', 'portfolio', 'global'] }, tags: { type: 'array', items: { type: 'string' } }, session_id: { type: 'string' }, agent_role: { type: 'string' }, belief: { type: 'string', enum: ['fact', 'opinion', 'hypothesis'] }, extraction_confidence: { type: 'number' }, valid_from: { type: 'string' }, valid_until: { type: 'string' }, entities: { type: 'array', items: { type: 'object', properties: { name: { type: 'string' }, type: { type: 'string' } }, required: ['name', 'type'] } }, parent_version_id: { type: 'string' } }, required: ['content', 'type'] } }, { name: 'recall', description: 'Retrieve relevant memories. Omit query for bootstrap mode. Suggestion: use retain() to capture new knowledge.', inputSchema: { type: 'object' as const, properties: { query: { type: 'string' }, project: { type: 'string' }, token_budget: { type: 'number' } } } }, @@ -71,7 +81,7 @@ export const MCP_TOOL_DEFINITIONS: McpToolDefinition[] = [ { name: 'correct', description: 'Create a correction memory superseding an existing one (admin tool).', inputSchema: { type: 'object' as const, properties: { memory_id: { type: 'string' }, correction: { type: 'string' } }, required: ['memory_id', 'correction'] } }, { name: 'forget', description: 'Archive a specific memory (admin tool, soft delete).', inputSchema: { type: 'object' as const, properties: { memory_id: { type: 'string' } }, required: ['memory_id'] } }, { name: 'inspect_memory', description: 'View full memory details including provenance (admin tool).', inputSchema: { type: 'object' as const, properties: { memory_id: { type: 'string' } }, required: ['memory_id'] } }, - { name: 'configure_memory', description: 'Set memory configuration: embedding provider, reflect settings (admin tool).', inputSchema: { type: 'object' as const, properties: { embedding_provider: { type: 'string', enum: ['openai', 'ollama', 'none'] }, reflect_schedule: { type: 'string' }, reflect_threshold: { type: 'number' } } } }, + { name: 'configure_memory', description: 'Set memory configuration: embedding provider, reflect settings, and interactions retention knobs (admin tool). interactions_retention_days defaults to 90, interactions_max_rows_per_project defaults to 10000. The two bounds apply independently — a row is pruned when EITHER older than the retention floor OR the per-project count exceeds the cap (whichever is more restrictive wins). Both knobs must be positive integers; non-numeric or non-positive values are rejected.', inputSchema: { type: 'object' as const, properties: { embedding_provider: { type: 'string', enum: ['openai', 'ollama', 'none'] }, reflect_schedule: { type: 'string' }, reflect_threshold: { type: 'number' }, interactions_retention_days: { type: 'number', description: 'Age floor in days for interactions log retention. Positive integer; default 90.' }, interactions_max_rows_per_project: { type: 'number', description: 'Per-project row cap on interactions log. Positive integer; default 10000.' } } } }, // Ports (4) { name: 'claim_port', description: 'Claim a port for a project\'s service.', inputSchema: { type: 'object' as const, properties: { project_name: { type: 'string' }, service_label: { type: 'string' }, port: { type: 'number' }, protocol: { type: 'string', default: 'tcp' } }, required: ['project_name', 'service_label'] } }, { name: 'release_port', description: 'Release a previously claimed port.', inputSchema: { type: 'object' as const, properties: { project_name: { type: 'string' }, port: { type: 'number' } }, required: ['project_name', 'port'] } }, @@ -157,7 +167,8 @@ export function createServer(dbPath?: string, options: CreateServerOptions = {}) if (!options.skipSelfRegister) { const logger = options.logger ?? stderrLogger; try { - const existing = registry.getProject(SELF_REGISTER_PROJECT, 'minimal'); + // Self-register probe — internal call, must not contribute to S204 signal. + const existing = registry.getProject(SELF_REGISTER_PROJECT, 'minimal', { logInteraction: false }); if (!existing) { registry.register({ name: SELF_REGISTER_PROJECT, @@ -253,14 +264,24 @@ export function createServer(dbPath?: string, options: CreateServerOptions = {}) case 'switch_project': result = registry.switchProject(a.name as string); break; - case 'search_projects': - result = registry.searchProjects({ + case 'search_projects': { + // Spec 0.34 (#cross-project 2.9, S208) — review finding #2: the + // payload shape is now consistent. Always return + // `{result: Project[], ambiguous: bool, alternatives: [...]}` so + // callers iterating `response.result` work regardless of whether + // the ambiguity envelope triggered. The pre-0.34 contract "bare + // array on unambiguous queries" produced unpredictable crashes + // when ambiguity surfaced; consistent shape costs a single + // wrapper object and removes the foot-gun. + const env = registry.searchProjectsAmbiguous({ query: a.query as string, type_filter: a.type_filter as string | undefined, status_filter: a.status_filter as string | undefined, area_filter: a.area_filter as string | undefined, }); + result = { result: env.result, ambiguous: env.ambiguous, alternatives: env.alternatives }; break; + } case 'get_registry_stats': result = registry.getRegistryStats(); break; @@ -300,7 +321,9 @@ export function createServer(dbPath?: string, options: CreateServerOptions = {}) // spec 0.29: allow updating email_account (null or "" clears) email_account: (a.email_account === null ? null : a.email_account as string | undefined), }); - result = registry.getProject(a.name as string, 'summary'); + // update_project return — internal post-update read, not a user-initiated + // get_project. Suppress the interactions log to keep S204 signal clean. + result = registry.getProject(a.name as string, 'summary', { logInteraction: false }); break; case 'set_project_area': result = registry.setProjectArea( @@ -389,6 +412,19 @@ export function createServer(dbPath?: string, options: CreateServerOptions = {}) }); break; + // Vocabulary (spec 0.34, S200) — read-only, no interactions row. + case 'vocab': { + const field = a.field as string | undefined; + if (!field || !isVocabField(field)) { + throw new Error( + `vocab: unknown field '${field ?? '(missing)'}'. Supported fields: ${VOCAB_FIELDS.join(', ')}.`, + ); + } + const counts = registry.countVocabInUse(field as VocabField); + result = assembleVocabResponse(field as VocabField, counts); + break; + } + // Memory Agent case 'retain': { const retainResult = memoryStore.retain({ @@ -409,13 +445,30 @@ export function createServer(dbPath?: string, options: CreateServerOptions = {}) result = retainResult; break; } - case 'recall': - result = memoryRetrieval.recall({ + case 'recall': { + // Spec 0.34 (#cross-project 2.9, S208) — review finding #2: always + // envelope-shape so `response.result` is the memories array on + // every call. Ambiguity defaults to false / empty when project + // scope was unambiguous or absent. The pre-0.34 "bare array + // unless ambiguous" contract foot-gunned callers iterating the + // top-level response. + const projectScope = a.project as string | undefined; + let projectAmbiguity: { ambiguous: boolean; alternatives: { name: string; score: number; why: string }[] } = { ambiguous: false, alternatives: [] }; + if (projectScope) { + // Internal ambiguity probe — must not generate a phantom + // `search_projects` interactions row alongside the real `recall`. + // One recall call → exactly one interactions row (surface='recall'). + const env = registry.searchProjectsAmbiguous({ query: projectScope, logInteraction: false }); + if (env.ambiguous) projectAmbiguity = { ambiguous: true, alternatives: env.alternatives }; + } + const memories = memoryRetrieval.recall({ query: a.query as string | undefined, - project_id: a.project as string | undefined, + project_id: projectScope, token_budget: a.token_budget as number | undefined, }); + result = { result: memories, ambiguous: projectAmbiguity.ambiguous, alternatives: projectAmbiguity.alternatives }; break; + } case 'feedback': result = memoryStore.feedback({ memory_ids: a.memory_ids as string[], @@ -449,13 +502,49 @@ export function createServer(dbPath?: string, options: CreateServerOptions = {}) case 'inspect_memory': result = memoryStore.inspectMemory(a.memory_id as string); break; - case 'configure_memory': + case 'configure_memory': { + // Review finding #5: validate the interactions retention knobs at the + // MCP boundary so non-numeric inputs (e.g. "forever") and negative or + // fractional values fail loudly instead of silently coercing to NaN + // and falling back to the 90-day default. + // + // Allowed shape: positive finite integer. The cast `as number` in + // memory.ts does no runtime check, so the boundary owns the contract. + const validateRetentionKnob = (name: string, value: unknown): number | undefined => { + if (value === undefined) return undefined; + if (typeof value !== 'number') { + throw new InvalidInputError( + `configure_memory.${name} must be a positive integer; got ${typeof value} '${String(value)}'.`, + ); + } + if (!Number.isFinite(value)) { + throw new InvalidInputError(`configure_memory.${name} must be finite; got ${value}.`); + } + if (!Number.isInteger(value)) { + throw new InvalidInputError( + `configure_memory.${name} must be an integer; got ${value}. Round to a whole number of days/rows.`, + ); + } + if (value <= 0) { + throw new InvalidInputError( + `configure_memory.${name} must be a positive integer; got ${value}.`, + ); + } + return value; + }; + const retentionDays = validateRetentionKnob('interactions_retention_days', a.interactions_retention_days); + const maxRows = validateRetentionKnob('interactions_max_rows_per_project', a.interactions_max_rows_per_project); + result = memoryStore.configureMemory({ embedding_provider: a.embedding_provider as string | undefined, reflect_schedule: a.reflect_schedule as string | undefined, reflect_threshold: a.reflect_threshold as number | undefined, + // Spec 0.34 (S206): interactions retention knobs. + interactions_retention_days: retentionDays, + interactions_max_rows_per_project: maxRows, }); break; + } // Ports case 'claim_port': @@ -498,7 +587,36 @@ export function createServer(dbPath?: string, options: CreateServerOptions = {}) query: a.query as string, scope: a.scope as string | undefined, }); - result = cqResult; + // Spec 0.34 (#cross-project 2.9, S208) — review finding #8: + // (a) only registry-source hits count as registered-project + // alternatives. Memory and cc_memory hits often carry a + // project field of 'global' or a memory project_id that is + // not a registered project; surfacing them as ambiguity + // alternatives misled callers into resolving to non- + // projects. + // (b) replace the inline 15% / slice(1,5) logic with the shared + // `detectAmbiguity` helper + AMBIGUITY_GAP_THRESHOLD + + // MAX_ALTERNATIVES constants from ambiguity.ts so search, + // recall, and cross_query never drift apart again. + // (c) review finding #2: always envelope-shape with + // `ambiguous` and `alternatives` present (defaulting to + // false / empty) for consistency with search_projects and + // recall. + const registryCandidates: AmbiguityCandidate[] = cqResult.results + .filter(r => r.source === 'registry' && typeof r.project === 'string' && r.project.length > 0) + .map(r => ({ + name: r.project, + score: r.score, + why: r.memory_type ? `${r.source}/${r.memory_type}` : r.source, + })) + .sort((a, b) => b.score - a.score) + .slice(0, 1 + MAX_ALTERNATIVES); // top + up to MAX_ALTERNATIVES + const verdict = detectAmbiguity(registryCandidates); + result = { + ...cqResult, + ambiguous: verdict.ambiguous, + alternatives: verdict.alternatives, + }; break; } diff --git a/packages/mcp/tests/introspect-tools.test.ts b/packages/mcp/tests/introspect-tools.test.ts index dde99f4..ec2ff74 100644 --- a/packages/mcp/tests/introspect-tools.test.ts +++ b/packages/mcp/tests/introspect-tools.test.ts @@ -5,8 +5,8 @@ import { MCP_TOOL_DEFINITIONS } from '../src/server.js'; describe('introspectMcpTools (S112)', () => { const caps = introspectMcpTools(); - it('produces exactly 57 capability declarations — one per MCP tool', () => { - expect(caps).toHaveLength(57); + it('produces exactly 58 capability declarations — one per MCP tool', () => { + expect(caps).toHaveLength(58); expect(caps).toHaveLength(MCP_TOOL_DEFINITIONS.length); }); diff --git a/packages/mcp/tests/onboarding.test.ts b/packages/mcp/tests/onboarding.test.ts index 9b2a29a..5b2f1d6 100644 --- a/packages/mcp/tests/onboarding.test.ts +++ b/packages/mcp/tests/onboarding.test.ts @@ -110,6 +110,40 @@ describe('Onboarding instructions on initialize (S135)', () => { expect(ONBOARDING_INSTRUCTIONS).not.toContain('portfolio briefs'); }); + // ── S197: Server instructions are a proactive-use directive ── + + it('S197: instructions are imperative — names USE PROACTIVELY at the top', () => { + expect(ONBOARDING_INSTRUCTIONS).toContain('USE PROACTIVELY'); + // The directive must sit at or near the top — within the first 50 characters. + expect(ONBOARDING_INSTRUCTIONS.indexOf('USE PROACTIVELY')).toBeLessThan(50); + }); + + it('S197: names all three rules — LOOK UP FIRST / CAPTURE OPPORTUNISTICALLY / STAY CONSISTENT', () => { + expect(ONBOARDING_INSTRUCTIONS).toContain('LOOK UP FIRST'); + expect(ONBOARDING_INSTRUCTIONS).toContain('CAPTURE OPPORTUNISTICALLY'); + expect(ONBOARDING_INSTRUCTIONS).toContain('STAY CONSISTENT'); + }); + + it('S197: the LOOK UP FIRST rule names get_project or search_projects', () => { + // Match the rule sentence containing both LOOK UP FIRST and one of the + // expected calls. + const lookUpMatch = ONBOARDING_INSTRUCTIONS.match(/LOOK UP FIRST[^.]*\./); + expect(lookUpMatch, 'LOOK UP FIRST sentence must exist').toBeTruthy(); + expect(lookUpMatch![0]).toMatch(/get_project|search_projects/); + }); + + it('S197: the STAY CONSISTENT rule names vocab', () => { + const consistentMatch = ONBOARDING_INSTRUCTIONS.match(/STAY CONSISTENT[^.]*\./); + expect(consistentMatch, 'STAY CONSISTENT sentence must exist').toBeTruthy(); + expect(consistentMatch![0]).toMatch(/vocab/); + }); + + it('S197: directive sits ABOVE the four-step workflow', () => { + const directiveIdx = ONBOARDING_INSTRUCTIONS.indexOf('USE PROACTIVELY'); + const workflowIdx = ONBOARDING_INSTRUCTIONS.indexOf('register_project'); + expect(directiveIdx).toBeLessThan(workflowIdx); + }); + it('returns the same instructions when the registry is empty', async () => { // No projects registered, no portfolio state. The bootstrap path must // not depend on anything except the constant. diff --git a/packages/mcp/tests/self-register-integration.test.ts b/packages/mcp/tests/self-register-integration.test.ts index 5f6f3ab..667a74b 100644 --- a/packages/mcp/tests/self-register-integration.test.ts +++ b/packages/mcp/tests/self-register-integration.test.ts @@ -61,16 +61,16 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, // ── S112: Each type filter returns the expected set by surface ── - it('S112: query_capabilities returns exactly 57 tool rows, all CLI commands, and all library exports', async () => { + it('S112: query_capabilities returns exactly 58 tool rows, all CLI commands, and all library exports', async () => { const server = createServer(dbPath); const tools = await callTool(server, 'query_capabilities', { project_name: SELF_REGISTER_PROJECT, type: 'tool' }) as Array>; - expect(tools).toHaveLength(57); + expect(tools).toHaveLength(58); expect(tools.every(r => r.type === 'tool')).toBe(true); - const cmds = await callTool(server, 'query_capabilities', { project_name: SELF_REGISTER_PROJECT, type: 'cli-command' }) as Array>; + const cmds = await callTool(server, 'query_capabilities', { project_name: SELF_REGISTER_PROJECT, type: 'command' }) as Array>; expect(cmds.length).toBeGreaterThanOrEqual(8); // at minimum: init, migrate, migrate-memories, update, archive, worker, digest, ui - expect(cmds.every(r => r.type === 'cli-command')).toBe(true); + expect(cmds.every(r => r.type === 'command')).toBe(true); const libs = await callTool(server, 'query_capabilities', { project_name: SELF_REGISTER_PROJECT, type: 'library' }) as Array>; expect(libs.length).toBeGreaterThan(0); @@ -84,7 +84,7 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, // Type strings are literal (not 'mcp-tool', not 'cli', not pluralized). const types = new Set([...tools, ...cmds, ...libs].map(r => r.type)); - expect(types).toEqual(new Set(['tool', 'cli-command', 'library'])); + expect(types).toEqual(new Set(['tool', 'command', 'library'])); }); // ── S113: Restart → identical capability set ──────────────────── @@ -164,7 +164,7 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, const registry = new Registry(dbPath); const before = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'tool' }); const beforeCount = before.length; - expect(beforeCount).toBe(57); + expect(beforeCount).toBe(58); // Simulate a code change: temporarily splice an extra tool into MCP_TOOL_DEFINITIONS, // remove an existing one, then re-create the server. @@ -196,7 +196,7 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, // After "fixing" the code (restoring the list), the next boot heals the gap. createServer(dbPath); const healed = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'tool' }); - expect(healed.length).toBe(57); + expect(healed.length).toBe(58); const healedNames = new Set(healed.map(r => r.name)); expect(healedNames.has('memory_status')).toBe(true); expect(healedNames.has('__fake_debug_tool')).toBe(false); @@ -212,17 +212,17 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, registry.register({ name: 'chorus-app', type: 'project', status: 'active' }); registry.registerCapabilities('chorus-app', [ { name: 'chorus_tool', capability_type: 'tool', description: 'Chorus tool' }, - { name: 'chorus_cmd', capability_type: 'cli-command', description: 'Chorus CLI' }, + { name: 'chorus_cmd', capability_type: 'command', description: 'Chorus CLI' }, { name: 'ChorusLib', capability_type: 'library', description: 'Chorus library' }, ]); const allTools = registry.queryCapabilities({ capability_type: 'tool' }); - // setlist 57 + chorus 1 = 58 - expect(allTools.length).toBe(58); + // setlist 58 + chorus 1 = 59 + expect(allTools.length).toBe(59); expect(allTools.some(r => r.project === SELF_REGISTER_PROJECT && r.name === 'list_projects')).toBe(true); expect(allTools.some(r => r.project === 'chorus-app' && r.name === 'chorus_tool')).toBe(true); - const cmds = registry.queryCapabilities({ capability_type: 'cli-command' }); + const cmds = registry.queryCapabilities({ capability_type: 'command' }); expect(cmds.some(r => r.project === 'chorus-app' && r.name === 'chorus_cmd')).toBe(true); expect(cmds.some(r => r.project === SELF_REGISTER_PROJECT)).toBe(true); @@ -235,7 +235,7 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, expect(apiEndpoints).toEqual([]); // Combined filter ANDs. - const setlistCmds = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'cli-command' }); + const setlistCmds = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'command' }); expect(setlistCmds.every(r => r.project === SELF_REGISTER_PROJECT)).toBe(true); // Keyword crosses surfaces. @@ -243,7 +243,7 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, const matchedTypes = new Set(digestMatches.map(r => r.type)); // "digest" hits both a tool (refresh_project_digest) and a CLI command (digest). expect(matchedTypes.has('tool')).toBe(true); - expect(matchedTypes.has('cli-command')).toBe(true); + expect(matchedTypes.has('command')).toBe(true); }); // ── S117: One introspector fails → others land + warning + server responsive ── @@ -258,16 +258,16 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, description: SETLIST_CANONICAL_DESCRIPTION, area: SETLIST_CANONICAL_AREA, }); - registry.registerCapabilitiesForType(SELF_REGISTER_PROJECT, 'cli-command', [ - { name: 'prior_cmd', capability_type: 'cli-command', description: 'Prior good row' }, + registry.registerCapabilitiesForType(SELF_REGISTER_PROJECT, 'command', [ + { name: 'prior_cmd', capability_type: 'command', description: 'Prior good row' }, ], 'manual'); - // Break registerCapabilitiesForType('cli-command') so the cli introspector's + // Break registerCapabilitiesForType('command') so the cli introspector's // write path throws. (Spying on the prototype affects the Registry instance // that createServer will instantiate too, since it's the same class.) const spy = vi.spyOn(Registry.prototype, 'registerCapabilitiesForType').mockImplementation( function (this: Registry, projectName: string, capabilityType: string, capabilities, producer) { - if (capabilityType === 'cli-command') { + if (capabilityType === 'command') { throw new Error('simulated cli-command write failure'); } // Call through to the original for other types. @@ -284,7 +284,7 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, // Cleaner approach: run the real createServer once to populate setlist + capabilities, // then spy more narrowly to induce only a cli-command failure on a SECOND boot. createServer(dbPath); - const firstRun = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'cli-command' }); + const firstRun = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'command' }); expect(firstRun.length).toBeGreaterThan(0); // prior_cmd was replaced by introspected CLI command set // For the induced failure, patch via a bound reference on the instance the server will create. @@ -300,6 +300,9 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, const realMethod = Registry.prototype.registerCapabilitiesForType; const spy3 = vi.spyOn(Registry.prototype, 'registerCapabilitiesForType').mockImplementation( function (this: Registry, projectName: string, capabilityType: string, capabilities: any, producer?: string) { + // Surface label passed by self-register orchestrator is the raw label + // ('cli-command'); the registry would normalize it to 'command' + // internally, but the mock intercepts before normalization. if (capabilityType === 'cli-command') { throw new Error('simulated cli-command write failure'); } @@ -308,24 +311,24 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, ); // Seed a fresh prior-good cli-command so we can observe preservation. - realMethod.call(registry, SELF_REGISTER_PROJECT, 'cli-command', [ - { name: 'prior_cmd_2', capability_type: 'cli-command', description: 'Second prior good row' }, + realMethod.call(registry, SELF_REGISTER_PROJECT, 'command', [ + { name: 'prior_cmd_2', capability_type: 'command', description: 'Second prior good row' }, ], 'manual'); const { logger, messages } = captureLogger(); const server = createServer(dbPath, { logger }); // Exactly one warning, naming cli-command. - const cliWarn = messages.filter(m => m.includes('cli-command')); + const cliWarn = messages.filter(m => m.includes('command')); expect(cliWarn).toHaveLength(1); expect(cliWarn[0]).toMatch(/simulated cli-command write failure/); // Other two surfaces landed. const tools = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'tool' }); - expect(tools.length).toBe(57); + expect(tools.length).toBe(58); // The failing surface's prior-good rows are preserved. - const cli = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'cli-command' }); + const cli = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'command' }); expect(cli.some(r => r.name === 'prior_cmd_2')).toBe(true); // Server is responsive to a tool call. @@ -337,7 +340,7 @@ describe('MCP server startup self-registration (goal gate — S112, S113, S114, // On a subsequent clean boot, the cli surface heals. const clean = createServer(dbPath); - const healed = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'cli-command' }); + const healed = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'command' }); expect(healed.length).toBeGreaterThan(0); expect(healed.every(r => r.name !== 'prior_cmd_2')).toBe(true); // replaced with introspected set expect(clean).toBeDefined(); diff --git a/packages/mcp/tests/self-register.test.ts b/packages/mcp/tests/self-register.test.ts index 1056d2b..b32c92f 100644 --- a/packages/mcp/tests/self-register.test.ts +++ b/packages/mcp/tests/self-register.test.ts @@ -45,8 +45,10 @@ describe('selfRegisterCapabilities (S117)', () => { const t = row.type as string; byType[t] = (byType[t] ?? 0) + 1; } - expect(byType.tool).toBe(57); - expect(byType['cli-command']).toBeGreaterThan(0); + // Spec 0.34: capability_type normalizes at the write boundary. + // 'cli-command' surface label aliases to canonical slug 'command'. + expect(byType.tool).toBe(58); + expect(byType.command).toBeGreaterThan(0); expect(byType.library).toBeGreaterThan(0); }); @@ -100,7 +102,7 @@ describe('selfRegisterCapabilities (S117)', () => { // Other two surfaces wrote their rows. const tools = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'tool' }); - expect(tools.length).toBe(57); + expect(tools.length).toBe(58); const libs = registry.queryCapabilities({ project_name: SELF_REGISTER_PROJECT, capability_type: 'library' }); expect(libs.length).toBeGreaterThan(0); diff --git a/packages/mcp/tests/server.test.ts b/packages/mcp/tests/server.test.ts index 457e8cd..95dfb95 100644 --- a/packages/mcp/tests/server.test.ts +++ b/packages/mcp/tests/server.test.ts @@ -43,9 +43,9 @@ describe('MCP Server (S21)', () => { // ── Tool Registration ────────────────────────────────────── - it('registers exactly 57 tools', async () => { + it('registers exactly 58 tools', async () => { const tools = await listTools(server); - expect(tools).toHaveLength(57); + expect(tools).toHaveLength(58); }); it('registers all expected tool names', async () => { @@ -61,7 +61,7 @@ describe('MCP Server (S21)', () => { 'portfolio_brief', 'queue_task', 'recall', 'reflect', 'refresh_project_digest', 'register_capabilities', 'register_project', 'release_port', 'rename_project', 'replace_recipe', 'retain', 'search_projects', 'set_parent_project', 'set_project_area', 'switch_project', - 'update_area', 'update_primitive', 'update_project', 'update_project_type', 'query_capabilities', 'write_fields', + 'update_area', 'update_primitive', 'update_project', 'update_project_type', 'query_capabilities', 'vocab', 'write_fields', ].sort()); }); @@ -129,9 +129,14 @@ describe('MCP Server (S21)', () => { await callTool(server, 'register_project', { name: 'auth-service', description: 'Authentication microservice' }); await callTool(server, 'register_project', { name: 'web-app', description: 'Frontend application' }); - const results = await callTool(server, 'search_projects', { query: 'authentication' }) as Record[]; - expect(results).toHaveLength(1); - expect(results[0].name).toBe('auth-service'); + // Spec 0.34 / review finding #2: search_projects always returns the + // envelope shape {result, ambiguous, alternatives} so callers iterating + // response.result work regardless of ambiguity outcome. + const env = await callTool(server, 'search_projects', { query: 'authentication' }) as { result: Record[]; ambiguous: boolean; alternatives: unknown[] }; + expect(env.result).toHaveLength(1); + expect(env.result[0].name).toBe('auth-service'); + expect(env.ambiguous).toBe(false); + expect(env.alternatives).toEqual([]); }); it('get_registry_stats returns counts including per-area distribution and unassigned', async () => { @@ -279,6 +284,55 @@ describe('MCP Server (S21)', () => { expect(byKeyword).toHaveLength(1); }); + // ── Vocabulary Tool (S200) ───────────────────────────────── + + it('vocab returns canonical, in_use, and aliases for tech_stack', async () => { + await callTool(server, 'register_project', { name: 'vp-a' }); + await callTool(server, 'write_fields', { + project_name: 'vp-a', + fields: { tech_stack: ['TypeScript', 'SQLite'] }, + }); + await callTool(server, 'register_project', { name: 'vp-b' }); + await callTool(server, 'write_fields', { + project_name: 'vp-b', + fields: { tech_stack: ['ts', 'python'] }, // ts → typescript alias + }); + + const v = await callTool(server, 'vocab', { field: 'tech_stack' }) as Record; + expect(v.field).toBe('tech_stack'); + expect(v.canonical).toEqual(expect.arrayContaining(['typescript', 'sqlite', 'python'])); + // typescript counted twice (vp-a + vp-b's `ts` alias both store 'typescript') + const tsRow = v.in_use.find((r: any) => r.slug === 'typescript'); + expect(tsRow?.count).toBe(2); + const sqliteRow = v.in_use.find((r: any) => r.slug === 'sqlite'); + expect(sqliteRow?.count).toBe(1); + expect(v.aliases.typescript).toEqual(expect.arrayContaining(['ts', 'TS', 'TypeScript'])); + }); + + it('vocab returns capability_type counts across projects', async () => { + await callTool(server, 'register_project', { name: 'vp-c' }); + await callTool(server, 'register_capabilities', { + project_name: 'vp-c', + capabilities: [ + { name: 'tool_a', capability_type: 'MCP tool', description: 'A' }, + { name: 'cmd_b', capability_type: 'cli-command', description: 'B' }, + ], + }); + const v = await callTool(server, 'vocab', { field: 'capability_type' }) as Record; + const toolRow = v.in_use.find((r: any) => r.slug === 'tool'); + const cmdRow = v.in_use.find((r: any) => r.slug === 'command'); + expect(toolRow?.count).toBeGreaterThanOrEqual(1); + expect(cmdRow?.count).toBeGreaterThanOrEqual(1); + expect(v.canonical).toEqual(expect.arrayContaining(['tool', 'command', 'library'])); + }); + + it('vocab rejects unknown field with helpful error', async () => { + const result = await callTool(server, 'vocab', { field: 'unknown_field' }) as string; + expect(typeof result).toBe('string'); + expect(result).toContain('unknown_field'); + expect(result).toContain('tech_stack'); + }); + // ── Memory Tools ─────────────────────────────────────────── it('retain creates a memory and recall retrieves it', async () => { @@ -288,11 +342,13 @@ describe('MCP Server (S21)', () => { expect(retained.memory_id).toBeTruthy(); expect(retained.is_new).toBe(true); - const recalled = await callTool(server, 'recall', { + // Spec 0.34 / review finding #2: recall always returns the envelope + // shape {result, ambiguous, alternatives}. + const env = await callTool(server, 'recall', { query: 'SQLite storage', project: 'my-proj', - }) as Record[]; - expect(recalled.length).toBeGreaterThan(0); - expect(recalled[0].content).toContain('SQLite'); + }) as { result: Record[]; ambiguous: boolean; alternatives: unknown[] }; + expect(env.result.length).toBeGreaterThan(0); + expect(env.result[0].content).toContain('SQLite'); }); it('retain deduplicates identical content', async () => { @@ -355,6 +411,51 @@ describe('MCP Server (S21)', () => { expect(config.embedding_provider).toBe('none'); }); + // Review finding #5: validate the interactions retention knobs at the MCP + // boundary so invalid types don't silently coerce to NaN (which then falls + // back to the 90-day default). + it('configure_memory rejects non-numeric interactions_retention_days', async () => { + const result = await callTool(server, 'configure_memory', { interactions_retention_days: 'forever' as unknown as number }) as string; + expect(typeof result).toBe('string'); + expect(result).toMatch(/configure_memory\.interactions_retention_days/); + expect(result).toMatch(/positive integer/); + }); + + it('configure_memory rejects negative interactions_retention_days', async () => { + const result = await callTool(server, 'configure_memory', { interactions_retention_days: -1 }) as string; + expect(typeof result).toBe('string'); + expect(result).toMatch(/positive integer/); + }); + + it('configure_memory rejects fractional interactions_retention_days', async () => { + const result = await callTool(server, 'configure_memory', { interactions_retention_days: 1.5 }) as string; + expect(typeof result).toBe('string'); + expect(result).toMatch(/must be an integer/); + }); + + it('configure_memory rejects zero interactions_retention_days', async () => { + const result = await callTool(server, 'configure_memory', { interactions_retention_days: 0 }) as string; + expect(typeof result).toBe('string'); + expect(result).toMatch(/positive integer/); + }); + + it('configure_memory rejects NaN interactions_retention_days', async () => { + const result = await callTool(server, 'configure_memory', { interactions_retention_days: Number.NaN }) as string; + expect(typeof result).toBe('string'); + expect(result).toMatch(/finite/); + }); + + it('configure_memory rejects invalid interactions_max_rows_per_project too', async () => { + const result = await callTool(server, 'configure_memory', { interactions_max_rows_per_project: 'huge' as unknown as number }) as string; + expect(typeof result).toBe('string'); + expect(result).toMatch(/configure_memory\.interactions_max_rows_per_project/); + }); + + it('configure_memory accepts a valid positive integer', async () => { + const config = await callTool(server, 'configure_memory', { interactions_retention_days: 30 }) as Record; + expect(config.interactions_retention_days).toBe('30'); + }); + it('reflect runs consolidation cycle', async () => { await callTool(server, 'retain', { content: 'Something to reflect on', type: 'decision' }); @@ -419,6 +520,63 @@ describe('MCP Server (S21)', () => { }) as Record; expect(result.results).toBeTruthy(); expect(result.summary).toBeTruthy(); + // Review finding #2: always envelope-shape — ambiguous + alternatives present. + expect(result).toHaveProperty('ambiguous'); + expect(result).toHaveProperty('alternatives'); + }); + + // Review finding #2: ambiguous-envelope shape consistency across all three + // identity-resolution surfaces. Every call returns {result, ambiguous, + // alternatives} regardless of whether ambiguity triggered. + it('search_projects returns envelope shape even on zero matches', async () => { + const env = await callTool(server, 'search_projects', { query: 'definitely-no-match-here' }) as { result: unknown[]; ambiguous: boolean; alternatives: unknown[] }; + expect(env.result).toEqual([]); + expect(env.ambiguous).toBe(false); + expect(env.alternatives).toEqual([]); + }); + + it('recall returns envelope shape on unambiguous project scope', async () => { + await callTool(server, 'register_project', { name: 'unique-proj' }); + await callTool(server, 'retain', { content: 'x', type: 'pattern', project: 'unique-proj' }); + const env = await callTool(server, 'recall', { query: 'x', project: 'unique-proj' }) as { result: unknown[]; ambiguous: boolean; alternatives: unknown[] }; + expect(env.result).toBeInstanceOf(Array); + expect(env.ambiguous).toBe(false); + expect(env.alternatives).toEqual([]); + }); + + it('recall returns envelope shape with no project scope (bootstrap mode)', async () => { + const env = await callTool(server, 'recall', {}) as { result: unknown[]; ambiguous: boolean; alternatives: unknown[] }; + expect(env.result).toBeInstanceOf(Array); + expect(env.ambiguous).toBe(false); + expect(env.alternatives).toEqual([]); + }); + + // Review finding #8: cross_query alternatives only carry source='registry' + // project names. Memory and cc_memory hits never surface as registered- + // project alternatives. + it('cross_query ambiguity alternatives filter to registry source only', async () => { + // Register two projects whose names are equally close to the query. + await callTool(server, 'register_project', { name: 'similar-svc-a', description: 'integrates with shared-keyword' }); + await callTool(server, 'register_project', { name: 'similar-svc-b', description: 'integrates with shared-keyword' }); + // Retain some 'global' memories that match the query keyword. These must + // NOT show up as alternatives. + await callTool(server, 'retain', { content: 'shared-keyword reference one', type: 'pattern' }); + await callTool(server, 'retain', { content: 'shared-keyword reference two', type: 'pattern' }); + + const result = await callTool(server, 'cross_query', { query: 'shared-keyword', scope: 'all' }) as { results: { source: string; project: string }[]; ambiguous: boolean; alternatives: { name: string }[] }; + // Every alternative name corresponds to a registry-source result. + if (result.ambiguous && result.alternatives.length > 0) { + const registryProjectNames = new Set( + result.results.filter(r => r.source === 'registry').map(r => r.project), + ); + for (const alt of result.alternatives) { + expect(registryProjectNames.has(alt.name)).toBe(true); + } + // 'global' (the memory source's project field) must NOT appear. + for (const alt of result.alternatives) { + expect(alt.name).not.toBe('global'); + } + } }); // ── Health (S69) ──────────────────────────────────────────