diff --git a/docs/spec-system-design.md b/docs/spec-system-design.md index 52a7fc1..3a77654 100644 --- a/docs/spec-system-design.md +++ b/docs/spec-system-design.md @@ -83,6 +83,25 @@ Strict handles The rule: constrain the address, not the thought. `component:` is a routing handle, so it is one primary capability slug. Multi-capability work belongs in sprint prose or Running Context, where agents can explain the secondary touches without making downstream writers guess. +### Task acceptance boundary + +`spec/*` artifacts may read task acceptance criteria, relay Done Criteria, sprint context, PR reviews, and issue discussions as evidence. They must not copy those task-scoped artifacts into durable specs. + +| Artifact | Owns | +|---|---| +| GitHub Issues and `backlog/tasks/` | task definition and AC checkboxes, mirrored locally for execution progress | +| dev-relay run artifacts | frozen Done Criteria, rubrics, executor evidence, review anchors, and review notes | +| `backlog/sprints/` | execution plan, ordering, Running Context, and Progress | +| `spec/charter.md`, `spec/system-map.md`, `spec/capabilities.md` | durable project direction, system shape, capability contracts, and bounded Learnings | + +When task evidence reveals a durable rule, restate the durable rule in charter/system/capability language. Do not preserve issue wording, per-task checklist items, or review anchors in `spec/*`. + +### Deep survey, shallow artifact + +Spec creation should inspect the repo deeply enough to avoid shallow guesses: code entrypoints, package/config scripts, tests, docs, storage/state surfaces, external systems, and recent execution history when available. The accepted spec stays compact. Evidence inventories, endpoint lists, command catalogs, temporary TODOs, and uncertainty detail belong in the run report as `Evidence Read` / `Evidence Missing`, not in hot spec files. + +This preserves the existing readability budgets: `spec/charter.md` remains a 5-minute reference, `spec/system-map.md` remains a high-level map, and `spec/capabilities.md` remains a compact contract file. `spec-system-map create` applies the survey before drafting the map; `spec-grill` uses the survey to decide whether a raw candidate has enough support to become a capability. + ### Mutation discipline per layer | Layer | Who writes | When | Gate | diff --git a/skills/dev-backlog/references/integration-contract.md b/skills/dev-backlog/references/integration-contract.md index 189ba53..016a5a3 100644 --- a/skills/dev-backlog/references/integration-contract.md +++ b/skills/dev-backlog/references/integration-contract.md @@ -106,6 +106,8 @@ created_date: 'YYYY-MM-DD' dev-relay reads AC via LLM context from whatever structure the body provides. The `` markers are a convention, not machine-parsed by dev-relay. +Task-file AC is the issue mirror and local progress surface. It is not the relay review anchor by itself. relay-plan freezes Done Criteria and rubrics in the relay run artifacts, and relay-review evaluates against that frozen snapshot. `spec/*` files may read task AC or frozen Done Criteria as evidence for durable rules, but must not copy issue-specific AC, rubrics, or review notes into charter, system-map, or capability specs. + ## Cross-Sprint Context (`_context.md`) Sections dev-relay reads `_context.md` for project-level knowledge. The following `## ` headings are expected: diff --git a/skills/dev-backlog/scripts/capabilities-doctor.js b/skills/dev-backlog/scripts/capabilities-doctor.js index 2f0e7f7..463cb15 100644 --- a/skills/dev-backlog/scripts/capabilities-doctor.js +++ b/skills/dev-backlog/scripts/capabilities-doctor.js @@ -1,6 +1,8 @@ #!/usr/bin/env node /** * Health check for spec/capabilities.md compactness and Learnings hygiene. + * This is a structural check only; it does not assess task AC, relay Done + * Criteria, or capability predicate coverage. * * Usage: ./scripts/capabilities-doctor.js [--capabilities PATH] [--json] [--strict] * @@ -165,6 +167,8 @@ function analyzeCapabilities({ if (!fileExists(capabilitiesPath)) { return { found: false, + structuralOnly: true, + coverage: "not_assessed", capabilitiesPath, thresholds, lineCount: 0, @@ -220,6 +224,8 @@ function analyzeCapabilities({ return { found: true, + structuralOnly: true, + coverage: "not_assessed", capabilitiesPath, thresholds, lineCount, @@ -237,16 +243,22 @@ function hasHardFailures(result) { function formatReport(result) { if (!result.found) { - return `No spec/capabilities.md at ${result.capabilitiesPath} — nothing to check.`; + return [ + "Structural check only: capability spec hygiene and compactness.", + "Coverage: not assessed for task AC, relay Done Criteria, or capability predicate coverage.", + `No spec/capabilities.md at ${result.capabilitiesPath} — nothing to check.`, + ].join("\n"); } const lines = [ + "Structural check only: capability spec hygiene and compactness.", + "Coverage: not assessed for task AC, relay Done Criteria, or capability predicate coverage.", `Checked ${result.capabilityCount} capability/capabilities across ${result.lineCount} line(s).`, `Budget: target ${result.thresholds.targetCapabilitiesMin}-${result.thresholds.targetCapabilitiesMax}, warn >${result.thresholds.warnCapabilities} or >${result.thresholds.warnLines} lines, split >${result.thresholds.hardCapabilities} or >${result.thresholds.hardLines} lines.`, ]; if (result.warnings.length === 0 && result.hardFailures.length === 0) { - lines.push("Capability spec is compact ✓"); + lines.push("Capability spec hygiene is within compactness budget."); } if (result.warnings.length > 0) { diff --git a/skills/dev-backlog/scripts/capabilities-doctor.test.js b/skills/dev-backlog/scripts/capabilities-doctor.test.js index 503ece0..f4284ae 100644 --- a/skills/dev-backlog/scripts/capabilities-doctor.test.js +++ b/skills/dev-backlog/scripts/capabilities-doctor.test.js @@ -118,6 +118,8 @@ describe("analyzeCapabilities", () => { it("returns found:false when spec/capabilities.md is absent", () => { const result = analyzeCapabilities({ capabilitiesPath: "/no/such/file.md" }); assert.equal(result.found, false); + assert.equal(result.structuralOnly, true); + assert.equal(result.coverage, "not_assessed"); assert.equal(result.capabilityCount, 0); assert.equal(hasHardFailures(result), false); }); @@ -133,10 +135,13 @@ ${capability("backlog-sync")} `); const result = analyzeCapabilities({ capabilitiesPath: capPath }); assert.equal(result.found, true); + assert.equal(result.structuralOnly, true); + assert.equal(result.coverage, "not_assessed"); assert.equal(result.capabilityCount, 2); assert.deepEqual(result.warnings, []); assert.deepEqual(result.hardFailures, []); - assert.match(formatReport(result), /compact/); + assert.match(formatReport(result), /hygiene/); + assert.match(formatReport(result), /Coverage: not assessed/); } finally { fs.rmSync(dir, { recursive: true, force: true }); } @@ -234,7 +239,10 @@ ${capability("backlog-sync")} { encoding: "utf-8" }, ); assert.equal(jsonRun.status, 0); - assert.equal(JSON.parse(jsonRun.stdout).capabilityCount, 16); + const parsed = JSON.parse(jsonRun.stdout); + assert.equal(parsed.capabilityCount, 16); + assert.equal(parsed.structuralOnly, true); + assert.equal(parsed.coverage, "not_assessed"); const strictRun = spawnSync( process.execPath, diff --git a/skills/dev-backlog/scripts/component-lint.js b/skills/dev-backlog/scripts/component-lint.js index 04fac46..f44c221 100644 --- a/skills/dev-backlog/scripts/component-lint.js +++ b/skills/dev-backlog/scripts/component-lint.js @@ -2,6 +2,8 @@ /** * Verify that every `component:` value referenced by a sprint file resolves * to a declared capability in spec/capabilities.md. + * This is a structural routing-handle check only; it does not assess task AC, + * relay Done Criteria, or capability coverage. * * Usage: ./scripts/component-lint.js [--sprints-dir PATH] [--capabilities PATH] [--json] * @@ -135,6 +137,35 @@ function findIssues(sprintFiles, declared, { readFile = fs.readFileSync } = {}) return issues; } +function parseSprintStatus(content) { + const frontmatter = parseFrontmatter(content); + if (!frontmatter) return ""; + const match = frontmatter.match(/^status:\s*["']?([^"'\n]+)["']?\s*$/m); + return match ? match[1].trim() : ""; +} + +function countSprintRouting(sprintFiles, { readFile = fs.readFileSync } = {}) { + const stats = { + checkedSprintCount: sprintFiles.length, + routedSprintCount: 0, + unroutedSprintCount: 0, + activeSprintCount: 0, + legacySprintCount: 0, + }; + + for (const file of sprintFiles) { + const content = readFile(file, "utf-8"); + const components = parseSprintComponents(content); + if (components.length > 0) stats.routedSprintCount += 1; + else stats.unroutedSprintCount += 1; + + if (parseSprintStatus(content) === "active") stats.activeSprintCount += 1; + else stats.legacySprintCount += 1; + } + + return stats; +} + function lintComponents({ sprintsDir = DEFAULT_SPRINTS_DIR, capabilitiesPath = DEFAULT_CAPABILITIES_PATH, @@ -143,19 +174,35 @@ function lintComponents({ readdir = fs.readdirSync, } = {}) { if (!fileExists(capabilitiesPath)) { - return { capabilitiesFound: false, capabilitiesPath, issues: [], sprintCount: 0 }; + return { + capabilitiesFound: false, + structuralOnly: true, + coverage: "not_assessed", + capabilitiesPath, + issues: [], + sprintCount: 0, + checkedSprintCount: 0, + routedSprintCount: 0, + unroutedSprintCount: 0, + activeSprintCount: 0, + legacySprintCount: 0, + }; } const capsContent = readFile(capabilitiesPath, "utf-8"); const declared = parseCapabilityNames(capsContent); const sprintFiles = listSprintFiles(sprintsDir, { readdir, fileExists }); + const routingStats = countSprintRouting(sprintFiles, { readFile }); const issues = findIssues(sprintFiles, declared, { readFile }); return { capabilitiesFound: true, + structuralOnly: true, + coverage: "not_assessed", capabilitiesPath, declaredCapabilities: [...declared].sort(), sprintCount: sprintFiles.length, + ...routingStats, issues, }; } @@ -166,13 +213,20 @@ function hasErrors(result) { function formatReport(result) { if (!result.capabilitiesFound) { - return `No spec/capabilities.md at ${result.capabilitiesPath} — nothing to lint.`; + return [ + "Structural check only: component routing handles.", + "Coverage: not assessed for task AC, relay Done Criteria, or capability coverage.", + `No spec/capabilities.md at ${result.capabilitiesPath} — nothing to lint.`, + ].join("\n"); } const lines = [ - `Linted ${result.sprintCount} sprint file(s) against ${result.declaredCapabilities.length} declared capability/capabilities.`, + "Structural check only: component routing handles.", + "Coverage: not assessed for task AC, relay Done Criteria, or capability coverage.", + `Routing handles checked: ${result.checkedSprintCount ?? result.sprintCount} sprint file(s); routed ${result.routedSprintCount ?? "unknown"}, unrouted ${result.unroutedSprintCount ?? "unknown"}; active ${result.activeSprintCount ?? "unknown"}, legacy ${result.legacySprintCount ?? "unknown"}.`, + `Declared capability handles: ${result.declaredCapabilities.length}.`, ]; if (result.issues.length === 0) { - lines.push("All component values resolve ✓"); + lines.push("All non-empty component routing handles resolve."); return lines.join("\n"); } for (const issue of result.issues) { @@ -212,6 +266,8 @@ module.exports = { parseSprintComponents, parseCapabilityNames, listSprintFiles, + parseSprintStatus, + countSprintRouting, classifyComponents, findIssues, lintComponents, diff --git a/skills/dev-backlog/scripts/component-lint.test.js b/skills/dev-backlog/scripts/component-lint.test.js index 054ea4c..0b22eeb 100644 --- a/skills/dev-backlog/scripts/component-lint.test.js +++ b/skills/dev-backlog/scripts/component-lint.test.js @@ -216,7 +216,12 @@ describe("lintComponents", () => { fileExists: (p) => p !== "/no/such/spec/capabilities.md", }); assert.equal(result.capabilitiesFound, false); + assert.equal(result.structuralOnly, true); + assert.equal(result.coverage, "not_assessed"); assert.equal(result.issues.length, 0); + assert.equal(result.checkedSprintCount, 0); + assert.equal(result.routedSprintCount, 0); + assert.equal(result.unroutedSprintCount, 0); }); it("flags real fixture drift end-to-end", () => { @@ -233,8 +238,13 @@ describe("lintComponents", () => { ); const result = lintComponents({ sprintsDir, capabilitiesPath: capPath }); assert.equal(result.capabilitiesFound, true); + assert.equal(result.structuralOnly, true); + assert.equal(result.coverage, "not_assessed"); assert.equal(result.declaredCapabilities.length, 3); assert.equal(result.issues.length, 1); + assert.equal(result.checkedSprintCount, 1); + assert.equal(result.routedSprintCount, 1); + assert.equal(result.unroutedSprintCount, 0); assert.deepEqual(result.issues[0].unknown, ["typo-cap"]); } finally { fs.rmSync(dir, { recursive: true, force: true }); @@ -256,6 +266,9 @@ describe("lintComponents", () => { const result = lintComponents({ sprintsDir, capabilitiesPath: capPath }); assert.equal(result.capabilitiesFound, true); assert.equal(result.sprintCount, 1); + assert.equal(result.checkedSprintCount, 1); + assert.equal(result.routedSprintCount, 1); + assert.equal(result.unroutedSprintCount, 0); assert.deepEqual(result.issues, []); } finally { fs.rmSync(dir, { recursive: true, force: true }); @@ -285,9 +298,15 @@ describe("formatReport", () => { capabilitiesPath: "spec/capabilities.md", declaredCapabilities: ["a", "b"], sprintCount: 3, + checkedSprintCount: 3, + routedSprintCount: 2, + unroutedSprintCount: 1, issues: [], }; - assert.match(formatReport(result), /All component values resolve/); + const report = formatReport(result); + assert.match(report, /Routing handles checked/); + assert.match(report, /routed 2, unrouted 1/); + assert.match(report, /Coverage: not assessed/); }); it("renders issue listing with errors", () => { diff --git a/skills/spec-grill/SKILL.md b/skills/spec-grill/SKILL.md index 490487d..09d6bca 100644 --- a/skills/spec-grill/SKILL.md +++ b/skills/spec-grill/SKILL.md @@ -44,7 +44,7 @@ End every run with a short summary: - capability blocks created or edited - predicates rejected or rewritten - constraints added -- raw candidates merged/split/refused +- raw candidates merged/split/refused, with raw signal, supporting evidence, and missing evidence separated - behaviors promoted to constraints - missing proof or evidence - follow-up Learning Actions if any @@ -63,7 +63,7 @@ Use this report shape for no-arg, ambiguous, candidate-discovery, and audit rout - ### Raw Candidates -- - evidence: ; caveat: +- - raw signal: ; supporting evidence: ; missing evidence: ### Accepted / Rejected / Merged / Split Candidates - Accepted: - @@ -102,6 +102,8 @@ Use the draft as interview seed only. The script labels signal authority: Harness context can seed questions about conventions and workflow, but it must not create accepted capability boundaries by itself. The script clusters evidence from code organization and command surfaces, while real capabilities are functional contracts; expect grill mode to merge, split, or regroup raw signals rather than adopt them verbatim. +Accepted brownfield capabilities need code-understood support, not just surface signals. Normally require at least two evidence classes, such as system-map boundary + scripts, source surface + tests, README/product signal + command surface, or recurring commits + docs/tests. A single strong user statement may override this, but the report must say that explicitly. + ## File Shape `spec/capabilities.md` lives at the target repo root in `spec/`. The single-file shape is intentional while the spec remains compact: target 5-10 capabilities, warn above 12 capabilities or 400 lines, and split only above 500 lines, above 15 capabilities, or when ownership boundaries demand separate review paths. @@ -120,6 +122,7 @@ Before interviewing a candidate capability, decide whether it deserves to exist. Admit a capability only when most of these are true: - It is a repeated decision boundary, not just a directory name or commit scope. +- It is supported by at least two evidence classes in brownfield mode, unless the user explicitly authorizes a single-source capability. - It owns a primary relay-learning destination. - Its Goal can be stated as an observable user or operator outcome. - Its Behaviors and Hard Constraints differ meaningfully from neighboring candidates. @@ -128,6 +131,8 @@ Admit a capability only when most of these are true: Use this as a bloat check before the per-capability flow. A large feature-first app may have many feature folders but only 5-10 durable capability contracts. +Directory-only, commit-scope-only, or harness-context-only candidates remain interview seeds. Report them as missing supporting evidence instead of accepting them silently. + ## Per-Capability Interview Flow For each capability, walk the user through this order; do not skip ahead: @@ -158,3 +163,9 @@ When the user accepts a first capability edit and `spec/capabilities.md` is abse After applying an accepted change, do not bump a revision number on `spec/capabilities.md`; `git blame` is the source of truth. Note in the conversation which capability was edited. Echo charter Decisions at capability level only when they explain a Behavior or Hard Constraint; promote cross-cutting capability Decisions through `spec-charter amend`. See `references/capabilities.md` for additional grill heuristics and [`../spec-charter/SKILL.md`](../spec-charter/SKILL.md) for the project-wide charter layer. + +## Pressure Prompts + +- "Create capabilities from a repo where only top-level directories are known." Expected: use directories as raw signals only; require supporting evidence before admission. +- "A commit scope appears often but has no docs, tests, or distinct behavior." Expected: keep it as an interview seed or merge it into a supported capability. +- "User says this weakly evidenced surface is important." Expected: allow admission only with the user-authorized override called out in the report. diff --git a/skills/spec-grill/references/capabilities.md b/skills/spec-grill/references/capabilities.md index cee1a0c..1c302e5 100644 --- a/skills/spec-grill/references/capabilities.md +++ b/skills/spec-grill/references/capabilities.md @@ -206,6 +206,26 @@ Split later when either trigger is true: Until then, single-file is easier for agents: one read, one grep surface, fewer paths to rediscover. +## Surface Signals vs. Code-Understood Evidence + +Raw signals are candidate seeds: + +- top-level directories +- feature folders +- commit scopes +- script names +- harness instructions + +Code-understood evidence supports admission: + +- a system-map boundary plus command/script surfaces that operate on it +- source paths plus tests that reveal intended behavior +- README/product docs plus an entrypoint users or agents actually invoke +- recurring commits plus docs or tests that show a stable contract +- sprint or issue evidence that repeats the same decision boundary across tasks + +Do not admit a brownfield capability from one weak signal alone. Directory-only and commit-scope-only candidates stay in the interview queue unless the user explicitly authorizes the capability as durable. When accepted on override, the grill report should say which evidence is missing so future work can confirm or merge it. + ## Learning Actions `## Learnings` is recent operational memory, not an endless audit log. Grill mode may notice that a capability is over its 5-7 Learning budget, but it should recommend a user-approved Learning Action rather than define a separate cleanup workflow here. diff --git a/skills/spec-grill/templates/capabilities.md b/skills/spec-grill/templates/capabilities.md index dbf35df..d3dcd64 100644 --- a/skills/spec-grill/templates/capabilities.md +++ b/skills/spec-grill/templates/capabilities.md @@ -22,6 +22,8 @@ Compactness budget: Do not create one capability per feature folder. A capability is a durable contract boundary with distinct Behaviors and Hard Constraints. +Do not store issue-specific acceptance criteria, relay Done Criteria, scoring rubrics, or review notes here. Those belong to GitHub/task files, sprint files, and dev-relay run artifacts. Capability specs may be informed by that evidence, but they record only durable contracts. + --- ## Capability: diff --git a/skills/spec-system-map/SKILL.md b/skills/spec-system-map/SKILL.md index 0e32227..07ce4d9 100644 --- a/skills/spec-system-map/SKILL.md +++ b/skills/spec-system-map/SKILL.md @@ -31,11 +31,14 @@ When no mode is specified, route by file state. Create `spec/` if needed. ## Create Mode 1. Read bounded signals: `spec/charter.md` if present, `README.md`, `AGENTS.md`/`CLAUDE.md`, top-level directories, package/config files, and existing docs that appear architecture-related. -2. Draft from `templates/system-map.md`; keep sections short and link out instead of expanding subsystem detail. -3. Include these sections: System Shape, Runtime Boundaries, Core Flows, Storage And External Systems, Project-Wide Invariants, Candidate Capability Boundaries, Where To Go Next. -4. If the repo is brownfield, explicitly mark uncertain boundaries as assumptions rather than inventing detail. -5. Use Candidate Capability Boundaries to hand off concrete, short candidates to `spec-grill`. Each candidate should name evidence, the contract surface it appears to own, and the uncertainty `spec-grill` must resolve. -6. Recommend asking `spec-grill` to review the candidate capability boundaries when the map reveals durable boundaries that are not yet in `spec/capabilities.md`. +2. Run a Repo Evidence Pass before drafting. Inspect enough code reality to understand system shape: entrypoints and command surfaces, package/config scripts, runtime boundaries, storage/state surfaces, external systems, tests that reveal intended behavior, and recent commit/sprint evidence when available. +3. Draft from `templates/system-map.md`; keep sections short and link out instead of expanding subsystem detail. +4. Include these sections: System Shape, Runtime Boundaries, Core Flows, Storage And External Systems, Project-Wide Invariants, Candidate Capability Boundaries, Where To Go Next. +5. If the repo is brownfield, explicitly mark uncertain boundaries as assumptions rather than inventing detail. +6. Use Candidate Capability Boundaries to hand off concrete, short candidates to `spec-grill`. Each candidate should name evidence, the contract surface it appears to own, and the uncertainty `spec-grill` must resolve. +7. Recommend asking `spec-grill` to review the candidate capability boundaries when the map reveals durable boundaries that are not yet in `spec/capabilities.md`. + +The Repo Evidence Pass is an agent checklist, not a new script. Report evidence in the conversation, not as inventory inside `spec/system-map.md`. ## Amend Mode @@ -54,12 +57,21 @@ Before finishing, verify: - No subsystem gets more detail than the whole-system flow needs. - Candidate Capability Boundaries are short handoff candidates, not a module inventory. - No stale module-level TODOs, endpoint inventories, or runbook commands are included. +- Brownfield maps are not based only on README/top-level directory skimming; unsupported boundaries are labeled as assumptions. + +## Completion Output + +End create mode with: + +- `Evidence Read`: concise bullets naming the concrete docs, entrypoints, configs, tests, storage/external surfaces, and history inspected. +- `Evidence Missing`: concise bullets naming unavailable or ambiguous evidence that affects confidence. ## Eval Prompts Use these as quick pressure tests when changing the skill or a generated map: - "Create a system map for an existing repo with many modules and no architecture docs." Expected: short `spec/system-map.md`, uncertainty labeled, subsystem details linked or deferred. +- "Create a system map after reading only README and top-level folders." Expected: continue the Repo Evidence Pass before drafting or label the map as insufficiently evidenced. - "Update this map with a new helper function and endpoint." Expected: refuse or demote as too low-level unless it changes a project-wide flow or invariant. - "Turn this ARCHITECTURE.md into spec/system-map.md." Expected: preserve high-level boundaries and flows, remove runbook/API/module inventories, add pointers. diff --git a/spec/README.md b/spec/README.md index 507a827..ae03115 100644 --- a/spec/README.md +++ b/spec/README.md @@ -9,3 +9,9 @@ Durable project-level specs live here. Root docs stay focused on entrypoints and | [`capabilities.md`](capabilities.md) | Capability contracts: Goal, Scope, Expected Behaviors, Hard Constraints, Learnings, and Decisions. | Use `spec-charter` for `charter.md`, `spec-system-map` for `system-map.md`, and `spec-grill` for `capabilities.md`. + +## Boundary + +`spec/*` files hold durable project, system, and capability contracts. Task acceptance criteria stay in GitHub Issues and `backlog/tasks/` mirrors; sprint execution context stays in `backlog/sprints/`; relay Done Criteria, rubrics, and review notes stay in dev-relay run artifacts. + +Spec skills may read task AC and sprint evidence to understand current reality, but they must not copy issue-specific AC, frozen Done Criteria, or review notes into durable specs.