Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions docs/spec-system-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,25 @@ Strict handles

The rule: constrain the address, not the thought. `component:` is a routing handle, so it is one primary capability slug. Multi-capability work belongs in sprint prose or Running Context, where agents can explain the secondary touches without making downstream writers guess.

### Task acceptance boundary

`spec/*` artifacts may read task acceptance criteria, relay Done Criteria, sprint context, PR reviews, and issue discussions as evidence. They must not copy those task-scoped artifacts into durable specs.

| Artifact | Owns |
|---|---|
| GitHub Issues and `backlog/tasks/` | task definition and AC checkboxes, mirrored locally for execution progress |
| dev-relay run artifacts | frozen Done Criteria, rubrics, executor evidence, review anchors, and review notes |
| `backlog/sprints/` | execution plan, ordering, Running Context, and Progress |
| `spec/charter.md`, `spec/system-map.md`, `spec/capabilities.md` | durable project direction, system shape, capability contracts, and bounded Learnings |

When task evidence reveals a durable rule, restate the durable rule in charter/system/capability language. Do not preserve issue wording, per-task checklist items, or review anchors in `spec/*`.

### Deep survey, shallow artifact

Spec creation should inspect the repo deeply enough to avoid shallow guesses: code entrypoints, package/config scripts, tests, docs, storage/state surfaces, external systems, and recent execution history when available. The accepted spec stays compact. Evidence inventories, endpoint lists, command catalogs, temporary TODOs, and uncertainty detail belong in the run report as `Evidence Read` / `Evidence Missing`, not in hot spec files.

This preserves the existing readability budgets: `spec/charter.md` remains a 5-minute reference, `spec/system-map.md` remains a high-level map, and `spec/capabilities.md` remains a compact contract file. `spec-system-map create` applies the survey before drafting the map; `spec-grill` uses the survey to decide whether a raw candidate has enough support to become a capability.

### Mutation discipline per layer

| Layer | Who writes | When | Gate |
Expand Down
2 changes: 2 additions & 0 deletions skills/dev-backlog/references/integration-contract.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,8 @@ created_date: 'YYYY-MM-DD'

dev-relay reads AC via LLM context from whatever structure the body provides. The `<!-- AC:BEGIN/END -->` markers are a convention, not machine-parsed by dev-relay.

Task-file AC is the issue mirror and local progress surface. It is not the relay review anchor by itself. relay-plan freezes Done Criteria and rubrics in the relay run artifacts, and relay-review evaluates against that frozen snapshot. `spec/*` files may read task AC or frozen Done Criteria as evidence for durable rules, but must not copy issue-specific AC, rubrics, or review notes into charter, system-map, or capability specs.

## Cross-Sprint Context (`_context.md`) Sections

dev-relay reads `_context.md` for project-level knowledge. The following `## ` headings are expected:
Expand Down
16 changes: 14 additions & 2 deletions skills/dev-backlog/scripts/capabilities-doctor.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#!/usr/bin/env node
/**
* Health check for spec/capabilities.md compactness and Learnings hygiene.
* This is a structural check only; it does not assess task AC, relay Done
* Criteria, or capability predicate coverage.
*
* Usage: ./scripts/capabilities-doctor.js [--capabilities PATH] [--json] [--strict]
*
Expand Down Expand Up @@ -165,6 +167,8 @@ function analyzeCapabilities({
if (!fileExists(capabilitiesPath)) {
return {
found: false,
structuralOnly: true,
coverage: "not_assessed",
capabilitiesPath,
thresholds,
lineCount: 0,
Expand Down Expand Up @@ -220,6 +224,8 @@ function analyzeCapabilities({

return {
found: true,
structuralOnly: true,
coverage: "not_assessed",
capabilitiesPath,
thresholds,
lineCount,
Expand All @@ -237,16 +243,22 @@ function hasHardFailures(result) {

function formatReport(result) {
if (!result.found) {
return `No spec/capabilities.md at ${result.capabilitiesPath} — nothing to check.`;
return [
"Structural check only: capability spec hygiene and compactness.",
"Coverage: not assessed for task AC, relay Done Criteria, or capability predicate coverage.",
`No spec/capabilities.md at ${result.capabilitiesPath} — nothing to check.`,
].join("\n");
}

const lines = [
"Structural check only: capability spec hygiene and compactness.",
"Coverage: not assessed for task AC, relay Done Criteria, or capability predicate coverage.",
`Checked ${result.capabilityCount} capability/capabilities across ${result.lineCount} line(s).`,
`Budget: target ${result.thresholds.targetCapabilitiesMin}-${result.thresholds.targetCapabilitiesMax}, warn >${result.thresholds.warnCapabilities} or >${result.thresholds.warnLines} lines, split >${result.thresholds.hardCapabilities} or >${result.thresholds.hardLines} lines.`,
];

if (result.warnings.length === 0 && result.hardFailures.length === 0) {
lines.push("Capability spec is compact ✓");
lines.push("Capability spec hygiene is within compactness budget.");
}

if (result.warnings.length > 0) {
Expand Down
12 changes: 10 additions & 2 deletions skills/dev-backlog/scripts/capabilities-doctor.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,8 @@ describe("analyzeCapabilities", () => {
it("returns found:false when spec/capabilities.md is absent", () => {
const result = analyzeCapabilities({ capabilitiesPath: "/no/such/file.md" });
assert.equal(result.found, false);
assert.equal(result.structuralOnly, true);
assert.equal(result.coverage, "not_assessed");
assert.equal(result.capabilityCount, 0);
assert.equal(hasHardFailures(result), false);
});
Expand All @@ -133,10 +135,13 @@ ${capability("backlog-sync")}
`);
const result = analyzeCapabilities({ capabilitiesPath: capPath });
assert.equal(result.found, true);
assert.equal(result.structuralOnly, true);
assert.equal(result.coverage, "not_assessed");
assert.equal(result.capabilityCount, 2);
assert.deepEqual(result.warnings, []);
assert.deepEqual(result.hardFailures, []);
assert.match(formatReport(result), /compact/);
assert.match(formatReport(result), /hygiene/);
assert.match(formatReport(result), /Coverage: not assessed/);
} finally {
fs.rmSync(dir, { recursive: true, force: true });
}
Expand Down Expand Up @@ -234,7 +239,10 @@ ${capability("backlog-sync")}
{ encoding: "utf-8" },
);
assert.equal(jsonRun.status, 0);
assert.equal(JSON.parse(jsonRun.stdout).capabilityCount, 16);
const parsed = JSON.parse(jsonRun.stdout);
assert.equal(parsed.capabilityCount, 16);
assert.equal(parsed.structuralOnly, true);
assert.equal(parsed.coverage, "not_assessed");

const strictRun = spawnSync(
process.execPath,
Expand Down
64 changes: 60 additions & 4 deletions skills/dev-backlog/scripts/component-lint.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
/**
* Verify that every `component:` value referenced by a sprint file resolves
* to a declared capability in spec/capabilities.md.
* This is a structural routing-handle check only; it does not assess task AC,
* relay Done Criteria, or capability coverage.
*
* Usage: ./scripts/component-lint.js [--sprints-dir PATH] [--capabilities PATH] [--json]
*
Expand Down Expand Up @@ -135,6 +137,35 @@ function findIssues(sprintFiles, declared, { readFile = fs.readFileSync } = {})
return issues;
}

function parseSprintStatus(content) {
const frontmatter = parseFrontmatter(content);
if (!frontmatter) return "";
const match = frontmatter.match(/^status:\s*["']?([^"'\n]+)["']?\s*$/m);
return match ? match[1].trim() : "";
}

function countSprintRouting(sprintFiles, { readFile = fs.readFileSync } = {}) {
const stats = {
checkedSprintCount: sprintFiles.length,
routedSprintCount: 0,
unroutedSprintCount: 0,
activeSprintCount: 0,
legacySprintCount: 0,
};

for (const file of sprintFiles) {
const content = readFile(file, "utf-8");
const components = parseSprintComponents(content);
if (components.length > 0) stats.routedSprintCount += 1;
else stats.unroutedSprintCount += 1;

if (parseSprintStatus(content) === "active") stats.activeSprintCount += 1;
else stats.legacySprintCount += 1;
}

return stats;
}

function lintComponents({
sprintsDir = DEFAULT_SPRINTS_DIR,
capabilitiesPath = DEFAULT_CAPABILITIES_PATH,
Expand All @@ -143,19 +174,35 @@ function lintComponents({
readdir = fs.readdirSync,
} = {}) {
if (!fileExists(capabilitiesPath)) {
return { capabilitiesFound: false, capabilitiesPath, issues: [], sprintCount: 0 };
return {
capabilitiesFound: false,
structuralOnly: true,
coverage: "not_assessed",
capabilitiesPath,
issues: [],
sprintCount: 0,
checkedSprintCount: 0,
routedSprintCount: 0,
unroutedSprintCount: 0,
activeSprintCount: 0,
legacySprintCount: 0,
};
}
const capsContent = readFile(capabilitiesPath, "utf-8");
const declared = parseCapabilityNames(capsContent);

const sprintFiles = listSprintFiles(sprintsDir, { readdir, fileExists });
const routingStats = countSprintRouting(sprintFiles, { readFile });
const issues = findIssues(sprintFiles, declared, { readFile });

return {
capabilitiesFound: true,
structuralOnly: true,
coverage: "not_assessed",
capabilitiesPath,
declaredCapabilities: [...declared].sort(),
sprintCount: sprintFiles.length,
...routingStats,
issues,
};
}
Expand All @@ -166,13 +213,20 @@ function hasErrors(result) {

function formatReport(result) {
if (!result.capabilitiesFound) {
return `No spec/capabilities.md at ${result.capabilitiesPath} — nothing to lint.`;
return [
"Structural check only: component routing handles.",
"Coverage: not assessed for task AC, relay Done Criteria, or capability coverage.",
`No spec/capabilities.md at ${result.capabilitiesPath} — nothing to lint.`,
].join("\n");
}
const lines = [
`Linted ${result.sprintCount} sprint file(s) against ${result.declaredCapabilities.length} declared capability/capabilities.`,
"Structural check only: component routing handles.",
"Coverage: not assessed for task AC, relay Done Criteria, or capability coverage.",
`Routing handles checked: ${result.checkedSprintCount ?? result.sprintCount} sprint file(s); routed ${result.routedSprintCount ?? "unknown"}, unrouted ${result.unroutedSprintCount ?? "unknown"}; active ${result.activeSprintCount ?? "unknown"}, legacy ${result.legacySprintCount ?? "unknown"}.`,
`Declared capability handles: ${result.declaredCapabilities.length}.`,
];
if (result.issues.length === 0) {
lines.push("All component values resolve ✓");
lines.push("All non-empty component routing handles resolve.");
return lines.join("\n");
}
for (const issue of result.issues) {
Expand Down Expand Up @@ -212,6 +266,8 @@ module.exports = {
parseSprintComponents,
parseCapabilityNames,
listSprintFiles,
parseSprintStatus,
countSprintRouting,
classifyComponents,
findIssues,
lintComponents,
Expand Down
21 changes: 20 additions & 1 deletion skills/dev-backlog/scripts/component-lint.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,12 @@ describe("lintComponents", () => {
fileExists: (p) => p !== "/no/such/spec/capabilities.md",
});
assert.equal(result.capabilitiesFound, false);
assert.equal(result.structuralOnly, true);
assert.equal(result.coverage, "not_assessed");
assert.equal(result.issues.length, 0);
assert.equal(result.checkedSprintCount, 0);
assert.equal(result.routedSprintCount, 0);
assert.equal(result.unroutedSprintCount, 0);
});

it("flags real fixture drift end-to-end", () => {
Expand All @@ -233,8 +238,13 @@ describe("lintComponents", () => {
);
const result = lintComponents({ sprintsDir, capabilitiesPath: capPath });
assert.equal(result.capabilitiesFound, true);
assert.equal(result.structuralOnly, true);
assert.equal(result.coverage, "not_assessed");
assert.equal(result.declaredCapabilities.length, 3);
assert.equal(result.issues.length, 1);
assert.equal(result.checkedSprintCount, 1);
assert.equal(result.routedSprintCount, 1);
assert.equal(result.unroutedSprintCount, 0);
assert.deepEqual(result.issues[0].unknown, ["typo-cap"]);
} finally {
fs.rmSync(dir, { recursive: true, force: true });
Expand All @@ -256,6 +266,9 @@ describe("lintComponents", () => {
const result = lintComponents({ sprintsDir, capabilitiesPath: capPath });
assert.equal(result.capabilitiesFound, true);
assert.equal(result.sprintCount, 1);
assert.equal(result.checkedSprintCount, 1);
assert.equal(result.routedSprintCount, 1);
assert.equal(result.unroutedSprintCount, 0);
assert.deepEqual(result.issues, []);
} finally {
fs.rmSync(dir, { recursive: true, force: true });
Expand Down Expand Up @@ -285,9 +298,15 @@ describe("formatReport", () => {
capabilitiesPath: "spec/capabilities.md",
declaredCapabilities: ["a", "b"],
sprintCount: 3,
checkedSprintCount: 3,
routedSprintCount: 2,
unroutedSprintCount: 1,
issues: [],
};
assert.match(formatReport(result), /All component values resolve/);
const report = formatReport(result);
assert.match(report, /Routing handles checked/);
assert.match(report, /routed 2, unrouted 1/);
assert.match(report, /Coverage: not assessed/);
});

it("renders issue listing with errors", () => {
Expand Down
15 changes: 13 additions & 2 deletions skills/spec-grill/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ End every run with a short summary:
- capability blocks created or edited
- predicates rejected or rewritten
- constraints added
- raw candidates merged/split/refused
- raw candidates merged/split/refused, with raw signal, supporting evidence, and missing evidence separated
- behaviors promoted to constraints
- missing proof or evidence
- follow-up Learning Actions if any
Expand All @@ -63,7 +63,7 @@ Use this report shape for no-arg, ambiguous, candidate-discovery, and audit rout
- <missing charter/system-map/tests/docs/surface that weakens confidence>

### Raw Candidates
- <candidate> - evidence: <signals>; caveat: <why it is not accepted yet>
- <candidate> - raw signal: <surface>; supporting evidence: <docs/code/tests/history>; missing evidence: <gap>

### Accepted / Rejected / Merged / Split Candidates
- Accepted: <candidate> - <reason>
Expand Down Expand Up @@ -102,6 +102,8 @@ Use the draft as interview seed only. The script labels signal authority:

Harness context can seed questions about conventions and workflow, but it must not create accepted capability boundaries by itself. The script clusters evidence from code organization and command surfaces, while real capabilities are functional contracts; expect grill mode to merge, split, or regroup raw signals rather than adopt them verbatim.

Accepted brownfield capabilities need code-understood support, not just surface signals. Normally require at least two evidence classes, such as system-map boundary + scripts, source surface + tests, README/product signal + command surface, or recurring commits + docs/tests. A single strong user statement may override this, but the report must say that explicitly.

## File Shape

`spec/capabilities.md` lives at the target repo root in `spec/`. The single-file shape is intentional while the spec remains compact: target 5-10 capabilities, warn above 12 capabilities or 400 lines, and split only above 500 lines, above 15 capabilities, or when ownership boundaries demand separate review paths.
Expand All @@ -120,6 +122,7 @@ Before interviewing a candidate capability, decide whether it deserves to exist.
Admit a capability only when most of these are true:

- It is a repeated decision boundary, not just a directory name or commit scope.
- It is supported by at least two evidence classes in brownfield mode, unless the user explicitly authorizes a single-source capability.
- It owns a primary relay-learning destination.
- Its Goal can be stated as an observable user or operator outcome.
- Its Behaviors and Hard Constraints differ meaningfully from neighboring candidates.
Expand All @@ -128,6 +131,8 @@ Admit a capability only when most of these are true:

Use this as a bloat check before the per-capability flow. A large feature-first app may have many feature folders but only 5-10 durable capability contracts.

Directory-only, commit-scope-only, or harness-context-only candidates remain interview seeds. Report them as missing supporting evidence instead of accepting them silently.

## Per-Capability Interview Flow

For each capability, walk the user through this order; do not skip ahead:
Expand Down Expand Up @@ -158,3 +163,9 @@ When the user accepts a first capability edit and `spec/capabilities.md` is abse
After applying an accepted change, do not bump a revision number on `spec/capabilities.md`; `git blame` is the source of truth. Note in the conversation which capability was edited. Echo charter Decisions at capability level only when they explain a Behavior or Hard Constraint; promote cross-cutting capability Decisions through `spec-charter amend`.

See `references/capabilities.md` for additional grill heuristics and [`../spec-charter/SKILL.md`](../spec-charter/SKILL.md) for the project-wide charter layer.

## Pressure Prompts

- "Create capabilities from a repo where only top-level directories are known." Expected: use directories as raw signals only; require supporting evidence before admission.
- "A commit scope appears often but has no docs, tests, or distinct behavior." Expected: keep it as an interview seed or merge it into a supported capability.
- "User says this weakly evidenced surface is important." Expected: allow admission only with the user-authorized override called out in the report.
20 changes: 20 additions & 0 deletions skills/spec-grill/references/capabilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,26 @@ Split later when either trigger is true:

Until then, single-file is easier for agents: one read, one grep surface, fewer paths to rediscover.

## Surface Signals vs. Code-Understood Evidence

Raw signals are candidate seeds:

- top-level directories
- feature folders
- commit scopes
- script names
- harness instructions

Code-understood evidence supports admission:

- a system-map boundary plus command/script surfaces that operate on it
- source paths plus tests that reveal intended behavior
- README/product docs plus an entrypoint users or agents actually invoke
- recurring commits plus docs or tests that show a stable contract
- sprint or issue evidence that repeats the same decision boundary across tasks

Do not admit a brownfield capability from one weak signal alone. Directory-only and commit-scope-only candidates stay in the interview queue unless the user explicitly authorizes the capability as durable. When accepted on override, the grill report should say which evidence is missing so future work can confirm or merge it.

## Learning Actions

`## Learnings` is recent operational memory, not an endless audit log. Grill mode may notice that a capability is over its 5-7 Learning budget, but it should recommend a user-approved Learning Action rather than define a separate cleanup workflow here.
Expand Down
2 changes: 2 additions & 0 deletions skills/spec-grill/templates/capabilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ Compactness budget:

Do not create one capability per feature folder. A capability is a durable contract boundary with distinct Behaviors and Hard Constraints.

Do not store issue-specific acceptance criteria, relay Done Criteria, scoring rubrics, or review notes here. Those belong to GitHub/task files, sprint files, and dev-relay run artifacts. Capability specs may be informed by that evidence, but they record only durable contracts.

---

## Capability: <slug>
Expand Down
Loading
Loading