Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
{
"name": "claude-code-toolkit",
"source": "./",
"description": "Skills: scaffold-mcp (generate a tested MCP server), verify-mcp (check an MCP is green + well-formed with a health report), public-ready (audit + publish a repo), coverage-audit (audit test coverage for negative space, with a completeness validator), handoff (write a PICKUP handoff), design-note (write a structured design note), note (frictionless auto-categorized note-to-self), failure-scan (review a diff against a failure-mode catalog). Plus hooks: failure-mode (auto-flags risky patterns, e.g. unpaginated DynamoDB scans), context-alert (warns when session context crosses a threshold and offers a handoff checkpoint), and coverage-nudge (flags source changes with no test changes and offers a coverage audit).",
"version": "0.7.1",
"description": "Skills: scaffold-mcp (generate a tested MCP server), verify-mcp (check an MCP is green + well-formed with a health report), public-ready (audit + publish a repo), coverage-audit (audit test coverage for negative space, with a completeness validator), handoff (write a PICKUP handoff), design-note (write a structured design note), orchestrate (recommend a good use of subagents), note (frictionless auto-categorized note-to-self), failure-scan (review a diff against a failure-mode catalog). Plus hooks: failure-mode (auto-flags risky patterns, e.g. unpaginated DynamoDB scans), context-alert (warns when session context crosses a threshold and offers a handoff checkpoint), subagent-nudge (flags parallelizable tasks and offers an orchestration plan), and coverage-nudge (flags source changes with no test changes and offers a coverage audit).",
"version": "0.9.0",
"license": "MIT",
"keywords": [
"mcp",
Expand Down
8 changes: 5 additions & 3 deletions .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"name": "claude-code-toolkit",
"displayName": "Claude Code Toolkit",
"description": "Skills to scaffold MCP servers, verify an MCP is green and well-formed, audit repos for public release, audit test coverage for negative space, write session handoffs, write structured design notes, capture quick notes-to-self, and review a diff against a failure-mode catalog — plus hooks that auto-flag risky patterns, alert when session context crosses a threshold (offering a handoff checkpoint), and nudge when source changes outpace test changes (offering a coverage audit).",
"version": "0.8.1",
"description": "Skills to scaffold MCP servers, verify an MCP is green and well-formed, audit repos for public release, audit test coverage for negative space, write session handoffs, write structured design notes, recommend a good use of subagents, capture quick notes-to-self, and review a diff against a failure-mode catalog — plus hooks that auto-flag risky patterns, alert when session context crosses a threshold (offering a handoff checkpoint), nudge when a task looks parallelizable (offering an orchestration plan), and nudge when source changes outpace test changes (offering a coverage audit).",
"version": "0.9.0",
"author": {
"name": "Megan Schott"
},
Expand All @@ -21,6 +21,8 @@
"design",
"notes",
"coverage",
"testing"
"testing",
"subagents",
"orchestration"
]
}
2 changes: 2 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ jobs:
run: python3 context-alert/test_hook.py
- name: Coverage-nudge hook regression test
run: python3 coverage-nudge/test_hook.py
- name: Subagent-nudge hook regression test
run: python3 subagent-nudge/test_hook.py
- name: Coverage-audit checklist validator test
run: python3 skills/coverage-audit/test_checklist.py
- name: Manifests are valid JSON
Expand Down
36 changes: 35 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ agents and MCP servers.
| [handoff](skills/handoff/) | `/handoff` | Write a `PICKUP-*.md` session handoff (task, what's done, what's in flight, next steps, key paths, working norms) so a fresh session resumes cold. Stays local, never committed. |
| [failure-scan](skills/failure-scan/) | `/failure-scan` | Review the current diff against the failure-mode catalog — the judgment classes a hook can't catch (plausible-but-wrong AI output, biased per-segment stats, dispatch-order coupling) — reporting concrete risks with each class's verify questions and fix pattern. |
| [design-note](skills/design-note/) | `/design-note` | Write a structured `<feature>-DESIGN.md` before building — problem, the core reframe, the design with options + tradeoffs, edge cases, and open questions to decide at build time. |
| [orchestrate](skills/orchestrate/) | `/orchestrate` | Recommend a good use of subagents — walk a 2-level decision tree (inline / single agent / multi-agent → pattern), defend item independence before any fan-out, hand back a copy-paste Agent/Workflow snippet, and run it only on your go-ahead. |
| [note](skills/note/) | `/note` | Frictionless note-to-self — dump a thought and it's auto-classified into a bucket (`note`/`skill`/`mcp`/`todo`) under `~/.claude/notes/`, no category-picking. Bare `/note` lists everything; `/note <bucket>` lists one. |
| [coverage-audit](skills/coverage-audit/) | `/coverage-audit` | Audit a project's coverage for **negative space** — run line coverage, then judge the misses against the gaps agent-written suites predictably leave (empty, boundary, error paths, scale/pagination, time, untested adapters, tests that can't fail). Reports the top gaps ranked by silent-failure risk + the cheapest tests to add, then proves its own completeness with a shipped validator (`checklist.py`). Verify proves it runs; coverage-audit proves it's protected. |

Expand Down Expand Up @@ -144,6 +145,36 @@ fail).
`coverage-nudge/test_hook.py` and `skills/coverage-audit/test_checklist.py`
keep both halves honest.

## Subagent-nudge hook + `/orchestrate`

The mechanisms for delegation (Agent, Workflow) exist and Claude is prompted to
use them — but nothing *proactively suggests* "this task is subagent-shaped,"
and the judgment of **whether to fan out, and in what shape** lives implicitly
in the model's head. This pair externalizes it, in the toolkit's usual
nudge-hook + judgment-skill shape.

- **A UserPromptSubmit hook** (auto-registered on install) that spots a
subagent-shaped prompt — breadth/decomposition signals like "across the whole
codebase", "for each X", "find/fix all Y", "comprehensive". Pure regex on the
prompt text (no FS scan, no LLM call, ~no latency), **flag-for-review** not
prescriptive ("this *might* parallelize; if the items aren't independent,
ignore me"), fires **once per session**, and is fully silenceable with
`SUBAGENT_NUDGE_DISABLED`. It doesn't orchestrate — it points Claude at
`/orchestrate`.
- **`/orchestrate`** is the judgment half: a **2-level decision tree** — Level 1
the quick call (inline / single agent / multi-agent), Level 2 the pattern
(parallel-read, worktree-mutate, pipeline, loop-until-dry, adversarial-verify,
judge-panel). It **advises → offers → executes only on your go-ahead**, hands
back a copy-paste Agent/Workflow snippet, and surfaces the rough token cost +
Workflow opt-in.
- Its load-bearing rule: **parallelism is only valid over INDEPENDENT items.**
The skill must state *why* items are independent before recommending fan-out;
if it can't, it picks a pipeline or stays inline. This is pinned to the
`parallel-without-independence` entry in the failure-mode catalog.

`subagent-nudge/test_hook.py` keeps the trigger boundary honest — breadth fires,
single-target prompts (`audit this function`) stay silent.

## Install (as a plugin)

```text
Expand Down Expand Up @@ -181,14 +212,17 @@ claude-code-toolkit/
design-note/ SKILL.md — write a <feature>-DESIGN.md
note/ SKILL.md — frictionless note-to-self capture
coverage-audit/ SKILL.md · checklist.py · test_checklist.py
orchestrate/ SKILL.md — the subagent-orchestration decision tree
hooks/
hooks.json registers the failure-mode, context-alert + coverage-nudge hooks
hooks.json registers the failure-mode, context-alert, subagent-nudge + coverage-nudge hooks
failure-modes/
catalog.md the bug-class catalog (the shared brain)
rules.json grep-able rules the hook runs
hook.py · test_hook.py
context-alert/
hook.py · test_hook.py context-threshold alert -> offers /handoff
subagent-nudge/
hook.py · test_hook.py parallelizable-task nudge -> offers /orchestrate
coverage-nudge/
hook.py · test_hook.py source-changed-without-tests nudge -> offers /coverage-audit
.github/workflows/
Expand Down
25 changes: 25 additions & 0 deletions failure-modes/catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,3 +151,28 @@ Detectability tiers:
cases", the scaffold `SKILL.md` "expect 49 passed" — all silently
stale from ordinary work. Now pinned by `test_readme.py`
(book-tracker), which goes red on drift instead of lying.

## parallel-without-independence: Fan-out over non-independent items
- **Detectability:** judgment
- **Smell:** Work is split across parallel subagents (or parallel
tasks) for speed, but the items aren't actually independent — they
share a file, a counter, an ordering, or coordinate via "skip if
already handled". The fan-out races or double-acts and the result is
silently wrong; the wall-clock win hides the correctness loss.
- **Signature:** none — behavioral, and it lives in the *plan*, not the
code. The tell is a fan-out justified only by "it's faster" with no
stated reason the items can't interfere. (Closely related to
dispatch-order-coupling, one level up: there the coordination is
implicit within one process; here it's across parallel workers.)
- **Verify:** Can you state in one sentence *why* these items are
independent — no shared mutable state, no ordering, no implicit
coordination? If two of them ran at the exact same instant, would the
outcome still be correct?
- **Fix pattern:** **Independence is a claim you must defend, not a
default.** If you can defend it, fan out. If you can't, model the
dependency explicitly — an ordered `pipeline()` (stages, not a
barrier), worktree isolation for parallel file mutations, or just stay
inline. Speed never licenses parallelism over coupled work.
- **Origin:** the `/orchestrate` advisor + subagent-nudge hook — the
guard that keeps "this is parallelizable" honest rather than merely
enthusiastic.
5 changes: 5 additions & 0 deletions hooks/hooks.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@
"type": "command",
"command": "python3 ${CLAUDE_PLUGIN_ROOT}/context-alert/hook.py",
"timeout": 10
},
{
"type": "command",
"command": "python3 ${CLAUDE_PLUGIN_ROOT}/subagent-nudge/hook.py",
"timeout": 10
}
]
}
Expand Down
82 changes: 82 additions & 0 deletions skills/orchestrate/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
name: orchestrate
description: Recommend a good use of subagents for a task — walk a 2-level decision tree (inline / single agent / multi-agent → pattern), explain why, hand back a ready-to-run Agent or Workflow snippet, and execute it only on the user's go-ahead. Use when a task looks broad or parallelizable ("audit/across the whole codebase", "for each X", "find all Y"), when the subagent-nudge hook fires, or whenever you're unsure whether to delegate and in what shape.
---

# /orchestrate

Decide whether a task warrants subagents and, if so, which **shape** — then
advise, offer, and (on go-ahead) run it. You are an *advisor*: the scarce thing
is the judgment, not another way to launch agents. Never fan out silently.

If invoked with a task, use it. If invoked bare, ask for the task in one line.

## The load-bearing rule (read first)

**Parallelism is only valid when the items are INDEPENDENT** — no shared
state, no ordering dependency, no implicit coordination (e.g. "skip if already
handled"). This is the #1 way orchestration goes wrong: work that *looks*
independent but isn't produces races and wrong results. So:

> Before recommending any fan-out, **state in one sentence why the items are
> independent.** If you can't, do NOT parallelize — use a pipeline (ordered
> stages) or stay inline. Cross-reference: `parallel-without-independence` in
> the failure-mode catalog.

## The decision tree

### Level 1 — the quick call

- **Gate 0 — is delegation even worth it?** Stay **inline (no subagent)** if the
task touches ≤1–2 files you already know, is a single lookup, or is
inherently sequential with tight feedback (debugging one stack trace).
Delegation costs spin-up + context handoff + tokens — don't pay it for small
or sequential work.
- **One bounded, single-perspective chunk that would just bloat your context?**
→ **a single general-purpose (or `Explore`) subagent.** Delegate to keep your
own context clean; no fan-out.
- **Genuinely decomposable into many items / perspectives?** → **multi-agent**,
go to Level 2.

### Level 2 — pick the multi-agent pattern

Walk these in order; take the first that fits:

1. **Known, independent, read-only** (search/understand many files)
→ **parallel `Explore` agents**, one per area.
2. **Known, independent, each MUTATES files** (could collide)
→ **Workflow**, `isolation: 'worktree'` per agent.
3. **Known items, multi-stage each** (find → verify → fix)
→ **Workflow `pipeline()`** — no barrier between stages.
4. **Unknown-size discovery** (find all bugs / every dead flag)
→ **loop-until-dry**: spawn finders until K consecutive empty rounds.
5. **A wrong answer is costly** (correctness / security)
→ **adversarial verify**: N skeptics per finding, kill on majority-refute
(or perspective-diverse lenses if it can fail multiple ways).
6. **Wide solution space** ("best approach to X")
→ **judge panel**: N independent attempts → score → synthesize the winner.

Patterns compose (e.g. find → dedup → pipeline-verify). Prefer `pipeline()`
over a `parallel()` barrier unless a stage genuinely needs *all* prior results
at once (dedup/merge, early-exit on zero).

## Two checks to always surface

- **Independence** — the sentence above. No sentence, no fan-out.
- **Cost / opt-in** — anything using the **Workflow** tool needs the user's
explicit opt-in and can burn many tokens (dozens of agents). Say so, with a
rough scale ("~6 readers + a verify pass"), rather than assuming it's allowed.

## Output: advise → offer → execute

1. **Recommend** — name the Level-1 call and (if multi-agent) the pattern, in
1–3 lines, with the independence sentence and a rough cost.
2. **Show the snippet** — a ready-to-run `Agent` call list, or a `Workflow`
script, that implements the recommendation. Make it copy-paste real, not
pseudocode.
3. **Offer** — end by asking whether to run it. **Execute only on an explicit
go-ahead**; otherwise leave the snippet for the user to run or adjust.

If the honest answer is "this doesn't need subagents," say that plainly — a
calm "do this inline" is a valid, valuable result. Manufacturing a fan-out to
look sophisticated is itself a failure mode.
Loading