illinigirl · illinigirl · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -8,8 +8,8 @@
     {
       "name": "claude-code-toolkit",
       "source": "./",
-      "description": "Skills: scaffold-mcp (generate a tested MCP server), verify-mcp (check an MCP is green + well-formed with a health report), public-ready (audit + publish a repo), coverage-audit (audit test coverage for negative space, with a completeness validator), handoff (write a PICKUP handoff), design-note (write a structured design note), note (frictionless auto-categorized note-to-self), failure-scan (review a diff against a failure-mode catalog). Plus hooks: failure-mode (auto-flags risky patterns, e.g. unpaginated DynamoDB scans), context-alert (warns when session context crosses a threshold and offers a handoff checkpoint), and coverage-nudge (flags source changes with no test changes and offers a coverage audit).",
-      "version": "0.7.1",
+      "description": "Skills: scaffold-mcp (generate a tested MCP server), verify-mcp (check an MCP is green + well-formed with a health report), public-ready (audit + publish a repo), coverage-audit (audit test coverage for negative space, with a completeness validator), handoff (write a PICKUP handoff), design-note (write a structured design note), orchestrate (recommend a good use of subagents), note (frictionless auto-categorized note-to-self), failure-scan (review a diff against a failure-mode catalog). Plus hooks: failure-mode (auto-flags risky patterns, e.g. unpaginated DynamoDB scans), context-alert (warns when session context crosses a threshold and offers a handoff checkpoint), subagent-nudge (flags parallelizable tasks and offers an orchestration plan), and coverage-nudge (flags source changes with no test changes and offers a coverage audit).",
+      "version": "0.9.0",
       "license": "MIT",
       "keywords": [
         "mcp",

diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,8 +1,8 @@
 {
   "name": "claude-code-toolkit",
   "displayName": "Claude Code Toolkit",
-  "description": "Skills to scaffold MCP servers, verify an MCP is green and well-formed, audit repos for public release, audit test coverage for negative space, write session handoffs, write structured design notes, capture quick notes-to-self, and review a diff against a failure-mode catalog — plus hooks that auto-flag risky patterns, alert when session context crosses a threshold (offering a handoff checkpoint), and nudge when source changes outpace test changes (offering a coverage audit).",
-  "version": "0.8.1",
+  "description": "Skills to scaffold MCP servers, verify an MCP is green and well-formed, audit repos for public release, audit test coverage for negative space, write session handoffs, write structured design notes, recommend a good use of subagents, capture quick notes-to-self, and review a diff against a failure-mode catalog — plus hooks that auto-flag risky patterns, alert when session context crosses a threshold (offering a handoff checkpoint), nudge when a task looks parallelizable (offering an orchestration plan), and nudge when source changes outpace test changes (offering a coverage audit).",
+  "version": "0.9.0",
   "author": {
     "name": "Megan Schott"
   },
@@ -21,6 +21,8 @@
     "design",
     "notes",
     "coverage",
-    "testing"
+    "testing",
+    "subagents",
+    "orchestration"
   ]
 }
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -71,6 +71,8 @@ jobs:
         run: python3 context-alert/test_hook.py
       - name: Coverage-nudge hook regression test
         run: python3 coverage-nudge/test_hook.py
+      - name: Subagent-nudge hook regression test
+        run: python3 subagent-nudge/test_hook.py
       - name: Coverage-audit checklist validator test
         run: python3 skills/coverage-audit/test_checklist.py
       - name: Manifests are valid JSON

diff --git a/README.md b/README.md
@@ -15,6 +15,7 @@ agents and MCP servers.
 | [handoff](skills/handoff/) | `/handoff` | Write a `PICKUP-*.md` session handoff (task, what's done, what's in flight, next steps, key paths, working norms) so a fresh session resumes cold. Stays local, never committed. |
 | [failure-scan](skills/failure-scan/) | `/failure-scan` | Review the current diff against the failure-mode catalog — the judgment classes a hook can't catch (plausible-but-wrong AI output, biased per-segment stats, dispatch-order coupling) — reporting concrete risks with each class's verify questions and fix pattern. |
 | [design-note](skills/design-note/) | `/design-note` | Write a structured `<feature>-DESIGN.md` before building — problem, the core reframe, the design with options + tradeoffs, edge cases, and open questions to decide at build time. |
+| [orchestrate](skills/orchestrate/) | `/orchestrate` | Recommend a good use of subagents — walk a 2-level decision tree (inline / single agent / multi-agent → pattern), defend item independence before any fan-out, hand back a copy-paste Agent/Workflow snippet, and run it only on your go-ahead. |
 | [note](skills/note/) | `/note` | Frictionless note-to-self — dump a thought and it's auto-classified into a bucket (`note`/`skill`/`mcp`/`todo`) under `~/.claude/notes/`, no category-picking. Bare `/note` lists everything; `/note <bucket>` lists one. |
 | [coverage-audit](skills/coverage-audit/) | `/coverage-audit` | Audit a project's coverage for **negative space** — run line coverage, then judge the misses against the gaps agent-written suites predictably leave (empty, boundary, error paths, scale/pagination, time, untested adapters, tests that can't fail). Reports the top gaps ranked by silent-failure risk + the cheapest tests to add, then proves its own completeness with a shipped validator (`checklist.py`). Verify proves it runs; coverage-audit proves it's protected. |
 
@@ -144,6 +145,36 @@ fail).
 `coverage-nudge/test_hook.py` and `skills/coverage-audit/test_checklist.py`
 keep both halves honest.
 
+## Subagent-nudge hook + `/orchestrate`
+
+The mechanisms for delegation (Agent, Workflow) exist and Claude is prompted to
+use them — but nothing *proactively suggests* "this task is subagent-shaped,"
+and the judgment of **whether to fan out, and in what shape** lives implicitly
+in the model's head. This pair externalizes it, in the toolkit's usual
+nudge-hook + judgment-skill shape.
+
+- **A UserPromptSubmit hook** (auto-registered on install) that spots a
+  subagent-shaped prompt — breadth/decomposition signals like "across the whole
+  codebase", "for each X", "find/fix all Y", "comprehensive". Pure regex on the
+  prompt text (no FS scan, no LLM call, ~no latency), **flag-for-review** not
+  prescriptive ("this *might* parallelize; if the items aren't independent,
+  ignore me"), fires **once per session**, and is fully silenceable with
+  `SUBAGENT_NUDGE_DISABLED`. It doesn't orchestrate — it points Claude at
+  `/orchestrate`.
+- **`/orchestrate`** is the judgment half: a **2-level decision tree** — Level 1
+  the quick call (inline / single agent / multi-agent), Level 2 the pattern
+  (parallel-read, worktree-mutate, pipeline, loop-until-dry, adversarial-verify,
+  judge-panel). It **advises → offers → executes only on your go-ahead**, hands
+  back a copy-paste Agent/Workflow snippet, and surfaces the rough token cost +
+  Workflow opt-in.
+- Its load-bearing rule: **parallelism is only valid over INDEPENDENT items.**
+  The skill must state *why* items are independent before recommending fan-out;
+  if it can't, it picks a pipeline or stays inline. This is pinned to the
+  `parallel-without-independence` entry in the failure-mode catalog.
+
+`subagent-nudge/test_hook.py` keeps the trigger boundary honest — breadth fires,
+single-target prompts (`audit this function`) stay silent.
+
 ## Install (as a plugin)
 
 ```text
@@ -181,14 +212,17 @@ claude-code-toolkit/
     design-note/        SKILL.md — write a <feature>-DESIGN.md
     note/               SKILL.md — frictionless note-to-self capture
     coverage-audit/     SKILL.md · checklist.py · test_checklist.py
+    orchestrate/        SKILL.md — the subagent-orchestration decision tree
   hooks/
-    hooks.json          registers the failure-mode, context-alert + coverage-nudge hooks
+    hooks.json          registers the failure-mode, context-alert, subagent-nudge + coverage-nudge hooks
   failure-modes/
     catalog.md          the bug-class catalog (the shared brain)
     rules.json          grep-able rules the hook runs
     hook.py · test_hook.py
   context-alert/
     hook.py · test_hook.py   context-threshold alert -> offers /handoff
+  subagent-nudge/
+    hook.py · test_hook.py   parallelizable-task nudge -> offers /orchestrate
   coverage-nudge/
     hook.py · test_hook.py   source-changed-without-tests nudge -> offers /coverage-audit
   .github/workflows/

diff --git a/failure-modes/catalog.md b/failure-modes/catalog.md
@@ -151,3 +151,28 @@ Detectability tiers:
   cases", the scaffold `SKILL.md` "expect 49 passed" — all silently
   stale from ordinary work. Now pinned by `test_readme.py`
   (book-tracker), which goes red on drift instead of lying.
+
+## parallel-without-independence: Fan-out over non-independent items
+- **Detectability:** judgment
+- **Smell:** Work is split across parallel subagents (or parallel
+  tasks) for speed, but the items aren't actually independent — they
+  share a file, a counter, an ordering, or coordinate via "skip if
+  already handled". The fan-out races or double-acts and the result is
+  silently wrong; the wall-clock win hides the correctness loss.
+- **Signature:** none — behavioral, and it lives in the *plan*, not the
+  code. The tell is a fan-out justified only by "it's faster" with no
+  stated reason the items can't interfere. (Closely related to
+  dispatch-order-coupling, one level up: there the coordination is
+  implicit within one process; here it's across parallel workers.)
+- **Verify:** Can you state in one sentence *why* these items are
+  independent — no shared mutable state, no ordering, no implicit
+  coordination? If two of them ran at the exact same instant, would the
+  outcome still be correct?
+- **Fix pattern:** **Independence is a claim you must defend, not a
+  default.** If you can defend it, fan out. If you can't, model the
+  dependency explicitly — an ordered `pipeline()` (stages, not a
+  barrier), worktree isolation for parallel file mutations, or just stay
+  inline. Speed never licenses parallelism over coupled work.
+- **Origin:** the `/orchestrate` advisor + subagent-nudge hook — the
+  guard that keeps "this is parallelizable" honest rather than merely
+  enthusiastic.
diff --git a/hooks/hooks.json b/hooks/hooks.json
@@ -28,6 +28,11 @@
             "type": "command",
             "command": "python3 ${CLAUDE_PLUGIN_ROOT}/context-alert/hook.py",
             "timeout": 10
+          },
+          {
+            "type": "command",
+            "command": "python3 ${CLAUDE_PLUGIN_ROOT}/subagent-nudge/hook.py",
+            "timeout": 10
           }
         ]
       }

diff --git a/skills/orchestrate/SKILL.md b/skills/orchestrate/SKILL.md
@@ -0,0 +1,82 @@
+---
+name: orchestrate
+description: Recommend a good use of subagents for a task — walk a 2-level decision tree (inline / single agent / multi-agent → pattern), explain why, hand back a ready-to-run Agent or Workflow snippet, and execute it only on the user's go-ahead. Use when a task looks broad or parallelizable ("audit/across the whole codebase", "for each X", "find all Y"), when the subagent-nudge hook fires, or whenever you're unsure whether to delegate and in what shape.
+---
+
+# /orchestrate
+
+Decide whether a task warrants subagents and, if so, which **shape** — then
+advise, offer, and (on go-ahead) run it. You are an *advisor*: the scarce thing
+is the judgment, not another way to launch agents. Never fan out silently.
+
+If invoked with a task, use it. If invoked bare, ask for the task in one line.
+
+## The load-bearing rule (read first)
+
+**Parallelism is only valid when the items are INDEPENDENT** — no shared
+state, no ordering dependency, no implicit coordination (e.g. "skip if already
+handled"). This is the #1 way orchestration goes wrong: work that *looks*
+independent but isn't produces races and wrong results. So:
+
+> Before recommending any fan-out, **state in one sentence why the items are
+> independent.** If you can't, do NOT parallelize — use a pipeline (ordered
+> stages) or stay inline. Cross-reference: `parallel-without-independence` in
+> the failure-mode catalog.
+
+## The decision tree
+
+### Level 1 — the quick call
+
+- **Gate 0 — is delegation even worth it?** Stay **inline (no subagent)** if the
+  task touches ≤1–2 files you already know, is a single lookup, or is
+  inherently sequential with tight feedback (debugging one stack trace).
+  Delegation costs spin-up + context handoff + tokens — don't pay it for small
+  or sequential work.
+- **One bounded, single-perspective chunk that would just bloat your context?**
+  → **a single general-purpose (or `Explore`) subagent.** Delegate to keep your
+  own context clean; no fan-out.
+- **Genuinely decomposable into many items / perspectives?** → **multi-agent**,
+  go to Level 2.
+
+### Level 2 — pick the multi-agent pattern
+
+Walk these in order; take the first that fits:
+
+1. **Known, independent, read-only** (search/understand many files)
+   → **parallel `Explore` agents**, one per area.
+2. **Known, independent, each MUTATES files** (could collide)
+   → **Workflow**, `isolation: 'worktree'` per agent.
+3. **Known items, multi-stage each** (find → verify → fix)
+   → **Workflow `pipeline()`** — no barrier between stages.
+4. **Unknown-size discovery** (find all bugs / every dead flag)
+   → **loop-until-dry**: spawn finders until K consecutive empty rounds.
+5. **A wrong answer is costly** (correctness / security)
+   → **adversarial verify**: N skeptics per finding, kill on majority-refute
+   (or perspective-diverse lenses if it can fail multiple ways).
+6. **Wide solution space** ("best approach to X")
+   → **judge panel**: N independent attempts → score → synthesize the winner.
+
+Patterns compose (e.g. find → dedup → pipeline-verify). Prefer `pipeline()`
+over a `parallel()` barrier unless a stage genuinely needs *all* prior results
+at once (dedup/merge, early-exit on zero).
+
+## Two checks to always surface
+
+- **Independence** — the sentence above. No sentence, no fan-out.
+- **Cost / opt-in** — anything using the **Workflow** tool needs the user's
+  explicit opt-in and can burn many tokens (dozens of agents). Say so, with a
+  rough scale ("~6 readers + a verify pass"), rather than assuming it's allowed.
+
+## Output: advise → offer → execute
+
+1. **Recommend** — name the Level-1 call and (if multi-agent) the pattern, in
+   1–3 lines, with the independence sentence and a rough cost.
+2. **Show the snippet** — a ready-to-run `Agent` call list, or a `Workflow`
+   script, that implements the recommendation. Make it copy-paste real, not
+   pseudocode.
+3. **Offer** — end by asking whether to run it. **Execute only on an explicit
+   go-ahead**; otherwise leave the snippet for the user to run or adjust.
+
+If the honest answer is "this doesn't need subagents," say that plainly — a
+calm "do this inline" is a valid, valuable result. Manufacturing a fan-out to
+look sophisticated is itself a failure mode.