Skip to content

Per-plan PR model — each plan ships as its own stacked/parallel PR#26

Open
andrewemark wants to merge 3 commits into
mainfrom
amark/stacked-plan-prs
Open

Per-plan PR model — each plan ships as its own stacked/parallel PR#26
andrewemark wants to merge 3 commits into
mainfrom
amark/stacked-plan-prs

Conversation

@andrewemark
Copy link
Copy Markdown

Why this change: slowing down to speed up

We're introducing a deliberately more broken-out PR process — one PR per plan, each gated by /drvr:assess/drvr:docs-artifacts/drvr:open-pr — so we can review each plan's actual output in isolation and see where the agent's implementation or our planning falls short. A feature-level PR makes those gaps easy to miss; a per-plan PR forces them to the surface.

The eventual goal is to NOT need this much process. Once we trust the agent's planning + implementation cycle enough that per-plan review stops catching meaningful gaps, we can collapse this back toward fewer, larger PRs. Until then, slowing down here lets us:

  • See where the agent under-specifies plans or over-builds implementations
  • Revise the SDLC process itself based on real friction, not speculation
  • Get features across the line in the architectural form we actually want, not the form that emerges from accumulated drift across one large PR

TL;DR: slowing down to speed up.


What this PR does

Restructures the SDLC post-implementation flow so each plan in a feature ships as its own pull request with self-contained PR-body documentation, instead of one feature-level PR at the end.

Mental model: DAG of bases (not a linear stack)

Each plan's PR Base Branch is derived from its depends_on:

  • depends_on: [] → Base = feature parent (e.g. main). Independent plans get parallel PRs.
  • depends_on: [N] → Base = upstream plan N's Feature Branch. Dependent plans get stacked PRs.
  • depends_on: [N, M] → user picks one as the branch parent; others satisfied by interface contracts.

Plans can be implemented in any dependency-respecting order. Plan 8 before Plan 2 is fine if they're independent — both PRs target the feature parent in parallel.

Per-plan PR gate (REQUIRED before next plan)

Plan <name> bookkeeping complete
  → /drvr:assess <plan>           writes assessment/<plan>-test-curation.md
  → [/drvr:review <plan>]         only if assess found FAIL violations
  → /drvr:docs-artifacts <plan>   writes driver-docs/<plan>/* (PR body content)
  → /drvr:open-pr <plan>          push branch, gh pr create --base <plan's Base Branch>
  → THEN next unblocked plan

Each step is surfaced explicitly in sdlc-orchestration and implementation-guidance so the agent doesn't collapse the chain or skip ahead.

Per-plan handoff doc structure

driver-docs/
├── 00-feature-overview.md      # cross-plan rollup, updated as plans ship
├── 01-foo/
│   ├── feature-overview.md     # PR body — self-contained with Stack Position section
│   ├── architecture.md
│   ├── testing-guide.md
│   └── risk-assessment.md
└── 02-bar/
    └── ...

Each per-plan feature-overview.md has a mandatory Stack Position section (base/feature branch, upstream/downstream plans, link to cross-plan rollup) so a reviewer who hasn't seen the rest of the DAG can still evaluate the PR.

Edge cases handled

  • Upstream merged before downstream PR opens — recorded Base Branch may be gone. /drvr:open-pr checks git ls-remote and asks the user which branch to target instead (usually the feature parent). The plan's recorded Base Branch is updated to match.
  • Plans in different states simultaneously — phase detection scans per-plan via FEATURE_LOG.md events (pr_created_<plan>, pr_merged_<plan>, etc.). Plan 01 merged, Plan 02 in review, Plan 03 mid-implementation is a valid simultaneous state.

Files changed

File What changed
CLAUDE.md Lifecycle diagram, folder layout, DAG-of-bases mental model, per-plan principles
commands/feature.md Collects feature parent branch + branch prefix at scaffold time (not a single feature branch)
skills/planning-guidance/SKILL.md Per-plan Environment (Base/Feature branch from depends_on); PR Stack table in overview; branch confirmation in Step 4.5
skills/implementation-guidance/SKILL.md Per-plan branch validation in pre-flight 2.3; Step 5.5 surfaces the per-plan gate (assess→docs→open-pr) explicitly
skills/sdlc-orchestration/SKILL.md Per-plan gate transitions; per-plan FEATURE_LOG events; DAG-aware "next plan" hints (parallel vs stacked); per-plan post-PR phase detection
commands/assess.md Scoped to one plan; output is assessment/<plan>-test-curation.md; gates /drvr:docs-artifacts <plan>
commands/docs-artifacts.md Writes driver-docs/<plan>/* with mandatory Stack Position section; updates cross-plan rollup; gates /drvr:open-pr <plan>
commands/open-pr.md Per-plan; base from plan Environment; prompts user for replacement base if recorded one was deleted by upstream merge

Stats: 8 files changed, +605 / −291

Test plan

This plugin can't be unit-tested end-to-end without running through a real feature, so the verification here is documentation review:

  • The lifecycle diagram in CLAUDE.md matches the per-plan gate described in sdlc-orchestration
  • planning-guidance Step 4.5 and Step 5 prescribe the same Base Branch derivation rule (from depends_on)
  • implementation-guidance Step 5.5 names the same three gate commands (assess, docs-artifacts, open-pr) in the same order as sdlc-orchestration Bookkeeping → Per-Plan PR Gate
  • commands/open-pr.md Step 5 correctly handles the merged-upstream case (asks user, updates plan Environment)
  • PR body template in commands/docs-artifacts.md includes the Stack Position section as a required section
  • After merge: run a small dogfood feature (2-3 plans, mix of independent + dependent) and confirm the agent surfaces each gate step without prompting

Out of scope (follow-ups if needed)

  • Branch creation automation: implementation-guidance pre-flight 2.3 detects missing Feature Branch and suggests git checkout -b <branch> <base>, but doesn't run it. Could be automated later if we trust the convention.
  • Rebase-on-upstream-merge automation: when an upstream merges while a downstream is in revision, the user handles the rebase. Could be helped by a future /drvr:rebase-on-upstream command.
  • Cascade-check integration with stacked PRs: if Plan N's PR is revised in a way that changes its interface, downstream plans may need rebasing. The Loop Handling table flags this; the agent doesn't yet automate it.

🤖 Generated with Claude Code

## Motivation: slowing down to speed up

We're introducing a deliberately more broken-out PR process — one PR per
plan, each gated by per-plan assess → docs → open-pr — so that we can
review each plan's actual output in isolation and see where the agent's
implementation or our planning falls short. A feature-level PR makes those
gaps easy to miss; a per-plan PR forces them to the surface.

The eventual goal is to NOT need this much process. Once we trust the
agent's planning + implementation cycle enough that per-plan review stops
catching meaningful gaps, we can collapse this back toward fewer, larger
PRs. Until then, slowing down here lets us:

- See where the agent under-specifies plans or over-builds implementations
- Revise the SDLC process itself based on real friction, not speculation
- Get features across the line in the architectural form we actually want,
  not the form that emerged from accumulated drift across one large PR

TL;DR: slowing down to speed up.

## What changes

Each plan in a feature now ships as its own pull request with self-contained
PR-body documentation, instead of a single feature-level PR at the end. PR
bases form a DAG derived from plan depends_on: independent plans open
parallel PRs against the feature parent; dependent plans stack on their
upstream plan's branch.

Per-plan PR gate (REQUIRED before next plan):
  /drvr:assess <plan> → /drvr:docs-artifacts <plan> → /drvr:open-pr <plan>

Each gate step is surfaced explicitly in the orchestration and
implementation skills so the agent doesn't collapse the chain or skip
ahead to the next plan.

Key changes:
- CLAUDE.md: lifecycle diagram, folder layout, DAG-of-bases mental model
- /drvr:feature: collects feature parent + branch prefix (not single feature branch)
- planning-guidance: per-plan Environment (Base/Feature branch from depends_on);
  PR Stack table in overview; branch confirmation in Step 4.5
- implementation-guidance: per-plan branch validation in pre-flight 2.3;
  Step 5.5 surfaces the per-plan gate explicitly with the assess→docs→open-pr chain
- sdlc-orchestration: per-plan gate transitions; per-plan FEATURE_LOG events
  (assessment_complete_<plan>, pr_created_<plan>, pr_merged_<plan>, etc.);
  DAG-aware "next plan" hints (parallel vs stacked)
- /drvr:assess: scoped to one plan; output is assessment/<plan>-test-curation.md
- /drvr:docs-artifacts: writes driver-docs/<plan>/* with mandatory Stack Position
  section so PR body is self-contained for reviewers without DAG context
- /drvr:open-pr: per-plan; base from plan Environment; asks the user for a new
  base if recorded Base Branch was deleted by a merged upstream

Files: 8 changed, +605 / -291

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Andrew Mark and others added 2 commits May 22, 2026 10:59
The assess command previously biased toward KEEP in four places ("when
uncertain, KEEP", "this is a pruning pass, not a purge", failure-handling
notes, etc.) on the rationale that false-keeps cost less than false-prunes.

In practice this lets tautological scaffolding ship — e.g., tests that
assert "the enum has three variants" against a three-variant enum literal.
That assertion passes iff the implementation is unchanged and catches no
real bug; it's the canonical PRUNE case. The KEEP-by-default rule defeated
the assess phase's purpose.

Replace with a shape-based rule: judge by what the test asserts.
- Structural (counts, enum membership, types, mock call shapes,
  internal state) → PRUNE
- Behavioral (inputs → outputs, error modes, contract boundaries) → KEEP

Tautological structural assertions are called out explicitly as a PRUNE
signal in Step 3, with the enum-variant example. The "only-coverage-of-an-
edge-case" exception is preserved but reframed to require *behavioral*
coverage — structural-only coverage is not coverage. Step 7's post-prune
failure handling (restore if tests fail) is unchanged — that's a defensive
mechanism, not a default-bias issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous Step 7 said: "If tests fail after changes, investigate: a
pruned test was the only coverage for a real behavior → restore it as
KEEP." This doesn't track:

- A deleted test cannot fail (it's gone), and pruning test X does not
  cause unrelated test Y to fail in any normal sense
- The only way a pruned test's removal would surface as a test-suite
  failure is via test interdependence (shared state, ordering) — and
  the right fix there is to extract the dependency into a fixture, not
  to restore the pruned test
- "The pruned test was the only coverage for a real behavior" is not
  a signal a passing/failing test suite can give you — the surviving
  tests pass and the behavior is silently uncovered

The "restore as KEEP" rule was a vestige of the KEEP-by-default bias
removed in 5f2013a, repackaged as defensive recovery.

Replace with the actual failure modes that arise after Step 7:
- Promoted test rewritten incorrectly → fix the rewrite (common case)
- Surviving test depended on pruned test's setup/state → extract a fixture
- Unrelated regression in the curation commit → revert and redo cleanly
- Pre-existing unrelated failure → address separately

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant