plan_runner: fix deepening wedge + reliable email approval gate by Bernardo1998 · Pull Request #1 · Bernardo1998/steward

Bernardo1998 · 2026-06-01T03:15:08Z

Applies STEWARD_FIX.md Issues A and B (plus the D/E prerequisites that were missing in this tree) to the shared plan-runner engine. Because the scaffolder only generates per-task data + a thin run.py that imports the package, fixing these engine files means every existing and future SETUP_GUIDE-scaffolded task inherits the fixes automatically — once this lands on master (what SETUP_GUIDE §1 clones).

Issue A — deepening steps no longer wedge the plan (`runner.py`)

A step that passed its hard criteria but failed the soft rubric parked in deepening with no work queued, so select_next_action returned stuck every cycle, forever.

needs_deepening now generates deepening work for plan_step dispatches and advances the parent budget for deepening dispatches.
New promotion pass before _advance_stages flips spent/empty-queue deepening steps to done_enough, so the stage gate arms instead of wedging.

Issue D — `/approve` marker was never detected (`context.py`, `plan_phases.py`)

check_stage_approval/check_intent_answer read only feedback["body"], but g10_parse_feedback stores text under raw_reply and body was never set.

load_context mirrors the reply into feedback["body"]; check_stage_approval also reads raw_reply for robustness.

Issue E — replies were never fetched (`runner.py`, `feedback.py`)

_check_email_replies early-returns unless last_email_date is set, and the runner never set it.

Persist last_email_date + subject_prefix into state after a sent email and thread them via meta_for_context.
send_ltt_email falls back to global_state["subject_prefix"] before [LTT], so a plan_runner task's sent subject and its reply IMAP search use the same prefix (they previously mismatched → replies never found).

Issue B — reliable email gate (`plan_phases.py`, `context.py`, `runner.py`)

New interpret_reply_nl: conservative LLM reader for plain-English approvals; no-ops unless a gate is armed.
Plumb the reply Message-ID into feedback["reply_msgid"].
One-reply-one-gate de-dup (last_reply_approval_msgid) + nl_reply fallback at cycle start, plus same-cycle approval after _advance_stages to close the arm-at-end / read-at-start timing gap.

scaffolder.py: add last_email_date, subject_prefix, last_reply_approval_msgid to the default state template.

Not touched: Issue C (already correct here), Issue F (out of scope per decision).

Verification

180/180 existing tests pass; py_compile clean on all changed files.
Issue A: reproduced the stuck wedge, confirmed fix → done_enough → gate armed.
Issue D: /approve detected via body and raw_reply; wrong-stage tokens don't jump ahead.
Issue B de-dup: one reply approves at most one gate; a fresh Message-ID approves the next.
Issue B live LLM smoke test: "approve, proceed with your tests." → approve; "hold on, let me check the numbers first" → hold.
Scaffold + verify of a fresh task succeeds with the new state keys.

🤖 Generated with Claude Code

Apply STEWARD_FIX.md Issues A and B (plus the D/E prerequisites that were missing in this tree) to the shared engine, so every existing and future SETUP_GUIDE-scaffolded task inherits them automatically. Issue A — deepening steps no longer wedge the plan (runner.py): - needs_deepening now generates deepening work for plan_step dispatches and advances the parent budget for deepening dispatches. - New promotion pass before _advance_stages flips spent/empty-queue deepening steps to done_enough, so the stage gate arms instead of select_next_action returning `stuck` forever. Issue D — /approve marker was never detected (context.py, plan_phases.py): - load_context mirrors the reply text into feedback["body"] (repairs both check_stage_approval and check_intent_answer, which read only "body"). - check_stage_approval also reads feedback["raw_reply"] for robustness. Issue E — replies were never fetched (runner.py, feedback.py): - Persist last_email_date + subject_prefix into state after a sent email and thread them through meta_for_context, arming phase1's reply check. - send_ltt_email falls back to global_state["subject_prefix"] before "[LTT]", so a plan_runner task's sent subject and its reply IMAP search use the same prefix (they previously mismatched -> replies never found). Issue B — reliable email gate (plan_phases.py, context.py, runner.py): - New interpret_reply_nl: conservative LLM reader for plain-English approvals; no-ops unless a stage gate is armed. - Plumb the reply Message-ID into feedback["reply_msgid"]. - One-reply-one-gate de-dup (last_reply_approval_msgid) + nl_reply fallback at cycle start, plus same-cycle approval after _advance_stages to close the arm-at-end / read-at-start timing gap. scaffolder.py: add last_email_date, subject_prefix, last_reply_approval_msgid to the default state template. Not touched: Issue C (already correct here), Issue F (out of scope). Verified: 180/180 existing tests pass; py_compile clean; Issue A wedge reproduced and fixed; D marker detection; B de-dup; live LLM smoke test of interpret_reply_nl (approve vs hold); scaffold+verify of a fresh task. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plan_runner: fix deepening wedge + reliable email approval gate#1

plan_runner: fix deepening wedge + reliable email approval gate#1
Bernardo1998 wants to merge 1 commit into
masterfrom
plan-runner-compression

Bernardo1998 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bernardo1998 commented Jun 1, 2026

Issue A — deepening steps no longer wedge the plan (runner.py)

Issue D — /approve marker was never detected (context.py, plan_phases.py)

Issue E — replies were never fetched (runner.py, feedback.py)

Issue B — reliable email gate (plan_phases.py, context.py, runner.py)

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Issue A — deepening steps no longer wedge the plan (`runner.py`)

Issue D — `/approve` marker was never detected (`context.py`, `plan_phases.py`)

Issue E — replies were never fetched (`runner.py`, `feedback.py`)

Issue B — reliable email gate (`plan_phases.py`, `context.py`, `runner.py`)