Skip to content

plan_runner: fix deepening wedge + reliable email approval gate#1

Open
Bernardo1998 wants to merge 1 commit into
masterfrom
plan-runner-compression
Open

plan_runner: fix deepening wedge + reliable email approval gate#1
Bernardo1998 wants to merge 1 commit into
masterfrom
plan-runner-compression

Conversation

@Bernardo1998

Copy link
Copy Markdown
Owner

Applies STEWARD_FIX.md Issues A and B (plus the D/E prerequisites that were missing in this tree) to the shared plan-runner engine. Because the scaffolder only generates per-task data + a thin run.py that imports the package, fixing these engine files means every existing and future SETUP_GUIDE-scaffolded task inherits the fixes automatically — once this lands on master (what SETUP_GUIDE §1 clones).

Issue A — deepening steps no longer wedge the plan (runner.py)

A step that passed its hard criteria but failed the soft rubric parked in deepening with no work queued, so select_next_action returned stuck every cycle, forever.

  • needs_deepening now generates deepening work for plan_step dispatches and advances the parent budget for deepening dispatches.
  • New promotion pass before _advance_stages flips spent/empty-queue deepening steps to done_enough, so the stage gate arms instead of wedging.

Issue D — /approve marker was never detected (context.py, plan_phases.py)

check_stage_approval/check_intent_answer read only feedback["body"], but g10_parse_feedback stores text under raw_reply and body was never set.

  • load_context mirrors the reply into feedback["body"]; check_stage_approval also reads raw_reply for robustness.

Issue E — replies were never fetched (runner.py, feedback.py)

_check_email_replies early-returns unless last_email_date is set, and the runner never set it.

  • Persist last_email_date + subject_prefix into state after a sent email and thread them via meta_for_context.
  • send_ltt_email falls back to global_state["subject_prefix"] before [LTT], so a plan_runner task's sent subject and its reply IMAP search use the same prefix (they previously mismatched → replies never found).

Issue B — reliable email gate (plan_phases.py, context.py, runner.py)

  • New interpret_reply_nl: conservative LLM reader for plain-English approvals; no-ops unless a gate is armed.
  • Plumb the reply Message-ID into feedback["reply_msgid"].
  • One-reply-one-gate de-dup (last_reply_approval_msgid) + nl_reply fallback at cycle start, plus same-cycle approval after _advance_stages to close the arm-at-end / read-at-start timing gap.

scaffolder.py: add last_email_date, subject_prefix, last_reply_approval_msgid to the default state template.

Not touched: Issue C (already correct here), Issue F (out of scope per decision).

Verification

  • 180/180 existing tests pass; py_compile clean on all changed files.
  • Issue A: reproduced the stuck wedge, confirmed fix → done_enough → gate armed.
  • Issue D: /approve detected via body and raw_reply; wrong-stage tokens don't jump ahead.
  • Issue B de-dup: one reply approves at most one gate; a fresh Message-ID approves the next.
  • Issue B live LLM smoke test: "approve, proceed with your tests." → approve; "hold on, let me check the numbers first" → hold.
  • Scaffold + verify of a fresh task succeeds with the new state keys.

🤖 Generated with Claude Code

Apply STEWARD_FIX.md Issues A and B (plus the D/E prerequisites that
were missing in this tree) to the shared engine, so every existing and
future SETUP_GUIDE-scaffolded task inherits them automatically.

Issue A — deepening steps no longer wedge the plan (runner.py):
- needs_deepening now generates deepening work for plan_step dispatches
  and advances the parent budget for deepening dispatches.
- New promotion pass before _advance_stages flips spent/empty-queue
  deepening steps to done_enough, so the stage gate arms instead of
  select_next_action returning `stuck` forever.

Issue D — /approve marker was never detected (context.py, plan_phases.py):
- load_context mirrors the reply text into feedback["body"] (repairs both
  check_stage_approval and check_intent_answer, which read only "body").
- check_stage_approval also reads feedback["raw_reply"] for robustness.

Issue E — replies were never fetched (runner.py, feedback.py):
- Persist last_email_date + subject_prefix into state after a sent email
  and thread them through meta_for_context, arming phase1's reply check.
- send_ltt_email falls back to global_state["subject_prefix"] before
  "[LTT]", so a plan_runner task's sent subject and its reply IMAP search
  use the same prefix (they previously mismatched -> replies never found).

Issue B — reliable email gate (plan_phases.py, context.py, runner.py):
- New interpret_reply_nl: conservative LLM reader for plain-English
  approvals; no-ops unless a stage gate is armed.
- Plumb the reply Message-ID into feedback["reply_msgid"].
- One-reply-one-gate de-dup (last_reply_approval_msgid) + nl_reply
  fallback at cycle start, plus same-cycle approval after _advance_stages
  to close the arm-at-end / read-at-start timing gap.

scaffolder.py: add last_email_date, subject_prefix, last_reply_approval_msgid
to the default state template.

Not touched: Issue C (already correct here), Issue F (out of scope).

Verified: 180/180 existing tests pass; py_compile clean; Issue A wedge
reproduced and fixed; D marker detection; B de-dup; live LLM smoke test of
interpret_reply_nl (approve vs hold); scaffold+verify of a fresh task.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant