plan_runner: fix deepening wedge + reliable email approval gate#1
Open
Bernardo1998 wants to merge 1 commit into
Open
plan_runner: fix deepening wedge + reliable email approval gate#1Bernardo1998 wants to merge 1 commit into
Bernardo1998 wants to merge 1 commit into
Conversation
Apply STEWARD_FIX.md Issues A and B (plus the D/E prerequisites that were missing in this tree) to the shared engine, so every existing and future SETUP_GUIDE-scaffolded task inherits them automatically. Issue A — deepening steps no longer wedge the plan (runner.py): - needs_deepening now generates deepening work for plan_step dispatches and advances the parent budget for deepening dispatches. - New promotion pass before _advance_stages flips spent/empty-queue deepening steps to done_enough, so the stage gate arms instead of select_next_action returning `stuck` forever. Issue D — /approve marker was never detected (context.py, plan_phases.py): - load_context mirrors the reply text into feedback["body"] (repairs both check_stage_approval and check_intent_answer, which read only "body"). - check_stage_approval also reads feedback["raw_reply"] for robustness. Issue E — replies were never fetched (runner.py, feedback.py): - Persist last_email_date + subject_prefix into state after a sent email and thread them through meta_for_context, arming phase1's reply check. - send_ltt_email falls back to global_state["subject_prefix"] before "[LTT]", so a plan_runner task's sent subject and its reply IMAP search use the same prefix (they previously mismatched -> replies never found). Issue B — reliable email gate (plan_phases.py, context.py, runner.py): - New interpret_reply_nl: conservative LLM reader for plain-English approvals; no-ops unless a stage gate is armed. - Plumb the reply Message-ID into feedback["reply_msgid"]. - One-reply-one-gate de-dup (last_reply_approval_msgid) + nl_reply fallback at cycle start, plus same-cycle approval after _advance_stages to close the arm-at-end / read-at-start timing gap. scaffolder.py: add last_email_date, subject_prefix, last_reply_approval_msgid to the default state template. Not touched: Issue C (already correct here), Issue F (out of scope). Verified: 180/180 existing tests pass; py_compile clean; Issue A wedge reproduced and fixed; D marker detection; B de-dup; live LLM smoke test of interpret_reply_nl (approve vs hold); scaffold+verify of a fresh task. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Applies
STEWARD_FIX.mdIssues A and B (plus the D/E prerequisites that were missing in this tree) to the shared plan-runner engine. Because the scaffolder only generates per-task data + a thinrun.pythat imports the package, fixing these engine files means every existing and future SETUP_GUIDE-scaffolded task inherits the fixes automatically — once this lands onmaster(what SETUP_GUIDE §1 clones).Issue A — deepening steps no longer wedge the plan (
runner.py)A step that passed its hard criteria but failed the soft rubric parked in
deepeningwith no work queued, soselect_next_actionreturnedstuckevery cycle, forever.needs_deepeningnow generates deepening work forplan_stepdispatches and advances the parent budget fordeepeningdispatches._advance_stagesflips spent/empty-queuedeepeningsteps todone_enough, so the stage gate arms instead of wedging.Issue D —
/approvemarker was never detected (context.py,plan_phases.py)check_stage_approval/check_intent_answerread onlyfeedback["body"], butg10_parse_feedbackstores text underraw_replyandbodywas never set.load_contextmirrors the reply intofeedback["body"];check_stage_approvalalso readsraw_replyfor robustness.Issue E — replies were never fetched (
runner.py,feedback.py)_check_email_repliesearly-returns unlesslast_email_dateis set, and the runner never set it.last_email_date+subject_prefixintostateafter a sent email and thread them viameta_for_context.send_ltt_emailfalls back toglobal_state["subject_prefix"]before[LTT], so a plan_runner task's sent subject and its reply IMAP search use the same prefix (they previously mismatched → replies never found).Issue B — reliable email gate (
plan_phases.py,context.py,runner.py)interpret_reply_nl: conservative LLM reader for plain-English approvals; no-ops unless a gate is armed.feedback["reply_msgid"].last_reply_approval_msgid) +nl_replyfallback at cycle start, plus same-cycle approval after_advance_stagesto close the arm-at-end / read-at-start timing gap.scaffolder.py: addlast_email_date,subject_prefix,last_reply_approval_msgidto the default state template.Not touched: Issue C (already correct here), Issue F (out of scope per decision).
Verification
py_compileclean on all changed files.stuckwedge, confirmed fix →done_enough→ gate armed./approvedetected viabodyandraw_reply; wrong-stage tokens don't jump ahead.🤖 Generated with Claude Code