Skip to content

feat: report recoverable bug steps as degraded (#1641)#1644

Merged
gltanaka merged 10 commits into
mainfrom
change/issue-1641
Jun 19, 2026
Merged

feat: report recoverable bug steps as degraded (#1641)#1644
gltanaka merged 10 commits into
mainfrom
change/issue-1641

Conversation

@prompt-driven-github

Copy link
Copy Markdown
Contributor

Summary

This PR updates pdd bug reporting semantics so recoverable step failures are shown as degraded/continuing while terminal failures remain failed/aborting. Step 8 recoverable test-strategy failures now call out fallback/default planning, while later success behavior remains intact.

Closes #1641

Changes Made

Prompts Modified

  • pdd/prompts/agentic_common_python.prompt - adds explicit recoverable/fatal failure comment semantics while preserving legacy fallback behavior.
  • pdd/prompts/agentic_bug_orchestrator_python.prompt - updates bug workflow requirements for degraded soft failures, fatal abort comments, Step 8 fallback wording, and resume-compatible failed-step state.

Documentation Updated

  • README.md - documents degraded vs aborting step-comment statuses for pdd bug.
  • docs/TUTORIALS.md - adds guidance for interpreting degraded workflow comments.

Code and Tests Updated

  • pdd/agentic_common.py - implements explicit failure-mode comment templates.
  • pdd/agentic_bug_orchestrator.py - passes recoverable/fatal semantics to step-comment posting.
  • tests/test_agentic_common.py - covers degraded/fatal comment formatting and legacy compatibility.
  • tests/test_agentic_bug_orchestrator_step_comments.py - covers Step 8 degraded continuation and fatal provider abort behavior.
  • architecture.json - updates prompt architecture metadata.

User Stories

  • Policy: warn
  • user_stories/story__agentic_bug_orchestrator_agentic_common.md — issue-derived story linked to: agentic_bug_orchestrator_python.prompt, agentic_common_python.prompt
  • user_stories/contracts/agentic_bug_orchestrator_agentic_common.contract.md — generated machine-checkable contract
  • Validation: ❌ 1 of 1 linked story check(s) failed: user_stories/story__agentic_bug_orchestrator_agentic_common.md
  • ⚠️ Warning (non-blocking): failing linked story check(s): user_stories/story__agentic_bug_orchestrator_agentic_common.md

Review Checklist

  • Prompt syntax is valid
  • PDD conventions followed
  • Documentation is up to date

Next Steps After Merge

  1. Regenerate code from modified prompts in dependency order:
    ./sync_order.sh
    Or manually:
    pdd sync agentic_common
    pdd sync agentic_update
    pdd sync agentic_test_orchestrator
    pdd sync cli
    pdd sync architecture
    pdd sync agentic_verify
    pdd sync agentic_architecture_orchestrator
    pdd sync agentic_test_generate
    pdd sync agentic_common_worktree
    pdd sync agentic_crash
    pdd sync ci_validation
    pdd sync agentic_fix
    pdd sync git_update
    pdd sync agentic_split_orchestrator
    pdd sync duplicate_cli_guard
    pdd sync executor
    pdd sync fix_code_loop
    pdd sync fix_error_loop
    pdd sync fix_verification_errors_loop
    pdd sync agentic_split
    pdd sync sync_order
    pdd sync auto_deps_architecture
    pdd sync agentic_change
    pdd sync agentic_e2e_fix
    pdd sync agentic_e2e_fix_orchestrator
    pdd sync agentic_sync_runner
    pdd sync agentic_test
    pdd sync auth
    pdd sync commands
    pdd sync crash_main
    pdd sync fix
    pdd sync fix_verification_main
    pdd sync one_session_sync
    pdd sync pre_checkup_gate
    pdd sync utility
    pdd sync agentic_checkup_orchestrator
    pdd sync bug_main
    pdd sync durable_sync_runner
    pdd sync sync_main
    pdd sync __init__
    pdd sync agentic_sync
    pdd sync checkup
    pdd sync checkup_review_loop
    pdd sync ci_drift_heal
    pdd sync connect
    pdd sync maintenance
    pdd sync update_main
    pdd sync agentic_checkup
    pdd sync generate
    pdd sync modify
    pdd sync analysis
    pdd sync pin_example_hack
    pdd sync sync_orchestration
    pdd sync agentic_bug
    pdd sync agentic_bug_orchestrator
    
  2. Run tests to verify functionality
  3. Deploy if applicable

Created by pdd change workflow

@Serhan-Asad

Copy link
Copy Markdown
Collaborator

Independent testing & verification

Verified this PR in an isolated worktree against origin/main. No regressions; the degraded/fatal/legacy reporting is confirmed end-to-end on real GitHub. Merge-ready. The only red check is a pre-existing flaky test unrelated to this change (details below).

Automated tests

  • Touched-module suites (test_agentic_common.py, test_agentic_bug_orchestrator*.py): 803 passed / 1 skipped on this branch vs 795 on main — exactly the +8 tests this PR adds, with zero existing-test regressions.
  • Legacy post_step_comment callers (change/checkup orchestrators that pass body=None): 472 passed. The legacy Status: FAILED template is unchanged except an intentional footer em-dash→hyphen (covered by the updated test).

Live GitHub end-to-end

Drove the real post_step_comment (real gh) for all three fallback modes against a throwaway issue. Rendered output confirmed:

mode rendered status
recoverable **Status:** DEGRADED - workflow continuing + recovery detail
fatal **Status:** FAILED - workflow aborting
legacy (no mode) unchanged **Status:** FAILED

Secret redaction also runs on this path — a planted token rendered as [REDACTED_GITHUB_TOKEN]. This directly fixes the reported case where a continuing Step 8 looked fatal.

Resume-state logic

Traced the resume-validator change adversarially across the edge cases:

  • degraded Step 8 + later steps completed → does not downgrade last_completed_step or re-run the side-effectful Step 12 (PR creation);
  • Step 8 failed with a gap after / nothing after → still re-runs from Step 8;
  • an ordinary (non-fallback) failed step → still the resume point.

New unit tests cover all three, plus the transient degraded-comment retry on resume.

CI note (pre-existing flake, not this PR)

Run Unit Tests failed only on tests/test_update_main.py::test_update_main_with_input_code_and_no_git. This is a parallel (-n auto) test-isolation flake: the assertion trips on a global open mock leaked from test_codex_subscription.py (its auth.json / device_code writes appear in the failing call list). The test passes in isolation (133/133 in its file), this PR touches none of those files, and it reproduces on main. Re-triggered the job.

pdd-bot and others added 4 commits June 18, 2026 18:13
A recoverable soft-failure on a step that hands downstream work a deterministic fallback (test strategy -> fallback test plan) is persisted with the FAILED: sentinel, but the workflow legitimately continues and produces valid later results. The resume validator treated that sentinel as a gap, downgrading last_completed_step and rerunning already-completed, side-effectful steps (test generation through PR creation) on resume.

Gate the contiguous walk to the recoverable-fallback step set so such a step is non-blocking once a later step has completed, while ordinary failed steps (e.g. triage) and end-of-run failures still rerun. Add a helper unit test and a resume regression test (proven to fail on the pre-fix walk), and document the semantics in the prompt so regeneration preserves them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When a recoverable step's degraded fallback comment post fails transiently, _maybe_post_step_comment records failed_pending=True intending a resume retry. But the on-entry backfill sweep skipped every FAILED: output before checking pending markers and only honored fallback_pending, so the pending comment was never re-posted. With recoverable-fallback steps no longer rerun on resume, that sweep is the comment's only retry path, and the required DEGRADED status could be lost permanently.

Have the sweep re-post a failed step whose entry is failed_pending and not failed_posted, via failure_mode=recoverable. Route the degraded recovery detail through a shared _step_failure_detail helper so the live post and the resume re-post can't drift. Adds a regression test proven to fail on the pre-fix sweep.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gltanaka

Copy link
Copy Markdown
Contributor

Do not merge yet. The degraded/fatal reporting change is needed for #1641, but this branch still violates the prompt/code interface contract.

Required change:

  • pdd/prompts/agentic_common_python.prompt and architecture.json declare extract_step_report as a callable interface entry, but pdd/agentic_common.py currently exposes it only as extract_step_report = _extract_step_report at pdd/agentic_common.py:7395. The prompt signature verifier only accepts real FunctionDef/AsyncFunctionDef nodes, so the branch still fails architecture conformance with:
Architecture conformance error for agentic_common_python.prompt: the prompt's <pdd-interface> declares function(s)/method(s) missing from the generated code: extract_step_report. Output: pdd/agentic_common.py.
missing_symbols=['extract_step_report']

Fix by adding a real public wrapper with the declared signature, for example def extract_step_report(text: Optional[str]) -> Optional[str]: return _extract_step_report(text), or otherwise update the prompt/interface contract so the conformance tooling no longer declares it as a callable. After that, rerun the conformance/sync validation for agentic_common; targeted pytest currently passes, but this interface failure is a merge blocker.

Serhan-Asad and others added 2 commits June 18, 2026 19:17
The prompt <pdd-interface> and architecture.json declare extract_step_report
as a callable, but the code exposed it only as an assignment alias
(extract_step_report = _extract_step_report). The conformance verifier
resolves declared symbols solely via FunctionDef/AsyncFunctionDef nodes, so
the alias read as a missing symbol and failed architecture conformance once
this branch's prompt edit re-triggered the agentic_common check.

Replace the alias with a behavior-identical public wrapper def and reword the
prompt prose so a future sync regenerates a real def, not an alias. Public
API and return values are unchanged; four orchestrators import the name.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Serhan-Asad

Copy link
Copy Markdown
Collaborator

Fixed (commit 11dcf04e1).

extract_step_report is now a real def wrapper that returns _extract_step_report(text), instead of the extract_step_report = _extract_step_report assignment alias. The <pdd-interface>/architecture.json conformance verifier resolves declared symbols only via FunctionDef/AsyncFunctionDef nodes, so the alias read as a missing symbol; a real def resolves it. Public name, signature, and return contract are unchanged — the four orchestrators that import it are unaffected.

Verification:

  • Reproduced your exact error by calling the real verifier (_verify_architecture_conformance) on the pre-fix code → missing_symbols=['extract_step_report']. After the fix it returns clean for both agentic_common_python.prompt and agentic_bug_orchestrator_python.prompt.
  • Reworded the prompt prose from "Public alias" → "wrapper function (a real def, not an assignment alias)" so a future pdd sync regenerates the def rather than reintroducing the alias.
  • Touched-module suites stay green (821 passed / 1 skipped on the current head, including main's freshly-merged routing-policy changes).

@gltanaka gltanaka merged commit d77af7e into main Jun 19, 2026
9 checks passed
@gltanaka gltanaka deleted the change/issue-1641 branch June 19, 2026 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pdd-bug: report recoverable step failures as degraded instead of fatal

3 participants