fix(checkup): honor out-of-scope/non-blocking flags in Step 7 targeted-PR gate (#1574)#1575
fix(checkup): honor out-of-scope/non-blocking flags in Step 7 targeted-PR gate (#1574)#1575DianaTao wants to merge 18 commits into
Conversation
…7 targeted-PR gate (#1574) `_step7_unfixed_critical_blocks_targeted_pr` only excluded an unfixed critical finding when it carried a free-text `*_reason` field (`non_blocking_reason` / `out_of_scope_reason` / `scope_reason` / `reason`) IN ADDITION to the `blocking: false` / `scope: out-of-scope` flags. The Step 7 verify prompt emits those flags without a separate reason, so `_step7_nonblocking_reason` returned "" and the `... and reason` guard let pre-existing, out-of-scope criticals fail clean targeted PRs — e.g. a README-only PR failed by a pre-existing `TS18003: frontend/src/ missing` build break it neither introduced nor can fix. Make an explicit verifier signal sufficient on its own: `blocking is False`, `in_scope is False`, or a non-blocking `scope` value. The structured reason is kept as an additional corroborating signal, never a precondition. Behavior is unchanged for genuinely in-scope/blocking criticals (still block) and for full-scope runs (the exclusion stays targeted-mode only). Adds regression tests for: out-of-scope+flags-only critical passes a targeted PR, `blocking:false` alone passes, in-scope blocking critical still fails, and an out-of-scope critical still fails a full-scope run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PDD-Auto-Heal-Checkpoint: success
…hestrator" This reverts commit 2b448e0.
PDD-Auto-Heal-Checkpoint: success
PDD-Auto-Heal-Checkpoint: success
|
Jun 18 |
…#1574 review) P2: the Step 7 gate over-corrected #1574. `_step7_nonblocking_reason` matched a generic `reason` key, and `_step7_unfixed_critical_blocks_pr` treated any such reason as a standalone non-blocking signal. An in-scope PR critical (e.g. scope="pr-diff", in_scope=true) carrying ordinary explanatory text like reason="introduced token leak" would then bypass the critical gate. Narrow the helper to only explicitly non-blocking reason fields (non_blocking_reason / out_of_scope_reason / scope_reason); a generic `reason` is no longer a bypass. Adds a regression test. P3: restore the truncated executable tail of the orchestrator example — patch the per-step LLM runner, call run_agentic_checkup_orchestrator offline (use_github_state=False, no_fix=True), print the result, and add the `if __name__ == "__main__": main()` guard so the example runs again. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CodeQL flagged the example's `print(f"...{message/cost/model}")` calls as
clear-text logging of potentially sensitive information: the orchestrator
return tuple is taint-tracked, and `message` in particular can echo repo
output or other run context. Stop echoing those values — unpack them as
unused (`_message`/`_cost`/`_model`) and print only a static pass/fail
summary derived from the boolean result. A comment documents routing the
real values through a logging/redaction layer instead of stdout.
Verified the example still runs end-to-end (exit 0).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Required changes before merge. The #1574 fix is needed, but this PR is currently broader than the issue requires and creates a fail-open path in the Step 7 gate.
The issue is specifically targeted PR mode: a README-only targeted run was blocked by a baseline critical that Step 7 marked That means this now passes even when the critical is in the PR diff and there is no out-of-scope signal: issue = {
"module": "auth",
"file": "auth.py",
"severity": "critical",
"fixed": False,
"blocking": False,
"description": "PR-introduced token leak",
}
payload = {"success": True, "issue_aligned": True, "issues": [issue], "changed_files": ["auth.py"]}
_step7_passed("report\n" + json.dumps(payload), pr_mode=True, has_issue=True, pr_test_scope="targeted")
# current PR: (True, "")The Step 7 verify prompt still says full PR mode is the comprehensive gate, and only targeted PR scope gets the explicit unrelated baseline/project-wide carveout. Please either:
Please add a regression for a changed-file critical with
If the example is meant to be offline, it should patch/suppress the comment-posting helpers as well ( CI and the focused tests are green, but these two points should be fixed before this merges. |
…1574 review) The broadened gate let `blocking: false` alone wave through an unfixed critical in any PR mode — even when the finding is in the PR diff with no out-of-scope signal (e.g. a PR-introduced token leak in a changed file). That is a fail-open path. Tighten `_step7_unfixed_critical_blocks_pr`: an unfixed critical is now waved through only when the verifier marks it non-blocking AND it is demonstrably out of PR scope. A PR-scoped critical — tagged with a PR/in-scope `scope`, or whose file/module overlaps `changed_files` (new `_step7_finding_in_pr_diff` helper, ground-truth from the verifier's own changed_files) — always blocks, regardless of any non-blocking flag. The #1574 carveout (pre-existing, out-of-scope baseline criticals must not fail a clean PR) is preserved. Adds regressions: a changed-file critical with `blocking: false` and no out-of-scope signal still blocks, in both targeted and full PR scope. Also make the orchestrator example genuinely offline: stub the comment-posting helpers (post_step_comment_once / post_step_comment / post_pr_comment) and the steer-cursor seed so the run no longer attempts GitHub API calls against the fake repo (previously emitted "Failed to post step comment" / "Could not resolve to a Repository" warnings). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1574 review) Two review follow-ups on the Step 7 PR gate: - `_step7_finding_in_pr_diff` matched the coarse `module` label against changed files with bidirectional directory-prefix containment, so any changed file living *under* a finding's module dir (e.g. a doc edit `frontend/README.md` for `module: "frontend"`) re-blocked a pre-existing, out-of-scope baseline critical the PR neither introduced nor can fix. The `module` label is now matched only on an exact changed-path hit; the precise `file` field keeps exact + directory-prefix matching as the fail-closed overlap signal. Regression test verified to fail pre-fix. - Restore the trailing newline on the orchestrator example. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…scope signal (#1574 review) Address the pre-merge review: the gate was broader than #1574 needs and had a fail-open path. - Scope the non-blocking-critical carveout to **targeted** PR mode only (restore the `pr_test_scope == "targeted"` gate; rename the helper back to `_step7_unfixed_critical_blocks_targeted_pr`). Full PR mode is the comprehensive gate again — every unfixed critical blocks — matching the Step 7 prompt contract. - Require a **positive out-of-scope signal** to wave a targeted critical through: a non-blocking `scope`/`verification_scope`/`pr_scope` value, `in_scope: false`, or an explicit non-blocking `*_reason`. A bare `blocking: false` is no longer authoritative on its own — it is a self-reported severity downgrade, not a scope claim, so trusting it alone failed open on PR-introduced criticals. The fail-closed diff/PR-scope checks are unchanged. - Update the orchestrator prompt to state the targeted-only + positive-signal contract. Tests: invert the now-blocking cases (targeted `blocking: false`-alone, full-mode out-of-scope) and keep the fail-closed regressions (changed-file `blocking: false` still blocks). The real #1574 case (out-of-scope scope on a targeted run) still passes. 235 orchestrator + PR-mode tests green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
gltanaka
left a comment
There was a problem hiding this comment.
Requesting one more change before merge: please add true end-to-end verification evidence for the updated gate.
The unit/regression tests cover the helper behavior well, and the mocked offline example now runs cleanly, but this PR changes the top-level pdd checkup --pr pass/fail gate. The important risk is in the full runtime path: PR metadata/worktree setup, Step 7 output generation/parsing, _step7_passed, final gate reporting, and push/no-push behavior.
Please run and document a real E2E/staging check using the actual CLI against a PR-shaped repo, not only a mocked orchestrator example. Minimum useful evidence:
- command used, with secrets redacted
- PR/test scope used (
--test-scope targetedfor the #1574 case) - Step 7 JSON showing an unfixed critical marked out-of-scope/non-blocking without a
*_reason - final gate/result showing the targeted #1574 case passes
- a negative E2E or CLI-level check showing a PR-scoped/changed-file critical still blocks, or clear evidence that the new regression test covers this exact final gate path
The original failure was observed in staging, so a rerun of that reproduction or an equivalent staging PR would be the best evidence.
…out-of-scope-critical-gate # Conflicts: # architecture.json
Fixes #1574.
Problem
In PR mode the Step 7 gate (
_step7_passed→_step7_unfixed_critical_blocks_targeted_pr) failed a PR on an unfixedcriticalfinding the verifier itself tagged out-of-scope / non-blocking, whenever the finding lacked a free-text*_reasonfield. Each exclusion branch was gated on... and reason(_step7_nonblocking_reason(issue)non-empty), but the Step 7 verify prompt emitsblocking: false/scope: out-of-scopewithout a separate reason — so the guard never fired.Real-world impact: a README-only PR was failed by a pre-existing, repo-wide
TS18003: frontend/src/ missingTypeScript build break it neither introduced nor can fix. Step 7's own verdict saidsuccess: true,issue_aligned: true, "PR-scoped verification: all clear", with the finding taggedscope: out-of-scope,blocking: false— yet the gate returnedStep 7 reported unfixed critical issuesand the Final Gate reportedlayer1_status: failed.Fix
Treat an explicit verifier signal as authoritative on its own —
blocking is False,in_scope is False, or a non-blockingscopevalue each independently mark an unfixed critical as non-blocking. The free-text reason is corroboration, never a precondition.The gate is also broadened from targeted-only to all PR modes:
_step7_unfixed_critical_blocks_targeted_pris renamed to_step7_unfixed_critical_blocks_prand_step7_passednow applies it wheneverpr_modeis set (no longer justpr_test_scope == "targeted"). The final Step 7 verdict owns the PR-introduced-vs-baseline distinction even when a full suite ran locally, so an explicitly non-blocking critical is informational in both modes; everything else still blocks.Reviewer follow-ups addressed
reasonno longer bypasses in-scope criticals._step7_nonblocking_reasonpreviously matched a genericreasonkey, so an in-scope PR critical carrying ordinary explanatory text (e.g.scope: "pr-diff"+reason: "introduced token leak") could slip through. The helper now honors only explicitly non-blocking reason fields (non_blocking_reason/out_of_scope_reason/scope_reason); a genericreasoncannot bypass the gate. Covered by a regression test verified to fail pre-fix.context/agentic_checkup_orchestrator_example.pyhad been truncated mid-main()and no longer invoked the orchestrator. It now patches the per-step LLM runner, callsrun_agentic_checkup_orchestrator(...)offline (use_github_state=False,no_fix=True) against the mock repo, and restores theif __name__ == "__main__": main()guard. Verified to run end-to-end (exit 0).message/cost/modelas clear text (the return tuple is taint-tracked;messagecan echo repo output). Those values are no longer echoed — the example prints only a static pass/fail summary derived from the boolean result, with a comment pointing to routing real values through a logging/redaction layer.Behavior preserved
Tests
Gate regression coverage in
tests/test_agentic_checkup_orchestrator.pyandtests/test_checkup_pr_mode.py:*_reason) → PR passes (targeted and full scope)blocking: falseflag alone → PR passesblocking: truecritical → still failsreason→ still fails (P2 regression)The gate and
test_checkup_pr_mode.pysuites pass, and the exact staging reproduction from #1574 now returns(True, "").🤖 Generated with Claude Code