fix(checkup): honor out-of-scope/non-blocking flags in Step 7 targeted-PR gate (#1574) by DianaTao · Pull Request #1575 · promptdriven/pdd

DianaTao · 2026-06-13T09:06:57Z

Problem

In PR mode the Step 7 gate (_step7_passed → _step7_unfixed_critical_blocks_targeted_pr) failed a PR on an unfixed critical finding the verifier itself tagged out-of-scope / non-blocking, whenever the finding lacked a free-text *_reason field. Each exclusion branch was gated on ... and reason (_step7_nonblocking_reason(issue) non-empty), but the Step 7 verify prompt emits blocking: false / scope: out-of-scope without a separate reason — so the guard never fired.

Real-world impact: a README-only PR was failed by a pre-existing, repo-wide TS18003: frontend/src/ missing TypeScript build break it neither introduced nor can fix. Step 7's own verdict said success: true, issue_aligned: true, "PR-scoped verification: all clear", with the finding tagged scope: out-of-scope, blocking: false — yet the gate returned Step 7 reported unfixed critical issues and the Final Gate reported layer1_status: failed.

Fix

Treat an explicit verifier signal as authoritative on its own — blocking is False, in_scope is False, or a non-blocking scope value each independently mark an unfixed critical as non-blocking. The free-text reason is corroboration, never a precondition.

The gate is also broadened from targeted-only to all PR modes: _step7_unfixed_critical_blocks_targeted_pr is renamed to _step7_unfixed_critical_blocks_pr and _step7_passed now applies it whenever pr_mode is set (no longer just pr_test_scope == "targeted"). The final Step 7 verdict owns the PR-introduced-vs-baseline distinction even when a full suite ran locally, so an explicitly non-blocking critical is informational in both modes; everything else still blocks.

Reviewer follow-ups addressed

[P2] Generic reason no longer bypasses in-scope criticals. _step7_nonblocking_reason previously matched a generic reason key, so an in-scope PR critical carrying ordinary explanatory text (e.g. scope: "pr-diff" + reason: "introduced token leak") could slip through. The helper now honors only explicitly non-blocking reason fields (non_blocking_reason / out_of_scope_reason / scope_reason); a generic reason cannot bypass the gate. Covered by a regression test verified to fail pre-fix.
[P3] Restored the runnable orchestrator example. context/agentic_checkup_orchestrator_example.py had been truncated mid-main() and no longer invoked the orchestrator. It now patches the per-step LLM runner, calls run_agentic_checkup_orchestrator(...) offline (use_github_state=False, no_fix=True) against the mock repo, and restores the if __name__ == "__main__": main() guard. Verified to run end-to-end (exit 0).
[Security] No clear-text logging in the example. CodeQL flagged the example for printing the orchestrator's returned message / cost / model as clear text (the return tuple is taint-tracked; message can echo repo output). Those values are no longer echoed — the example prints only a static pass/fail summary derived from the boolean result, with a comment pointing to routing real values through a logging/redaction layer.

Behavior preserved

A genuinely in-scope / blocking unfixed critical still fails the gate (targeted and full scope).
A critical with no non-blocking signal (no flag, scope, or explicit non-blocking reason) still blocks.

Tests

Gate regression coverage in tests/test_agentic_checkup_orchestrator.py and tests/test_checkup_pr_mode.py:

out-of-scope critical with flags only (no *_reason) → PR passes (targeted and full scope)
blocking: false flag alone → PR passes
in-scope blocking: true critical → still fails
in-scope critical with only a generic reason → still fails (P2 regression)
changed-file critical → still fails

The gate and test_checkup_pr_mode.py suites pass, and the exact staging reproduction from #1574 now returns (True, "").

🤖 Generated with Claude Code

…7 targeted-PR gate (#1574) `_step7_unfixed_critical_blocks_targeted_pr` only excluded an unfixed critical finding when it carried a free-text `*_reason` field (`non_blocking_reason` / `out_of_scope_reason` / `scope_reason` / `reason`) IN ADDITION to the `blocking: false` / `scope: out-of-scope` flags. The Step 7 verify prompt emits those flags without a separate reason, so `_step7_nonblocking_reason` returned "" and the `... and reason` guard let pre-existing, out-of-scope criticals fail clean targeted PRs — e.g. a README-only PR failed by a pre-existing `TS18003: frontend/src/ missing` build break it neither introduced nor can fix. Make an explicit verifier signal sufficient on its own: `blocking is False`, `in_scope is False`, or a non-blocking `scope` value. The structured reason is kept as an additional corroborating signal, never a precondition. Behavior is unchanged for genuinely in-scope/blocking criticals (still block) and for full-scope runs (the exclusion stays targeted-mode only). Adds regression tests for: out-of-scope+flags-only critical passes a targeted PR, `blocking:false` alone passes, in-scope blocking critical still fails, and an out-of-scope critical still fails a full-scope run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

PDD-Auto-Heal-Checkpoint: success

…hestrator" This reverts commit 2b448e0.

PDD-Auto-Heal-Checkpoint: success

DianaTao · 2026-06-18T18:05:25Z

Jun 18

…#1574 review) P2: the Step 7 gate over-corrected #1574. `_step7_nonblocking_reason` matched a generic `reason` key, and `_step7_unfixed_critical_blocks_pr` treated any such reason as a standalone non-blocking signal. An in-scope PR critical (e.g. scope="pr-diff", in_scope=true) carrying ordinary explanatory text like reason="introduced token leak" would then bypass the critical gate. Narrow the helper to only explicitly non-blocking reason fields (non_blocking_reason / out_of_scope_reason / scope_reason); a generic `reason` is no longer a bypass. Adds a regression test. P3: restore the truncated executable tail of the orchestrator example — patch the per-step LLM runner, call run_agentic_checkup_orchestrator offline (use_github_state=False, no_fix=True), print the result, and add the `if __name__ == "__main__": main()` guard so the example runs again. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

CodeQL flagged the example's `print(f"...{message/cost/model}")` calls as clear-text logging of potentially sensitive information: the orchestrator return tuple is taint-tracked, and `message` in particular can echo repo output or other run context. Stop echoing those values — unpack them as unused (`_message`/`_cost`/`_model`) and print only a static pass/fail summary derived from the boolean result. A comment documents routing the real values through a logging/redaction layer instead of stdout. Verified the example still runs end-to-end (exit 0). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gltanaka · 2026-06-18T19:34:12Z

Required changes before merge.

The #1574 fix is needed, but this PR is currently broader than the issue requires and creates a fail-open path in the Step 7 gate.

Keep the non-blocking critical carveout scoped to the contract that justifies it, or add stricter corroboration.

The issue is specifically targeted PR mode: a README-only targeted run was blocked by a baseline critical that Step 7 marked scope: "out-of-scope" and blocking: false. This PR removes the pr_test_scope distinction and makes blocking is False alone authoritative for every PR mode.

That means this now passes even when the critical is in the PR diff and there is no out-of-scope signal:

issue = {
    "module": "auth",
    "file": "auth.py",
    "severity": "critical",
    "fixed": False,
    "blocking": False,
    "description": "PR-introduced token leak",
}
payload = {"success": True, "issue_aligned": True, "issues": [issue], "changed_files": ["auth.py"]}
_step7_passed("report\n" + json.dumps(payload), pr_mode=True, has_issue=True, pr_test_scope="targeted")
# current PR: (True, "")

The Step 7 verify prompt still says full PR mode is the comprehensive gate, and only targeted PR scope gets the explicit unrelated baseline/project-wide carveout. Please either:

keep the Checkup Step 7 gate blocks targeted PRs on out-of-scope/non-blocking criticals when no free-text reason is present #1574 behavior targeted-only, or
require the bypass to be corroborated by an out-of-scope signal (scope / verification_scope / pr_scope, in_scope: false, or an explicit non-blocking reason) and fail closed when the finding's file/module touches changed_files or the scope is PR/in-scope.

Please add a regression for a changed-file critical with blocking: false but no out-of-scope signal; it should still block.

The restored example is not actually offline.

python context/agentic_checkup_orchestrator_example.py exits 0, but it still attempts GitHub issue comment calls despite use_github_state=False:

Warning: Failed to post step comment ... Could not resolve to a Repository with the name 'example-owner/example-repo'

If the example is meant to be offline, it should patch/suppress the comment-posting helpers as well (post_step_comment_once, post_step_comment, and any PR comment helper reachable in the selected mode), or otherwise avoid fake GitHub issue metadata that triggers remote calls.

CI and the focused tests are green, but these two points should be fixed before this merges.

…1574 review) The broadened gate let `blocking: false` alone wave through an unfixed critical in any PR mode — even when the finding is in the PR diff with no out-of-scope signal (e.g. a PR-introduced token leak in a changed file). That is a fail-open path. Tighten `_step7_unfixed_critical_blocks_pr`: an unfixed critical is now waved through only when the verifier marks it non-blocking AND it is demonstrably out of PR scope. A PR-scoped critical — tagged with a PR/in-scope `scope`, or whose file/module overlaps `changed_files` (new `_step7_finding_in_pr_diff` helper, ground-truth from the verifier's own changed_files) — always blocks, regardless of any non-blocking flag. The #1574 carveout (pre-existing, out-of-scope baseline criticals must not fail a clean PR) is preserved. Adds regressions: a changed-file critical with `blocking: false` and no out-of-scope signal still blocks, in both targeted and full PR scope. Also make the orchestrator example genuinely offline: stub the comment-posting helpers (post_step_comment_once / post_step_comment / post_pr_comment) and the steer-cursor seed so the run no longer attempts GitHub API calls against the fake repo (previously emitted "Failed to post step comment" / "Could not resolve to a Repository" warnings). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…1574 review) Two review follow-ups on the Step 7 PR gate: - `_step7_finding_in_pr_diff` matched the coarse `module` label against changed files with bidirectional directory-prefix containment, so any changed file living *under* a finding's module dir (e.g. a doc edit `frontend/README.md` for `module: "frontend"`) re-blocked a pre-existing, out-of-scope baseline critical the PR neither introduced nor can fix. The `module` label is now matched only on an exact changed-path hit; the precise `file` field keeps exact + directory-prefix matching as the fail-closed overlap signal. Regression test verified to fail pre-fix. - Restore the trailing newline on the orchestrator example. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…scope signal (#1574 review) Address the pre-merge review: the gate was broader than #1574 needs and had a fail-open path. - Scope the non-blocking-critical carveout to **targeted** PR mode only (restore the `pr_test_scope == "targeted"` gate; rename the helper back to `_step7_unfixed_critical_blocks_targeted_pr`). Full PR mode is the comprehensive gate again — every unfixed critical blocks — matching the Step 7 prompt contract. - Require a **positive out-of-scope signal** to wave a targeted critical through: a non-blocking `scope`/`verification_scope`/`pr_scope` value, `in_scope: false`, or an explicit non-blocking `*_reason`. A bare `blocking: false` is no longer authoritative on its own — it is a self-reported severity downgrade, not a scope claim, so trusting it alone failed open on PR-introduced criticals. The fail-closed diff/PR-scope checks are unchanged. - Update the orchestrator prompt to state the targeted-only + positive-signal contract. Tests: invert the now-blocking cases (targeted `blocking: false`-alone, full-mode out-of-scope) and keep the fail-closed regressions (changed-file `blocking: false` still blocks). The real #1574 case (out-of-scope scope on a targeted run) still passes. 235 orchestrator + PR-mode tests green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gltanaka

Requesting one more change before merge: please add true end-to-end verification evidence for the updated gate.

The unit/regression tests cover the helper behavior well, and the mocked offline example now runs cleanly, but this PR changes the top-level pdd checkup --pr pass/fail gate. The important risk is in the full runtime path: PR metadata/worktree setup, Step 7 output generation/parsing, _step7_passed, final gate reporting, and push/no-push behavior.

Please run and document a real E2E/staging check using the actual CLI against a PR-shaped repo, not only a mocked orchestrator example. Minimum useful evidence:

command used, with secrets redacted
PR/test scope used (--test-scope targeted for the #1574 case)
Step 7 JSON showing an unfixed critical marked out-of-scope/non-blocking without a *_reason
final gate/result showing the targeted #1574 case passes
a negative E2E or CLI-level check showing a PR-scoped/changed-file critical still blocks, or clear evidence that the new regression test covers this exact final gate path

The original failure was observed in staging, so a rerun of that reproduction or an equivalent staging PR would be the best evidence.

…out-of-scope-critical-gate # Conflicts: # architecture.json

DianaTao and others added 2 commits June 13, 2026 02:06

chore: auto-heal prompt/example drift for agentic_checkup_orchestrator

2b448e0

PDD-Auto-Heal-Checkpoint: success

github-advanced-security AI found potential problems Jun 13, 2026

View reviewed changes

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

DianaTao and others added 2 commits June 13, 2026 02:23

Revert "chore: auto-heal prompt/example drift for agentic_checkup_orc…

c70c77b

…hestrator" This reverts commit 2b448e0.

chore: auto-heal prompt/example drift for agentic_checkup_orchestrator

9821322

PDD-Auto-Heal-Checkpoint: success

github-advanced-security AI found potential problems Jun 13, 2026

View reviewed changes

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

This was referenced Jun 13, 2026

checkup example logs the orchestrator return in clear text (CodeQL: clear-text logging of sensitive information) #1576

Open

fix(checkup): stop clear-text logging of orchestrator return in example (#1576) #1577

Open

DianaTao and others added 5 commits June 13, 2026 10:58

Fix checkup Step 7 no-reason baseline gate

ee351d6

chore: auto-heal prompt/example drift for agentic_checkup_orchestrator

1046cc4

PDD-Auto-Heal-Checkpoint: success

Document Step 7 nonblocking critical gate

37b2f8c

Honor nonblocking Step 7 criticals in PR mode

86f3ec5

Merge branch 'main' into fix/issue-1574-step7-out-of-scope-critical-gate

410de3c

DianaTao self-assigned this Jun 15, 2026

DianaTao and others added 2 commits June 15, 2026 11:45

Fix Step 7 full-scope nonblocking critical test

42ed615

Merge branch 'main' into fix/issue-1574-step7-out-of-scope-critical-gate

6767042

github-advanced-security AI found potential problems Jun 18, 2026

View reviewed changes

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

Comment thread context/agentic_checkup_orchestrator_example.py Fixed

DianaTao and others added 3 commits June 18, 2026 14:09

gltanaka requested changes Jun 19, 2026

View reviewed changes

DianaTao added 2 commits June 18, 2026 19:16

Invalidate PR checkup cache when test scope changes

d845926

Merge remote-tracking branch 'origin/main' into fix/issue-1574-step7-…

2cc9b70

…out-of-scope-critical-gate # Conflicts: # architecture.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(checkup): honor out-of-scope/non-blocking flags in Step 7 targeted-PR gate (#1574)#1575

fix(checkup): honor out-of-scope/non-blocking flags in Step 7 targeted-PR gate (#1574)#1575
DianaTao wants to merge 18 commits into
mainfrom
fix/issue-1574-step7-out-of-scope-critical-gate

DianaTao commented Jun 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DianaTao commented Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gltanaka commented Jun 18, 2026

Uh oh!

gltanaka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DianaTao commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Reviewer follow-ups addressed

Behavior preserved

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DianaTao commented Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gltanaka commented Jun 18, 2026

Uh oh!

gltanaka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DianaTao commented Jun 13, 2026 •

edited

Loading