Skip to content

fix(checkup): honor out-of-scope/non-blocking flags in Step 7 targeted-PR gate (#1574)#1575

Open
DianaTao wants to merge 18 commits into
mainfrom
fix/issue-1574-step7-out-of-scope-critical-gate
Open

fix(checkup): honor out-of-scope/non-blocking flags in Step 7 targeted-PR gate (#1574)#1575
DianaTao wants to merge 18 commits into
mainfrom
fix/issue-1574-step7-out-of-scope-critical-gate

Conversation

@DianaTao

@DianaTao DianaTao commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Fixes #1574.

Problem

In PR mode the Step 7 gate (_step7_passed_step7_unfixed_critical_blocks_targeted_pr) failed a PR on an unfixed critical finding the verifier itself tagged out-of-scope / non-blocking, whenever the finding lacked a free-text *_reason field. Each exclusion branch was gated on ... and reason (_step7_nonblocking_reason(issue) non-empty), but the Step 7 verify prompt emits blocking: false / scope: out-of-scope without a separate reason — so the guard never fired.

Real-world impact: a README-only PR was failed by a pre-existing, repo-wide TS18003: frontend/src/ missing TypeScript build break it neither introduced nor can fix. Step 7's own verdict said success: true, issue_aligned: true, "PR-scoped verification: all clear", with the finding tagged scope: out-of-scope, blocking: false — yet the gate returned Step 7 reported unfixed critical issues and the Final Gate reported layer1_status: failed.

Fix

Treat an explicit verifier signal as authoritative on its ownblocking is False, in_scope is False, or a non-blocking scope value each independently mark an unfixed critical as non-blocking. The free-text reason is corroboration, never a precondition.

The gate is also broadened from targeted-only to all PR modes: _step7_unfixed_critical_blocks_targeted_pr is renamed to _step7_unfixed_critical_blocks_pr and _step7_passed now applies it whenever pr_mode is set (no longer just pr_test_scope == "targeted"). The final Step 7 verdict owns the PR-introduced-vs-baseline distinction even when a full suite ran locally, so an explicitly non-blocking critical is informational in both modes; everything else still blocks.

Reviewer follow-ups addressed

  • [P2] Generic reason no longer bypasses in-scope criticals. _step7_nonblocking_reason previously matched a generic reason key, so an in-scope PR critical carrying ordinary explanatory text (e.g. scope: "pr-diff" + reason: "introduced token leak") could slip through. The helper now honors only explicitly non-blocking reason fields (non_blocking_reason / out_of_scope_reason / scope_reason); a generic reason cannot bypass the gate. Covered by a regression test verified to fail pre-fix.
  • [P3] Restored the runnable orchestrator example. context/agentic_checkup_orchestrator_example.py had been truncated mid-main() and no longer invoked the orchestrator. It now patches the per-step LLM runner, calls run_agentic_checkup_orchestrator(...) offline (use_github_state=False, no_fix=True) against the mock repo, and restores the if __name__ == "__main__": main() guard. Verified to run end-to-end (exit 0).
  • [Security] No clear-text logging in the example. CodeQL flagged the example for printing the orchestrator's returned message / cost / model as clear text (the return tuple is taint-tracked; message can echo repo output). Those values are no longer echoed — the example prints only a static pass/fail summary derived from the boolean result, with a comment pointing to routing real values through a logging/redaction layer.

Behavior preserved

  • A genuinely in-scope / blocking unfixed critical still fails the gate (targeted and full scope).
  • A critical with no non-blocking signal (no flag, scope, or explicit non-blocking reason) still blocks.

Tests

Gate regression coverage in tests/test_agentic_checkup_orchestrator.py and tests/test_checkup_pr_mode.py:

  • out-of-scope critical with flags only (no *_reason) → PR passes (targeted and full scope)
  • blocking: false flag alone → PR passes
  • in-scope blocking: true critical → still fails
  • in-scope critical with only a generic reason → still fails (P2 regression)
  • changed-file critical → still fails

The gate and test_checkup_pr_mode.py suites pass, and the exact staging reproduction from #1574 now returns (True, "").

🤖 Generated with Claude Code

DianaTao and others added 2 commits June 13, 2026 02:06
…7 targeted-PR gate (#1574)

`_step7_unfixed_critical_blocks_targeted_pr` only excluded an unfixed critical
finding when it carried a free-text `*_reason` field (`non_blocking_reason` /
`out_of_scope_reason` / `scope_reason` / `reason`) IN ADDITION to the
`blocking: false` / `scope: out-of-scope` flags. The Step 7 verify prompt emits
those flags without a separate reason, so `_step7_nonblocking_reason` returned ""
and the `... and reason` guard let pre-existing, out-of-scope criticals fail
clean targeted PRs — e.g. a README-only PR failed by a pre-existing
`TS18003: frontend/src/ missing` build break it neither introduced nor can fix.

Make an explicit verifier signal sufficient on its own: `blocking is False`,
`in_scope is False`, or a non-blocking `scope` value. The structured reason is
kept as an additional corroborating signal, never a precondition. Behavior is
unchanged for genuinely in-scope/blocking criticals (still block) and for
full-scope runs (the exclusion stays targeted-mode only).

Adds regression tests for: out-of-scope+flags-only critical passes a targeted PR,
`blocking:false` alone passes, in-scope blocking critical still fails, and an
out-of-scope critical still fails a full-scope run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
@DianaTao DianaTao self-assigned this Jun 15, 2026
@DianaTao

Copy link
Copy Markdown
Collaborator Author

Jun 18

…#1574 review)

P2: the Step 7 gate over-corrected #1574. `_step7_nonblocking_reason`
matched a generic `reason` key, and `_step7_unfixed_critical_blocks_pr`
treated any such reason as a standalone non-blocking signal. An in-scope
PR critical (e.g. scope="pr-diff", in_scope=true) carrying ordinary
explanatory text like reason="introduced token leak" would then bypass
the critical gate. Narrow the helper to only explicitly non-blocking
reason fields (non_blocking_reason / out_of_scope_reason / scope_reason);
a generic `reason` is no longer a bypass. Adds a regression test.

P3: restore the truncated executable tail of the orchestrator example —
patch the per-step LLM runner, call run_agentic_checkup_orchestrator
offline (use_github_state=False, no_fix=True), print the result, and add
the `if __name__ == "__main__": main()` guard so the example runs again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
Comment thread context/agentic_checkup_orchestrator_example.py Fixed
CodeQL flagged the example's `print(f"...{message/cost/model}")` calls as
clear-text logging of potentially sensitive information: the orchestrator
return tuple is taint-tracked, and `message` in particular can echo repo
output or other run context. Stop echoing those values — unpack them as
unused (`_message`/`_cost`/`_model`) and print only a static pass/fail
summary derived from the boolean result. A comment documents routing the
real values through a logging/redaction layer instead of stdout.

Verified the example still runs end-to-end (exit 0).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gltanaka

Copy link
Copy Markdown
Contributor

Required changes before merge.

The #1574 fix is needed, but this PR is currently broader than the issue requires and creates a fail-open path in the Step 7 gate.

  1. Keep the non-blocking critical carveout scoped to the contract that justifies it, or add stricter corroboration.

The issue is specifically targeted PR mode: a README-only targeted run was blocked by a baseline critical that Step 7 marked scope: "out-of-scope" and blocking: false. This PR removes the pr_test_scope distinction and makes blocking is False alone authoritative for every PR mode.

That means this now passes even when the critical is in the PR diff and there is no out-of-scope signal:

issue = {
    "module": "auth",
    "file": "auth.py",
    "severity": "critical",
    "fixed": False,
    "blocking": False,
    "description": "PR-introduced token leak",
}
payload = {"success": True, "issue_aligned": True, "issues": [issue], "changed_files": ["auth.py"]}
_step7_passed("report\n" + json.dumps(payload), pr_mode=True, has_issue=True, pr_test_scope="targeted")
# current PR: (True, "")

The Step 7 verify prompt still says full PR mode is the comprehensive gate, and only targeted PR scope gets the explicit unrelated baseline/project-wide carveout. Please either:

Please add a regression for a changed-file critical with blocking: false but no out-of-scope signal; it should still block.

  1. The restored example is not actually offline.

python context/agentic_checkup_orchestrator_example.py exits 0, but it still attempts GitHub issue comment calls despite use_github_state=False:

Warning: Failed to post step comment ... Could not resolve to a Repository with the name 'example-owner/example-repo'

If the example is meant to be offline, it should patch/suppress the comment-posting helpers as well (post_step_comment_once, post_step_comment, and any PR comment helper reachable in the selected mode), or otherwise avoid fake GitHub issue metadata that triggers remote calls.

CI and the focused tests are green, but these two points should be fixed before this merges.

DianaTao and others added 3 commits June 18, 2026 14:09
…1574 review)

The broadened gate let `blocking: false` alone wave through an unfixed
critical in any PR mode — even when the finding is in the PR diff with no
out-of-scope signal (e.g. a PR-introduced token leak in a changed file).
That is a fail-open path.

Tighten `_step7_unfixed_critical_blocks_pr`: an unfixed critical is now
waved through only when the verifier marks it non-blocking AND it is
demonstrably out of PR scope. A PR-scoped critical — tagged with a
PR/in-scope `scope`, or whose file/module overlaps `changed_files` (new
`_step7_finding_in_pr_diff` helper, ground-truth from the verifier's own
changed_files) — always blocks, regardless of any non-blocking flag. The
#1574 carveout (pre-existing, out-of-scope baseline criticals must not
fail a clean PR) is preserved.

Adds regressions: a changed-file critical with `blocking: false` and no
out-of-scope signal still blocks, in both targeted and full PR scope.

Also make the orchestrator example genuinely offline: stub the
comment-posting helpers (post_step_comment_once / post_step_comment /
post_pr_comment) and the steer-cursor seed so the run no longer attempts
GitHub API calls against the fake repo (previously emitted "Failed to
post step comment" / "Could not resolve to a Repository" warnings).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1574 review)

Two review follow-ups on the Step 7 PR gate:

- `_step7_finding_in_pr_diff` matched the coarse `module` label against
  changed files with bidirectional directory-prefix containment, so any
  changed file living *under* a finding's module dir (e.g. a doc edit
  `frontend/README.md` for `module: "frontend"`) re-blocked a pre-existing,
  out-of-scope baseline critical the PR neither introduced nor can fix. The
  `module` label is now matched only on an exact changed-path hit; the precise
  `file` field keeps exact + directory-prefix matching as the fail-closed
  overlap signal. Regression test verified to fail pre-fix.

- Restore the trailing newline on the orchestrator example.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…scope signal (#1574 review)

Address the pre-merge review: the gate was broader than #1574 needs and had a
fail-open path.

- Scope the non-blocking-critical carveout to **targeted** PR mode only
  (restore the `pr_test_scope == "targeted"` gate; rename the helper back to
  `_step7_unfixed_critical_blocks_targeted_pr`). Full PR mode is the
  comprehensive gate again — every unfixed critical blocks — matching the
  Step 7 prompt contract.

- Require a **positive out-of-scope signal** to wave a targeted critical
  through: a non-blocking `scope`/`verification_scope`/`pr_scope` value,
  `in_scope: false`, or an explicit non-blocking `*_reason`. A bare
  `blocking: false` is no longer authoritative on its own — it is a
  self-reported severity downgrade, not a scope claim, so trusting it alone
  failed open on PR-introduced criticals. The fail-closed diff/PR-scope checks
  are unchanged.

- Update the orchestrator prompt to state the targeted-only + positive-signal
  contract.

Tests: invert the now-blocking cases (targeted `blocking: false`-alone,
full-mode out-of-scope) and keep the fail-closed regressions (changed-file
`blocking: false` still blocks). The real #1574 case (out-of-scope scope on a
targeted run) still passes. 235 orchestrator + PR-mode tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@gltanaka gltanaka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting one more change before merge: please add true end-to-end verification evidence for the updated gate.

The unit/regression tests cover the helper behavior well, and the mocked offline example now runs cleanly, but this PR changes the top-level pdd checkup --pr pass/fail gate. The important risk is in the full runtime path: PR metadata/worktree setup, Step 7 output generation/parsing, _step7_passed, final gate reporting, and push/no-push behavior.

Please run and document a real E2E/staging check using the actual CLI against a PR-shaped repo, not only a mocked orchestrator example. Minimum useful evidence:

  • command used, with secrets redacted
  • PR/test scope used (--test-scope targeted for the #1574 case)
  • Step 7 JSON showing an unfixed critical marked out-of-scope/non-blocking without a *_reason
  • final gate/result showing the targeted #1574 case passes
  • a negative E2E or CLI-level check showing a PR-scoped/changed-file critical still blocks, or clear evidence that the new regression test covers this exact final gate path

The original failure was observed in staging, so a rerun of that reproduction or an equivalent staging PR would be the best evidence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Checkup Step 7 gate blocks targeted PRs on out-of-scope/non-blocking criticals when no free-text reason is present

3 participants