feat(adversarial-review): make self-collect path multi-turn so it can actually work by kingdoooo · Pull Request #328 · openai/codex-plugin-cc

kingdoooo · 2026-05-18T01:22:30Z

Summary

The self-collect path for /codex:adversarial-review has been broken since
it was introduced: a single turn pinned outputSchema, which forced every
agent emission to validate against the final-output schema, blocking the
shell investigation the prompt asked Codex to perform. Restructure the
path as a two-phase flow on the same Codex thread — a free-form
investigation phase with no schema, followed by a schema-enforced
finalization turn — so the contract is internally consistent and
large-diff reviews actually produce structured output. Codex's read-only
sandbox already permits the shell commands the investigation needs; no
new tool registration is required.

What changed

runAppServerInvestigation in lib/codex.mjs orchestrates a recon loop (no schema, free-form tool use) until convergence (finalAnswerSeen && commandCount === 0) or the turn cap (default 10), then runs a single finalize turn with the review schema set.
executeReviewRun branches on context.inputMode === "self-collect" to call the new orchestrator. Inline-diff payloads stay byte-identical.
New investigate/finalize prompt templates and matching builders.
New CLI flag --max-investigation-turns N with positive-integer validation.
Renderer prints Investigation truncated at N turns; findings may be shallow. Use --max-investigation-turns to raise the cap. on all three render paths (parse-error, validation-error, success) when the cap was hit.
Status normalization: runAppServerInvestigation always returns numeric status so payload.codex.status shape stays consistent with runAppServerTurn.
Queue-driven fake-codex harness and 22 new tests covering convergence, truncation, soft/hard error abort on phase 1, finalize-turn transport error, outputSchema discipline, banner placement, CLI flag propagation, end-to-end self-collect, inline regression guard.

Test plan

npm test — 109/109 passing
All new test cases run hermetically (each fixture creates and cleans up its own temp git repo)
Inline-diff path payload remains byte-identical (regression test added)
Self-collect e2e subprocess test exercises companion CLI → orchestrator → fake-codex round trip
Manual smoke test on a real branch with >2 changed files (covered by spec; reviewer's discretion)
Manual smoke test of inline path (1-2 file diff) to confirm no regression

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address review findings on the queue-driven harness: - close() now splices its own binDir out of PATH so handles can be closed in any order - Collapse redundant saveState calls in the queue-driven branch - Document the requests getter's per-access I/O cost Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address review finding: the soft-error test only checked truthiness of result.error. The fixture queues a deterministic message, so match against it explicitly to catch regressions where the error gets swallowed or replaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…gation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rtion The argument-hint assertion in commands.test.mjs hard-coded a contiguous match of "[--scope ...] [focus ...]". Splitting that into separate matches lets the new flag sit between the two without breaking the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The two transport-error returns in runAppServerInvestigation were emitting { code: 1, kind: "error" } as `status`, while the happy path and runAppServerTurn always return a number. Downstream payload.codex.status inherited this object on error paths but a number elsewhere. Normalizing the error returns to `status: 1` makes the public shape consistent and removes the need for the defensive typeof check at the executeReviewRun seam. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a unit test that runs both recon turns to convergence, then queues a transport error on the finalize turn. Verifies that the investigation metadata (turnCount=2, truncated=false) is preserved even when finalize fails, and that the numeric status contract holds. Also tightens the existing phase-1 hard-error assertion with the status check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extend the truncation banner to tell the user how to raise the cap. The banner only fires when investigation hit the turn cap, which is exactly when the flag is actionable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6992596928

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T01:28:34Z

+    const parsedTurns = Number.parseInt(rawMaxTurns, 10);
+    if (!Number.isFinite(parsedTurns) || parsedTurns <= 0) {


Reject non-integer max investigation turn values

The new --max-investigation-turns validation uses Number.parseInt, which silently accepts malformed values like 1.5 (becomes 1) or 10abc (becomes 10) even though the flag contract says this must be a positive integer. This can hide user typos and run significantly fewer or different investigation turns than requested, degrading review depth without any error signal. Please validate the full token (for example with a strict integer regex or Number(...) plus an integer check) before accepting it.

Useful? React with 👍 / 👎.

Fixed in d222542. The flag no longer uses `parseInt`; it validates the full token with a strict regex `/^[1-9][0-9]*$/` and throws on malformed input. Tests cover `1.5`, `10abc`, `1e2`, leading whitespace, `0`, negatives, and empty string — all rejected. (The same strict validation is applied to the new `--turn-idle-timeout` flag.)

Number.parseInt silently salvages "1.5" as 1, "10abc" as 10, " 5" as 5, and "1e2" via Number() as 100 — all of which violate the flag contract of "positive integer literal" and would silently change the depth of investigation without any error signal. Replace the parseInt-based check with a strict regex /^[1-9][0-9]*$/ so typos throw instead of degrading review depth. Reported by bot review on PR openai#328. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… message The legacy convergence check required `phase: "final_answer"` on the agent message, but recon turns run with outputSchema=null and the app-server does not always tag messages with that phase. Production runs hit the cap with the model emitting "My investigation is complete" on every turn — visible evidence the loop was ignoring the contract the investigate prompt teaches the model. Treat any 0-command turn that emits an agent message as convergence; this matches the prompt and unblocks the runaway-loop scenario. Test fixture: make finalAnswer.phase overridable so we can simulate the real-world case where recon agent messages lack the final_answer tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Production observation: even with a clean recon convergence, the model sometimes emits a tool-call stub (e.g. {"cmd": "wc -l README.md"}) during the finalize turn instead of the structured review JSON. The companion's parser then surfaces "Validation error: Missing string verdict" with the raw stub as the final message — a confusing failure for the user when the investigation produced real findings. Fix in two layers: 1. Tighten the finalize prompt: explicitly forbid running commands or emitting anything other than the schema-conforming JSON. 2. Detect the contract violation in the orchestrator: if the finalize turn's commandExecutions is non-empty, retry once with a STRICT FINALIZE preamble that reiterates the no-tool-call rule. After two attempts, surface whatever the model produced so the parser can still flag malformed output instead of looping forever. Test fixes: - Adapt `respects maxInvestigationTurns` so the 4th queued entry is the pure-JSON finalize turn (previous version queued 4 cmd-only entries before finalize, accidentally relying on the old leniency). - Add coverage for the new retry path: success-on-second-attempt and give-up-after-two-attempts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a7a90c7d73

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T07:34:35Z

+    for (const cmd of finalizeState.commandExecutions) {
+      aggregatedCommandExecutions.push(cmd);
+    }


Prevent duplicate finalize command aggregation

When both finalize attempts run commands (the retry-exhausted path), the second attempt is appended once inside the retry loop and then appended again unconditionally after the loop, so commandExecutions/fileChanges contain duplicates of the last finalize turn. This skews downstream telemetry/audit consumers (e.g., command counts and changed-file traces) exactly in the contract-violation scenario this code is trying to report, so the final append should avoid re-adding entries already recorded in-loop.

Useful? React with 👍 / 👎.

Fixed in d2157c2. The in-loop aggregation now lives inside the `finalizeAttempts < MAX_FINALIZE_ATTEMPTS` retry branch, so only superseded attempts are folded in there; the final attempt is aggregated exactly once after the loop. Previously the last attempt was appended both in-loop and post-loop, duplicating its command/file traces on the retry-exhausted path. The retry-exhausted test now asserts `commandExecutions` equals `["wc -l a", "wc -l b"]` — each attempt recorded once.

Production observation: a 2-file untracked review of two large markdown docs (~100KB total) hit the inline-diff path, embedded both files into a single-turn schema-pinned prompt, and the model responded with a tool-call stub `{"cmd": "wc -l ..."}` instead of the review JSON. The companion's parser then surfaced "Validation error: Missing string verdict" — the bug PR openai#328 thought it had fixed, except inline-diff is a separate code path from self-collect and was never touched. Two layers: 1. Lower the inline cap from 2 files to 1. The inline path embeds full file contents into a single turn with outputSchema pinned and has no recovery if the model wants to investigate before producing a verdict. Anything larger than 1 file falls through to the two-phase self-collect path which tolerates exploratory recon turns. 2. Tighten the inline prompt: explicitly forbid running shell commands or emitting tool-call stubs, with the `{"cmd": "..."}` anti-pattern named so the model cannot rationalize emitting one. Tests: - New git.test.mjs case locks 2 files → self-collect. - New investigation.test.mjs cases lock the prompt-contract for both the inline and finalize prompts (no shell commands, no tool-call stubs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 324836e24b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T07:49:56Z

+
+      if (turnState.error) {
+        return {
+          status: buildResultStatus(turnState),


Return non-zero status when investigation turn reports error

When an investigation turn emits a soft failure (turnState.error is set), this branch still derives status from buildResultStatus(turnState), which returns 0 whenever the turn ended as completed. In practice that makes /codex:adversarial-review exit successfully even though no valid review JSON was produced (only parseError), so CI/automation can treat a failed review run as success. This error path should force a non-zero status whenever turnState.error is present.

Useful? React with 👍 / 👎.

Fixed in d222542. `buildResultStatus` now returns 1 whenever `turnState.error` is set, so all four call sites (review, turn, recon soft-error branch, finalize) report a non-zero status when an error was observed — even if the turn ended `completed`. Covered by the soft-error test asserting `result.status === 1`.

When captureTurn records an error notification but the app-server still emits turn/completed with status="completed", buildResultStatus was returning 0. That made /codex:adversarial-review exit 0 even though no valid review JSON was produced (only parseError), so CI/automation could treat a failed run as success. Honor turnState.error in buildResultStatus so all four call sites (review, turn, investigation soft-error branch, finalize) report a non-zero status whenever an error was observed. Strengthen the existing soft-error test to assert result.status === 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kingdoooo · 2026-05-23T04:47:21Z

Re: P1 — Return non-zero status when investigation turn reports error

Addressed in d222542. Centralized the fix in buildResultStatus so all four call sites (review, turn, investigation soft-error branch, finalize) honor turnState.error:

 function buildResultStatus(turnState) {
+  if (turnState.error) {
+    return 1;
+  }
   return turnState.finalTurn?.status === "completed" ? 0 : 1;
 }

End-to-end chain confirmed: result.status → executeAdversarialReview returns exitStatus → runForegroundCommand sets process.exitCode = execution.exitStatus (codex-companion.mjs:478, :674). So a soft error now propagates all the way to the shell exit code.

Regression test (`tests/investigation.test.mjs:213`)

Strengthened the existing phase-1 soft error (turn/failed) aborts before phase-2 finalize test with:

assert.equal(result.status, 1, "soft-error path returns numeric status 1");

The fake harness (tests/fake-codex-fixture.mjs:404-412) already reproduces the production sequence — it sends turn/completed with status:"completed" even after the error notification. So this test directly covers the case the reviewer described.

Proof the test isn't a no-op

Reverted just the buildResultStatus body to the old logic and re-ran the same test:

$ node --test --test-name-pattern="phase-1 soft error" tests/investigation.test.mjs
not ok 1 - phase-1 soft error (turn/failed) aborts before phase-2 finalize
    soft-error path returns numeric status 1
# pass 0
# fail 1

Restored the fix:

$ node --test --test-name-pattern="phase-1 soft error" tests/investigation.test.mjs
ok 1 - phase-1 soft error (turn/failed) aborts before phase-2 finalize
# pass 1 / fail 0

Full investigation suite: 24/24 pass.

…alls Three correctness fixes to the multi-turn self-collect investigation path, all surfaced by an unstable upstream connection (Reconnecting... storms): 1. converge-before-abort: the app-server multiplexes transient retry notices onto the same `error` notification channel as fatal failures. The recon loop aborted on ANY `turnState.error`, so a reconnect that recovered (and produced an agent message + completed turn) skipped the finalize turn and handed raw investigation prose to the JSON parser. Now abort only when the turn produced no usable output (`turnState.error && !turnRecovered`). 2. turn-level idle timeout: `client.request()` only settles on a response or a full socket close, so a half-dead reconnecting link left `turn/start` pending forever (observed as a permanent hang at "Investigation turn N"). captureTurn now races an idle watchdog against both the start request and the completion promise, re-armed on every progress notification so a slow-but-healthy turn is never killed. On timeout it fires a best-effort, non-blocking turn/interrupt and rejects. Default 180s, configurable via the new `--turn-idle-timeout SECONDS` flag. 3. failed run not parsed as JSON: when a turn fails (transport error/timeout) the leftover finalMessage is often just the model's opening prose line. parseStructuredOutput then surfaced a misleading "invalid JSON" parse error instead of the real failure reason. executeReviewRun now parses first and only reports failure when the run errored AND produced no usable structured output; renderReviewResult gained a "Codex could not complete the review" branch that states the real reason. A status-0 run returning malformed JSON still renders the existing parse-error path. Tests: queue-driven fake-codex gains queueTurnHang(); new coverage for the idle-timeout abort, the slow-but-healthy re-arm guard, transient-reconnect recovery at both recon and finalize boundaries, transport-failure rendering, the malformed-JSON regression guard, and --turn-idle-timeout validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 510e750018

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T09:42:22Z

    status: result.status,
    failureMessage: result.error?.message ?? result.stderr
  });
+  const parsed = (runErrored && !structured.parsed)


Normalize recovered structured reviews to success

When a finalize turn emits valid JSON after a transient reconnect/error notification, this new branch correctly avoids marking the payload as failed, but the returned execution still uses exitStatus: result.status below. Because buildResultStatus now returns 1 whenever result.error is set, the foreground command exits non-zero and background jobs are recorded as failed even though payload.result contains a valid verdict and payload.failed is false. In the recovered-with-valid-structured-output case handled here, compute and return an effective success status instead of propagating the stale transient error status.

Useful? React with 👍 / 👎.

Good catch — fixed in 7c68363. `executeReviewRun` now computes an effective exit status: `const exitStatus = (!payload.failed && parsed.parsed) ? 0 : result.status;`. When the finalize turn recovered and produced a valid verdict, the run exits 0 instead of propagating the stale transient error status, so the foreground command succeeds and background jobs are recorded as completed. A genuinely failed run still propagates its non-zero status. The finalize-recovery e2e test now asserts `result.status === 0`.

…d verdict Follow-up to the failed-run gate: executeReviewRun returned `exitStatus: result.status`, but buildResultStatus returns 1 whenever a transient reconnect set `result.error` — even when the finalize turn recovered and produced a valid structured verdict (payload.failed=false, payload.result populated). The foreground command then exited non-zero and background jobs were recorded as failed despite a successful review. Compute an effective exit status: 0 when we have a usable verdict and the run was not flagged failed, otherwise propagate the non-zero status. Extends the finalize-recovery e2e test to assert exit 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c68363aa6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T10:57:14Z

+This is a single-turn review. Do NOT run any shell commands or tool calls — the file contents you need are already embedded in the repository_context block below. Do not emit a tool-use stub like `{"cmd": "..."}` instead of the review; that is not the schema.
+Return only valid JSON matching the provided schema. Your entire output must be that JSON — no prose before or after, no shell commands, no tool-call payloads.


Route skipped single-file reviews to self-collect

When the working tree contains exactly one untracked file whose contents are skipped (for example >24 KiB, binary, or unreadable), collectReviewContext can still choose the inline path because its byte check only measures git diffs and the file count is now within the one-file cap; the prompt then tells Codex that the needed contents are embedded and forbids any shell commands. In that scenario the review can only see a (skipped: ...) marker and must approve or guess without inspecting the actual change, so skipped untracked single-file reviews should fall through to self-collect or otherwise allow read-only inspection.

Useful? React with 👍 / 👎.

When both finalize attempts ran commands (the retry-exhausted path), the last attempt's commandExecutions/fileChanges were appended in-loop AND again unconditionally after the loop, duplicating its command/file traces in exactly the contract-violation scenario this code reports. Move the in-loop aggregation inside the retry branch so only superseded attempts are folded in there; the final attempt is aggregated once after the loop. Extends the retry-exhausted test to assert each attempt's commands are recorded exactly once. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2157c2ca8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T11:26:08Z

+    result = await runAppServerTurn(context.repoRoot, {
+      prompt,
+      model: request.model,
+      sandbox: "read-only",
+      outputSchema: readOutputSchema(REVIEW_SCHEMA),
+      onProgress: request.onProgress


Wire idle timeout through inline reviews

For small single-file adversarial reviews that stay on the inline path, the newly advertised --turn-idle-timeout is never passed to runAppServerTurn here, and that helper still calls captureTurn with only onProgress. In this scenario a half-dead turn/start can still hang indefinitely despite the command docs promising the timeout applies when any turn stalls, so the same timeout needs to be forwarded and honored for the inline path as well as self-collect.

Useful? React with 👍 / 👎.

kingdoooo and others added 14 commits May 17, 2026 19:36

test: add queue-driven fake-codex harness for multi-turn tests

9046e47

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(adversarial-review): add investigate + finalize prompt templates

3c47c59

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(codex): add runAppServerInvestigation for two-phase reviews

53f9e75

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(adversarial-review): add investigate/finalize prompt builders

fbb2bc7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(adversarial-review): route self-collect through two-phase investi…

cfa0423

…gation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(adversarial-review): show banner when investigation is truncated

8b25e7b

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(adversarial-review): plumb --max-investigation-turns CLI flag

93609a5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(adversarial-review): document --max-investigation-turns

a32d1d1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kingdoooo requested a review from a team May 18, 2026 01:22

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

kingdoooo and others added 3 commits May 18, 2026 09:58

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 6, 2026

View reviewed changes

		const parsedTurns = Number.parseInt(rawMaxTurns, 10);
		if (!Number.isFinite(parsedTurns) \|\| parsedTurns <= 0) {

		This is a single-turn review. Do NOT run any shell commands or tool calls — the file contents you need are already embedded in the repository_context block below. Do not emit a tool-use stub like `{"cmd": "..."}` instead of the review; that is not the schema.
		Return only valid JSON matching the provided schema. Your entire output must be that JSON — no prose before or after, no shell commands, no tool-call payloads.

Conversation

kingdoooo commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

kingdoooo Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

kingdoooo Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

kingdoooo Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

kingdoooo commented May 23, 2026

Re: P1 — Return non-zero status when investigation turn reports error

Regression test (tests/investigation.test.mjs:213)

Proof the test isn't a no-op

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

kingdoooo Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kingdoooo commented May 18, 2026 •

edited

Loading

Regression test (`tests/investigation.test.mjs:213`)