fix(agentic): spool Codex JSONL output to disk to cap parent RSS (#1646) by Serhan-Asad · Pull Request #1654 · promptdriven/pdd

Serhan-Asad · 2026-06-18T23:29:14Z

Problem

Closes #1646.

pdd/agentic_common.py::_run_with_provider() runs the Codex CLI (provider == "openai") with _subprocess_run(..., capture_output=True, text=True) → Popen(stdout=PIPE, stderr=PIPE) + communicate(), holding the entire codex exec --json NDJSON transcript in the parent heap as a decoded str. The openai parse path then made two more full copies — result.stdout.strip() and .splitlines(). Codex transcripts for long agentic runs reach tens-to-hundreds of MiB, so peak parent RSS hit ~3–4× the transcript, crossed container memory limits, and the process was SIGKILLed (exit_code=-9) — surfaced by the cloud wrapper as OOM.

Fix

Spool the Codex subprocess stdout/stderr to temp files and parse the NDJSON incrementally from disk, keeping only bounded head/tail snippets plus the few retained events in memory. openai-only; every other provider (anthropic/google/opencode/agy) keeps the existing in-RAM path unchanged.

_subprocess_run_spooled — communicate() with stdout/stderr bound to tempfile.TemporaryFile instead of pipes. Preserves the timeout + os.killpg process-group kill of _subprocess_run, and closes both temp files on any error. _subprocess_run itself is untouched, so its ~20 small-output callers are unaffected.
_parse_codex_jsonl — one shared incremental NDJSON parser used by both the spooled and in-RAM paths; preserves the existing result / item.completed→agent_message / session.end / turn.completed selection and the usage+model merge.
_iter_spooled_lines — streams decoded lines; an oversize line (>16 MiB) is skipped only when it carries none of the retained-event markers, so a genuinely large agent_message/result/error event can never be silently dropped (markers matched case-insensitively to mirror the error scanner).
PDD_AGENT_SPOOL_DIR — optional disk-backed spool dir. The system temp dir may be tmpfs/RAM-backed in some containers; pointing this at real disk also relieves cgroup memory.

Proof (peak RSS, identical 128 MiB Codex transcript, same parse result)

	peak RSS	result
before (origin/main, in-RAM)	776 MiB	ok, "Final answer."
after (this PR, spooled)	259 MiB	ok, "Final answer."

~67% reduction; peak no longer scales with transcript size.

Tests

test_openai_codex_spool_caps_peak_rss — the Codex JSONL agent output buffering can SIGKILL PDD parent #1646 reproduction: runs the same workload twice in fresh interpreters (tiny vs ≥128 MiB) and asserts the large-vs-tiny RSS delta stays small. Fails on the pre-fix path (delta ~473 MiB), passes here (~0).
test_openai_codex_jsonl_spooled_parses_large_transcript / …_error_extracted / …_blank_stdout — spooled-path parse/error/blank behavior through a real fake-codex binary.
test_iter_spooled_lines_skips_oversize_without_decoding / …_keeps_oversize_retained_event (parametrized over agent_message/result/error/failed incl. uppercase type) — the oversize-line guard never drops a retained event and never decodes a dropped one.
Cross-file: openai tests in test_e2e_issue_557_codex_ndjson.py, test_time_reasoning_effort_env.py, and test_agentic_common_issue_813_… updated to also mock _subprocess_run_spooled (they drive the openai path through _run_with_provider/run_agentic_task).

Validation commands run

# core + the cross-file mock updates (all green)
PDD_PATH=<repo> PYTHONPATH=<repo> pytest tests/test_agentic_common.py \
  tests/test_e2e_issue_557_codex_ndjson.py \
  tests/test_agentic_common_issue_813_anthropic_api_key_oauth_shadow.py \
  tests/test_time_reasoning_effort_env.py -p no:randomly -q
# -> 552 passed

Risk / tradeoff

This trades parent RAM for disk: the spool needs free space in the spool dir. A full spool fails the run with a clear error rather than a silent OOM — strictly better behavior for the memory-constrained path this fixes, and tunable via PDD_AGENT_SPOOL_DIR. Stream-from-pipe (no temp files) was considered but rejected for this fix as higher-regression (owning stdin/stdout/stderr drains, timeout, and process-group kill manually); disk spooling keeps the proven communicate() semantics.

🤖 Generated with Claude Code

Codex `exec --json` can emit tens-to-hundreds of MiB of NDJSON. The openai path captured it via `capture_output=True` + `communicate()` (the whole decoded transcript resident on the parent heap), then `.strip()` and `.splitlines()` copied it again — so peak parent RSS reached ~3-4x the transcript, crossed container memory limits, and the process was SIGKILLed (-9), surfaced by the cloud as OOM. Spool the Codex subprocess stdout/stderr to temp files and parse the NDJSON incrementally from disk with a bounded per-line read. openai only; every other provider keeps the existing in-RAM path unchanged. On a 128 MiB transcript, peak RSS drops from ~776 MiB to ~259 MiB with identical parse output. - `_subprocess_run_spooled`: `communicate()` with stdout/stderr bound to `tempfile.TemporaryFile` (preserves the timeout + process-group kill of `_subprocess_run`); closes both files on any error. `_subprocess_run` itself is unchanged so its ~20 small-output callers are unaffected. - `_parse_codex_jsonl`: one shared incremental NDJSON parser for both the spooled and in-RAM paths; preserves the result/agent_message/session.end/turn.completed selection and the usage+model merge. - `_iter_spooled_lines`: reads each line with a bounded `readline(cap)` (16 MiB) so the parent never materializes more than ~cap at a time. An oversize line is classified by its event TYPE in the already-read prefix (probing only the first few KiB): a giant tool-output blob is drained and dropped without being decoded, while a retained-type line (result/agent_message/session.end/turn.completed or any error/failed event) is read in full and yielded — never silently dropped — up to a larger hard ceiling (64 MiB) beyond which it too is dropped rather than risk OOM. - `PDD_AGENT_SPOOL_DIR`: optional disk-backed spool dir (system temp may be tmpfs/RAM-backed). A full spool fails the run with a clear error rather than a silent OOM; when set but not a writable directory, PDD warns once and falls back instead of silently ignoring it. Also update the generating prompt (so a future `pdd sync` preserves the contract instead of regenerating it away), document `PDD_AGENT_SPOOL_DIR` in the README, and refresh the .pdd fingerprint + run report so the module reads in-sync. Regression tests: a peak-RSS test (large vs tiny transcript in fresh interpreters) that fails on the pre-fix path; spooled-path parse/error/blank tests; oversize-line tests covering blob-drop, retained-event keep, the hard ceiling, and the per-line read bound; and spool-dir resolution/warning tests. The openai tests in test_e2e_issue_557 / test_time_reasoning_effort_env / test_agentic_common_issue_813 now also mock the spooled runner. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Serhan-Asad · 2026-06-19T01:22:31Z

Staging1 verification complete for @Serhan-Asad.

What was verified:

PR head: 7baee43b61a1556c5446d904b4ca2ec69f1da66d
PR CI is green: Unit Tests, Public CLI Regression, Package Preprocess Smoke, CodeQL, auto-heal/heal all succeeded.
Focused local verification already passed on the PR checkout:
- oversized retained Codex event probe preserved the final agent_message
- focused suite: 9 passed, 472 deselected
- broader agentic/Codex suite: 553 passed
Deployed only to staging1 project prompt-driven-development-stg; no prod deploy was run.
Staging image tag: us-docker.pkg.dev/prompt-driven-development-stg/pdd/worker:pr1654-7baee43b
Staging worker/executor are using that image.
Staging worker env includes PDD_AGENT_SPOOL_DIR=/home/appuser/pdd-agent-spool.
Staging health checks passed:
- worker /health: {"status":"healthy"}
- webhook /health: {"status":"healthy","webhook_subscriptions":"ok"}

Cloud Run staging verifier:

One-off job: pdd-pr1654-spool-verify
Execution: pdd-pr1654-spool-verify-ghrlb
Memory limit: 1Gi
Fake Codex JSONL transcript: 256 MiB
Verified installed PDD SHA from /opt/pdd-repo: 7baee43b61a1556c5446d904b4ca2ec69f1da66d
Result: PR1654_VERIFY_PASS
Parser preserved final text and model: PR1654 staging verifier final answer, gpt-5
Max RSS stayed flat: 214.36 MiB -> 220.48 MiB

This specifically exercises the Cloud Run worker image path that previously buffered large Codex JSONL output and validates that the spooled parser keeps parent RSS bounded.

gltanaka

Reviewed for necessity and merge readiness.

The fix is needed: #1646 is a real failure mode. The current OpenAI/Codex path captures the full codex exec --json stdout as a decoded string and then parses it with whole-transcript operations (strip() / splitlines()), so large JSONL transcripts can amplify parent RSS enough to OOM. The spooling direction and the focused tests look reasonable; I also ran the new targeted Codex spooling tests locally and they passed.

Do not merge this PR as-is because it is currently not mergeable with main. GitHub reports mergeable=CONFLICTING, and a local merge simulation against current origin/main conflicts in:

README.md
pdd/prompts/agentic_common_python.prompt

Required changes before merge:

Rebase or merge the latest main into this branch and resolve the conflicts.
Preserve both contracts while resolving: the new Codex/OpenAI spooling behavior from this PR and the newer agentic routing/effort behavior already on main.
Refresh any generated .pdd metadata/run report files that are expected to stay in sync after the prompt/code resolution.
Rerun the targeted spooling tests plus the normal required validation (make test / lint, or the project-equivalent CI set).

After that update, the spooling change is worth re-reviewing for merge.

Serhan-Asad · 2026-06-19T04:10:08Z

Addressed @gltanaka's merge-readiness review.

What changed:

Merged latest main into fix/1646-codex-jsonl-spooling.
Resolved the README.md conflict and kept both PDD_AGENT_SPOOL_DIR and main's PDD_ROUTING_POLICY docs.
Resolved pdd/prompts/agentic_common_python.prompt by preserving main's newer routing/effort contract and this PR's Codex JSONL spooling contract.
Refreshed the .pdd metadata/run report for agentic_common_python.
Confirmed git diff --cached --check before commit and no conflict markers remain in the touched prompt/docs.

Local validation:

tests/test_agentic_common.py -k 'spool or spooled or model_reasoning_effort or reasoning_effort or effort' -p no:randomly -q -> 36 passed
tests/test_agentic_common.py tests/test_e2e_issue_557_codex_ndjson.py tests/test_time_reasoning_effort_env.py tests/test_agentic_common_issue_813_anthropic_api_key_oauth_shadow.py -p no:randomly -q -> 571 passed
tests/test_routing_policy.py tests/test_llm_invoke_task_routing.py tests/test_agentic_multishot.py -p no:randomly -q -> 81 passed
tests/test_agentic_common.py tests/test_agentic_common_issue_813_anthropic_api_key_oauth_shadow.py tests/test_agentic_common_worktree.py -p no:randomly -q -> 613 passed

Staging verification from earlier still applies: staging1 Cloud Run job with a fake 256 MiB Codex JSONL transcript completed with PR1654_VERIFY_PASS; RSS stayed bounded at about 214.36 MiB -> 220.48 MiB. Details: #1654 (comment)

Remote CI on head 766bb94954f51c6cc796813dc6abf0dc0e8883e8 is now green:

Run Unit Tests
Public CLI Regression
Package Preprocess Smoke
CodeQL / Analyze actions / JS-TS / Python
auto-heal / heal

Note: I also attempted the broad local make test path, but it enables live LLM/local environment paths and was not a clean local gate in my machine environment. The required GitHub CI set is passing on the pushed head.

The PR is now MERGEABLE; the only remaining block is the existing CHANGES_REQUESTED review state. @gltanaka please re-review when you have a chance.

gltanaka

The updated branch resolves the earlier mergeability blocker. I do not think a real Codex/cloud E2E should be required for this PR: the risk being fixed is PDD's parent-process stdout buffering and JSONL parsing behavior, and the fake-Codex subprocess tests exercise that actual local runner/parser path deterministically without auth, provider cost, or cloud flake. CI is green and the focused spooling/RSS tests pass locally.

gltanaka

Re-approving after updating against current main. I still do not think a real Codex/cloud E2E should be required: the failure is in PDD's local subprocess buffering/parsing path, and the fake-Codex subprocess plus RSS regression tests exercise that deterministically without provider auth/cost/flakiness.

gltanaka

Re-approving after the latest update against main. I still do not think a real Codex/cloud E2E should be required for this local subprocess buffering/parser fix; the deterministic fake-Codex subprocess and RSS regression coverage target the actual failure surface.

gltanaka

Re-approving after updating against the latest main commit. True external Codex/cloud E2E is still not required for this deterministic local subprocess-buffering fix.

gltanaka

Re-approving after another update against current main. No true external Codex/cloud E2E required for this local subprocess buffering/parser fix.

Serhan-Asad force-pushed the fix/1646-codex-jsonl-spooling branch from 62ac9d1 to 551921d Compare June 19, 2026 00:19

Serhan-Asad force-pushed the fix/1646-codex-jsonl-spooling branch from 551921d to 7baee43 Compare June 19, 2026 00:50

Merge branch 'main' into fix/1646-codex-jsonl-spooling

cba136d

gltanaka requested changes Jun 19, 2026

View reviewed changes

Merge main into Codex spooling fix

766bb94