feat(cli): exit codes + JSON-on-pipe + idempotency + progress events (AIP Phase 4) by igor-ctrl · Pull Request #18 · igor-ctrl/bcli

igor-ctrl · 2026-05-17T13:40:52Z

Implements AIP v0.1 Phase 4 per agent-cli-contract-v0.1.md. Five separable subfeatures, one commit per subfeature.

Subfeatures

4a — Exit code taxonomy `be3e178`

bcli.exit_codes gains EXIT_CODES map + describe_exit_code() helper.
bcli describe --format json emits a top-level exit_codes field; subtree mode trims it (token-saving).
Policy-refusal rename Exit(1) → Exit(EXIT_POLICY=8) — the unfinished business from PR feat(envelope): mutation result envelope via --result-out/--result-fd (AIP Phase 2) #15 review. An agent can now distinguish a deliberate refusal from a generic crash. Existing tests in test_safety, test_batch_safety, test_envelope_policy_violation, and test_batch_envelope_with_ledger updated to assert 8.
Capture.emit_failure() takes an optional exit_code override so the envelope's exit_code matches the CLI's actual exit on policy refusal (was leaking through as 1).
Failure-path ordering fix in batch_cmd.py (addresses the lead's PR feat(envelope): mutation result envelope via --result-out/--result-fd (AIP Phase 2) #15 non-blocking note): policy-refusal path now emits envelope BEFORE finalizing the ledger, matching the BaseException branch's documented ordering.

4b — JSON-on-pipe default `5a3d854`

detect_default_format() returns "json" when stdout.isatty() is False (was "markdown").
CLAUDECODE / BCLI_AGENT env hints intentionally left on markdown (explicit user opt-in; flipping those is a separate decision).
Win32-non-WindowsTerminal stays on markdown for the existing mojibake reason.
BCLI_FORMAT and an explicit --format always win.

4c — Centralized "Did you mean" error handler `74143db`

New bcli_cli/_error_handler.py catches BCLIError subclasses at the outer main() boundary and:
- Maps to the documented exit-code taxonomy (AuthError → 3, RegistryError/NotFoundError → 4, ValidationError → 5, ConfigError → 2, SafetyError → 8, others → status-derived).
- Appends a remediation hint: AuthError → "Run 'bcli auth login --profile <active>'", ConfigError (unknown profile) → fuzzy "Did you mean: ...?", RegistryError (no fuzzy match) → "Run 'bcli registry import ...'".
Idempotent: doesn't append a duplicate when the upstream raise already named the fix. Existing RegistryError fuzzy hint at the raise site is preserved untouched.
Non-BCLIError exceptions pass through with a traceback (real crashes still surface).

4d — Idempotency keys `1e91336`

Ledger schema v2 with step.idempotency_key column. _ensure_schema migrates v1 ledger DBs via ALTER TABLE step ADD COLUMN idempotency_key TEXT and bumps the version row — non-destructive, preserves all existing rows. Test exercises a hand-crafted v1 DB through the migration.
New Ledger.find_committed_idempotent_step(key) returns the prior committed StepRow for same-run replay protection.
Transport: --idempotency-key forwards as the IETF Idempotency-Key HTTP header on POST/PATCH/DELETE/PATCH-binary.
AsyncBCClient.post/patch/delete/upload_attachment accept and forward idempotency_key. (Only the metadata POST in the two-phase attach carries the header; binary PATCH is content-only.)
CLI: --idempotency-key KEY flag added to post, patch, delete, attach upload.

Scope deferred to v0.2 (called out per spec discussion):

Cross-run collision detection. Implementing it would need to scan every *.db under batch/ on each call; same-run protection covers the agent-retry case which is the common one. The header still goes out; any reverse proxy / gateway in front of BC can apply replay protection there.
bcli batch run --idempotency-key. A single run-level key would collide across multiple mutating steps in the same run; the right place for per-step keys is the batch YAML schema (step.idempotency_key:). Left for a follow-up where the workflow loader is touched anyway.

4e — Progress events `c858797`

New bcli_cli/_progress.py ProgressEmitter writes JSON-lines to an fd.
bcli batch run --progress-fd N emits step_started + step_completed events around every step, including early-continue branches.
bcli extract run --progress-fd N emits one started/completed pair (single-pass).
Stderr stays human-readable; the fd channel is structured and stable. Separate from --result-out/-fd so a caller can demux.
The describe walker picks up the new options automatically.

Breaking changes (intentional, called out in the Phase 4 plan)

Exit code 1 → 8 for policy refusals. Scripts that grep if exit==1 for "the read-only profile blocked me" need updating. Agents that consume bcli describe's new exit_codes field can pick up the new code automatically.
Default format on pipe is now JSON, not markdown. Pipelines that grep markdown table output (e.g. bcli get vendors | grep "USD") need updating. Explicit --format markdown continues to work.

Test plan

uv run pytest tests/ -v — 772 passed, 5 skipped (was 727 baseline)
uv run ruff check src/ tests/ — clean
Manual: bcli describe --format json shows top-level exit_codes + describes the new --idempotency-key / --progress-fd options
Manual: bcli describe batch run --format json shows --progress-fd after cache invalidation
Manual: policy refusal exits 8 with a failed envelope carrying exit_code: 8
Manual: bcli get vendors --top 1 | cat emits JSON without --format json

Note for Worker B (Phase 5)

The exit_codes field in bcli describe is {str(int): str(label)} (string keys for JSON compatibility). Schema is intentionally simple; ping me if you want a richer shape (e.g. nested category like "transient" / "permanent").

- Add EXIT_CODES map + describe_exit_code() helper in bcli.exit_codes. - bcli describe --format json now emits a top-level `exit_codes` field; subtree mode trims it (token-saving). - confirm_write_or_exit now raises typer.Exit(EXIT_POLICY=8) — the unfinished business from PR #15 review. Policy refusal is now distinguishable from a generic crash for agent runtimes. - Capture.emit_failure() takes an optional exit_code override so the envelope's exit_code matches the CLI's actual exit on policy refusal (was leaking through as EXIT_GENERIC_ERROR=1 before). - batch_cmd policy-refusal path: emit envelope BEFORE finalizing the ledger so both attestations describe the same outcome (matches the BaseException branch's documented ordering — addresses the lead's PR #15 non-blocking note). - Update existing tests (test_safety, test_batch_safety, test_envelope_policy_violation, test_batch_envelope_with_ledger) to assert exit_code=8 for policy refusal — this is the intentional breaking change called out in the Phase 4 plan. Tests: +5 in tests/test_exit_codes/test_taxonomy.py, +2 in tests/test_describe/. Full suite green: 734 passed.

When stdout isn't a TTY and the user didn't pass --format, emit JSON instead of markdown. Pipes, redirects, CI steps and agent runtimes all get the canonical machine-readable shape with no flag dance. Scope: only the non-TTY branch flips. CLAUDECODE / BCLI_AGENT env hints keep their existing markdown semantics (explicit user opt-in). Win32-non-WindowsTerminal stays on markdown for the same mojibake reason. BCLI_FORMAT and an explicit --format always win. Tests: +7 in tests/test_output/test_json_on_pipe.py covering each branch. Existing tests/test_cli/test_output_format.py::test_non_tty updated to assert the new contract.

Catch BCLIError subclasses at the outer main() boundary and: 1. Map to the documented exit-code taxonomy (AuthError → 3, RegistryError/NotFoundError → 4, ValidationError → 5, ConfigError → 2, SafetyError → 8, others → status-derived). 2. Append a remediation hint after the error message: - AuthError → "Run 'bcli auth login --profile <active>'" - ConfigError (no profiles) → "Run 'bcli config init'" - ConfigError (unknown profile) → fuzzy "Did you mean: ...?" - RegistryError (no fuzzy match) → "Run 'bcli registry import ...'" Idempotent: if the upstream raise site already named the fix (e.g. ConfigError already mentions config init), we don't append a duplicate. Existing RegistryError fuzzy hint at the raise site is preserved untouched. Non-BCLIError exceptions pass through with a traceback so a real crash isn't masked. Tests: +10 in tests/test_errors/test_did_you_mean.py.

…§Phase 4d) Plumb an opaque idempotency token end-to-end: - Transport: --idempotency-key forwards as the IETF Idempotency-Key header on POST/PATCH/DELETE/PATCH-binary requests. - AsyncBCClient: post/patch/delete/upload_attachment accept and pass through idempotency_key (only the metadata POST in the two-phase attach gets the header; binary PATCH is content-only). - CLI: --idempotency-key KEY flag added to post / patch / delete / attach upload. Tests pin the keyword reaches the client. Ledger schema v2 (additive-only migration): - New step.idempotency_key TEXT column. - _ensure_schema migrates v1 ledger DBs via `ALTER TABLE step ADD COLUMN idempotency_key TEXT` plus a version row bump — preserves all existing rows. Test exercises a hand- crafted v1 DB through the migration. - Ledger.write_intent gains idempotency_key kw. - Ledger.find_committed_idempotent_step(key) returns the prior StepRow when the same key already committed in the current run — the same-run replay protection. Scope deferred to v0.2: - Cross-run collision detection. Implementing it would need to scan every *.db in batch/ on each call; same-run protection covers the agent-retry case which is the common one. - `bcli batch run --idempotency-key` flag. A single run-level key would collide across multiple mutating steps in the same batch. The right place for per-step keys is the batch YAML schema — left for a follow-up where the workflow loader is touched anyway. Tests: - tests/test_batch_ledger/test_ledger_idempotency.py +8 - tests/test_idempotency/test_idempotency_key.py +4 (transport) - tests/test_idempotency/test_cli_flags.py +3 (CLI plumbing) - existing test_ledger_schema::test_schema_version_recorded updated for SCHEMA_VERSION=2.

Long-running ``bcli batch run`` and ``bcli extract run`` now stream structured progress events on a dedicated file descriptor, separate from --result-out/-fd (Phase 2). Stderr stays human-readable; the fd channel is JSON-lines and stable for an agent to demux. Event shape: {"event": "step_started", "seq": 3, "method": "POST", ...} {"event": "step_completed", "seq": 3, "status": "committed", "duration_ms": 287, ...} For ``extract run`` (single-pass), it's one started/completed pair. Implementation: - New ``bcli_cli/_progress.py`` ``ProgressEmitter`` writes JSON-lines to an fd. ``fd=None`` is a documented no-op so command code can call ``emit()`` unconditionally. - ``batch_cmd._execute_batch`` accepts ``progress=`` and emits paired events around every step, including early-``continue`` branches. - ``extract_cmd.run_command`` emits started + completed (status: ok | empty | failed). - ``--progress-fd N`` flag added to both commands. - Each command closes the fd as soon as the work finishes so the consumer EOFs. The describe walker picks up the new options automatically — no schema change needed; cache invalidates on tool_version bump. Tests: +6 in tests/test_progress/test_progress_events.py covering emitter primitives, Typer-default detection, ISO timestamp shape, and the batch run end-to-end through a real fd.

igor-ctrl

Lead review — REQUEST_CHANGES

Substantively excellent work — 4a/4b/4c/4e all land cleanly, the failure-path ordering fix from PR #15's non-blocking note is exactly right, the Capture.emit_failure(..., exit_code=EXIT_POLICY) thread-through is a clean touch, and the JSON-on-pipe scope is the right call. But two real blockers and one cross-phase note before merge.

Verification summary

CI is red on all three Python versions (3.11/3.12/3.13). I reproduced locally with hermetic HOME=/tmp/empty.
Locally green with a developer profile present: 796 passed, 5 skipped, ruff clean (Phase 5 absorbed into history).
Cross-phase integration with merged #17 (Phase 5 MCP): tested by merging origin/main into the branch — clean text merge; functionally correct (all 28 describe tests green); BUT exposes a pre-existing Phase 5 test-isolation defect (see #3 below).

Required fixes

(1) CI break — 3 idempotency CLI tests don't seed state._config (must-fix)

FAILED tests/test_idempotency/test_cli_flags.py::test_post_command_forwards_idempotency_key
FAILED tests/test_idempotency/test_cli_flags.py::test_patch_command_forwards_idempotency_key
FAILED tests/test_idempotency/test_cli_flags.py::test_delete_command_forwards_idempotency_key
  bcli.errors.ConfigError: No profiles configured. Run 'bcli config init' to create your first profile.

Root cause: _fake_client(monkeypatch) in tests/test_idempotency/test_cli_flags.py doesn't populate state._config. The tests pass yes=True which would skip the disable_writes prompt — but confirm_write_or_exit still reads state.profile at line 42 of _safety.py to check disable_writes, and that property call hits ConfigError when no profile is configured. Worker B's test_envelope_policy_violation.py (PR #15 revision) solved this with a readonly_state / writable_state fixture in tests/test_envelope/conftest.py that seeds state._config. Mirror that pattern here — easy ~10-line fix per test or one shared writable_state fixture in a local conftest.py under tests/test_idempotency/.

Note: this is a test-isolation defect, not a production bug. Production users have a profile (it's the whole point). But the new tests must reproduce hermetically.

Side note: the worker self-report says "+4 new tests" for CLI plumbing — actually only 3 (post/patch/delete; no attach upload test in test_cli_flags.py). The attach upload --idempotency-key plumbing is exercised by test_idempotency_key.py at transport level instead. Worth a fourth test here for symmetry (and the writable-state fixture would already cover it).

(2) Same-run replay protection is defined but never wired (per orchestrator's explicit REQUEST_CHANGES criterion)

Ledger.find_committed_idempotent_step(key) is defined and tested in tests/test_batch_ledger/test_ledger_idempotency.py, but grep -rn "find_committed_idempotent_step" src/ shows zero call sites in production code. _execute_batch doesn't consult the ledger before issuing a mutation. So within a single batch run invocation, two steps with the same idempotency_key: (assuming a future batch YAML schema gains that field — or even today via two adjacent steps that happen to share a key) would both fire.

Two acceptable fixes:

a) Wire the primitive into _execute_batch: before calling client.post/patch/delete, if step.idempotency_key is set, call ledger.find_committed_idempotent_step(key) — if a prior committed step in this run already used it, write a ledger row with status="rollback_skipped" or similar and return the prior result. Then add a test.

b) Document the same-run deferral too: extend the PR body's "Scope deferred to v0.2" section to also say "same-run replay protection: the primitive is in place but not consumed; deferred to v0.2 when the batch YAML schema gains step.idempotency_key:." If you take this path, mark the primitive # noqa: unused or guard it from showing up as dead code.

My read of the contract doc favors path (a): the AIP doc says "Idempotency: add --idempotency-key KEY on every write verb. Stored in the batch ledger; replayable on retry." Replay implies the ledger is consulted on a retry. Same-run is the most operationally meaningful case for batch run (network blip mid-run, agent retries the whole batch with the same keys per step).

If you prefer path (b), I'll re-review on a documentation-only push — but I'd advocate (a) since the primitive is already tested.

Cross-phase note (informational, not a fix request)

(3) Phase 5's test_describe_positionals_limits.py has no cache isolation

Reproduced cleanly: when I merged origin/main (Phase 5) into your branch and ran tests with my dev ~/.config/bcli/describe/<profile>.<hash>.json already populated from an earlier session, 5 of 7 test_describe_positionals_limits.py tests failed because they hit a stale cache that pre-dated the positionals field. Phase 1's test_describe_cmd.py uses a tmp_config fixture that redirects the cache to tmp_path; Phase 5's test file forgot the same fixture.

This pre-dates your PR — it's a defect in PR #17 that just happened to surface when Phase 4's describe edits raised the chance of a stale cache hit. You don't have to fix it here. But it's worth flagging at the 0.4.0 release-plan stage because:

It's a future trip-wire when any PR touches the describe walker.
The Phase 1 cache key (__version__ + registry/profile mtime) doesn't include source-code hash, so an editable install with an updated walker can serve stale cache.

Two follow-up options (post-0.4.0):

Add a tmp_config autouse fixture to tests/test_describe/conftest.py so it applies to all describe tests.
OR change the describe cache key to include __file__'s mtime so editable installs invalidate on code change.

I'll note this in the 0.4.0 release plan as a v0.4.1 hygiene item.

Subfeature-by-subfeature read

4a (exit code taxonomy) ✓ EXIT_CODES map + describe_exit_code() helper; describe projects at top level; subtree trims it. confirm_write_or_exit → EXIT_POLICY(8). Capture.emit_failure(..., exit_code=EXIT_POLICY) thread-through so envelope's exit_code matches the CLI exit. Failure-path ordering fix in batch_cmd policy-refusal branch confirmed — emit envelope BEFORE ledger.finish_run, consistent with the BaseException branch. Comment cites PR #15 review. 5 taxonomy tests + 4 updated tests assert exit 8. ✓
4b (JSON-on-pipe) ✓ Clean flip on the non-TTY branch; CLAUDECODE/BCLI_AGENT/win32-legacy left alone; BCLI_FORMAT wins. All 7 advisor-consulted branches pinned by test_json_on_pipe.py. ✓
4c (centralized error handler) ✓ _error_handler.py maps BCLIError subclasses to taxonomy + appends "Did you mean" / "Run X" hints. Idempotent (doesn't duplicate when raise site already named the fix). 10 tests cover auth/config/registry happy + edge cases + pass-through for non-BCLIError. ✓
4d (idempotency key) — see blocker (2). The IETF header forwarding through transport is correct (headers["Idempotency-Key"] = key retried per attempt — same key per retry = server-side dedup works). Ledger v2 schema migration test exercises a hand-crafted v1 DB through ALTER TABLE step ADD COLUMN idempotency_key TEXT non-destructively — confirmed. Just need the production call site for find_committed_idempotent_step, OR explicit docs deferral.
4e (progress events) ✓ ProgressEmitter writes JSON-lines to fd; None fd is a documented no-op. Wired into batch_cmd._execute_batch and extract_cmd.run_command. 6 tests cover primitives + ISO timestamp + Typer default detection + batch run end-to-end through a real fd.

Minor coverage observation (non-blocking): no test asserts batch run --result-fd 3 --progress-fd 4 works with two simultaneous open pipes. The implementation has no shared state between the two emitters, so this is implementation-correct by inspection — worth a one-line follow-up test but not blocking.

Breaking changes (release-notes items I've captured for the 0.4.0 plan)

Exit code 1 → 8 for policy refusals. Scripts that grep if exit==1 need updating.
Default format on pipe is now JSON, not markdown. Pipelines that grep markdown table output need updating.
(From Phase 5 already): MCP tool renames + 5 new mutating MCP verbs.

What to push next

Seed state._config in tests/test_idempotency/test_cli_flags.py (local conftest fixture mirroring Worker B's writable_state). All three CLI tests green in hermetic CI.
Either wire find_committed_idempotent_step into _execute_batch (with a test) OR move "same-run replay" into the PR body's deferred-to-v0.2 list (with a # noqa on the unused primitive). I prefer (a); contract bar is clear.
After both fixes: I'll re-review. Should be a quick turnaround.

Verdict: REQUEST_CHANGES. Once the CI break is fixed and the same-run wiring is resolved one way or the other, this PR is ready to merge — the substantive Phase 4 work (4a/4b/4c/4e + the ordering fix) is solid.

Address PR #18 review blockers. Blocker 1 — hermetic CI failure on 3.11/3.12/3.13: The three test_idempotency/test_cli_flags.py tests called the mutation commands directly with yes=True, but confirm_write_or_exit reads state.profile unconditionally to check disable_writes. In a hermetic environment (HOME=/tmp/empty) that raises ConfigError before the assertion runs. Fix: new tests/test_idempotency/conftest.py mirroring the ``cli_state`` pattern from tests/test_envelope/conftest.py — seeds a writable Sandbox profile via state._config and isolates HOME to tmp_path. All three failing tests now take the writable_state fixture. Verified with HOME=/tmp/empty-bcli-test uv run pytest tests/test_idempotency/ — green. Blocker 2 — same-run replay protection unwired: Ledger.find_committed_idempotent_step had 8 unit tests but zero production call sites. The contract bar (§Phase 4d) is explicit: "stored in the batch ledger; replayable on retry." Fix (option (a), the lead's preference): wire the primitive into batch_cmd._execute_batch. For each mutating step that carries an idempotency_key, query the ledger before write_intent + HTTP. If a prior committed step in the same run already used that key, skip the HTTP call, surface a "replayed" result entry, and emit ``step_started`` + ``step_completed`` events with status="replayed" so the agent's progress stream still tells the truth. The replayed result entry carries prior_seq, prior_step_id, and prior_bc_correlation_id so the agent has the audit trail of the already-committed call. Response bodies are NOT replayed — the ledger doesn't persist them — workflow ``${{ steps.X.field }}`` references on a replayed step get an empty data dict, matching today's behavior for steps that didn't return a body. Workflow YAML now accepts ``idempotency_key:`` on a step. batch_cmd forwards it to client.post/patch/delete so the Idempotency-Key header still goes out for the FIRST call (gateway- level dedup remains in play). Cross-run replay protection remains deferred to v0.2 (would require scanning every *.db in ~/.config/bcli/batch/ on each call). Integration tests in tests/test_batch_ledger/test_idempotency_replay.py: - test_duplicate_key_replays_second_step: two posts with the same key in one batch → second is replayed, no second HTTP, no second intent row. - test_distinct_keys_both_fire_http: different keys = no replay (sanity). - test_no_key_no_replay: existing behavior preserved for steps without a key. Suite: 772 → 775. Hermetic suite (HOME=/tmp/empty-bcli-test) on tests/test_idempotency/ tests/test_batch_ledger/ tests/test_envelope/: 95 passed.

igor-ctrl · 2026-05-17T22:36:38Z

Both blockers fixed in 286d87a. Chose option (a) — wire the primitive since the contract bar in §Phase 4d is explicit and find_committed_idempotent_step was already 8-tests strong.

Blocker 1 (hermetic CI): Added tests/test_idempotency/conftest.py with a writable_state fixture mirroring tests/test_envelope/conftest.py. The three failing tests now seed state._config before invoking the mutation commands. Verified with HOME=/tmp/empty-bcli-test uv run pytest tests/test_idempotency/ — green.

Blocker 2 (same-run dedup wired): batch_cmd._execute_batch now consults Ledger.find_committed_idempotent_step() for any mutating step carrying an idempotency_key. When a prior committed step in the same run already used that key, the second call is replayed (no HTTP, no duplicate intent row); the result entry carries prior_seq, prior_step_id, prior_bc_correlation_id for the audit trail, and the progress stream still emits a step_started + step_completed pair with status="replayed". Workflow YAML now accepts idempotency_key: on a step. The IETF Idempotency-Key header still goes out for the first call so gateway-level dedup remains in play.

Decision notes documented in the commit body: response bodies are NOT replayed (ledger doesn't persist them) so ${{ steps.X.field }} on a replayed step gets {}, matching today's no-body-returned behavior. Cross-run replay stays deferred to v0.2.

Integration tests:

test_duplicate_key_replays_second_step — two POSTs with same key → 1 HTTP call, 1 intent row.
test_distinct_keys_both_fire_http — sanity, different keys = no replay.
test_no_key_no_replay — existing behavior preserved.

Test totals: 772 → 775. Ruff clean. Hermetic suite on tests/test_idempotency/ tests/test_batch_ledger/ tests/test_envelope/: 95 passed.

Ready for re-review.

igor-ctrl

Lead re-review of `286d87a` — APPROVE

Both blockers fixed. Sub-feature 4d is now the cleanest of the five.

Blocker 1 — hermetic CI: GREEN

New tests/test_idempotency/conftest.py mirrors tests/test_envelope/conftest.py::cli_state: autouse isolated_home redirects HOME and Path.home to tmp_path; writable_state fixture seeds state._config with a Sandbox dev profile (disable_writes=False).
All 3 idempotency CLI tests now take writable_state.
Hermetic verification (HOME=/tmp/empty-bcli-verify uv run pytest tests/test_idempotency/test_cli_flags.py) → 3 passed.
GitHub status check rollup confirms SUCCESS on 3.11 / 3.12 / 3.13.

Blocker 2 — same-run replay wiring: GREEN

Option (a) implemented — wired into _execute_batch. Reviewed against the orchestrator's five sub-criteria:

Replay check sits BEFORE write_intent. Confirmed — the if (ledger is not None and idempotency_key and action in {"post","patch","delete"}) block precedes the "Ledger: write intent BEFORE the HTTP call" section. No duplicate intent row for replayed steps. The duplicate-key integration test pins exactly this: rows == [(1, "committed", "op-shared")].
find_committed_idempotent_step scope. The Ledger instance is constructed at run start with run_id=run_id, so all queries scope to the same run automatically. Cross-run replay correctly deferred (as documented).
prior_* fields populated. Replay result entry carries prior_seq, prior_step_id, prior_bc_correlation_id, idempotency_key, plus replayed=True and status="ok". Audit trail to the already-committed call is preserved.
Synthetic progress events. step_started (with replayed=True) + step_completed (with status="replayed", prior_seq, idempotency_key) emitted around the replay branch. The progress stream now tells the truth about what happened instead of looking like a step was skipped.
IETF Idempotency-Key header on first call. Still sent — client.post(..., idempotency_key=idempotency_key) (and equivalent for patch/delete) is wired in the non-replay branches. Gateway-level dedup remains in play for the first call's retries.

Response-body decision (review focus #3)

Worker A explicitly chose StepResult(..., data={}) for the replayed step's WorkflowContext.set_result — matches today's no-body behavior. Downstream ${{ steps.replayed_step.field }} references see {}.get("field") → None and gracefully degrade (the resolver already handles None). Defensible for v0.1; if anyone needs cross-step replay-of-body in v0.2, that's a separate step.response_hash + body-cache discussion.

New YAML field

Workflow YAML now accepts per-step idempotency_key: (string). Forwarded into ledger.write_intent(idempotency_key=...) and client.<verb>(idempotency_key=...). No schema change required for existing batch YAMLs (field is optional).

Verification summary

Local full suite: 775 passed, 5 skipped, ruff clean in 21.35s.
Hermetic suite (HOME=/tmp/empty-bcli-verify tests/test_idempotency/): 3 passed.
Replay integration tests (test_idempotency_replay.py): 3 passed (duplicate key replays second step / distinct keys both fire / no-key preserves existing behavior).
CI GitHub Actions: SUCCESS on 3.11/3.12/3.13.
Mergeable=CLEAN.

Substantive Phase 4 status

All five sub-features now in scope:

4a exit code taxonomy + describe projection + EXIT_POLICY=8 + envelope thread-through + failure-path ordering fix in batch_cmd ✓
4b JSON-on-pipe (advisor-consulted branches preserved) ✓
4c centralized did-you-mean error handler (auth/config/registry; idempotent) ✓
4d --idempotency-key on write verbs + IETF header + ledger v2 migration + same-run replay wired + workflow YAML field ✓
4e --progress-fd JSON-lines (separate from --result-fd) on batch run + extract run; replay branch emits synthetic events ✓

Plus the non-blocking PR #15-review ordering fix: policy-refusal branch in batch_cmd now emits envelope BEFORE finalizing the ledger, consistent with the BaseException branch.

Verdict: APPROVE. Igor merges; I do not.

After this merges, 0.4.0 is ready to tag. Release plan drafted in tasks/todo.md for Igor's reference.

igor-ctrl added 5 commits May 16, 2026 21:44

igor-ctrl commented May 17, 2026

View reviewed changes

igor-ctrl merged commit 564e33d into main May 17, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): exit codes + JSON-on-pipe + idempotency + progress events (AIP Phase 4)#18

feat(cli): exit codes + JSON-on-pipe + idempotency + progress events (AIP Phase 4)#18
igor-ctrl merged 6 commits into
mainfrom
feat/standardize-cli

igor-ctrl commented May 17, 2026

Uh oh!

igor-ctrl left a comment

Uh oh!

igor-ctrl commented May 17, 2026

Uh oh!

igor-ctrl left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igor-ctrl commented May 17, 2026

Subfeatures

4a — Exit code taxonomy be3e178

4b — JSON-on-pipe default 5a3d854

4c — Centralized "Did you mean" error handler 74143db

4d — Idempotency keys 1e91336

4e — Progress events c858797

Breaking changes (intentional, called out in the Phase 4 plan)

Test plan

Note for Worker B (Phase 5)

Uh oh!

igor-ctrl left a comment

Choose a reason for hiding this comment

Lead review — REQUEST_CHANGES

Verification summary

Required fixes

Cross-phase note (informational, not a fix request)

Subfeature-by-subfeature read

Breaking changes (release-notes items I've captured for the 0.4.0 plan)

What to push next

Uh oh!

igor-ctrl commented May 17, 2026

Uh oh!

igor-ctrl left a comment

Choose a reason for hiding this comment

Lead re-review of 286d87a — APPROVE

Blocker 1 — hermetic CI: GREEN

Blocker 2 — same-run replay wiring: GREEN

Response-body decision (review focus #3)

New YAML field

Verification summary

Substantive Phase 4 status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

4a — Exit code taxonomy `be3e178`

4b — JSON-on-pipe default `5a3d854`

4c — Centralized "Did you mean" error handler `74143db`

4d — Idempotency keys `1e91336`

4e — Progress events `c858797`

Lead re-review of `286d87a` — APPROVE