feat(envelope): mutation result envelope via --result-out/--result-fd (AIP Phase 2)#15
Conversation
… (AIP Phase 2) Adds an opt-in JSON result envelope an agent runtime can consume on a side channel — never on stdout. Five mutating commands now accept `--result-out PATH` (atomic write) or `--result-fd N` (write+close): bcli post / patch / delete / attach upload / batch run Envelope shape matches contract doc §Phase 2: version, invocation_id, tool_version, profile, environment, company, method, endpoint, resolved_url, record_id, dry_run, status, exit_code, bc_correlation_id, telemetry_event_id, audit_log_offset, started_at, duration_ms. Implementation: - `bcli.result_envelope` — frozen dataclass + atomic writer (tmp file + os.replace; fsync; mkdir parents). - `bcli.exit_codes` — taxonomy seeded for Phase 4 (Worker A) to extend. Phase 2 only wires 0/1/6/7 today. - `bcli_cli._envelope_wrap` — sibling to `_audit_wrap`, owns the context-manager lifecycle so the five commands share one implementation (validate flags, capture start_at/uuid, build+write the envelope on success or failure). Behaviour: - Stdout output is untouched; `--format` still drives the existing dump. - Envelope writes on BOTH success and failure paths (HTTP 4xx → exit 6, 5xx → exit 7, other → exit 1; bc_correlation_id captured when present). - `--dry-run` produces an envelope with `dry_run: true`, `status: "succeeded"`, no HTTP call. - `record_id` extracted from response body: `systemId` → `id` → first `*Id` field; null when none available. - `--result-out` and `--result-fd` are mutually exclusive (typer.BadParameter on conflict). - Both flags are strictly opt-in — when unset, commands behave exactly as before (additive change). Tests: 36 new (`tests/test_envelope/`). 667 total pass; ruff clean. Known limitations / follow-ups: - `telemetry_event_id` and `audit_log_offset` are always null; wiring needs a protocol extension on the telemetry + audit sinks. Deferred to a follow-up to keep this PR additive. - Policy violations (`disable_writes=true`, non-interactive, no `--yes`) exit before the capture block — no envelope is emitted. Phase 4 will define exit 8 and surface this in the envelope. - Existing direct-call tests (e.g. `test_batch_safety.py`) pass typer `OptionInfo` defaults to the new params; the wrapper treats those as "not provided" so we don't have to update every call site.
igor-ctrl
left a comment
There was a problem hiding this comment.
Lead review — REQUEST_CHANGES (one issue) + APPROVE the rest
Reviewed against agent-cli-contract-v0.1.md §Phase 2 and tasks/todo.md Phase 2 checklist. Worker self-report verified: 667 passed / 5 skipped, ruff clean, additive only. Beautiful PR — frozen dataclass + os.replace + fsync, mutual-exclusion on the flags, per-verb integration tests, dedicated _envelope_wrap module mirroring _audit_wrap. The ergonomics of with capture(...) as cap: are exactly right.
Verdict on the three flagged items
(1) telemetry_event_id / audit_log_offset always null — APPROVE.
The fields are present in the dataclass and pinned by test_has_all_spec_fields. The spec example in §Phase 2 shows non-null values but the schema permits null for unknown/unavailable. Wiring those needs a TelemetrySink / audit-sink protocol extension that touches every backend including the optional Azure Monitor extra — keeping this PR additive is the right call. Track as a Phase 2.1 follow-up so it doesn't drift indefinitely.
(3) Dry-run uses status="succeeded" + dry_run=true — APPROVE.
Per the contract doc §Phase 2 schema, status and dry_run are independent fields. Your interpretation is correct: a completed dry-run is a success at the "completed the dry-run preview" layer; dry_run=true is the disambiguator. Verified in test_envelope_on_dry_run. No change needed.
(2) Policy violations skip the envelope — REQUEST_CHANGES.
I empirically confirmed this with a readonly Production profile + --result-out:
typer.Exit code=1
NO ENVELOPE WRITTEN (policy gate exited before capture)
The four single-mutation commands (post, patch, delete, attach upload) call confirm_write_or_exit(...) before opening the with capture(...) block, so a policy refusal exits without emitting the envelope. batch_cmd.py already does this correctly — its confirm_write_or_exit("BATCH WRITE", preview, yes=yes) is inside with capture(...) and the envelope captures the policy exit. That asymmetry is the bug.
Why this matters for the AIP contract: the spec says "a write's value is its attestation: profile, environment, target, correlation id, outcome" — a refused write is an outcome. An agent runtime driving production needs to distinguish "BC said no" (4xx → envelope), "BC errored" (5xx → envelope), and "bcli policy refused locally" (no envelope today). The last case is the one a runtime is most likely to encounter when piloting against a disable_writes profile.
You correctly identified Phase 4 will add EXIT_POLICY = 8 to the taxonomy — but the envelope wiring is here, in Phase 2. Phase 4 can flip exit_code=1 → exit_code=8 without re-touching the envelope code path.
Asked fix:
Move the confirm_write_or_exit(...) call inside the with capture(...) block for the four single-mutation commands, mirroring batch_cmd.py (lines ~756 onward). The fix per command is ~3-line reorganization. The pattern:
with capture(method="POST", endpoint=endpoint,
result_out=result_out, result_fd=result_fd) as cap:
cap.set_resolved_url(try_resolve_url(...))
confirm_write_or_exit("POST", endpoint, yes=yes) # ← now inside; typer.Exit will hit the BaseException branch of capture(), which writes a failed envelope
...Then add one test per verb (e.g. test_post_envelope_on_policy_violation) — readonly profile + no --yes + non-TTY stdin + --result-out → file exists, status="failed", exit_code=1 (Phase 4 will update to 8 along with the taxonomy roll-out).
Smoke test result_out behavior on batch run confirms the pattern works — your existing test_batch_run_envelope_failed_when_any_step_fails already exercises a non-policy failure-on-exit; a sibling policy test would close the loop.
Verification summary
uv run pytest tests/→ 667 passed, 5 skipped (matches self-report)uv run ruff check src/ tests/→ clean- Envelope schema: 18 fields present per spec ✓
- Atomic write via
os.replace✓ (tested + verified viaspy_replace) - Mutual exclusion
--result-out/--result-fd✓ - Stdout never carries envelope keys ✓ (
test_stdout_does_not_contain_envelope) - fd mode closes the fd ✓ (
test_fd_is_closed_after_write) record_idextraction: systemId → id → first *Id ✓- Per-verb integration: post, patch, delete, attach upload, batch run all pinned
Other observations (non-blocking)
_envelope_wrap.capture()swallows atyper.Exitraised inside thewithblock to opportunistically emit an envelope — clever for catching forgottenemit_*calls, but also means a deeply-nestedtyper.Exit(1)fromconfirm_write_or_exitwould get caught if you moved it inside the block. The "salvage envelope on BaseException" path usesint(exc.exit_code)for failed exits, so the envelope will recordexit_code=1automatically._record_id_from's third clause (first key ending inId) iterates dict insertion order; for Python 3.7+ that's deterministic. Worth a comment but fine.- Synthetic
RuntimeError("N of M steps failed")inbatch_cmd.pyfor the envelope-failure path is an honest pattern given there isn't a single underlying exception. Phase 3's ledger will give a richer answer; this is the right placeholder.
Action items for worker B (next push):
- Move
confirm_write_or_exitinsidewith capture(...)for post / patch / delete / attach upload (mirror batch_cmd). - Add one
test_*_envelope_on_policy_violationper verb undertests/test_envelope/. - Keep
exit_code=1for now; Phase 4 will rename to 8 when the taxonomy lands.
After this lands, branch feat/mcp-describe-consumer off main (waiting on Phase 1 to merge first) and proceed with Phase 5.
Verdict: REQUEST_CHANGES on policy-envelope, APPROVE everything else. Igor merges once the policy-envelope fix lands.
Lead's review caught: a read-only profile (`disable_writes=true`) + non-interactive + no `--yes` runs `confirm_write_or_exit` BEFORE the `capture(...)` block, so the command exits 1 with no envelope file. An agent runtime can't distinguish that from a true crash — it should see `status="failed"` like any other refused write. Fix: move `confirm_write_or_exit` *inside* the `with capture(...)` block on `post`, `patch`, `delete`, and `attach upload`. The salvage path in `_envelope_wrap.capture` already turns a `typer.Exit(1)` into a `status="failed"` envelope, so the gate just had to move into the captured region. `batch run` already had it correct — its `confirm_write_or_exit` was already inside the capture block. Tests: 4 new under `tests/test_envelope/test_envelope_policy_violation.py` (one per verb). Each pins: - the command exits with code 1 - the envelope file exists on disk - `status="failed"`, `exit_code=1`, profile/environment captured - the underlying client write method was NOT awaited `exit_code` stays at 1 for now per lead's note — Phase 4 (Worker A) owns the taxonomy rename to `EXIT_POLICY` (8). The envelope shape remains stable; only the int constant changes later. Totals: 671 passed (was 667; +4 new), 5 skipped. Ruff clean.
igor-ctrl
left a comment
There was a problem hiding this comment.
Lead re-review of commit 113200e9 — APPROVE
Verified against the REQUEST_CHANGES item from the prior review.
Gate-move pattern (all four verbs)
For each of post_cmd, patch_cmd, delete_cmd, attach_cmd:
confirm_write_or_exit(...)removed from the pre-capture region.- Re-inserted inside
with capture(...) as cap:, aftercap.set_resolved_url(...)(andcap.set_record_id(...)where the command has one). - Inline comment cites PR #15 review and explains why (salvage path in
_envelope_wrap.captureturns thetyper.Exit(1)into astatus="failed"envelope).
This mirrors the pattern batch_cmd.py already had — confirmed unchanged.
New tests (tests/test_envelope/test_envelope_policy_violation.py)
One class per verb, four tests total:
TestPostEnvelopeOnPolicyViolationTestPatchEnvelopeOnPolicyViolationTestDeleteEnvelopeOnPolicyViolationTestAttachUploadEnvelopeOnPolicyViolation
Each pins exactly what the review asked for:
disable_writes=trueprofile (newreadonly_stateconftest fixture) +force_non_interactiveautouse → matches the policy-violation path.- Verb invoked with
--result-out=<tmp>,yes=False. pytest.raises(typer.Exit)→exit_code == 1(per the note that Phase 4 owns the taxonomy rename toEXIT_POLICY=8).- Shared
_assert_policy_failure_envelopehelper validates envelope on disk,status="failed",exit_code=1,method/endpoint/profile/environmentpopulated. fake_client.<verb>.assert_not_awaited()— proves the gate fired before any HTTP.
Bonus: the patch and delete tests additionally assert env["record_id"] == "vnd-1" after refusal — proves the gate now sits after set_record_id (structural evidence the move worked, not just behavioral).
Local verification (in /tmp/aip-review-15 worktree)
uv run pytest tests/test_envelope/ -v→ 40 passed in 0.33s (36 prior + 4 new).uv run ruff check src/bcli_cli/commands/{post,patch,delete,attach}_cmd.py tests/test_envelope/→ clean.- Worker totals match: 671 passed (was 667 + 4 new), 5 skipped, ruff clean.
Diff scope
6 files, +220/-12. Touches only the four single-mutation commands + tests/test_envelope/conftest.py (new readonly_state fixture) + new test_envelope_policy_violation.py. batch_cmd.py and _envelope_wrap.py not touched — exactly the surface area expected.
After this merges
Branch feat/mcp-describe-consumer off main (waiting on Phase 1 to merge first) and proceed with Phase 5 per tasks/todo.md.
Verdict: APPROVE. Igor merges; I do not.
PRs #15 (result envelope) and #16 (batch ledger) both modified ``batch_cmd.run_batch``'s control flow. This merge threads the envelope ``capture(...)`` wrap around the SQLite ledger writes so both attestations describe the same outcome. Integration order: - ``capture(...)`` wraps the entire ``run_batch`` body. - Inside it, ``ledger.start_run(...)`` writes the run row BEFORE any step fires (preserved from #16 — needed for SIGKILL realism). - ``cap.set_record_id(run_id)`` stashes the ledger run id onto the envelope. Agent runtimes now have one cross-reference: the envelope's ``record_id`` is the value to pass to ``bcli batch state <run-id>``. - The ``disable_writes`` gate (from #15) runs *inside* the capture block, AFTER ``start_run``. A refused write finalizes the ledger as ``failed`` and emits a failed envelope before re-raising — both attestations stay consistent. - ``cap.emit_success`` / ``emit_failure`` are paired with ``ledger.finish_run(...)`` on every exit path. Mixed-result batches (some ok, some error) get ``status="failed"`` on the envelope and ``state="partially_committed"`` on the ledger; the same RuntimeError feeds both. - The ``BaseException`` handler from #16 is preserved so a ``KeyboardInterrupt`` still re-raises after the ledger derives a truthful state via ``compute_run_state``. Dry-run decision: ``ledger.finish_run(run_id, "completed")`` IS called on dry-run, so the ledger row never lingers in ``planned`` state. The envelope's ``dry_run=true`` is the disambiguator for "no HTTP fired." Comment in the source explains the call. New integration test: - ``tests/test_envelope/test_batch_envelope_with_ledger.py`` — three scenarios (happy path, partial commit, policy refusal) pin that envelope and ledger stay in agreement, including the ``record_id == run_id`` cross-reference. Test isolation: added an autouse ``isolated_home`` fixture in ``tests/test_envelope/conftest.py`` so envelope tests that touch ``run_batch`` don't pollute the user's real ``~/.config/bcli/batch/`` dir. Test update: ``TestBatchRunEnvelope.test_batch_run_writes_envelope_all_ok`` now asserts ``record_id`` is the 32-char hex run id (was: ``None``) — matches the new ledger cross-reference behavior. Totals: 727 passed, 5 skipped. Ruff clean.
igor-ctrl
left a comment
There was a problem hiding this comment.
Lead re-review of merge commit d1ead8e — APPROVE
Post-conflict resolution validated. PR is MERGEABLE / CLEAN. d1ead8e is the only new commit on top of 113200e after #14 and #16 landed on main.
Integration walk-through (batch_cmd.py vs current main)
capture(...)wraps the entirerun_batchbody — confirmed (line 97 onward).ledger.start_runsits inside the capture block, after workflow params setup — confirmed (line 166-174, after_validate_step_names). Preserves #16's "intent row before any step" property.cap.set_record_id(run_id)— confirmed (line 179). Reuses the existing envelope field with new semantic meaning forBATCH_RUN: the envelope'srecord_idIS the ledger run id. No schema change; agents can pivot from envelope →bcli batch state <run-id>with one field. Clean.- Dry-run finalizes both attestations — confirmed (line 189-195):
_print_dry_run→ledger.finish_run(run_id, "completed")→ledger.close()→cap.mark_dry_run()→cap.emit_success()→ return. No zombieplannedledger row;dry_run=trueon the envelope is the disambiguator. The choice is defensible — operators inspecting the ledger seecompletedfor an intentional no-op, which matches the user's mental model better than a danglingrunning. - Policy refusal finalizes both as failed — confirmed (line 217-228):
try/except typer.Exitaroundconfirm_write_or_exit→compute_run_statederives → override tofailedif needed →ledger.finish_run(run_id, "failed")→cap.emit_failure(RuntimeError("BATCH WRITE refused by disable_writes gate"))→ re-raise. Both attestations consistent: envelopestatus=failed, ledgerstate=failed. - Mixed-result on happy path — confirmed (line 295-302): synthetic
RuntimeError(f"{failed_count} of {len(results)} steps failed")feedscap.emit_failure→ledger.finish_run(run_id, final_state)wherefinal_stateispartially_committedif any succeeded elsefailed→raise typer.Exit(1). Envelope reports one-line failure; ledger holds the long-formpartially_committeddetail. - BaseException ordering — confirmed (line 309-326): on real
Exception(notKeyboardInterrupt/SystemExit):cap.emit_failure(e)first, thencompute_run_state+finish_run+close. OnKeyboardInterrupt/SystemExit: no envelope emit, ledger still finalizes via derivation, re-raise. SIGKILL realism from #16 preserved — Python signal handlers shouldn't conjure a "complete" envelope. except typer.Exit: raise— confirmed (line 306-308). Lets the policy-refusaltyper.Exitand mixed-resulttyper.Exit(1)pass through without re-finalization. Important: prevents double-emit on the envelope (the salvage path incapture()is guarded bycap_state.writtenso even if it tried, it wouldn't double-write).
Integration test (test_batch_envelope_with_ledger.py)
Three scenarios, each asserting envelope ↔ ledger agreement plus the record_id == run_id cross-reference:
TestEnvelopeAndLedgerAgreeOnSuccess::test_success_envelope_matches_ledger_run_id— envelopestatus="succeeded"/exit_code=0/method="BATCH_RUN", ledger derived statecompleted,env["record_id"] == run_row["run_id"].TestEnvelopeAndLedgerAgreeOnPartialCommit::test_partial_commit_envelope_failed_ledger_partial— Step 1 commits, step 2 raisesRuntimeError; envelopestatus="failed"/exit_code=1, ledger derived statepartially_committed, cross-reference preserved.TestEnvelopeOnPolicyRefusalForBatch::test_policy_refusal_emits_failed_envelope_and_failed_ledger— readonly profile + non-interactive + no--yes; envelopestatus="failed"/exit_code=1/profile=prod/environment=Production, ledger row exists (start_run ran before gate),state="failed", step table empty,fake_client.post.assert_not_awaited(), cross-reference preserved.
The first two are exactly the "envelope-ledger coupling" check the lead asked about. The third documents the chosen behavior so a future refactor doesn't accidentally short-circuit start_run and lose the audit trail of "the run was attempted but refused."
Test hardening
isolated_homefixture intests/test_envelope/conftest.pyis now autouse — every envelope test redirectsHOMEandPath.home()totmp_path. Preventsrun_batch-touching tests from polluting~/.config/bcli/batch/. Right move.TestBatchRunEnvelope.test_batch_run_writes_envelope_all_okupdated:record_idassertion changed fromNonetolen == 32(uuid4 hex). Comment cites #15 + #16 integration. Consistent with new behavior.
Local verification (in /tmp/aip-review-15 worktree)
- Focused suites —
uv run pytest tests/test_envelope/ tests/test_batch_ledger/ tests/test_cli/test_batch_safety.py→ 85 passed in 0.67s (43 envelope incl. 3 new integration + 34 ledger + 8 batch_safety). - Full suite —
uv run pytest tests/→ 727 passed, 5 skipped in 20.73s (matches worker self-report). uv run ruff check src/ tests/→ clean.
Non-blocking observation
Failure-path ordering differs between branches:
- Policy refusal:
ledger.finish_runfirst, thencap.emit_failure, then re-raise. - BaseException:
cap.emit_failurefirst, thenledger.finish_run, then re-raise.
Both produce correct attestations because cap.set_record_id(run_id) runs unconditionally at line 179 — by the time either failure branch fires, the envelope already knows which ledger to point at. The BaseException docstring explicitly says "Order matters: emit the envelope BEFORE finalizing the ledger so both attestations describe the same outcome" — but the policy-refusal branch doesn't follow that principle. Not a correctness bug, just a minor inconsistency. Worth noting for Phase 4 worker if they happen to be touching this region.
After this merges
Standing down. Worker B picks up Phase 5 (bcli_mcp rewrite to consume bcli describe + --result-out); Worker A picks up Phase 4 (exit-code taxonomy + JSON-on-pipe + idempotency-key + progress events).
Verdict: APPROVE. Igor merges; I do not.
…(AIP Phase 4) (#18) * feat(exit-codes): wire AIP §Phase 4a taxonomy + project in describe - Add EXIT_CODES map + describe_exit_code() helper in bcli.exit_codes. - bcli describe --format json now emits a top-level `exit_codes` field; subtree mode trims it (token-saving). - confirm_write_or_exit now raises typer.Exit(EXIT_POLICY=8) — the unfinished business from PR #15 review. Policy refusal is now distinguishable from a generic crash for agent runtimes. - Capture.emit_failure() takes an optional exit_code override so the envelope's exit_code matches the CLI's actual exit on policy refusal (was leaking through as EXIT_GENERIC_ERROR=1 before). - batch_cmd policy-refusal path: emit envelope BEFORE finalizing the ledger so both attestations describe the same outcome (matches the BaseException branch's documented ordering — addresses the lead's PR #15 non-blocking note). - Update existing tests (test_safety, test_batch_safety, test_envelope_policy_violation, test_batch_envelope_with_ledger) to assert exit_code=8 for policy refusal — this is the intentional breaking change called out in the Phase 4 plan. Tests: +5 in tests/test_exit_codes/test_taxonomy.py, +2 in tests/test_describe/. Full suite green: 734 passed. * feat(output): JSON on non-TTY by default (AIP §Phase 4b) When stdout isn't a TTY and the user didn't pass --format, emit JSON instead of markdown. Pipes, redirects, CI steps and agent runtimes all get the canonical machine-readable shape with no flag dance. Scope: only the non-TTY branch flips. CLAUDECODE / BCLI_AGENT env hints keep their existing markdown semantics (explicit user opt-in). Win32-non-WindowsTerminal stays on markdown for the same mojibake reason. BCLI_FORMAT and an explicit --format always win. Tests: +7 in tests/test_output/test_json_on_pipe.py covering each branch. Existing tests/test_cli/test_output_format.py::test_non_tty updated to assert the new contract. * feat(errors): centralized did-you-mean error handler (AIP §Phase 4c) Catch BCLIError subclasses at the outer main() boundary and: 1. Map to the documented exit-code taxonomy (AuthError → 3, RegistryError/NotFoundError → 4, ValidationError → 5, ConfigError → 2, SafetyError → 8, others → status-derived). 2. Append a remediation hint after the error message: - AuthError → "Run 'bcli auth login --profile <active>'" - ConfigError (no profiles) → "Run 'bcli config init'" - ConfigError (unknown profile) → fuzzy "Did you mean: ...?" - RegistryError (no fuzzy match) → "Run 'bcli registry import ...'" Idempotent: if the upstream raise site already named the fix (e.g. ConfigError already mentions config init), we don't append a duplicate. Existing RegistryError fuzzy hint at the raise site is preserved untouched. Non-BCLIError exceptions pass through with a traceback so a real crash isn't masked. Tests: +10 in tests/test_errors/test_did_you_mean.py. * feat(idempotency): --idempotency-key on write verbs + ledger v2 (AIP §Phase 4d) Plumb an opaque idempotency token end-to-end: - Transport: --idempotency-key forwards as the IETF Idempotency-Key header on POST/PATCH/DELETE/PATCH-binary requests. - AsyncBCClient: post/patch/delete/upload_attachment accept and pass through idempotency_key (only the metadata POST in the two-phase attach gets the header; binary PATCH is content-only). - CLI: --idempotency-key KEY flag added to post / patch / delete / attach upload. Tests pin the keyword reaches the client. Ledger schema v2 (additive-only migration): - New step.idempotency_key TEXT column. - _ensure_schema migrates v1 ledger DBs via `ALTER TABLE step ADD COLUMN idempotency_key TEXT` plus a version row bump — preserves all existing rows. Test exercises a hand- crafted v1 DB through the migration. - Ledger.write_intent gains idempotency_key kw. - Ledger.find_committed_idempotent_step(key) returns the prior StepRow when the same key already committed in the current run — the same-run replay protection. Scope deferred to v0.2: - Cross-run collision detection. Implementing it would need to scan every *.db in batch/ on each call; same-run protection covers the agent-retry case which is the common one. - `bcli batch run --idempotency-key` flag. A single run-level key would collide across multiple mutating steps in the same batch. The right place for per-step keys is the batch YAML schema — left for a follow-up where the workflow loader is touched anyway. Tests: - tests/test_batch_ledger/test_ledger_idempotency.py +8 - tests/test_idempotency/test_idempotency_key.py +4 (transport) - tests/test_idempotency/test_cli_flags.py +3 (CLI plumbing) - existing test_ledger_schema::test_schema_version_recorded updated for SCHEMA_VERSION=2. * feat(progress): JSON step events to --progress-fd (AIP §Phase 4e) Long-running ``bcli batch run`` and ``bcli extract run`` now stream structured progress events on a dedicated file descriptor, separate from --result-out/-fd (Phase 2). Stderr stays human-readable; the fd channel is JSON-lines and stable for an agent to demux. Event shape: {"event": "step_started", "seq": 3, "method": "POST", ...} {"event": "step_completed", "seq": 3, "status": "committed", "duration_ms": 287, ...} For ``extract run`` (single-pass), it's one started/completed pair. Implementation: - New ``bcli_cli/_progress.py`` ``ProgressEmitter`` writes JSON-lines to an fd. ``fd=None`` is a documented no-op so command code can call ``emit()`` unconditionally. - ``batch_cmd._execute_batch`` accepts ``progress=`` and emits paired events around every step, including early-``continue`` branches. - ``extract_cmd.run_command`` emits started + completed (status: ok | empty | failed). - ``--progress-fd N`` flag added to both commands. - Each command closes the fd as soon as the work finishes so the consumer EOFs. The describe walker picks up the new options automatically — no schema change needed; cache invalidates on tool_version bump. Tests: +6 in tests/test_progress/test_progress_events.py covering emitter primitives, Typer-default detection, ISO timestamp shape, and the batch run end-to-end through a real fd. * fix(idempotency): seed state config in CLI tests + wire same-run dedup Address PR #18 review blockers. Blocker 1 — hermetic CI failure on 3.11/3.12/3.13: The three test_idempotency/test_cli_flags.py tests called the mutation commands directly with yes=True, but confirm_write_or_exit reads state.profile unconditionally to check disable_writes. In a hermetic environment (HOME=/tmp/empty) that raises ConfigError before the assertion runs. Fix: new tests/test_idempotency/conftest.py mirroring the ``cli_state`` pattern from tests/test_envelope/conftest.py — seeds a writable Sandbox profile via state._config and isolates HOME to tmp_path. All three failing tests now take the writable_state fixture. Verified with HOME=/tmp/empty-bcli-test uv run pytest tests/test_idempotency/ — green. Blocker 2 — same-run replay protection unwired: Ledger.find_committed_idempotent_step had 8 unit tests but zero production call sites. The contract bar (§Phase 4d) is explicit: "stored in the batch ledger; replayable on retry." Fix (option (a), the lead's preference): wire the primitive into batch_cmd._execute_batch. For each mutating step that carries an idempotency_key, query the ledger before write_intent + HTTP. If a prior committed step in the same run already used that key, skip the HTTP call, surface a "replayed" result entry, and emit ``step_started`` + ``step_completed`` events with status="replayed" so the agent's progress stream still tells the truth. The replayed result entry carries prior_seq, prior_step_id, and prior_bc_correlation_id so the agent has the audit trail of the already-committed call. Response bodies are NOT replayed — the ledger doesn't persist them — workflow ``${{ steps.X.field }}`` references on a replayed step get an empty data dict, matching today's behavior for steps that didn't return a body. Workflow YAML now accepts ``idempotency_key:`` on a step. batch_cmd forwards it to client.post/patch/delete so the Idempotency-Key header still goes out for the FIRST call (gateway- level dedup remains in play). Cross-run replay protection remains deferred to v0.2 (would require scanning every *.db in ~/.config/bcli/batch/ on each call). Integration tests in tests/test_batch_ledger/test_idempotency_replay.py: - test_duplicate_key_replays_second_step: two posts with the same key in one batch → second is replayed, no second HTTP, no second intent row. - test_distinct_keys_both_fire_http: different keys = no replay (sanity). - test_no_key_no_replay: existing behavior preserved for steps without a key. Suite: 772 → 775. Hermetic suite (HOME=/tmp/empty-bcli-test) on tests/test_idempotency/ tests/test_batch_ledger/ tests/test_envelope/: 95 passed.
Summary
Implements Phase 2 of the AIP v0.1 contract — an opt-in JSON result envelope an agent runtime can consume on a side channel, never on stdout. Adds
--result-out PATHand--result-fd Nto every mutating command:bcli post/patch/delete/attach upload/batch runEnvelope shape matches the contract:
version, invocation_id, tool_version, profile, environment, company, method, endpoint, resolved_url, record_id, dry_run, status, exit_code, bc_correlation_id, telemetry_event_id, audit_log_offset, started_at, duration_ms.Why
Agent runtimes (and
bcli_mcp) need to parse mutation outcomes deterministically. Today they have to scrape stdout prose — fragile when--formatchanges, when the user sees[green]Error:[/red]styling, or when the BC correlation id only appears in stderr. The envelope is a stable, atomic, side-channel attestation: profile + environment + company + correlation id + outcome.This unblocks Phase 5 (the
bcli_mcprewrite that consumes--result-out).Implementation
bcli.result_envelope— frozen dataclass + atomic writer (tmp file +os.replace,fsync,mkdir(parents=True)).bcli.exit_codes— taxonomy seeded for Phase 4 (Worker A) to extend; Phase 2 only wires0/1/6/7today.bcli_cli._envelope_wrap— sibling to_audit_wrap. Context-manager lifecycle so the five commands share one implementation: validate flags, capturestarted_at+invocation_id, build + write the envelope on success or failure.The envelope is written on both success and failure paths:
status="failed",exit_code=6,bc_correlation_idcaptured.status="failed",exit_code=7,bc_correlation_idcaptured.status="failed",exit_code=1.--dry-run→dry_run=true,status="succeeded", no HTTP call.stdout is untouched; the existing
--formatflag still drives whatever the human / CSV / JSON dump looks like. Stderr too.record_idis extracted from the response body in order:systemId→id→ first*Idfield;nullwhen none available.--result-outand--result-fdare mutually exclusive (typer.BadParameteron conflict).Both flags are strictly opt-in — when unset, commands behave exactly as before. This is an additive change.
Smoke proof
Known limitations / follow-ups
telemetry_event_id/audit_log_offsetare alwaysnull. Wiring needs a protocol extension on the telemetry + audit sinks (returning the emitted event id / log offset). Deferred to keep this PR additive — extending those Protocols touches every backend including the optional Azure Monitor extra.confirm_write_or_exitraisestyper.Exit(1)before thecapture(...)block. A read-only profile in non-interactive mode without--yesexits 1 with no envelope. Phase 4 (exit-code taxonomy) definesexit 8for this case — the right place to wire envelope-on-policy-violation is together with that change.status="succeeded"for dry-run. I chose this overstatus="dry_run"because the dry-run completed successfully;dry_run=trueis the disambiguator. Defensible vs. the spec's example which only shows the happy path. Open to flipping if you'd prefer.OptionInfotolerance. The wrapper treatstyper.OptionInfoplaceholders as "not provided" so existing direct-call tests (test_batch_safety.pycallsrun_batch(file=..., dry_run=..., …)without the new kwargs) keep working. A separate cleanup PR could update every call site; this helper keeps the change contained.Test plan
uv run pytest tests/ -v— 667 passed, 5 skipped.uv run ruff check src/ tests/— clean.tests/test_envelope/(core module + per-verb integration).--helpshows the new flags on post / patch / delete / attach upload / batch run.--formatexactly on a real post (no envelope keys leaking).mkdir(parents=True, exist_ok=True)for--result-outis OK (matches existing--outputflag elsewhere).Spec / context
agent-cli-contract-v0.1.md§Phase 2.tasks/todo.mdPhase 2 checklist.