Skip to content

feat(envelope): mutation result envelope via --result-out/--result-fd (AIP Phase 2)#15

Merged
igor-ctrl merged 3 commits into
mainfrom
feat/result-envelope
May 17, 2026
Merged

feat(envelope): mutation result envelope via --result-out/--result-fd (AIP Phase 2)#15
igor-ctrl merged 3 commits into
mainfrom
feat/result-envelope

Conversation

@igor-ctrl
Copy link
Copy Markdown
Owner

Summary

Implements Phase 2 of the AIP v0.1 contract — an opt-in JSON result envelope an agent runtime can consume on a side channel, never on stdout. Adds --result-out PATH and --result-fd N to every mutating command:

  • bcli post / patch / delete / attach upload / batch run

Envelope shape matches the contract: version, invocation_id, tool_version, profile, environment, company, method, endpoint, resolved_url, record_id, dry_run, status, exit_code, bc_correlation_id, telemetry_event_id, audit_log_offset, started_at, duration_ms.

Why

Agent runtimes (and bcli_mcp) need to parse mutation outcomes deterministically. Today they have to scrape stdout prose — fragile when --format changes, when the user sees [green]Error:[/red] styling, or when the BC correlation id only appears in stderr. The envelope is a stable, atomic, side-channel attestation: profile + environment + company + correlation id + outcome.

This unblocks Phase 5 (the bcli_mcp rewrite that consumes --result-out).

Implementation

  • bcli.result_envelope — frozen dataclass + atomic writer (tmp file + os.replace, fsync, mkdir(parents=True)).
  • bcli.exit_codes — taxonomy seeded for Phase 4 (Worker A) to extend; Phase 2 only wires 0/1/6/7 today.
  • bcli_cli._envelope_wrap — sibling to _audit_wrap. Context-manager lifecycle so the five commands share one implementation: validate flags, capture started_at + invocation_id, build + write the envelope on success or failure.

The envelope is written on both success and failure paths:

  • HTTP 4xx → status="failed", exit_code=6, bc_correlation_id captured.
  • HTTP 5xx → status="failed", exit_code=7, bc_correlation_id captured.
  • Other exceptions → status="failed", exit_code=1.
  • --dry-rundry_run=true, status="succeeded", no HTTP call.

stdout is untouched; the existing --format flag still drives whatever the human / CSV / JSON dump looks like. Stderr too.

record_id is extracted from the response body in order: systemIdid → first *Id field; null when none available.

--result-out and --result-fd are mutually exclusive (typer.BadParameter on conflict).

Both flags are strictly opt-in — when unset, commands behave exactly as before. This is an additive change.

Smoke proof

$ bcli post vendors --data '{"displayName": "Smoke"}' --result-out /tmp/x.json
$ cat /tmp/x.json
{
  "version": "0.1",
  "invocation_id": "3b673fa7c6e743fb816801842f0cd990",
  "tool_version": "0.3.0",
  "profile": "dev",
  "environment": "Sandbox",
  "company": "c-1",
  "method": "POST",
  "endpoint": "vendors",
  "resolved_url": "https://example.test/vendors",
  "record_id": "vnd-real-1",
  "dry_run": false,
  "status": "succeeded",
  "exit_code": 0,
  "bc_correlation_id": null,
  "telemetry_event_id": null,
  "audit_log_offset": null,
  "started_at": "2026-05-17T01:00:51.355583Z",
  "duration_ms": 1
}

Known limitations / follow-ups

  1. telemetry_event_id / audit_log_offset are always null. Wiring needs a protocol extension on the telemetry + audit sinks (returning the emitted event id / log offset). Deferred to keep this PR additive — extending those Protocols touches every backend including the optional Azure Monitor extra.
  2. Policy violations don't emit an envelope. confirm_write_or_exit raises typer.Exit(1) before the capture(...) block. A read-only profile in non-interactive mode without --yes exits 1 with no envelope. Phase 4 (exit-code taxonomy) defines exit 8 for this case — the right place to wire envelope-on-policy-violation is together with that change.
  3. status="succeeded" for dry-run. I chose this over status="dry_run" because the dry-run completed successfully; dry_run=true is the disambiguator. Defensible vs. the spec's example which only shows the happy path. Open to flipping if you'd prefer.
  4. OptionInfo tolerance. The wrapper treats typer.OptionInfo placeholders as "not provided" so existing direct-call tests (test_batch_safety.py calls run_batch(file=..., dry_run=..., …) without the new kwargs) keep working. A separate cleanup PR could update every call site; this helper keeps the change contained.

Test plan

  • uv run pytest tests/ -v — 667 passed, 5 skipped.
  • uv run ruff check src/ tests/ — clean.
  • 36 new tests under tests/test_envelope/ (core module + per-verb integration).
  • CLI --help shows the new flags on post / patch / delete / attach upload / batch run.
  • Live smoke run with mocked HTTP layer produces the envelope shown above.
  • Reviewer: spot-check that stdout still matches --format exactly on a real post (no envelope keys leaking).
  • Reviewer: confirm mkdir(parents=True, exist_ok=True) for --result-out is OK (matches existing --output flag elsewhere).

Spec / context

… (AIP Phase 2)

Adds an opt-in JSON result envelope an agent runtime can consume on a
side channel — never on stdout. Five mutating commands now accept
`--result-out PATH` (atomic write) or `--result-fd N` (write+close):

  bcli post / patch / delete / attach upload / batch run

Envelope shape matches contract doc §Phase 2: version, invocation_id,
tool_version, profile, environment, company, method, endpoint,
resolved_url, record_id, dry_run, status, exit_code, bc_correlation_id,
telemetry_event_id, audit_log_offset, started_at, duration_ms.

Implementation:
- `bcli.result_envelope` — frozen dataclass + atomic writer
  (tmp file + os.replace; fsync; mkdir parents).
- `bcli.exit_codes` — taxonomy seeded for Phase 4 (Worker A) to extend.
  Phase 2 only wires 0/1/6/7 today.
- `bcli_cli._envelope_wrap` — sibling to `_audit_wrap`, owns the
  context-manager lifecycle so the five commands share one implementation
  (validate flags, capture start_at/uuid, build+write the envelope on
  success or failure).

Behaviour:
- Stdout output is untouched; `--format` still drives the existing dump.
- Envelope writes on BOTH success and failure paths (HTTP 4xx → exit 6,
  5xx → exit 7, other → exit 1; bc_correlation_id captured when present).
- `--dry-run` produces an envelope with `dry_run: true`, `status:
  "succeeded"`, no HTTP call.
- `record_id` extracted from response body: `systemId` → `id` → first
  `*Id` field; null when none available.
- `--result-out` and `--result-fd` are mutually exclusive
  (typer.BadParameter on conflict).
- Both flags are strictly opt-in — when unset, commands behave exactly as
  before (additive change).

Tests: 36 new (`tests/test_envelope/`). 667 total pass; ruff clean.

Known limitations / follow-ups:
- `telemetry_event_id` and `audit_log_offset` are always null; wiring
  needs a protocol extension on the telemetry + audit sinks. Deferred to
  a follow-up to keep this PR additive.
- Policy violations (`disable_writes=true`, non-interactive, no `--yes`)
  exit before the capture block — no envelope is emitted. Phase 4 will
  define exit 8 and surface this in the envelope.
- Existing direct-call tests (e.g. `test_batch_safety.py`) pass typer
  `OptionInfo` defaults to the new params; the wrapper treats those as
  "not provided" so we don't have to update every call site.
Copy link
Copy Markdown
Owner Author

@igor-ctrl igor-ctrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lead review — REQUEST_CHANGES (one issue) + APPROVE the rest

Reviewed against agent-cli-contract-v0.1.md §Phase 2 and tasks/todo.md Phase 2 checklist. Worker self-report verified: 667 passed / 5 skipped, ruff clean, additive only. Beautiful PR — frozen dataclass + os.replace + fsync, mutual-exclusion on the flags, per-verb integration tests, dedicated _envelope_wrap module mirroring _audit_wrap. The ergonomics of with capture(...) as cap: are exactly right.

Verdict on the three flagged items

(1) telemetry_event_id / audit_log_offset always null — APPROVE.
The fields are present in the dataclass and pinned by test_has_all_spec_fields. The spec example in §Phase 2 shows non-null values but the schema permits null for unknown/unavailable. Wiring those needs a TelemetrySink / audit-sink protocol extension that touches every backend including the optional Azure Monitor extra — keeping this PR additive is the right call. Track as a Phase 2.1 follow-up so it doesn't drift indefinitely.

(3) Dry-run uses status="succeeded" + dry_run=true — APPROVE.
Per the contract doc §Phase 2 schema, status and dry_run are independent fields. Your interpretation is correct: a completed dry-run is a success at the "completed the dry-run preview" layer; dry_run=true is the disambiguator. Verified in test_envelope_on_dry_run. No change needed.

(2) Policy violations skip the envelope — REQUEST_CHANGES.

I empirically confirmed this with a readonly Production profile + --result-out:

typer.Exit code=1
NO ENVELOPE WRITTEN (policy gate exited before capture)

The four single-mutation commands (post, patch, delete, attach upload) call confirm_write_or_exit(...) before opening the with capture(...) block, so a policy refusal exits without emitting the envelope. batch_cmd.py already does this correctly — its confirm_write_or_exit("BATCH WRITE", preview, yes=yes) is inside with capture(...) and the envelope captures the policy exit. That asymmetry is the bug.

Why this matters for the AIP contract: the spec says "a write's value is its attestation: profile, environment, target, correlation id, outcome" — a refused write is an outcome. An agent runtime driving production needs to distinguish "BC said no" (4xx → envelope), "BC errored" (5xx → envelope), and "bcli policy refused locally" (no envelope today). The last case is the one a runtime is most likely to encounter when piloting against a disable_writes profile.

You correctly identified Phase 4 will add EXIT_POLICY = 8 to the taxonomy — but the envelope wiring is here, in Phase 2. Phase 4 can flip exit_code=1 → exit_code=8 without re-touching the envelope code path.

Asked fix:

Move the confirm_write_or_exit(...) call inside the with capture(...) block for the four single-mutation commands, mirroring batch_cmd.py (lines ~756 onward). The fix per command is ~3-line reorganization. The pattern:

with capture(method="POST", endpoint=endpoint,
             result_out=result_out, result_fd=result_fd) as cap:
    cap.set_resolved_url(try_resolve_url(...))
    confirm_write_or_exit("POST", endpoint, yes=yes)   # ← now inside; typer.Exit will hit the BaseException branch of capture(), which writes a failed envelope
    ...

Then add one test per verb (e.g. test_post_envelope_on_policy_violation) — readonly profile + no --yes + non-TTY stdin + --result-out → file exists, status="failed", exit_code=1 (Phase 4 will update to 8 along with the taxonomy roll-out).

Smoke test result_out behavior on batch run confirms the pattern works — your existing test_batch_run_envelope_failed_when_any_step_fails already exercises a non-policy failure-on-exit; a sibling policy test would close the loop.

Verification summary

  • uv run pytest tests/667 passed, 5 skipped (matches self-report)
  • uv run ruff check src/ tests/clean
  • Envelope schema: 18 fields present per spec ✓
  • Atomic write via os.replace ✓ (tested + verified via spy_replace)
  • Mutual exclusion --result-out / --result-fd
  • Stdout never carries envelope keys ✓ (test_stdout_does_not_contain_envelope)
  • fd mode closes the fd ✓ (test_fd_is_closed_after_write)
  • record_id extraction: systemId → id → first *Id ✓
  • Per-verb integration: post, patch, delete, attach upload, batch run all pinned

Other observations (non-blocking)

  • _envelope_wrap.capture() swallows a typer.Exit raised inside the with block to opportunistically emit an envelope — clever for catching forgotten emit_* calls, but also means a deeply-nested typer.Exit(1) from confirm_write_or_exit would get caught if you moved it inside the block. The "salvage envelope on BaseException" path uses int(exc.exit_code) for failed exits, so the envelope will record exit_code=1 automatically.
  • _record_id_from's third clause (first key ending in Id) iterates dict insertion order; for Python 3.7+ that's deterministic. Worth a comment but fine.
  • Synthetic RuntimeError("N of M steps failed") in batch_cmd.py for the envelope-failure path is an honest pattern given there isn't a single underlying exception. Phase 3's ledger will give a richer answer; this is the right placeholder.

Action items for worker B (next push):

  1. Move confirm_write_or_exit inside with capture(...) for post / patch / delete / attach upload (mirror batch_cmd).
  2. Add one test_*_envelope_on_policy_violation per verb under tests/test_envelope/.
  3. Keep exit_code=1 for now; Phase 4 will rename to 8 when the taxonomy lands.

After this lands, branch feat/mcp-describe-consumer off main (waiting on Phase 1 to merge first) and proceed with Phase 5.

Verdict: REQUEST_CHANGES on policy-envelope, APPROVE everything else. Igor merges once the policy-envelope fix lands.

Lead's review caught: a read-only profile (`disable_writes=true`) +
non-interactive + no `--yes` runs `confirm_write_or_exit` BEFORE the
`capture(...)` block, so the command exits 1 with no envelope file.
An agent runtime can't distinguish that from a true crash — it should
see `status="failed"` like any other refused write.

Fix: move `confirm_write_or_exit` *inside* the `with capture(...)`
block on `post`, `patch`, `delete`, and `attach upload`. The salvage
path in `_envelope_wrap.capture` already turns a `typer.Exit(1)` into a
`status="failed"` envelope, so the gate just had to move into the
captured region. `batch run` already had it correct — its
`confirm_write_or_exit` was already inside the capture block.

Tests: 4 new under `tests/test_envelope/test_envelope_policy_violation.py`
(one per verb). Each pins:
- the command exits with code 1
- the envelope file exists on disk
- `status="failed"`, `exit_code=1`, profile/environment captured
- the underlying client write method was NOT awaited

`exit_code` stays at 1 for now per lead's note — Phase 4 (Worker A)
owns the taxonomy rename to `EXIT_POLICY` (8). The envelope shape
remains stable; only the int constant changes later.

Totals: 671 passed (was 667; +4 new), 5 skipped. Ruff clean.
Copy link
Copy Markdown
Owner Author

@igor-ctrl igor-ctrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lead re-review of commit 113200e9 — APPROVE

Verified against the REQUEST_CHANGES item from the prior review.

Gate-move pattern (all four verbs)

For each of post_cmd, patch_cmd, delete_cmd, attach_cmd:

  • confirm_write_or_exit(...) removed from the pre-capture region.
  • Re-inserted inside with capture(...) as cap:, after cap.set_resolved_url(...) (and cap.set_record_id(...) where the command has one).
  • Inline comment cites PR #15 review and explains why (salvage path in _envelope_wrap.capture turns the typer.Exit(1) into a status="failed" envelope).

This mirrors the pattern batch_cmd.py already had — confirmed unchanged.

New tests (tests/test_envelope/test_envelope_policy_violation.py)

One class per verb, four tests total:

  • TestPostEnvelopeOnPolicyViolation
  • TestPatchEnvelopeOnPolicyViolation
  • TestDeleteEnvelopeOnPolicyViolation
  • TestAttachUploadEnvelopeOnPolicyViolation

Each pins exactly what the review asked for:

  1. disable_writes=true profile (new readonly_state conftest fixture) + force_non_interactive autouse → matches the policy-violation path.
  2. Verb invoked with --result-out=<tmp>, yes=False.
  3. pytest.raises(typer.Exit)exit_code == 1 (per the note that Phase 4 owns the taxonomy rename to EXIT_POLICY=8).
  4. Shared _assert_policy_failure_envelope helper validates envelope on disk, status="failed", exit_code=1, method/endpoint/profile/environment populated.
  5. fake_client.<verb>.assert_not_awaited() — proves the gate fired before any HTTP.

Bonus: the patch and delete tests additionally assert env["record_id"] == "vnd-1" after refusal — proves the gate now sits after set_record_id (structural evidence the move worked, not just behavioral).

Local verification (in /tmp/aip-review-15 worktree)

  • uv run pytest tests/test_envelope/ -v40 passed in 0.33s (36 prior + 4 new).
  • uv run ruff check src/bcli_cli/commands/{post,patch,delete,attach}_cmd.py tests/test_envelope/clean.
  • Worker totals match: 671 passed (was 667 + 4 new), 5 skipped, ruff clean.

Diff scope

6 files, +220/-12. Touches only the four single-mutation commands + tests/test_envelope/conftest.py (new readonly_state fixture) + new test_envelope_policy_violation.py. batch_cmd.py and _envelope_wrap.py not touched — exactly the surface area expected.

After this merges

Branch feat/mcp-describe-consumer off main (waiting on Phase 1 to merge first) and proceed with Phase 5 per tasks/todo.md.

Verdict: APPROVE. Igor merges; I do not.

PRs #15 (result envelope) and #16 (batch ledger) both modified
``batch_cmd.run_batch``'s control flow. This merge threads the envelope
``capture(...)`` wrap around the SQLite ledger writes so both
attestations describe the same outcome.

Integration order:
- ``capture(...)`` wraps the entire ``run_batch`` body.
- Inside it, ``ledger.start_run(...)`` writes the run row BEFORE any
  step fires (preserved from #16 — needed for SIGKILL realism).
- ``cap.set_record_id(run_id)`` stashes the ledger run id onto the
  envelope. Agent runtimes now have one cross-reference: the
  envelope's ``record_id`` is the value to pass to
  ``bcli batch state <run-id>``.
- The ``disable_writes`` gate (from #15) runs *inside* the capture
  block, AFTER ``start_run``. A refused write finalizes the ledger
  as ``failed`` and emits a failed envelope before re-raising — both
  attestations stay consistent.
- ``cap.emit_success`` / ``emit_failure`` are paired with
  ``ledger.finish_run(...)`` on every exit path. Mixed-result batches
  (some ok, some error) get ``status="failed"`` on the envelope and
  ``state="partially_committed"`` on the ledger; the same RuntimeError
  feeds both.
- The ``BaseException`` handler from #16 is preserved so a
  ``KeyboardInterrupt`` still re-raises after the ledger derives a
  truthful state via ``compute_run_state``.

Dry-run decision: ``ledger.finish_run(run_id, "completed")`` IS called
on dry-run, so the ledger row never lingers in ``planned`` state. The
envelope's ``dry_run=true`` is the disambiguator for "no HTTP fired."
Comment in the source explains the call.

New integration test:
- ``tests/test_envelope/test_batch_envelope_with_ledger.py`` — three
  scenarios (happy path, partial commit, policy refusal) pin that
  envelope and ledger stay in agreement, including the
  ``record_id == run_id`` cross-reference.

Test isolation: added an autouse ``isolated_home`` fixture in
``tests/test_envelope/conftest.py`` so envelope tests that touch
``run_batch`` don't pollute the user's real
``~/.config/bcli/batch/`` dir.

Test update: ``TestBatchRunEnvelope.test_batch_run_writes_envelope_all_ok``
now asserts ``record_id`` is the 32-char hex run id (was: ``None``) —
matches the new ledger cross-reference behavior.

Totals: 727 passed, 5 skipped. Ruff clean.
Copy link
Copy Markdown
Owner Author

@igor-ctrl igor-ctrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lead re-review of merge commit d1ead8e — APPROVE

Post-conflict resolution validated. PR is MERGEABLE / CLEAN. d1ead8e is the only new commit on top of 113200e after #14 and #16 landed on main.

Integration walk-through (batch_cmd.py vs current main)

  • capture(...) wraps the entire run_batch body — confirmed (line 97 onward).
  • ledger.start_run sits inside the capture block, after workflow params setup — confirmed (line 166-174, after _validate_step_names). Preserves #16's "intent row before any step" property.
  • cap.set_record_id(run_id) — confirmed (line 179). Reuses the existing envelope field with new semantic meaning for BATCH_RUN: the envelope's record_id IS the ledger run id. No schema change; agents can pivot from envelope → bcli batch state <run-id> with one field. Clean.
  • Dry-run finalizes both attestations — confirmed (line 189-195): _print_dry_runledger.finish_run(run_id, "completed")ledger.close()cap.mark_dry_run()cap.emit_success() → return. No zombie planned ledger row; dry_run=true on the envelope is the disambiguator. The choice is defensible — operators inspecting the ledger see completed for an intentional no-op, which matches the user's mental model better than a dangling running.
  • Policy refusal finalizes both as failed — confirmed (line 217-228): try/except typer.Exit around confirm_write_or_exitcompute_run_state derives → override to failed if needed → ledger.finish_run(run_id, "failed")cap.emit_failure(RuntimeError("BATCH WRITE refused by disable_writes gate")) → re-raise. Both attestations consistent: envelope status=failed, ledger state=failed.
  • Mixed-result on happy path — confirmed (line 295-302): synthetic RuntimeError(f"{failed_count} of {len(results)} steps failed") feeds cap.emit_failureledger.finish_run(run_id, final_state) where final_state is partially_committed if any succeeded else failedraise typer.Exit(1). Envelope reports one-line failure; ledger holds the long-form partially_committed detail.
  • BaseException ordering — confirmed (line 309-326): on real Exception (not KeyboardInterrupt/SystemExit): cap.emit_failure(e) first, then compute_run_state + finish_run + close. On KeyboardInterrupt/SystemExit: no envelope emit, ledger still finalizes via derivation, re-raise. SIGKILL realism from #16 preserved — Python signal handlers shouldn't conjure a "complete" envelope.
  • except typer.Exit: raise — confirmed (line 306-308). Lets the policy-refusal typer.Exit and mixed-result typer.Exit(1) pass through without re-finalization. Important: prevents double-emit on the envelope (the salvage path in capture() is guarded by cap_state.written so even if it tried, it wouldn't double-write).

Integration test (test_batch_envelope_with_ledger.py)

Three scenarios, each asserting envelope ↔ ledger agreement plus the record_id == run_id cross-reference:

  1. TestEnvelopeAndLedgerAgreeOnSuccess::test_success_envelope_matches_ledger_run_id — envelope status="succeeded"/exit_code=0/method="BATCH_RUN", ledger derived state completed, env["record_id"] == run_row["run_id"].
  2. TestEnvelopeAndLedgerAgreeOnPartialCommit::test_partial_commit_envelope_failed_ledger_partial — Step 1 commits, step 2 raises RuntimeError; envelope status="failed"/exit_code=1, ledger derived state partially_committed, cross-reference preserved.
  3. TestEnvelopeOnPolicyRefusalForBatch::test_policy_refusal_emits_failed_envelope_and_failed_ledger — readonly profile + non-interactive + no --yes; envelope status="failed"/exit_code=1/profile=prod/environment=Production, ledger row exists (start_run ran before gate), state="failed", step table empty, fake_client.post.assert_not_awaited(), cross-reference preserved.

The first two are exactly the "envelope-ledger coupling" check the lead asked about. The third documents the chosen behavior so a future refactor doesn't accidentally short-circuit start_run and lose the audit trail of "the run was attempted but refused."

Test hardening

  • isolated_home fixture in tests/test_envelope/conftest.py is now autouse — every envelope test redirects HOME and Path.home() to tmp_path. Prevents run_batch-touching tests from polluting ~/.config/bcli/batch/. Right move.
  • TestBatchRunEnvelope.test_batch_run_writes_envelope_all_ok updated: record_id assertion changed from None to len == 32 (uuid4 hex). Comment cites #15 + #16 integration. Consistent with new behavior.

Local verification (in /tmp/aip-review-15 worktree)

  • Focused suites — uv run pytest tests/test_envelope/ tests/test_batch_ledger/ tests/test_cli/test_batch_safety.py85 passed in 0.67s (43 envelope incl. 3 new integration + 34 ledger + 8 batch_safety).
  • Full suite — uv run pytest tests/727 passed, 5 skipped in 20.73s (matches worker self-report).
  • uv run ruff check src/ tests/clean.

Non-blocking observation

Failure-path ordering differs between branches:

  • Policy refusal: ledger.finish_run first, then cap.emit_failure, then re-raise.
  • BaseException: cap.emit_failure first, then ledger.finish_run, then re-raise.

Both produce correct attestations because cap.set_record_id(run_id) runs unconditionally at line 179 — by the time either failure branch fires, the envelope already knows which ledger to point at. The BaseException docstring explicitly says "Order matters: emit the envelope BEFORE finalizing the ledger so both attestations describe the same outcome" — but the policy-refusal branch doesn't follow that principle. Not a correctness bug, just a minor inconsistency. Worth noting for Phase 4 worker if they happen to be touching this region.

After this merges

Standing down. Worker B picks up Phase 5 (bcli_mcp rewrite to consume bcli describe + --result-out); Worker A picks up Phase 4 (exit-code taxonomy + JSON-on-pipe + idempotency-key + progress events).

Verdict: APPROVE. Igor merges; I do not.

@igor-ctrl igor-ctrl merged commit 7a3ac95 into main May 17, 2026
3 checks passed
igor-ctrl added a commit that referenced this pull request May 17, 2026
…(AIP Phase 4) (#18)

* feat(exit-codes): wire AIP §Phase 4a taxonomy + project in describe

- Add EXIT_CODES map + describe_exit_code() helper in bcli.exit_codes.
- bcli describe --format json now emits a top-level `exit_codes`
  field; subtree mode trims it (token-saving).
- confirm_write_or_exit now raises typer.Exit(EXIT_POLICY=8) — the
  unfinished business from PR #15 review. Policy refusal is now
  distinguishable from a generic crash for agent runtimes.
- Capture.emit_failure() takes an optional exit_code override so the
  envelope's exit_code matches the CLI's actual exit on policy refusal
  (was leaking through as EXIT_GENERIC_ERROR=1 before).
- batch_cmd policy-refusal path: emit envelope BEFORE finalizing the
  ledger so both attestations describe the same outcome (matches the
  BaseException branch's documented ordering — addresses the lead's
  PR #15 non-blocking note).
- Update existing tests (test_safety, test_batch_safety,
  test_envelope_policy_violation, test_batch_envelope_with_ledger) to
  assert exit_code=8 for policy refusal — this is the intentional
  breaking change called out in the Phase 4 plan.

Tests: +5 in tests/test_exit_codes/test_taxonomy.py, +2 in
tests/test_describe/. Full suite green: 734 passed.

* feat(output): JSON on non-TTY by default (AIP §Phase 4b)

When stdout isn't a TTY and the user didn't pass --format, emit JSON
instead of markdown. Pipes, redirects, CI steps and agent runtimes
all get the canonical machine-readable shape with no flag dance.

Scope: only the non-TTY branch flips. CLAUDECODE / BCLI_AGENT env
hints keep their existing markdown semantics (explicit user opt-in).
Win32-non-WindowsTerminal stays on markdown for the same mojibake
reason. BCLI_FORMAT and an explicit --format always win.

Tests: +7 in tests/test_output/test_json_on_pipe.py covering each
branch. Existing tests/test_cli/test_output_format.py::test_non_tty
updated to assert the new contract.

* feat(errors): centralized did-you-mean error handler (AIP §Phase 4c)

Catch BCLIError subclasses at the outer main() boundary and:

1. Map to the documented exit-code taxonomy
   (AuthError → 3, RegistryError/NotFoundError → 4, ValidationError →
   5, ConfigError → 2, SafetyError → 8, others → status-derived).
2. Append a remediation hint after the error message:
   - AuthError → "Run 'bcli auth login --profile <active>'"
   - ConfigError (no profiles) → "Run 'bcli config init'"
   - ConfigError (unknown profile) → fuzzy "Did you mean: ...?"
   - RegistryError (no fuzzy match) → "Run 'bcli registry import ...'"

Idempotent: if the upstream raise site already named the fix
(e.g. ConfigError already mentions config init), we don't append a
duplicate. Existing RegistryError fuzzy hint at the raise site is
preserved untouched.

Non-BCLIError exceptions pass through with a traceback so a real
crash isn't masked.

Tests: +10 in tests/test_errors/test_did_you_mean.py.

* feat(idempotency): --idempotency-key on write verbs + ledger v2 (AIP §Phase 4d)

Plumb an opaque idempotency token end-to-end:

- Transport: --idempotency-key forwards as the IETF Idempotency-Key
  header on POST/PATCH/DELETE/PATCH-binary requests.
- AsyncBCClient: post/patch/delete/upload_attachment accept and
  pass through idempotency_key (only the metadata POST in the
  two-phase attach gets the header; binary PATCH is content-only).
- CLI: --idempotency-key KEY flag added to post / patch / delete /
  attach upload. Tests pin the keyword reaches the client.

Ledger schema v2 (additive-only migration):
- New step.idempotency_key TEXT column.
- _ensure_schema migrates v1 ledger DBs via
  `ALTER TABLE step ADD COLUMN idempotency_key TEXT` plus a version
  row bump — preserves all existing rows. Test exercises a hand-
  crafted v1 DB through the migration.
- Ledger.write_intent gains idempotency_key kw.
- Ledger.find_committed_idempotent_step(key) returns the prior
  StepRow when the same key already committed in the current run —
  the same-run replay protection.

Scope deferred to v0.2:
- Cross-run collision detection. Implementing it would need to scan
  every *.db in batch/ on each call; same-run protection covers the
  agent-retry case which is the common one.
- `bcli batch run --idempotency-key` flag. A single run-level key
  would collide across multiple mutating steps in the same batch.
  The right place for per-step keys is the batch YAML schema — left
  for a follow-up where the workflow loader is touched anyway.

Tests:
- tests/test_batch_ledger/test_ledger_idempotency.py +8
- tests/test_idempotency/test_idempotency_key.py +4 (transport)
- tests/test_idempotency/test_cli_flags.py +3 (CLI plumbing)
- existing test_ledger_schema::test_schema_version_recorded
  updated for SCHEMA_VERSION=2.

* feat(progress): JSON step events to --progress-fd (AIP §Phase 4e)

Long-running ``bcli batch run`` and ``bcli extract run`` now stream
structured progress events on a dedicated file descriptor, separate
from --result-out/-fd (Phase 2). Stderr stays human-readable; the fd
channel is JSON-lines and stable for an agent to demux.

Event shape:

  {"event": "step_started",   "seq": 3, "method": "POST", ...}
  {"event": "step_completed", "seq": 3, "status": "committed",
                              "duration_ms": 287, ...}

For ``extract run`` (single-pass), it's one started/completed pair.

Implementation:
- New ``bcli_cli/_progress.py`` ``ProgressEmitter`` writes JSON-lines
  to an fd. ``fd=None`` is a documented no-op so command code can call
  ``emit()`` unconditionally.
- ``batch_cmd._execute_batch`` accepts ``progress=`` and emits paired
  events around every step, including early-``continue`` branches.
- ``extract_cmd.run_command`` emits started + completed (status: ok |
  empty | failed).
- ``--progress-fd N`` flag added to both commands.
- Each command closes the fd as soon as the work finishes so the
  consumer EOFs.

The describe walker picks up the new options automatically — no
schema change needed; cache invalidates on tool_version bump.

Tests: +6 in tests/test_progress/test_progress_events.py covering
emitter primitives, Typer-default detection, ISO timestamp shape,
and the batch run end-to-end through a real fd.

* fix(idempotency): seed state config in CLI tests + wire same-run dedup

Address PR #18 review blockers.

Blocker 1 — hermetic CI failure on 3.11/3.12/3.13:
The three test_idempotency/test_cli_flags.py tests called the mutation
commands directly with yes=True, but confirm_write_or_exit reads
state.profile unconditionally to check disable_writes. In a hermetic
environment (HOME=/tmp/empty) that raises ConfigError before the
assertion runs.

Fix: new tests/test_idempotency/conftest.py mirroring the
``cli_state`` pattern from tests/test_envelope/conftest.py — seeds a
writable Sandbox profile via state._config and isolates HOME to
tmp_path. All three failing tests now take the writable_state fixture.

Verified with HOME=/tmp/empty-bcli-test uv run pytest
tests/test_idempotency/ — green.

Blocker 2 — same-run replay protection unwired:
Ledger.find_committed_idempotent_step had 8 unit tests but zero
production call sites. The contract bar (§Phase 4d) is explicit:
"stored in the batch ledger; replayable on retry."

Fix (option (a), the lead's preference): wire the primitive into
batch_cmd._execute_batch. For each mutating step that carries an
idempotency_key, query the ledger before write_intent + HTTP. If a
prior committed step in the same run already used that key, skip the
HTTP call, surface a "replayed" result entry, and emit
``step_started`` + ``step_completed`` events with status="replayed"
so the agent's progress stream still tells the truth.

The replayed result entry carries prior_seq, prior_step_id, and
prior_bc_correlation_id so the agent has the audit trail of the
already-committed call. Response bodies are NOT replayed — the ledger
doesn't persist them — workflow ``${{ steps.X.field }}`` references
on a replayed step get an empty data dict, matching today's behavior
for steps that didn't return a body.

Workflow YAML now accepts ``idempotency_key:`` on a step.
batch_cmd forwards it to client.post/patch/delete so the
Idempotency-Key header still goes out for the FIRST call (gateway-
level dedup remains in play).

Cross-run replay protection remains deferred to v0.2 (would require
scanning every *.db in ~/.config/bcli/batch/ on each call).

Integration tests in tests/test_batch_ledger/test_idempotency_replay.py:
- test_duplicate_key_replays_second_step: two posts with the same
  key in one batch → second is replayed, no second HTTP, no second
  intent row.
- test_distinct_keys_both_fire_http: different keys = no replay
  (sanity).
- test_no_key_no_replay: existing behavior preserved for steps
  without a key.

Suite: 772 → 775. Hermetic suite (HOME=/tmp/empty-bcli-test) on
tests/test_idempotency/ tests/test_batch_ledger/ tests/test_envelope/:
95 passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant