feat(batch): SQLite operation ledger + state/list/rollback (AIP Phase 3)#16
Conversation
Adds a durable per-run SQLite ledger to bcli batch. Every run writes an intent row before each HTTP call and an outcome row after, so a SIGKILLed batch leaves a recoverable record of what landed server-side. New module: - src/bcli/batch/ledger.py — Ledger class with WAL + synchronous=NORMAL, schema_version table, derived state via compute_run_state() New CLI commands: - bcli batch state <run-id> [--format json|table] — full ledger dump - bcli batch list [--state STATE] [--limit N] [--format json|table] - bcli batch rollback <run-id> [--dry-run] [--yes] — DELETEs captured POST URLs; PATCH/DELETE marked rollback_skipped (no clean inverse without pre-image snapshots in v0.1) Safety: - disable_writes is a hard refusal on rollback; --yes does NOT bypass - run state derivation honours finish_run stamps but falls back to step rows for SIGKILLed runs (operator sees partially_committed instead of the stale "running") SDK addition: - AsyncBCClient.delete_url() so rollback can DELETE the captured absolute URL without re-resolving the registry Behavior unchanged on stdout: ledger path + run id go to stderr. Tests: - 34 new tests in tests/test_batch_ledger/ (schema, intent-before-HTTP, partially_committed derivation, list filter, state/list/rollback CLI command surfaces, disable_writes refusal, PATCH/DELETE skipping) - 5 existing test_batch_safety tests get an autouse HOME-isolation fixture so the ledger lands in tmp_path, not the developer's real ~/.config/bcli/batch - Full suite: 665 passed, 5 skipped; ruff clean
igor-ctrl
left a comment
There was a problem hiding this comment.
Lead review — APPROVE (pending Igor's merge)
Reviewed against agent-cli-contract-v0.1.md §Phase 3 and tasks/todo.md Phase 3 checklist. Worker self-report verified: 665 passed / 5 skipped, ruff clean, ledger.py at 94% coverage, stdlib sqlite3 only. Strongest PR of the trio — thoughtful design notes in the PR body, defensive guards in the right places, and the read-side derivation makes "POST committed, then we died" actually visible to an operator.
Six review foci → findings
1. Schema matches contract §Phase 3. All run columns present; CHECK enum has all seven states including partially_committed; step columns match. test_state_enum_allowed_values round-trips every enum value through finish_run. Bad values rejected by sqlite3.IntegrityError as expected. ✓
2. compute_run_state derives partially_committed correctly.
Logic: committed + (unknown OR failed) → partially_committed. Stamped state wins for finished runs; un-stamped runs derive from step rows. Pinned by test_intent_without_outcome_is_partially_committed and test_failed_after_committed_is_partially_committed. ✓
3. SIGKILL test realism.
Two-part coverage that I think correctly maps the contract:
test_partially_committed_after_simulated_crashexercises theexcept BaseException:path inrun_batch— handler runs,compute_run_state+finish_runstamp it.test_list_uses_computed_state_for_unfinished_runsseeds a ledger by hand with nofinish_runand an intent row missing its outcome — proves the derivation handles a truly un-stamped run.
Real SIGKILL (no handler runs at all) falls between those two, but the read-side derivation is what matters at recovery time — and that's what test #2 directly validates. WAL + synchronous=NORMAL makes the durability claim (intent row hits disk before HTTP) credible; pinned by test_wal_mode_enabled. ✓
4. rollback_url is sufficient for a clean DELETE.
Pattern: _compose_rollback_url(client, entity, post_result) → {client._resolve_url(entity)}({id}). The new AsyncBCClient.delete_url(url, etag="*") takes that captured absolute URL directly — no registry re-resolution at rollback time (registry could have shifted between runs). test_post_step_stores_rollback_url validates capture; test_committed_post_issues_delete validates dispatch via delete_url.await_count. ✓
One minor concern: etag="*" is hardcoded. Means a rolled-back record modified between POST and rollback gets clobbered. Defensible for v0.1 since the rollback context is "undo what I just created," but tracking etag in the ledger for v0.2 would let rollback honor concurrency. Not blocking.
5. Stderr routing.
Dedicated _stderr = Console(stderr=True). Ledger path + run id emitted via _stderr.print(...) — never touches stdout. The state/list JSON outputs use typer.echo() direct to stdout. test_state_json_returns_full_ledger does json.loads(result.stdout) which proves nothing non-JSON leaks. Step-level progress messages still use the regular console (stdout) which matches legacy behavior — exactly the additive constraint required. ✓
6. 94% coverage gap.
Inspected the 8 uncovered lines:
__enter__/__exit__(191-192, 195) — unused; tests use explicit lifecycle. Harmless.LookupErrorraise inget_run(378) — defensive raise; the matching path forget_stepsis tested implicitly.rolled_backderivation path (425) — only reachable if all steps arerolled_back/rollback_skippedand the run is un-stamped. Rollback's ownfinish_run("rolled_back")stamps the run so the read goes through line 411-412 first. The derivation is theoretical and correct.- Empty-dir early return in
list_runs(459) — trivially correct. - Malformed-DB skip (482-484) — defensive catch for unrelated
.dbfiles in~/.config/bcli/batch/.
None are crash-safety-critical. 94% is the right place to land. ✓
Additional observations
_close_if_coroutinehelper for AsyncMock leak suppression — well-documented, contained, pragmatic. ResourceWarnings in pytest output are test-only.- Step
statusdeliberately not CHECK-constrained — worker documents the vocabulary in the module docstring (committed/failed/rollback_skipped/rolled_back/unknown/skipped/rollback_failed). Reasonable since step statuses evolve faster than run states; module docstring is the source of truth. _isolate_ledger_homeautouse fixture added totest_batch_safety.py— exactly the right defensive isolation; prevents the 12 pre-existing batch safety tests from polluting~/.config/bcli/batch/on the developer's box.disable_writesis a hard refusal on rollback (no--yesbypass) — correctly tested intest_readonly_profile_aborts_before_any_http. This is stricter thanbatch run(which allows--yesto bypass) and the right call: rollback issues unprompted inverse mutations on profiles the operator marked read-only.
New SDK surface
AsyncBCClient.delete_url(url, *, etag="*") is a new public method. The docstring explains why (rollback can't re-resolve the registry). Worth a one-line mention in AGENTS.md so agents know it exists for absolute-URL DELETE; not blocking.
Spec deferrals (worker called out; agree)
- PATCH/DELETE rollback requires pre-image snapshots — out of scope for v0.1, documented in the rollback help text and in tests
test_patch_step_is_rollback_skipped/test_delete_step_is_rollback_skipped. - Ledger garbage collection — one file per run keeps it trivially deletable; no retention policy in v0.1.
- Idempotency keys — Phase 4 territory.
Verification summary
uv run pytest tests/→ 665 passed, 5 skipped (matches self-report)uv run ruff check src/ tests/→ cleanuv run pytest tests/test_batch_ledger/ --cov=bcli.batch.ledger→ 94% (matches self-report)- Smoke:
bcli batch list --format json→ pure JSON[]on stdout, nothing on stderr (verified viajson.load) - Live ledger commands present:
bcli batch state <id>,bcli batch list,bcli batch rollback <id>
After this merges
Stand by — no immediate next phase for Worker C (Phase 6/7 come after 0.4.0 ships). Worker A picks up Phase 4 after Phase 1 merges; Worker B picks up Phase 5 after Phases 1 + 2 merge.
Verdict: APPROVE. Igor merges; I do not.
PRs #15 (result envelope) and #16 (batch ledger) both modified ``batch_cmd.run_batch``'s control flow. This merge threads the envelope ``capture(...)`` wrap around the SQLite ledger writes so both attestations describe the same outcome. Integration order: - ``capture(...)`` wraps the entire ``run_batch`` body. - Inside it, ``ledger.start_run(...)`` writes the run row BEFORE any step fires (preserved from #16 — needed for SIGKILL realism). - ``cap.set_record_id(run_id)`` stashes the ledger run id onto the envelope. Agent runtimes now have one cross-reference: the envelope's ``record_id`` is the value to pass to ``bcli batch state <run-id>``. - The ``disable_writes`` gate (from #15) runs *inside* the capture block, AFTER ``start_run``. A refused write finalizes the ledger as ``failed`` and emits a failed envelope before re-raising — both attestations stay consistent. - ``cap.emit_success`` / ``emit_failure`` are paired with ``ledger.finish_run(...)`` on every exit path. Mixed-result batches (some ok, some error) get ``status="failed"`` on the envelope and ``state="partially_committed"`` on the ledger; the same RuntimeError feeds both. - The ``BaseException`` handler from #16 is preserved so a ``KeyboardInterrupt`` still re-raises after the ledger derives a truthful state via ``compute_run_state``. Dry-run decision: ``ledger.finish_run(run_id, "completed")`` IS called on dry-run, so the ledger row never lingers in ``planned`` state. The envelope's ``dry_run=true`` is the disambiguator for "no HTTP fired." Comment in the source explains the call. New integration test: - ``tests/test_envelope/test_batch_envelope_with_ledger.py`` — three scenarios (happy path, partial commit, policy refusal) pin that envelope and ledger stay in agreement, including the ``record_id == run_id`` cross-reference. Test isolation: added an autouse ``isolated_home`` fixture in ``tests/test_envelope/conftest.py`` so envelope tests that touch ``run_batch`` don't pollute the user's real ``~/.config/bcli/batch/`` dir. Test update: ``TestBatchRunEnvelope.test_batch_run_writes_envelope_all_ok`` now asserts ``record_id`` is the 32-char hex run id (was: ``None``) — matches the new ledger cross-reference behavior. Totals: 727 passed, 5 skipped. Ruff clean.
Summary
AIP v0.1 Phase 3 — durable per-run SQLite ledger that gives
bcli batcha real answer to "POST committed, then stdout died":bcli batch state,bcli batch list,bcli batch rollbackread the ledger.partially_committedinstead of the stalerunningstamp.disable_writesis a hard refusal on rollback (no--yesescape hatch).What's new
Module
src/bcli/batch/ledger.py—Ledgerclass, stdlibsqlite3, autocommit + WAL +synchronous=NORMALso the intent row hits disk before HTTP fires.run(run_id PK, manifest_path, manifest_hash, profile, environment, company, state CHECK (...), started_at, finished_at)step(step_id PK AUTOINCREMENT, run_id FK, seq, intent_ts, method, url, body_hash, outcome_ts, status, bc_correlation_id, error_message, rollback_url)schema_version(version)Commands
SDK
AsyncBCClient.delete_url(url, *, etag)— rollback DELETEs an absolute URL captured at POST time, without re-resolving the registry.Rollback v0.1 limitations
rollback_url(record URL composed from responseid/systemId). Step →rolled_back.The rollback command help text documents this and operators get a per-step preview before confirming.
Design notes for reviewers
finish_runstamps;compute_run_statederives. Oncefinish_runran (i.e.finished_at IS NOT NULL), the stamp wins. For unfinished runs (the SIGKILL case) the step rows decide. This is the only way a process that died withoutfinish_runcan still tell the truth at read time. See the docstring oncompute_run_statefor the table.status="unknown"for intent-only steps. I chose to leave the stepstatusNULL and let the run-level derivation interpret "intent without outcome" aspartially_committed. Cleaner than synthesising a transient enum value. Behaviour parity with the brief's intent.test_partially_committed_after_simulated_crashtest raises aBaseExceptionmid-batch — this exercises the gracefulexcept BaseException:path inrun_batch. Real SIGKILL (no finish_run, no exception handler) is exercised bytest_list_uses_computed_state_for_unfinished_runs(a ledger seeded by hand with nofinished_at). Together the two cover the contract._stderrconsole), not stdout — stdout still matches the legacy batch output byte-for-byte. Required by the additive constraint.sqlite3only.Test plan
uv run pytest tests/ -v— 665 passed, 5 skippeduv run ruff check src/ tests/— cleantests/test_batch_ledger/cover:bc_correlation_id+rollback_urlcapturepartially_committedderivationbatch stateJSON + table modesbatch listwith--statefilter + empty-state behaviourbatch rollbackissues DELETE for POST, skips PATCH/DELETE, refuses ondisable_writes, leaves intent-only steps aloneRunLedgerExistsErrordefensive guard against run-id collisionuv run bcli batch --help, plus each subcommand)bcli batch listend-to-end returns[]for empty ledgertests/test_cli/test_batch_safety.pyadapted with an autouse HOME-isolation fixture so the ledger lands intmp_pathinstead of the developer's~/.config/bcli/batch/Out of scope (deferred)
~/.config/bcli/batch/*.dbby hand for now; one file per run, easy to ship/delete.