feat(batch): SQLite operation ledger + state/list/rollback (AIP Phase 3) by igor-ctrl · Pull Request #16 · igor-ctrl/bcli

igor-ctrl · 2026-05-17T01:11:17Z

Summary

AIP v0.1 Phase 3 — durable per-run SQLite ledger that gives bcli batch a real answer to "POST committed, then stdout died":

Every run writes an intent row before each HTTP call and an outcome row after, surviving SIGKILL in the gap.
New commands bcli batch state, bcli batch list, bcli batch rollback read the ledger.
Run state derived from step rows so a SIGKILLed run surfaces as partially_committed instead of the stale running stamp.
disable_writes is a hard refusal on rollback (no --yes escape hatch).

What's new

Module

src/bcli/batch/ledger.py — Ledger class, stdlib sqlite3, autocommit + WAL + synchronous=NORMAL so the intent row hits disk before HTTP fires.
Schema (version 1):
- run(run_id PK, manifest_path, manifest_hash, profile, environment, company, state CHECK (...), started_at, finished_at)
- step(step_id PK AUTOINCREMENT, run_id FK, seq, intent_ts, method, url, body_hash, outcome_ts, status, bc_correlation_id, error_message, rollback_url)
- schema_version(version)

Commands

bcli batch run     # already existed; now writes a ledger row every step
bcli batch state <run-id> [--format json|table]
bcli batch list   [--state STATE] [--limit N] [--format json|table]
bcli batch rollback <run-id> [--dry-run] [--yes]

SDK

AsyncBCClient.delete_url(url, *, etag) — rollback DELETEs an absolute URL captured at POST time, without re-resolving the registry.

Rollback v0.1 limitations

Method	Behaviour
POST → DELETE	issued automatically against the captured `rollback_url` (record URL composed from response `id`/`systemId`). Step → `rolled_back`.
PATCH	rollback_skipped — no clean inverse without a pre-image snapshot. Help text spells this out.
DELETE	rollback_skipped — same reason.
Intent-only steps (no outcome)	skipped — we don't know if the HTTP succeeded server-side.

The rollback command help text documents this and operators get a per-step preview before confirming.

Design notes for reviewers

Run state stamp vs derivation. finish_run stamps; compute_run_state derives. Once finish_run ran (i.e. finished_at IS NOT NULL), the stamp wins. For unfinished runs (the SIGKILL case) the step rows decide. This is the only way a process that died without finish_run can still tell the truth at read time. See the docstring on compute_run_state for the table.
Spec brief mentioned status="unknown" for intent-only steps. I chose to leave the step status NULL and let the run-level derivation interpret "intent without outcome" as partially_committed. Cleaner than synthesising a transient enum value. Behaviour parity with the brief's intent.
SIGKILL test. The test_partially_committed_after_simulated_crash test raises a BaseException mid-batch — this exercises the graceful except BaseException: path in run_batch. Real SIGKILL (no finish_run, no exception handler) is exercised by test_list_uses_computed_state_for_unfinished_runs (a ledger seeded by hand with no finished_at). Together the two cover the contract.
Behaviour unchanged. The ledger path + run id are emitted to stderr (_stderr console), not stdout — stdout still matches the legacy batch output byte-for-byte. Required by the additive constraint.
No new deps. stdlib sqlite3 only.

Test plan

uv run pytest tests/ -v — 665 passed, 5 skipped
uv run ruff check src/ tests/ — clean
34 new tests under tests/test_batch_ledger/ cover:
- schema creation + CHECK enum
- intent written before HTTP (enforced via mock side_effect that inspects the DB at call time)
- outcome written after with bc_correlation_id + rollback_url capture
- SIGKILL recovery → partially_committed derivation
- batch state JSON + table modes
- batch list with --state filter + empty-state behaviour
- batch rollback issues DELETE for POST, skips PATCH/DELETE, refuses on disable_writes, leaves intent-only steps alone
- RunLedgerExistsError defensive guard against run-id collision
- WAL + synchronous=NORMAL pragmas
Manual help-text smoke test (uv run bcli batch --help, plus each subcommand)
Manual bcli batch list end-to-end returns [] for empty ledger
Existing tests/test_cli/test_batch_safety.py adapted with an autouse HOME-isolation fixture so the ledger lands in tmp_path instead of the developer's ~/.config/bcli/batch/

Out of scope (deferred)

Pre-image snapshots for PATCH/DELETE rollback — would need a separate ledger column + body-fetch dance; not in v0.1.
Idempotency keys — Phase 4 territory.
Ledger garbage collection / retention — operators delete ~/.config/bcli/batch/*.db by hand for now; one file per run, easy to ship/delete.

Adds a durable per-run SQLite ledger to bcli batch. Every run writes an intent row before each HTTP call and an outcome row after, so a SIGKILLed batch leaves a recoverable record of what landed server-side. New module: - src/bcli/batch/ledger.py — Ledger class with WAL + synchronous=NORMAL, schema_version table, derived state via compute_run_state() New CLI commands: - bcli batch state <run-id> [--format json|table] — full ledger dump - bcli batch list [--state STATE] [--limit N] [--format json|table] - bcli batch rollback <run-id> [--dry-run] [--yes] — DELETEs captured POST URLs; PATCH/DELETE marked rollback_skipped (no clean inverse without pre-image snapshots in v0.1) Safety: - disable_writes is a hard refusal on rollback; --yes does NOT bypass - run state derivation honours finish_run stamps but falls back to step rows for SIGKILLed runs (operator sees partially_committed instead of the stale "running") SDK addition: - AsyncBCClient.delete_url() so rollback can DELETE the captured absolute URL without re-resolving the registry Behavior unchanged on stdout: ledger path + run id go to stderr. Tests: - 34 new tests in tests/test_batch_ledger/ (schema, intent-before-HTTP, partially_committed derivation, list filter, state/list/rollback CLI command surfaces, disable_writes refusal, PATCH/DELETE skipping) - 5 existing test_batch_safety tests get an autouse HOME-isolation fixture so the ledger lands in tmp_path, not the developer's real ~/.config/bcli/batch - Full suite: 665 passed, 5 skipped; ruff clean

igor-ctrl

Lead review — APPROVE (pending Igor's merge)

Reviewed against agent-cli-contract-v0.1.md §Phase 3 and tasks/todo.md Phase 3 checklist. Worker self-report verified: 665 passed / 5 skipped, ruff clean, ledger.py at 94% coverage, stdlib sqlite3 only. Strongest PR of the trio — thoughtful design notes in the PR body, defensive guards in the right places, and the read-side derivation makes "POST committed, then we died" actually visible to an operator.

Six review foci → findings

1. Schema matches contract §Phase 3. All run columns present; CHECK enum has all seven states including partially_committed; step columns match. test_state_enum_allowed_values round-trips every enum value through finish_run. Bad values rejected by sqlite3.IntegrityError as expected. ✓

2. compute_run_state derives partially_committed correctly.
Logic: committed + (unknown OR failed) → partially_committed. Stamped state wins for finished runs; un-stamped runs derive from step rows. Pinned by test_intent_without_outcome_is_partially_committed and test_failed_after_committed_is_partially_committed. ✓

3. SIGKILL test realism.
Two-part coverage that I think correctly maps the contract:

test_partially_committed_after_simulated_crash exercises the except BaseException: path in run_batch — handler runs, compute_run_state + finish_run stamp it.
test_list_uses_computed_state_for_unfinished_runs seeds a ledger by hand with no finish_run and an intent row missing its outcome — proves the derivation handles a truly un-stamped run.

Real SIGKILL (no handler runs at all) falls between those two, but the read-side derivation is what matters at recovery time — and that's what test #2 directly validates. WAL + synchronous=NORMAL makes the durability claim (intent row hits disk before HTTP) credible; pinned by test_wal_mode_enabled. ✓

4. rollback_url is sufficient for a clean DELETE.
Pattern: _compose_rollback_url(client, entity, post_result) → {client._resolve_url(entity)}({id}). The new AsyncBCClient.delete_url(url, etag="*") takes that captured absolute URL directly — no registry re-resolution at rollback time (registry could have shifted between runs). test_post_step_stores_rollback_url validates capture; test_committed_post_issues_delete validates dispatch via delete_url.await_count. ✓

One minor concern: etag="*" is hardcoded. Means a rolled-back record modified between POST and rollback gets clobbered. Defensible for v0.1 since the rollback context is "undo what I just created," but tracking etag in the ledger for v0.2 would let rollback honor concurrency. Not blocking.

5. Stderr routing.
Dedicated _stderr = Console(stderr=True). Ledger path + run id emitted via _stderr.print(...) — never touches stdout. The state/list JSON outputs use typer.echo() direct to stdout. test_state_json_returns_full_ledger does json.loads(result.stdout) which proves nothing non-JSON leaks. Step-level progress messages still use the regular console (stdout) which matches legacy behavior — exactly the additive constraint required. ✓

6. 94% coverage gap.
Inspected the 8 uncovered lines:

__enter__/__exit__ (191-192, 195) — unused; tests use explicit lifecycle. Harmless.
LookupError raise in get_run (378) — defensive raise; the matching path for get_steps is tested implicitly.
rolled_back derivation path (425) — only reachable if all steps are rolled_back/rollback_skipped and the run is un-stamped. Rollback's own finish_run("rolled_back") stamps the run so the read goes through line 411-412 first. The derivation is theoretical and correct.
Empty-dir early return in list_runs (459) — trivially correct.
Malformed-DB skip (482-484) — defensive catch for unrelated .db files in ~/.config/bcli/batch/.

None are crash-safety-critical. 94% is the right place to land. ✓

Additional observations

_close_if_coroutine helper for AsyncMock leak suppression — well-documented, contained, pragmatic. ResourceWarnings in pytest output are test-only.
Step status deliberately not CHECK-constrained — worker documents the vocabulary in the module docstring (committed/failed/rollback_skipped/rolled_back/unknown/skipped/rollback_failed). Reasonable since step statuses evolve faster than run states; module docstring is the source of truth.
_isolate_ledger_home autouse fixture added to test_batch_safety.py — exactly the right defensive isolation; prevents the 12 pre-existing batch safety tests from polluting ~/.config/bcli/batch/ on the developer's box.
disable_writes is a hard refusal on rollback (no --yes bypass) — correctly tested in test_readonly_profile_aborts_before_any_http. This is stricter than batch run (which allows --yes to bypass) and the right call: rollback issues unprompted inverse mutations on profiles the operator marked read-only.

New SDK surface

AsyncBCClient.delete_url(url, *, etag="*") is a new public method. The docstring explains why (rollback can't re-resolve the registry). Worth a one-line mention in AGENTS.md so agents know it exists for absolute-URL DELETE; not blocking.

Spec deferrals (worker called out; agree)

PATCH/DELETE rollback requires pre-image snapshots — out of scope for v0.1, documented in the rollback help text and in tests test_patch_step_is_rollback_skipped / test_delete_step_is_rollback_skipped.
Ledger garbage collection — one file per run keeps it trivially deletable; no retention policy in v0.1.
Idempotency keys — Phase 4 territory.

Verification summary

uv run pytest tests/ → 665 passed, 5 skipped (matches self-report)
uv run ruff check src/ tests/ → clean
uv run pytest tests/test_batch_ledger/ --cov=bcli.batch.ledger → 94% (matches self-report)
Smoke: bcli batch list --format json → pure JSON [] on stdout, nothing on stderr (verified via json.load)
Live ledger commands present: bcli batch state <id>, bcli batch list, bcli batch rollback <id>

After this merges

Stand by — no immediate next phase for Worker C (Phase 6/7 come after 0.4.0 ships). Worker A picks up Phase 4 after Phase 1 merges; Worker B picks up Phase 5 after Phases 1 + 2 merge.

Verdict: APPROVE. Igor merges; I do not.

PRs #15 (result envelope) and #16 (batch ledger) both modified ``batch_cmd.run_batch``'s control flow. This merge threads the envelope ``capture(...)`` wrap around the SQLite ledger writes so both attestations describe the same outcome. Integration order: - ``capture(...)`` wraps the entire ``run_batch`` body. - Inside it, ``ledger.start_run(...)`` writes the run row BEFORE any step fires (preserved from #16 — needed for SIGKILL realism). - ``cap.set_record_id(run_id)`` stashes the ledger run id onto the envelope. Agent runtimes now have one cross-reference: the envelope's ``record_id`` is the value to pass to ``bcli batch state <run-id>``. - The ``disable_writes`` gate (from #15) runs *inside* the capture block, AFTER ``start_run``. A refused write finalizes the ledger as ``failed`` and emits a failed envelope before re-raising — both attestations stay consistent. - ``cap.emit_success`` / ``emit_failure`` are paired with ``ledger.finish_run(...)`` on every exit path. Mixed-result batches (some ok, some error) get ``status="failed"`` on the envelope and ``state="partially_committed"`` on the ledger; the same RuntimeError feeds both. - The ``BaseException`` handler from #16 is preserved so a ``KeyboardInterrupt`` still re-raises after the ledger derives a truthful state via ``compute_run_state``. Dry-run decision: ``ledger.finish_run(run_id, "completed")`` IS called on dry-run, so the ledger row never lingers in ``planned`` state. The envelope's ``dry_run=true`` is the disambiguator for "no HTTP fired." Comment in the source explains the call. New integration test: - ``tests/test_envelope/test_batch_envelope_with_ledger.py`` — three scenarios (happy path, partial commit, policy refusal) pin that envelope and ledger stay in agreement, including the ``record_id == run_id`` cross-reference. Test isolation: added an autouse ``isolated_home`` fixture in ``tests/test_envelope/conftest.py`` so envelope tests that touch ``run_batch`` don't pollute the user's real ``~/.config/bcli/batch/`` dir. Test update: ``TestBatchRunEnvelope.test_batch_run_writes_envelope_all_ok`` now asserts ``record_id`` is the 32-char hex run id (was: ``None``) — matches the new ledger cross-reference behavior. Totals: 727 passed, 5 skipped. Ruff clean.

igor-ctrl commented May 17, 2026

View reviewed changes

igor-ctrl merged commit c6b74c1 into main May 17, 2026
3 checks passed

igor-ctrl mentioned this pull request May 17, 2026

feat(envelope): mutation result envelope via --result-out/--result-fd (AIP Phase 2) #15

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(batch): SQLite operation ledger + state/list/rollback (AIP Phase 3)#16

feat(batch): SQLite operation ledger + state/list/rollback (AIP Phase 3)#16
igor-ctrl merged 1 commit into
mainfrom
feat/batch-ledger

igor-ctrl commented May 17, 2026

Uh oh!

igor-ctrl left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igor-ctrl commented May 17, 2026

Summary

What's new

Module

Commands

SDK

Rollback v0.1 limitations

Design notes for reviewers

Test plan

Out of scope (deferred)

Uh oh!

igor-ctrl left a comment

Choose a reason for hiding this comment

Lead review — APPROVE (pending Igor's merge)

Six review foci → findings

Additional observations

New SDK surface

Spec deferrals (worker called out; agree)

Verification summary

After this merges

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant