Skip to content

feat(batch): SQLite operation ledger + state/list/rollback (AIP Phase 3)#16

Merged
igor-ctrl merged 1 commit into
mainfrom
feat/batch-ledger
May 17, 2026
Merged

feat(batch): SQLite operation ledger + state/list/rollback (AIP Phase 3)#16
igor-ctrl merged 1 commit into
mainfrom
feat/batch-ledger

Conversation

@igor-ctrl
Copy link
Copy Markdown
Owner

Summary

AIP v0.1 Phase 3 — durable per-run SQLite ledger that gives bcli batch a real answer to "POST committed, then stdout died":

  • Every run writes an intent row before each HTTP call and an outcome row after, surviving SIGKILL in the gap.
  • New commands bcli batch state, bcli batch list, bcli batch rollback read the ledger.
  • Run state derived from step rows so a SIGKILLed run surfaces as partially_committed instead of the stale running stamp.
  • disable_writes is a hard refusal on rollback (no --yes escape hatch).

What's new

Module

  • src/bcli/batch/ledger.pyLedger class, stdlib sqlite3, autocommit + WAL + synchronous=NORMAL so the intent row hits disk before HTTP fires.
  • Schema (version 1):
    • run(run_id PK, manifest_path, manifest_hash, profile, environment, company, state CHECK (...), started_at, finished_at)
    • step(step_id PK AUTOINCREMENT, run_id FK, seq, intent_ts, method, url, body_hash, outcome_ts, status, bc_correlation_id, error_message, rollback_url)
    • schema_version(version)

Commands

bcli batch run     # already existed; now writes a ledger row every step
bcli batch state <run-id> [--format json|table]
bcli batch list   [--state STATE] [--limit N] [--format json|table]
bcli batch rollback <run-id> [--dry-run] [--yes]

SDK

  • AsyncBCClient.delete_url(url, *, etag) — rollback DELETEs an absolute URL captured at POST time, without re-resolving the registry.

Rollback v0.1 limitations

Method Behaviour
POST → DELETE issued automatically against the captured rollback_url (record URL composed from response id/systemId). Step → rolled_back.
PATCH rollback_skipped — no clean inverse without a pre-image snapshot. Help text spells this out.
DELETE rollback_skipped — same reason.
Intent-only steps (no outcome) skipped — we don't know if the HTTP succeeded server-side.

The rollback command help text documents this and operators get a per-step preview before confirming.

Design notes for reviewers

  • Run state stamp vs derivation. finish_run stamps; compute_run_state derives. Once finish_run ran (i.e. finished_at IS NOT NULL), the stamp wins. For unfinished runs (the SIGKILL case) the step rows decide. This is the only way a process that died without finish_run can still tell the truth at read time. See the docstring on compute_run_state for the table.
  • Spec brief mentioned status="unknown" for intent-only steps. I chose to leave the step status NULL and let the run-level derivation interpret "intent without outcome" as partially_committed. Cleaner than synthesising a transient enum value. Behaviour parity with the brief's intent.
  • SIGKILL test. The test_partially_committed_after_simulated_crash test raises a BaseException mid-batch — this exercises the graceful except BaseException: path in run_batch. Real SIGKILL (no finish_run, no exception handler) is exercised by test_list_uses_computed_state_for_unfinished_runs (a ledger seeded by hand with no finished_at). Together the two cover the contract.
  • Behaviour unchanged. The ledger path + run id are emitted to stderr (_stderr console), not stdout — stdout still matches the legacy batch output byte-for-byte. Required by the additive constraint.
  • No new deps. stdlib sqlite3 only.

Test plan

  • uv run pytest tests/ -v — 665 passed, 5 skipped
  • uv run ruff check src/ tests/ — clean
  • 34 new tests under tests/test_batch_ledger/ cover:
    • schema creation + CHECK enum
    • intent written before HTTP (enforced via mock side_effect that inspects the DB at call time)
    • outcome written after with bc_correlation_id + rollback_url capture
    • SIGKILL recovery → partially_committed derivation
    • batch state JSON + table modes
    • batch list with --state filter + empty-state behaviour
    • batch rollback issues DELETE for POST, skips PATCH/DELETE, refuses on disable_writes, leaves intent-only steps alone
    • RunLedgerExistsError defensive guard against run-id collision
    • WAL + synchronous=NORMAL pragmas
  • Manual help-text smoke test (uv run bcli batch --help, plus each subcommand)
  • Manual bcli batch list end-to-end returns [] for empty ledger
  • Existing tests/test_cli/test_batch_safety.py adapted with an autouse HOME-isolation fixture so the ledger lands in tmp_path instead of the developer's ~/.config/bcli/batch/

Out of scope (deferred)

  • Pre-image snapshots for PATCH/DELETE rollback — would need a separate ledger column + body-fetch dance; not in v0.1.
  • Idempotency keys — Phase 4 territory.
  • Ledger garbage collection / retention — operators delete ~/.config/bcli/batch/*.db by hand for now; one file per run, easy to ship/delete.

Adds a durable per-run SQLite ledger to bcli batch. Every run writes an
intent row before each HTTP call and an outcome row after, so a
SIGKILLed batch leaves a recoverable record of what landed server-side.

New module:
- src/bcli/batch/ledger.py — Ledger class with WAL + synchronous=NORMAL,
  schema_version table, derived state via compute_run_state()

New CLI commands:
- bcli batch state <run-id> [--format json|table] — full ledger dump
- bcli batch list [--state STATE] [--limit N] [--format json|table]
- bcli batch rollback <run-id> [--dry-run] [--yes] — DELETEs captured
  POST URLs; PATCH/DELETE marked rollback_skipped (no clean inverse
  without pre-image snapshots in v0.1)

Safety:
- disable_writes is a hard refusal on rollback; --yes does NOT bypass
- run state derivation honours finish_run stamps but falls back to step
  rows for SIGKILLed runs (operator sees partially_committed instead of
  the stale "running")

SDK addition:
- AsyncBCClient.delete_url() so rollback can DELETE the captured
  absolute URL without re-resolving the registry

Behavior unchanged on stdout: ledger path + run id go to stderr.

Tests:
- 34 new tests in tests/test_batch_ledger/ (schema, intent-before-HTTP,
  partially_committed derivation, list filter, state/list/rollback CLI
  command surfaces, disable_writes refusal, PATCH/DELETE skipping)
- 5 existing test_batch_safety tests get an autouse HOME-isolation
  fixture so the ledger lands in tmp_path, not the developer's real
  ~/.config/bcli/batch
- Full suite: 665 passed, 5 skipped; ruff clean
Copy link
Copy Markdown
Owner Author

@igor-ctrl igor-ctrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lead review — APPROVE (pending Igor's merge)

Reviewed against agent-cli-contract-v0.1.md §Phase 3 and tasks/todo.md Phase 3 checklist. Worker self-report verified: 665 passed / 5 skipped, ruff clean, ledger.py at 94% coverage, stdlib sqlite3 only. Strongest PR of the trio — thoughtful design notes in the PR body, defensive guards in the right places, and the read-side derivation makes "POST committed, then we died" actually visible to an operator.

Six review foci → findings

1. Schema matches contract §Phase 3. All run columns present; CHECK enum has all seven states including partially_committed; step columns match. test_state_enum_allowed_values round-trips every enum value through finish_run. Bad values rejected by sqlite3.IntegrityError as expected. ✓

2. compute_run_state derives partially_committed correctly.
Logic: committed + (unknown OR failed) → partially_committed. Stamped state wins for finished runs; un-stamped runs derive from step rows. Pinned by test_intent_without_outcome_is_partially_committed and test_failed_after_committed_is_partially_committed. ✓

3. SIGKILL test realism.
Two-part coverage that I think correctly maps the contract:

  • test_partially_committed_after_simulated_crash exercises the except BaseException: path in run_batch — handler runs, compute_run_state + finish_run stamp it.
  • test_list_uses_computed_state_for_unfinished_runs seeds a ledger by hand with no finish_run and an intent row missing its outcome — proves the derivation handles a truly un-stamped run.

Real SIGKILL (no handler runs at all) falls between those two, but the read-side derivation is what matters at recovery time — and that's what test #2 directly validates. WAL + synchronous=NORMAL makes the durability claim (intent row hits disk before HTTP) credible; pinned by test_wal_mode_enabled. ✓

4. rollback_url is sufficient for a clean DELETE.
Pattern: _compose_rollback_url(client, entity, post_result){client._resolve_url(entity)}({id}). The new AsyncBCClient.delete_url(url, etag="*") takes that captured absolute URL directly — no registry re-resolution at rollback time (registry could have shifted between runs). test_post_step_stores_rollback_url validates capture; test_committed_post_issues_delete validates dispatch via delete_url.await_count. ✓

One minor concern: etag="*" is hardcoded. Means a rolled-back record modified between POST and rollback gets clobbered. Defensible for v0.1 since the rollback context is "undo what I just created," but tracking etag in the ledger for v0.2 would let rollback honor concurrency. Not blocking.

5. Stderr routing.
Dedicated _stderr = Console(stderr=True). Ledger path + run id emitted via _stderr.print(...) — never touches stdout. The state/list JSON outputs use typer.echo() direct to stdout. test_state_json_returns_full_ledger does json.loads(result.stdout) which proves nothing non-JSON leaks. Step-level progress messages still use the regular console (stdout) which matches legacy behavior — exactly the additive constraint required. ✓

6. 94% coverage gap.
Inspected the 8 uncovered lines:

  • __enter__/__exit__ (191-192, 195) — unused; tests use explicit lifecycle. Harmless.
  • LookupError raise in get_run (378) — defensive raise; the matching path for get_steps is tested implicitly.
  • rolled_back derivation path (425) — only reachable if all steps are rolled_back/rollback_skipped and the run is un-stamped. Rollback's own finish_run("rolled_back") stamps the run so the read goes through line 411-412 first. The derivation is theoretical and correct.
  • Empty-dir early return in list_runs (459) — trivially correct.
  • Malformed-DB skip (482-484) — defensive catch for unrelated .db files in ~/.config/bcli/batch/.

None are crash-safety-critical. 94% is the right place to land.

Additional observations

  • _close_if_coroutine helper for AsyncMock leak suppression — well-documented, contained, pragmatic. ResourceWarnings in pytest output are test-only.
  • Step status deliberately not CHECK-constrained — worker documents the vocabulary in the module docstring (committed/failed/rollback_skipped/rolled_back/unknown/skipped/rollback_failed). Reasonable since step statuses evolve faster than run states; module docstring is the source of truth.
  • _isolate_ledger_home autouse fixture added to test_batch_safety.py — exactly the right defensive isolation; prevents the 12 pre-existing batch safety tests from polluting ~/.config/bcli/batch/ on the developer's box.
  • disable_writes is a hard refusal on rollback (no --yes bypass) — correctly tested in test_readonly_profile_aborts_before_any_http. This is stricter than batch run (which allows --yes to bypass) and the right call: rollback issues unprompted inverse mutations on profiles the operator marked read-only.

New SDK surface

AsyncBCClient.delete_url(url, *, etag="*") is a new public method. The docstring explains why (rollback can't re-resolve the registry). Worth a one-line mention in AGENTS.md so agents know it exists for absolute-URL DELETE; not blocking.

Spec deferrals (worker called out; agree)

  • PATCH/DELETE rollback requires pre-image snapshots — out of scope for v0.1, documented in the rollback help text and in tests test_patch_step_is_rollback_skipped / test_delete_step_is_rollback_skipped.
  • Ledger garbage collection — one file per run keeps it trivially deletable; no retention policy in v0.1.
  • Idempotency keys — Phase 4 territory.

Verification summary

  • uv run pytest tests/665 passed, 5 skipped (matches self-report)
  • uv run ruff check src/ tests/clean
  • uv run pytest tests/test_batch_ledger/ --cov=bcli.batch.ledger94% (matches self-report)
  • Smoke: bcli batch list --format json → pure JSON [] on stdout, nothing on stderr (verified via json.load)
  • Live ledger commands present: bcli batch state <id>, bcli batch list, bcli batch rollback <id>

After this merges

Stand by — no immediate next phase for Worker C (Phase 6/7 come after 0.4.0 ships). Worker A picks up Phase 4 after Phase 1 merges; Worker B picks up Phase 5 after Phases 1 + 2 merge.

Verdict: APPROVE. Igor merges; I do not.

@igor-ctrl igor-ctrl merged commit c6b74c1 into main May 17, 2026
3 checks passed
igor-ctrl added a commit that referenced this pull request May 17, 2026
PRs #15 (result envelope) and #16 (batch ledger) both modified
``batch_cmd.run_batch``'s control flow. This merge threads the envelope
``capture(...)`` wrap around the SQLite ledger writes so both
attestations describe the same outcome.

Integration order:
- ``capture(...)`` wraps the entire ``run_batch`` body.
- Inside it, ``ledger.start_run(...)`` writes the run row BEFORE any
  step fires (preserved from #16 — needed for SIGKILL realism).
- ``cap.set_record_id(run_id)`` stashes the ledger run id onto the
  envelope. Agent runtimes now have one cross-reference: the
  envelope's ``record_id`` is the value to pass to
  ``bcli batch state <run-id>``.
- The ``disable_writes`` gate (from #15) runs *inside* the capture
  block, AFTER ``start_run``. A refused write finalizes the ledger
  as ``failed`` and emits a failed envelope before re-raising — both
  attestations stay consistent.
- ``cap.emit_success`` / ``emit_failure`` are paired with
  ``ledger.finish_run(...)`` on every exit path. Mixed-result batches
  (some ok, some error) get ``status="failed"`` on the envelope and
  ``state="partially_committed"`` on the ledger; the same RuntimeError
  feeds both.
- The ``BaseException`` handler from #16 is preserved so a
  ``KeyboardInterrupt`` still re-raises after the ledger derives a
  truthful state via ``compute_run_state``.

Dry-run decision: ``ledger.finish_run(run_id, "completed")`` IS called
on dry-run, so the ledger row never lingers in ``planned`` state. The
envelope's ``dry_run=true`` is the disambiguator for "no HTTP fired."
Comment in the source explains the call.

New integration test:
- ``tests/test_envelope/test_batch_envelope_with_ledger.py`` — three
  scenarios (happy path, partial commit, policy refusal) pin that
  envelope and ledger stay in agreement, including the
  ``record_id == run_id`` cross-reference.

Test isolation: added an autouse ``isolated_home`` fixture in
``tests/test_envelope/conftest.py`` so envelope tests that touch
``run_batch`` don't pollute the user's real
``~/.config/bcli/batch/`` dir.

Test update: ``TestBatchRunEnvelope.test_batch_run_writes_envelope_all_ok``
now asserts ``record_id`` is the 32-char hex run id (was: ``None``) —
matches the new ledger cross-reference behavior.

Totals: 727 passed, 5 skipped. Ruff clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant