Skip to content

refactor: append-only transcript persistence + fsync batching#13

Closed
yanmxa wants to merge 1 commit into
mainfrom
trace/1-persistence
Closed

refactor: append-only transcript persistence + fsync batching#13
yanmxa wants to merge 1 commit into
mainfrom
trace/1-persistence

Conversation

@yanmxa
Copy link
Copy Markdown
Member

@yanmxa yanmxa commented May 15, 2026

PR 1 of 5 — foundation.

Replaces full-file rewrite (FileStore.Replace) with per-event append.

  • Store.Save now calls Start + AppendMessage per node + one PatchState. Old rewrite path deleted.
  • AppendMessage deduped via in-memory persistedIDs cache, populated lazily by scanning the file once. After that, O(1) dedup instead of O(N) per call.
  • appendRecord gains a sync parameter:
    • sync=true: message.appended, session.compacted, session.started, inference.responded (turn flush).
    • sync=false: telemetry — buffered in page cache, ride along on next turn flush.
  • Typical turn drops from ~5 fsync calls to 1.

Behavior preserved: existing tests including resume/fork/JSONL integrity continue to pass.

Test

  • go test ./internal/session/... ./tests/integration/session/... ./tests/integration/cli/... passes
  • TestFileStoreAppendMessageIsIdempotent replaces the deleted TestFileStoreReplace

🤖 Generated with Claude Code

The session save path previously rewrote the entire JSONL on every Save
via FileStore.Replace, producing O(file_size) writes per turn even though
each turn only adds one or two messages. This commit switches the path
to per-event append:

  Store.Save now calls Start (idempotent) + AppendMessage per node
  (deduped via an in-memory persistedIDs cache, populated lazily by
  scanning the file once) + a single PatchState.

  FileStore.Replace, recordsForTranscript, TranscriptFromSnapshot,
  ReplaceCommand, and messageExistsLocked are removed — the rewrite
  path no longer exists.

Bundled in this commit: fsync is now gated by a `sync` parameter on
appendRecord, batched at turn boundaries:

  - sync=true: session.started, message.appended, session.compacted,
    inference.responded (the turn-flush point).
  - sync=false: state.patched, inference.requested, system.section.*,
    tools.* — buffered in the page cache, flushed when the matching
    inference.responded lands.

A typical turn now does one fsync instead of five. On crash, the
in-flight turn's telemetry may be lost, but messages and state from
prior turns are durable.

Also exports StateOpsFor + PatchTag/Mode/Worktree helpers so the new
Save path can express the projected state as a single patch list.
@yanmxa
Copy link
Copy Markdown
Member Author

yanmxa commented May 15, 2026

Closing — the file-level split produced commits that don't compile standalone. Reopening as a proper hunk-aware re-split. See follow-ups.

@yanmxa yanmxa closed this May 15, 2026
@yanmxa yanmxa deleted the trace/1-persistence branch May 15, 2026 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant