Skip to content

feat: transcript tracing pipeline + gen trace viewer#12

Closed
yanmxa wants to merge 5 commits into
mainfrom
feature/transcript-tracing
Closed

feat: transcript tracing pipeline + gen trace viewer#12
yanmxa wants to merge 5 commits into
mainfrom
feature/transcript-tracing

Conversation

@yanmxa
Copy link
Copy Markdown
Member

@yanmxa yanmxa commented May 15, 2026

Summary

Adds end-to-end session tracing: every byte that reaches the model is captured as a structured event in the session's JSONL, and a local gen trace viewer renders the timeline. See docs/tracing.md for the first-principles design.

What's in this PR

5 commits, layered:

  1. refactor: append-only transcript persistence + fsync batching — replaces FileStore.Replace (full-file rewrite per turn) with Start + per-event AppendMessage + single PatchState. Adds an in-memory persistedIDs cache so dedup is O(1) after the first scan. fsync is now turn-boundary batched: critical writes (message.appended, inference.responded) sync; telemetry writes (system.section.*, tools.*, inference.requested, state.patched) buffer in the page cache and ride along on the next turn-boundary flush. Typical turn drops from ~5 fsync calls to 1.

  2. refactor: unify record/payload naming + ContentBlock.Source field — every record type now follows <entity>[.<sub-entity>].<past-tense-verb>. Renames: transcript.*session.*, TranscriptIDSessionID everywhere (Go field + JSON tag), SystemRecordSessionRecord (the struct never held system-prompt data; name now matches intent), Record.SystemRecord.Session. Adds ContentBlock.Source for provenance; renames the existing image-data field to ImageSource to free the namespace. Projector tolerates unknown patch paths for forward compatibility.

  3. feat: trace recorder for inference / system / tools + content provenance — new session.Recorder wired through core.Config.OnEvent translates lifecycle events into transcript records:

    • inference.requested (digests of system prompt / tool list / message chain) and inference.responded (stop reason, latency, token usage)
    • system.section.added / system.section.removed driven by a new observer on core.System; Use(sec, caller) / Drop(name, caller) carry caller info (system:init, command:identity, subagent:init, etc.). SetObserver replays existing sections on attach so the event chain is complete from t0.
    • tools.added / tools.removed via the same pattern on core.Tools. MCP registrations use caller mcp:<toolname>. Both wrappers (permissionTools, progressTools) pass-through.
    • Content provenance: splitTextByProvenance splits user-message content on <system-reminder> boundaries and tags those blocks with Source="reminder". Round-trip safe — extractUserContent concatenates all text blocks back into the original core.Message.Content string.
    • Non-blocking emitTelemetry for observer-fired events so the agent goroutine never blocks on a slow TUI consumer.
  4. feat: gen trace web viewer — new internal/trace package + gen trace subcommand. Localhost-only HTTP server, polling-based tailer (no fsnotify dependency), SSE live tail, JSONL is the wire format (server doesn't reshape records). Vanilla JS frontend, no build step, assets embedded via go:embed. CLI refuses any non-loopback --addr.

  5. docs: tracing event model — first-principles doc covering event taxonomy, payload shapes, replay algorithm, troubleshooting recipes, and the viewer's HTTP API.

Architecture decisions

  • Cause not effect: observers fire on Use/Drop/Add/Remove, with explicit caller strings. Avoids digest-comparison-after-the-fact which would lose the "who changed this?" attribution.
  • Replay on attach: SetObserver snapshots current state and replays it as synthetic added events. Lets the recorder be wired at agent-construction time without losing the initial state established by system.Build().
  • Event bus reuse: extended existing core.Event types (OnSystemChange, OnToolsChange) instead of introducing a separate Recorder interface. The TUI already subscribes to the outbox; the recorder is a second consumer on the same stream.
  • JSONL is the wire format: the trace viewer's /api/sessions/{id}/records returns the JSONL lines untouched. One schema across disk, network, and browser.
  • Round-trip safety on splitting: splitTextByProvenance never trims; concatenating all returned blocks' Text fields reproduces the input byte-for-byte. extractUserContent does that concat. Net effect on core.Message.Content is zero.

Test plan

  • go build ./...
  • go test ./... — 51 packages pass, 0 FAIL
  • Targeted unit tests added:
    • TestRecorderWritesRequestedAndRespondedPerTurn
    • TestRecorderWritesSystemSectionEvents
    • TestRecorderWritesToolsChangeEvents
    • TestRecorderNilSafe
    • Test_userContentToBlocks_splitsByProvenance
    • Test_userContentToBlocks_plainTextOneBlock
    • TestServerListAndRecords (+ path-traversal guard)
    • TestServerListNoTranscripts
    • TestFileStoreAppendMessageIsIdempotent
  • Round-trip preservation: Test_messagesToEntries_roundtrip still passes despite the ContentBlock split
  • Manual smoke test: built binary, ran gen trace --addr 127.0.0.1:38081 --no-open, verified /api/sessions returned real local sessions and SPA assets served correctly

Compatibility

  • No backward compatibility is preserved for the on-disk schema. Existing transcripts written before this PR will not load — the JSON tag rename (transcriptIdsessionId) and record-type rename (transcript.*session.*) are breaking.
  • No external callers outside internal/session/ and the in-process agent build paths use the renamed types.

🤖 Generated with Claude Code

yanmxa added 5 commits May 15, 2026 17:30
The session save path previously rewrote the entire JSONL on every Save
via FileStore.Replace, producing O(file_size) writes per turn even though
each turn only adds one or two messages. This commit switches the path
to per-event append:

  Store.Save now calls Start (idempotent) + AppendMessage per node
  (deduped via an in-memory persistedIDs cache, populated lazily by
  scanning the file once) + a single PatchState.

  FileStore.Replace, recordsForTranscript, TranscriptFromSnapshot,
  ReplaceCommand, and messageExistsLocked are removed — the rewrite
  path no longer exists.

Bundled in this commit: fsync is now gated by a `sync` parameter on
appendRecord, batched at turn boundaries:

  - sync=true: session.started, message.appended, session.compacted,
    inference.responded (the turn-flush point).
  - sync=false: state.patched, inference.requested, system.section.*,
    tools.* — buffered in the page cache, flushed when the matching
    inference.responded lands.

A typical turn now does one fsync instead of five. On crash, the
in-flight turn's telemetry may be lost, but messages and state from
prior turns are durable.

Also exports StateOpsFor + PatchTag/Mode/Worktree helpers so the new
Save path can express the projected state as a single patch list.
Naming convention (see docs/tracing.md): every record type follows
<entity>[.<sub-entity>].<past-tense-verb>, lowercase, dot-separated.
Payload key matches the first segment of `type`.

Renames:
  RecordStarted/Forked/Compacted   → SessionStarted/Forked/Compacted
  RecordMessageAppended            → MessageAppended
  RecordStatePatched               → StatePatched
  transcript.started/forked/.compacted (JSON value) → session.*
  SystemRecord (lifecycle payload) → SessionRecord (name now matches
    intent — that struct never held system-prompt data anyway)
  Record.System field              → Record.Session, JSON tag "system"→"session"
  Record.TranscriptID              → Record.SessionID, JSON tag
    "transcriptId"→"sessionId"
  *Command.TranscriptID            → SessionID (all five commands)
  ForkCommand.Source/NewTranscriptID → Source/NewSessionID
  ListItem.TranscriptID, fileIndexEntry.TranscriptID → SessionID

ContentBlock gains a Source field for provenance attribution
(populated in a later commit). The existing inline-image data field,
which used the json tag "source", is renamed to ImageSource with tag
"imageSource" to free up the namespace.

applyStatePatch now ignores unknown patch paths (default: continue)
instead of returning an error — keeps the projector forward-compatible
with patch paths added by later schema iterations.
Adds session.Recorder, a synchronous observer on core.Agent's event bus
that translates lifecycle events into transcript records. Wired via
core.Config.OnEvent (new BuildParams.OnEvent passthrough in
internal/agent/build.go; constructed by Session.NewRecorder at agent
start in internal/app/agent.go).

New records persisted per turn:

  inference.requested  — emitted in streamInfer with sha256 digests of
                         the rendered system prompt, canonicalized tool
                         list, and the active message-ID chain. Carries
                         provider, model, max_tokens, turn number.
  inference.responded  — emitted on PostInfer with stop reason, latency,
                         and token usage (input / output / cache read /
                         cache create).

System mutations now flow through:

  Use(sec, caller) and Drop(name, caller) on core.System (Refresh
  similarly). New System.SetObserver registers a callback that fires
  on every subsequent mutation AND replays existing sections as
  synthetic "added" events with caller="system:init", so observers
  attached after Build still see the complete history.

  catalog.go's 13 Use call sites pass caller strings: "system:init"
  for default registrations, "command:identity" for SwapIdentity,
  "subagent:init" for WithSubagentIdentity.

  Recorder writes system.section.added / system.section.removed
  records carrying name, slot, content, caller.

Tool registry events follow the same pattern:

  core.Tools gains Add(tool, caller), Remove(name, caller), and
  SetObserver(fn). The two wrappers (permissionTools, progressTools)
  forward both observer and mutations.

  MCP registrations pass caller="mcp:<toolname>". The recorder writes
  tools.added (with schema) and tools.removed (with name).

Content provenance: splitTextByProvenance splits user-message content
on <system-reminder> XML boundaries and tags those blocks with
Source="reminder". Round-trip safe (extractUserContent concatenates
all text blocks back into a single string).

Telemetry events go through a new emitTelemetry path: non-blocking
outbox send with select-default fallback. System/tools observers must
not block the agent goroutine even if the TUI consumer falls behind.

Tests cover inference pair recording, system section add/replace/remove,
tools add/remove, content-provenance splitting, and Recorder nil-safety.
A localhost-only web UI for inspecting session transcripts under
~/.gen/projects/<encoded-cwd>/transcripts/. Read-only, single binary
(assets embedded via go:embed), no build step on the frontend.

Backend (internal/trace):
  - HTTP server with three endpoints:
      GET /                              SPA shell
      GET /api/sessions                  list transcripts in the project
      GET /api/sessions/{id}/records     paginated record fetch
      GET /api/sessions/{id}/stream      SSE live tail
  - The wire format is the JSONL on disk verbatim — server doesn't
    reshape records. One schema, not two.
  - Polling-based tailer (500ms tick); no fsnotify dependency.
  - Path-traversal guard rejects sessionIDs containing slashes.

Frontend (internal/trace/ui/assets):
  - Vanilla JS, ~200 lines. EventSource for SSE.
  - Sessions sidebar | colored timeline | JSON detail panel.
  - Per-group filter checkboxes (state, tools, system, inference,
    message).
  - Auto-scrolls when near the bottom; pauses on user scroll-up.

CLI (cmd/gen/trace.go):
  - `gen trace` binds 127.0.0.1 on a random port by default, opens the
    browser, blocks until Ctrl-C.
  - --addr to pin a port; refuses any non-loopback host.
  - --no-open to skip the browser launch (for headless / CI).

Tests cover the records endpoint, empty-project case, and the
path-traversal guard. Smoke-tested locally against real session files.
A first-principles description of the transcript event taxonomy,
record envelope, payload shapes per group, replay algorithm, common
troubleshooting recipes (jq one-liners), and the gen trace viewer's
HTTP API. Companion to docs/transcriptstore.md, which covers storage
layout and resume mechanics; this doc focuses on the event schema and
the viewer.
@yanmxa
Copy link
Copy Markdown
Member Author

yanmxa commented May 15, 2026

Closing in favor of 5 smaller stacked PRs for easier review. Splitting into #N+1..#N+5 by C-slice.

@yanmxa yanmxa closed this May 15, 2026
@yanmxa yanmxa deleted the feature/transcript-tracing branch May 15, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant