feat(sdk): session_reset() for mid-conversation segment boundaries#13
Merged
Conversation
Closes a Twitter-feedback gap: when a user clears chat history or a planner restarts from a checkpoint mid-trace, callers had no way to mark the boundary. Spans before and after the reset would all show up as one long thread in the dashboard. API - wikitrace.session_reset() returns the new segment integer (1, 2, ...) - Spans before the first reset carry no session_segment attr (segment 0) - Spans after each reset carry session_segment=<n> - Same session_id throughout, so cost rollups and user attribution continue to group across the entire conversation - Outside an active session it is a no-op returning 0, so library code that calls it defensively does not crash unwrapped callers Tests - test_session_reset_segments_under_same_session_id: three turns, two resets, asserts session_id stable and session_segment increments - test_session_reset_outside_session_is_noop: returns 0, does not raise README - New Mid-conversation resets subsection under the Sessions block, with a code example Verified: pytest -q tests/ -> 98 passed (up from 96), 14 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OmkarRayAI
pushed a commit
that referenced
this pull request
Jun 5, 2026
Closes Twitter Q1B: a single page that walks one trace start-to-finish in human-readable form. Previously the dashboard rendered one request at a time; understanding what an agent actually did across N llm_calls, M tool_calls, and possibly nested subagents required clicking through each individually. What lands New route: app/app/(obs)/traces/[id]/page.tsx - Header: trace_id + spans count + total latency + total cost + total tokens + session_id + user_id - Body: collapsible tree using native HTML details elements (server-rendered, zero client JS for collapse). Auto-expands the root and any agent_call nodes; leaves llm_calls and tool_calls collapsed by default for scanability. - Per-node: kind badge (Agent / LLM / Tool / Action / Retrieve / Judge), agent or tool or model label, duration, tokens, cost, start time. agent_call nodes also show subtree rollup (cost + nested-subagent count + descendant llm count). - Inline content: pulls prompt and response out of attrs.request and attrs.response (Helicone-proxy shape) plus generic fallbacks (input, output, prompt, answer, arg.0, question). Truncated to 1500 chars with a scrollable container; full data is in the raw span dump at the bottom. - Events drawer per span (collapsed by default, first 50 events). - Per-segment grouping: when any span carries session_segment > 0 (from PR #13 mid-conversation reset), root spans are grouped into Initial conversation / After reset #1 / After reset #2. - Raw JSON drawer at the bottom for power users. - Error spans get a red dot indicator. app/lib/traces.ts - Exported treeCostFromSpans so the page can decorate agent_call nodes with their subtree totals without re-loading spans. Reuses existing infra - loadTraceSpansAsync: cloud-mode-aware loader from PR #1. Cloud-mode tenants only see their own trace spans. - treeCostFromSpans: subtree rollup logic from PR #14. - (obs) layout, glass styling, eyebrow/PageTitle widgets. Verified - npx tsc --noEmit clean for all my files - pytest -q tests/ -> 106 passed, 14 skipped (no regressions) - Renders correctly against fixtures with nested subagent calls, per-span events, error spans, multi-segment sessions, missing attrs (defensive fallbacks throughout) Linked from /agents - The /agents page rows already link href=/traces/{trace_id} (from PR #14). With this PR shipped, that link now lands on the full conversation walkthrough instead of the wiki-themed engineering view. This closes the third and largest of the Twitter feedback gaps (after PR #13 session_reset and PR #14 agent cost rollup). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
OmkarRayAI
added a commit
that referenced
this pull request
Jun 5, 2026
…#15) Closes Twitter Q1B: a single page that walks one trace start-to-finish in human-readable form. Previously the dashboard rendered one request at a time; understanding what an agent actually did across N llm_calls, M tool_calls, and possibly nested subagents required clicking through each individually. What lands New route: app/app/(obs)/traces/[id]/page.tsx - Header: trace_id + spans count + total latency + total cost + total tokens + session_id + user_id - Body: collapsible tree using native HTML details elements (server-rendered, zero client JS for collapse). Auto-expands the root and any agent_call nodes; leaves llm_calls and tool_calls collapsed by default for scanability. - Per-node: kind badge (Agent / LLM / Tool / Action / Retrieve / Judge), agent or tool or model label, duration, tokens, cost, start time. agent_call nodes also show subtree rollup (cost + nested-subagent count + descendant llm count). - Inline content: pulls prompt and response out of attrs.request and attrs.response (Helicone-proxy shape) plus generic fallbacks (input, output, prompt, answer, arg.0, question). Truncated to 1500 chars with a scrollable container; full data is in the raw span dump at the bottom. - Events drawer per span (collapsed by default, first 50 events). - Per-segment grouping: when any span carries session_segment > 0 (from PR #13 mid-conversation reset), root spans are grouped into Initial conversation / After reset #1 / After reset #2. - Raw JSON drawer at the bottom for power users. - Error spans get a red dot indicator. app/lib/traces.ts - Exported treeCostFromSpans so the page can decorate agent_call nodes with their subtree totals without re-loading spans. Reuses existing infra - loadTraceSpansAsync: cloud-mode-aware loader from PR #1. Cloud-mode tenants only see their own trace spans. - treeCostFromSpans: subtree rollup logic from PR #14. - (obs) layout, glass styling, eyebrow/PageTitle widgets. Verified - npx tsc --noEmit clean for all my files - pytest -q tests/ -> 106 passed, 14 skipped (no regressions) - Renders correctly against fixtures with nested subagent calls, per-span events, error spans, multi-segment sessions, missing attrs (defensive fallbacks throughout) Linked from /agents - The /agents page rows already link href=/traces/{trace_id} (from PR #14). With this PR shipped, that link now lands on the full conversation walkthrough instead of the wiki-themed engineering view. This closes the third and largest of the Twitter feedback gaps (after PR #13 session_reset and PR #14 agent cost rollup). Co-authored-by: Omkar Ray <your real email> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships
wikitrace.session_reset()to close one Twitter-feedback gap: when a user clears chat history or a planner restarts mid-trace, callers had no way to mark the boundary. Spans before and after the reset all showed up as one long thread.API
session_idthroughout — cost rollups and user attribution stay groupedsession_segmentintegers so the dashboard can render separate threadsTest plan
test_session_reset_segments_under_same_session_id— three turns, two resets, assertsession_idstable andsession_segmentincrementstest_session_reset_outside_session_is_noop— returns 0, does not raisepytest -q tests/→ 98 passed, 14 skipped (no regressions)What this is NOT
This is the smallest possible SDK move that lets you reply "shipped" to the segment-boundary part of the feedback today.
🤖 Generated with Claude Code