feat: subagent cost rollup at the agent_call level#14
Merged
Conversation
Closes Twitter Q2: does it work for subagent convos / cost is dictated by those now. The data model already supported nesting via parent_id; this PR surfaces the rollup that users actually need. Python (wikitrace/agents.py) - tree_cost(spans, root_span_id) -> CostRollup or None Walks all descendants (BFS), sums cost_usd / input_tokens / output_tokens / total_tokens, counts nested agent_calls / llm_calls / tool_calls / errors, tracks tree depth. - agent_rollups(trace_dir=, only_top_level=True, limit=None) Computes a CostRollup per agent_call in a trace dir. only_top_level=True (default) skips subagents whose parent is itself an agent_call, giving the canonical view: what did each user-visible agent run cost end-to-end. - CostRollup dataclass exported from the public package. Dashboard (app/app/(obs)/agents/page.tsx) - New /agents route in the obs theme. One row per top-level agent_call with: agent label, nested-subagent count, llm_calls, tool_calls, total tokens, rolled-up cost, root latency, time ago. Error badge when any descendant span has status=error. - Sidebar nav gains an Agents link in the obs section, positioned between Requests and Sessions. - TS port of tree_cost / agent_rollups in app/lib/traces.ts (agentRollupsAsync). Uses the existing loadSpansAsync backend abstraction so cloud-mode tenants only see their own rollups. Tests (tests/test_agents.py, 8 cases) - One agent_call to one llm_call leaf: rollup matches the leaf - Top-level planner spawns 3 subagents, each with own llm_call: asserts agent_calls=3, llm_calls=3, descendants=6, depth=2, cost summed across all leaves - Unknown root_span_id returns None - Spans without cost attrs contribute structurally only - only_top_level True skips nested agent_calls; False produces one row per agent_call - Missing trace_dir returns empty list (no exception) - Root latency uses root span wall-time Verified - pytest -q tests/ -> 106 passed (up from 98), 14 skipped - npx tsc --noEmit clean for all touched files This is the smaller of the two remaining Twitter-feedback gaps; the dashboard tree view (~1 day) is still pending. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OmkarRayAI
pushed a commit
that referenced
this pull request
Jun 5, 2026
Closes Twitter Q1B: a single page that walks one trace start-to-finish in human-readable form. Previously the dashboard rendered one request at a time; understanding what an agent actually did across N llm_calls, M tool_calls, and possibly nested subagents required clicking through each individually. What lands New route: app/app/(obs)/traces/[id]/page.tsx - Header: trace_id + spans count + total latency + total cost + total tokens + session_id + user_id - Body: collapsible tree using native HTML details elements (server-rendered, zero client JS for collapse). Auto-expands the root and any agent_call nodes; leaves llm_calls and tool_calls collapsed by default for scanability. - Per-node: kind badge (Agent / LLM / Tool / Action / Retrieve / Judge), agent or tool or model label, duration, tokens, cost, start time. agent_call nodes also show subtree rollup (cost + nested-subagent count + descendant llm count). - Inline content: pulls prompt and response out of attrs.request and attrs.response (Helicone-proxy shape) plus generic fallbacks (input, output, prompt, answer, arg.0, question). Truncated to 1500 chars with a scrollable container; full data is in the raw span dump at the bottom. - Events drawer per span (collapsed by default, first 50 events). - Per-segment grouping: when any span carries session_segment > 0 (from PR #13 mid-conversation reset), root spans are grouped into Initial conversation / After reset #1 / After reset #2. - Raw JSON drawer at the bottom for power users. - Error spans get a red dot indicator. app/lib/traces.ts - Exported treeCostFromSpans so the page can decorate agent_call nodes with their subtree totals without re-loading spans. Reuses existing infra - loadTraceSpansAsync: cloud-mode-aware loader from PR #1. Cloud-mode tenants only see their own trace spans. - treeCostFromSpans: subtree rollup logic from PR #14. - (obs) layout, glass styling, eyebrow/PageTitle widgets. Verified - npx tsc --noEmit clean for all my files - pytest -q tests/ -> 106 passed, 14 skipped (no regressions) - Renders correctly against fixtures with nested subagent calls, per-span events, error spans, multi-segment sessions, missing attrs (defensive fallbacks throughout) Linked from /agents - The /agents page rows already link href=/traces/{trace_id} (from PR #14). With this PR shipped, that link now lands on the full conversation walkthrough instead of the wiki-themed engineering view. This closes the third and largest of the Twitter feedback gaps (after PR #13 session_reset and PR #14 agent cost rollup). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
OmkarRayAI
added a commit
that referenced
this pull request
Jun 5, 2026
…#15) Closes Twitter Q1B: a single page that walks one trace start-to-finish in human-readable form. Previously the dashboard rendered one request at a time; understanding what an agent actually did across N llm_calls, M tool_calls, and possibly nested subagents required clicking through each individually. What lands New route: app/app/(obs)/traces/[id]/page.tsx - Header: trace_id + spans count + total latency + total cost + total tokens + session_id + user_id - Body: collapsible tree using native HTML details elements (server-rendered, zero client JS for collapse). Auto-expands the root and any agent_call nodes; leaves llm_calls and tool_calls collapsed by default for scanability. - Per-node: kind badge (Agent / LLM / Tool / Action / Retrieve / Judge), agent or tool or model label, duration, tokens, cost, start time. agent_call nodes also show subtree rollup (cost + nested-subagent count + descendant llm count). - Inline content: pulls prompt and response out of attrs.request and attrs.response (Helicone-proxy shape) plus generic fallbacks (input, output, prompt, answer, arg.0, question). Truncated to 1500 chars with a scrollable container; full data is in the raw span dump at the bottom. - Events drawer per span (collapsed by default, first 50 events). - Per-segment grouping: when any span carries session_segment > 0 (from PR #13 mid-conversation reset), root spans are grouped into Initial conversation / After reset #1 / After reset #2. - Raw JSON drawer at the bottom for power users. - Error spans get a red dot indicator. app/lib/traces.ts - Exported treeCostFromSpans so the page can decorate agent_call nodes with their subtree totals without re-loading spans. Reuses existing infra - loadTraceSpansAsync: cloud-mode-aware loader from PR #1. Cloud-mode tenants only see their own trace spans. - treeCostFromSpans: subtree rollup logic from PR #14. - (obs) layout, glass styling, eyebrow/PageTitle widgets. Verified - npx tsc --noEmit clean for all my files - pytest -q tests/ -> 106 passed, 14 skipped (no regressions) - Renders correctly against fixtures with nested subagent calls, per-span events, error spans, multi-segment sessions, missing attrs (defensive fallbacks throughout) Linked from /agents - The /agents page rows already link href=/traces/{trace_id} (from PR #14). With this PR shipped, that link now lands on the full conversation walkthrough instead of the wiki-themed engineering view. This closes the third and largest of the Twitter feedback gaps (after PR #13 session_reset and PR #14 agent cost rollup). Co-authored-by: Omkar Ray <your real email> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes Twitter Q2: "does it work for subagent convos? cost is dictated by those now". The data model already supported arbitrary nesting via
parent_id; this PR surfaces the rollup that users actually need.What
Python (
wikitrace/agents.py)tree_cost(spans, root_span_id)— BFS over the subtree, sumscost_usd/input_tokens/output_tokens/total_tokens, counts nestedagent_calls/llm_calls/tool_calls/ errors, tracks depth.agent_rollups(trace_dir=, only_top_level=True, limit=None)— oneCostRollupper agent_call.wikitrace.tree_cost,wikitrace.agent_rollups,wikitrace.CostRollup.Dashboard (
app/app/(obs)/agents/page.tsx)New
/agentsroute. One row per top-level agent_call with: agent label, nested-subagent count, llm_calls, tool_calls, total tokens, rolled-up cost, root latency, time ago. Error badge when any descendant hasstatus=error. Sidebar nav gains an "Agents" link in the obs section.Tests
8 new cases in
tests/test_agents.py, including the canonical scenario:Full suite:
pytest -q tests/→ 106 passed (up from 98), 14 skipped, 0 failures.What this is NOT
Test plan
/agentsin the dashboard, see the rollup rowloadSpansAsyncbackend abstraction)🤖 Generated with Claude Code