feat: subagent cost rollup at the agent_call level by OmkarRayAI · Pull Request #14 · OmkarRayAI/wiki-trace

OmkarRayAI · 2026-06-05T13:00:32Z

Summary

Closes Twitter Q2: "does it work for subagent convos? cost is dictated by those now". The data model already supported arbitrary nesting via parent_id; this PR surfaces the rollup that users actually need.

What

Python (wikitrace/agents.py)

from wikitrace.agents import tree_cost, agent_rollups

# One agent's recursive cost across all its descendants
r = tree_cost(spans, root_span_id="abc1234567890def")
print(r.cost_usd, r.agent_calls, r.llm_calls, r.depth)

# All top-level agent_call spans in a trace dir
for r in agent_rollups(trace_dir=".wikitrace"):
    print(r.agent, r.cost_usd, "→", r.agent_calls, "subagents")

tree_cost(spans, root_span_id) — BFS over the subtree, sums cost_usd / input_tokens / output_tokens / total_tokens, counts nested agent_calls / llm_calls / tool_calls / errors, tracks depth.
agent_rollups(trace_dir=, only_top_level=True, limit=None) — one CostRollup per agent_call.
Exported from wikitrace.tree_cost, wikitrace.agent_rollups, wikitrace.CostRollup.

Dashboard (app/app/(obs)/agents/page.tsx)

New /agents route. One row per top-level agent_call with: agent label, nested-subagent count, llm_calls, tool_calls, total tokens, rolled-up cost, root latency, time ago. Error badge when any descendant has status=error. Sidebar nav gains an "Agents" link in the obs section.

Tests

8 new cases in tests/test_agents.py, including the canonical scenario:

Top-level "planner" spawns 3 subagents.
Each subagent makes 1 llm_call at $0.001.
→ rollup: cost_usd=$0.003, agent_calls=3, llm_calls=3,
          descendants=6, depth=2, sum of all tokens.

Full suite: pytest -q tests/ → 106 passed (up from 98), 14 skipped, 0 failures.

What this is NOT

Not the human-readable trace tree view (Twitter Q1's main half) — that's a separate, larger PR (~1 day of dashboard work).
Not a per-llm-call attribution under each agent_call — just the summed totals at the agent level.

Test plan

CI green on this PR (Python + dashboard typecheck + lint)
Run an agent that spawns subagents, hit /agents in the dashboard, see the rollup row
Verify cloud-mode: each tenant sees only their own rollups (uses existing loadSpansAsync backend abstraction)

🤖 Generated with Claude Code

Closes Twitter Q2: does it work for subagent convos / cost is dictated by those now. The data model already supported nesting via parent_id; this PR surfaces the rollup that users actually need. Python (wikitrace/agents.py) - tree_cost(spans, root_span_id) -> CostRollup or None Walks all descendants (BFS), sums cost_usd / input_tokens / output_tokens / total_tokens, counts nested agent_calls / llm_calls / tool_calls / errors, tracks tree depth. - agent_rollups(trace_dir=, only_top_level=True, limit=None) Computes a CostRollup per agent_call in a trace dir. only_top_level=True (default) skips subagents whose parent is itself an agent_call, giving the canonical view: what did each user-visible agent run cost end-to-end. - CostRollup dataclass exported from the public package. Dashboard (app/app/(obs)/agents/page.tsx) - New /agents route in the obs theme. One row per top-level agent_call with: agent label, nested-subagent count, llm_calls, tool_calls, total tokens, rolled-up cost, root latency, time ago. Error badge when any descendant span has status=error. - Sidebar nav gains an Agents link in the obs section, positioned between Requests and Sessions. - TS port of tree_cost / agent_rollups in app/lib/traces.ts (agentRollupsAsync). Uses the existing loadSpansAsync backend abstraction so cloud-mode tenants only see their own rollups. Tests (tests/test_agents.py, 8 cases) - One agent_call to one llm_call leaf: rollup matches the leaf - Top-level planner spawns 3 subagents, each with own llm_call: asserts agent_calls=3, llm_calls=3, descendants=6, depth=2, cost summed across all leaves - Unknown root_span_id returns None - Spans without cost attrs contribute structurally only - only_top_level True skips nested agent_calls; False produces one row per agent_call - Missing trace_dir returns empty list (no exception) - Root latency uses root span wall-time Verified - pytest -q tests/ -> 106 passed (up from 98), 14 skipped - npx tsc --noEmit clean for all touched files This is the smaller of the two remaining Twitter-feedback gaps; the dashboard tree view (~1 day) is still pending. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Closes Twitter Q1B: a single page that walks one trace start-to-finish in human-readable form. Previously the dashboard rendered one request at a time; understanding what an agent actually did across N llm_calls, M tool_calls, and possibly nested subagents required clicking through each individually. What lands New route: app/app/(obs)/traces/[id]/page.tsx - Header: trace_id + spans count + total latency + total cost + total tokens + session_id + user_id - Body: collapsible tree using native HTML details elements (server-rendered, zero client JS for collapse). Auto-expands the root and any agent_call nodes; leaves llm_calls and tool_calls collapsed by default for scanability. - Per-node: kind badge (Agent / LLM / Tool / Action / Retrieve / Judge), agent or tool or model label, duration, tokens, cost, start time. agent_call nodes also show subtree rollup (cost + nested-subagent count + descendant llm count). - Inline content: pulls prompt and response out of attrs.request and attrs.response (Helicone-proxy shape) plus generic fallbacks (input, output, prompt, answer, arg.0, question). Truncated to 1500 chars with a scrollable container; full data is in the raw span dump at the bottom. - Events drawer per span (collapsed by default, first 50 events). - Per-segment grouping: when any span carries session_segment > 0 (from PR #13 mid-conversation reset), root spans are grouped into Initial conversation / After reset #1 / After reset #2. - Raw JSON drawer at the bottom for power users. - Error spans get a red dot indicator. app/lib/traces.ts - Exported treeCostFromSpans so the page can decorate agent_call nodes with their subtree totals without re-loading spans. Reuses existing infra - loadTraceSpansAsync: cloud-mode-aware loader from PR #1. Cloud-mode tenants only see their own trace spans. - treeCostFromSpans: subtree rollup logic from PR #14. - (obs) layout, glass styling, eyebrow/PageTitle widgets. Verified - npx tsc --noEmit clean for all my files - pytest -q tests/ -> 106 passed, 14 skipped (no regressions) - Renders correctly against fixtures with nested subagent calls, per-span events, error spans, multi-segment sessions, missing attrs (defensive fallbacks throughout) Linked from /agents - The /agents page rows already link href=/traces/{trace_id} (from PR #14). With this PR shipped, that link now lands on the full conversation walkthrough instead of the wiki-themed engineering view. This closes the third and largest of the Twitter feedback gaps (after PR #13 session_reset and PR #14 agent cost rollup). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…#15) Closes Twitter Q1B: a single page that walks one trace start-to-finish in human-readable form. Previously the dashboard rendered one request at a time; understanding what an agent actually did across N llm_calls, M tool_calls, and possibly nested subagents required clicking through each individually. What lands New route: app/app/(obs)/traces/[id]/page.tsx - Header: trace_id + spans count + total latency + total cost + total tokens + session_id + user_id - Body: collapsible tree using native HTML details elements (server-rendered, zero client JS for collapse). Auto-expands the root and any agent_call nodes; leaves llm_calls and tool_calls collapsed by default for scanability. - Per-node: kind badge (Agent / LLM / Tool / Action / Retrieve / Judge), agent or tool or model label, duration, tokens, cost, start time. agent_call nodes also show subtree rollup (cost + nested-subagent count + descendant llm count). - Inline content: pulls prompt and response out of attrs.request and attrs.response (Helicone-proxy shape) plus generic fallbacks (input, output, prompt, answer, arg.0, question). Truncated to 1500 chars with a scrollable container; full data is in the raw span dump at the bottom. - Events drawer per span (collapsed by default, first 50 events). - Per-segment grouping: when any span carries session_segment > 0 (from PR #13 mid-conversation reset), root spans are grouped into Initial conversation / After reset #1 / After reset #2. - Raw JSON drawer at the bottom for power users. - Error spans get a red dot indicator. app/lib/traces.ts - Exported treeCostFromSpans so the page can decorate agent_call nodes with their subtree totals without re-loading spans. Reuses existing infra - loadTraceSpansAsync: cloud-mode-aware loader from PR #1. Cloud-mode tenants only see their own trace spans. - treeCostFromSpans: subtree rollup logic from PR #14. - (obs) layout, glass styling, eyebrow/PageTitle widgets. Verified - npx tsc --noEmit clean for all my files - pytest -q tests/ -> 106 passed, 14 skipped (no regressions) - Renders correctly against fixtures with nested subagent calls, per-span events, error spans, multi-segment sessions, missing attrs (defensive fallbacks throughout) Linked from /agents - The /agents page rows already link href=/traces/{trace_id} (from PR #14). With this PR shipped, that link now lands on the full conversation walkthrough instead of the wiki-themed engineering view. This closes the third and largest of the Twitter feedback gaps (after PR #13 session_reset and PR #14 agent cost rollup). Co-authored-by: Omkar Ray <your real email> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

OmkarRayAI merged commit f6d7da0 into main Jun 5, 2026
4 checks passed

OmkarRayAI deleted the feat/agent-cost-rollup branch June 5, 2026 13:02

OmkarRayAI mentioned this pull request Jun 5, 2026

feat(dashboard): /traces/[id] human-readable conversation walkthrough #15

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: subagent cost rollup at the agent_call level#14

feat: subagent cost rollup at the agent_call level#14
OmkarRayAI merged 1 commit into
mainfrom
feat/agent-cost-rollup

OmkarRayAI commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

OmkarRayAI commented Jun 5, 2026

Summary

What

Tests

What this is NOT

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant