Skip to content

feat: subagent cost rollup at the agent_call level#14

Merged
OmkarRayAI merged 1 commit into
mainfrom
feat/agent-cost-rollup
Jun 5, 2026
Merged

feat: subagent cost rollup at the agent_call level#14
OmkarRayAI merged 1 commit into
mainfrom
feat/agent-cost-rollup

Conversation

@OmkarRayAI

Copy link
Copy Markdown
Owner

Summary

Closes Twitter Q2: "does it work for subagent convos? cost is dictated by those now". The data model already supported arbitrary nesting via parent_id; this PR surfaces the rollup that users actually need.

What

Python (wikitrace/agents.py)

from wikitrace.agents import tree_cost, agent_rollups

# One agent's recursive cost across all its descendants
r = tree_cost(spans, root_span_id="abc1234567890def")
print(r.cost_usd, r.agent_calls, r.llm_calls, r.depth)

# All top-level agent_call spans in a trace dir
for r in agent_rollups(trace_dir=".wikitrace"):
    print(r.agent, r.cost_usd, "→", r.agent_calls, "subagents")
  • tree_cost(spans, root_span_id) — BFS over the subtree, sums cost_usd / input_tokens / output_tokens / total_tokens, counts nested agent_calls / llm_calls / tool_calls / errors, tracks depth.
  • agent_rollups(trace_dir=, only_top_level=True, limit=None) — one CostRollup per agent_call.
  • Exported from wikitrace.tree_cost, wikitrace.agent_rollups, wikitrace.CostRollup.

Dashboard (app/app/(obs)/agents/page.tsx)

New /agents route. One row per top-level agent_call with: agent label, nested-subagent count, llm_calls, tool_calls, total tokens, rolled-up cost, root latency, time ago. Error badge when any descendant has status=error. Sidebar nav gains an "Agents" link in the obs section.

Tests

8 new cases in tests/test_agents.py, including the canonical scenario:

Top-level "planner" spawns 3 subagents.
Each subagent makes 1 llm_call at $0.001.
→ rollup: cost_usd=$0.003, agent_calls=3, llm_calls=3,
          descendants=6, depth=2, sum of all tokens.

Full suite: pytest -q tests/ → 106 passed (up from 98), 14 skipped, 0 failures.

What this is NOT

  • Not the human-readable trace tree view (Twitter Q1's main half) — that's a separate, larger PR (~1 day of dashboard work).
  • Not a per-llm-call attribution under each agent_call — just the summed totals at the agent level.

Test plan

  • CI green on this PR (Python + dashboard typecheck + lint)
  • Run an agent that spawns subagents, hit /agents in the dashboard, see the rollup row
  • Verify cloud-mode: each tenant sees only their own rollups (uses existing loadSpansAsync backend abstraction)

🤖 Generated with Claude Code

Closes Twitter Q2: does it work for subagent convos / cost is
dictated by those now. The data model already supported nesting
via parent_id; this PR surfaces the rollup that users actually need.

Python (wikitrace/agents.py)
- tree_cost(spans, root_span_id) -> CostRollup or None
  Walks all descendants (BFS), sums cost_usd / input_tokens /
  output_tokens / total_tokens, counts nested agent_calls /
  llm_calls / tool_calls / errors, tracks tree depth.
- agent_rollups(trace_dir=, only_top_level=True, limit=None)
  Computes a CostRollup per agent_call in a trace dir.
  only_top_level=True (default) skips subagents whose parent is
  itself an agent_call, giving the canonical view: what did each
  user-visible agent run cost end-to-end.
- CostRollup dataclass exported from the public package.

Dashboard (app/app/(obs)/agents/page.tsx)
- New /agents route in the obs theme. One row per top-level
  agent_call with: agent label, nested-subagent count, llm_calls,
  tool_calls, total tokens, rolled-up cost, root latency, time
  ago. Error badge when any descendant span has status=error.
- Sidebar nav gains an Agents link in the obs section,
  positioned between Requests and Sessions.
- TS port of tree_cost / agent_rollups in app/lib/traces.ts
  (agentRollupsAsync). Uses the existing loadSpansAsync backend
  abstraction so cloud-mode tenants only see their own rollups.

Tests (tests/test_agents.py, 8 cases)
- One agent_call to one llm_call leaf: rollup matches the leaf
- Top-level planner spawns 3 subagents, each with own llm_call:
  asserts agent_calls=3, llm_calls=3, descendants=6, depth=2,
  cost summed across all leaves
- Unknown root_span_id returns None
- Spans without cost attrs contribute structurally only
- only_top_level True skips nested agent_calls; False produces
  one row per agent_call
- Missing trace_dir returns empty list (no exception)
- Root latency uses root span wall-time

Verified
- pytest -q tests/ -> 106 passed (up from 98), 14 skipped
- npx tsc --noEmit clean for all touched files

This is the smaller of the two remaining Twitter-feedback gaps;
the dashboard tree view (~1 day) is still pending.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@OmkarRayAI OmkarRayAI merged commit f6d7da0 into main Jun 5, 2026
4 checks passed
@OmkarRayAI OmkarRayAI deleted the feat/agent-cost-rollup branch June 5, 2026 13:02
OmkarRayAI pushed a commit that referenced this pull request Jun 5, 2026
Closes Twitter Q1B: a single page that walks one trace start-to-finish
in human-readable form. Previously the dashboard rendered one request
at a time; understanding what an agent actually did across N llm_calls,
M tool_calls, and possibly nested subagents required clicking through
each individually.

What lands

New route: app/app/(obs)/traces/[id]/page.tsx
- Header: trace_id + spans count + total latency + total cost +
  total tokens + session_id + user_id
- Body: collapsible tree using native HTML details elements
  (server-rendered, zero client JS for collapse). Auto-expands
  the root and any agent_call nodes; leaves llm_calls and
  tool_calls collapsed by default for scanability.
- Per-node: kind badge (Agent / LLM / Tool / Action / Retrieve /
  Judge), agent or tool or model label, duration, tokens, cost,
  start time. agent_call nodes also show subtree rollup (cost +
  nested-subagent count + descendant llm count).
- Inline content: pulls prompt and response out of attrs.request
  and attrs.response (Helicone-proxy shape) plus generic fallbacks
  (input, output, prompt, answer, arg.0, question). Truncated to
  1500 chars with a scrollable container; full data is in the raw
  span dump at the bottom.
- Events drawer per span (collapsed by default, first 50 events).
- Per-segment grouping: when any span carries session_segment > 0
  (from PR #13 mid-conversation reset), root spans are grouped
  into Initial conversation / After reset #1 / After reset #2.
- Raw JSON drawer at the bottom for power users.
- Error spans get a red dot indicator.

app/lib/traces.ts
- Exported treeCostFromSpans so the page can decorate agent_call
  nodes with their subtree totals without re-loading spans.

Reuses existing infra
- loadTraceSpansAsync: cloud-mode-aware loader from PR #1.
  Cloud-mode tenants only see their own trace spans.
- treeCostFromSpans: subtree rollup logic from PR #14.
- (obs) layout, glass styling, eyebrow/PageTitle widgets.

Verified
- npx tsc --noEmit clean for all my files
- pytest -q tests/ -> 106 passed, 14 skipped (no regressions)
- Renders correctly against fixtures with nested subagent calls,
  per-span events, error spans, multi-segment sessions, missing
  attrs (defensive fallbacks throughout)

Linked from /agents
- The /agents page rows already link href=/traces/{trace_id}
  (from PR #14). With this PR shipped, that link now lands on the
  full conversation walkthrough instead of the wiki-themed
  engineering view.

This closes the third and largest of the Twitter feedback gaps
(after PR #13 session_reset and PR #14 agent cost rollup).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OmkarRayAI added a commit that referenced this pull request Jun 5, 2026
…#15)

Closes Twitter Q1B: a single page that walks one trace start-to-finish
in human-readable form. Previously the dashboard rendered one request
at a time; understanding what an agent actually did across N llm_calls,
M tool_calls, and possibly nested subagents required clicking through
each individually.

What lands

New route: app/app/(obs)/traces/[id]/page.tsx
- Header: trace_id + spans count + total latency + total cost +
  total tokens + session_id + user_id
- Body: collapsible tree using native HTML details elements
  (server-rendered, zero client JS for collapse). Auto-expands
  the root and any agent_call nodes; leaves llm_calls and
  tool_calls collapsed by default for scanability.
- Per-node: kind badge (Agent / LLM / Tool / Action / Retrieve /
  Judge), agent or tool or model label, duration, tokens, cost,
  start time. agent_call nodes also show subtree rollup (cost +
  nested-subagent count + descendant llm count).
- Inline content: pulls prompt and response out of attrs.request
  and attrs.response (Helicone-proxy shape) plus generic fallbacks
  (input, output, prompt, answer, arg.0, question). Truncated to
  1500 chars with a scrollable container; full data is in the raw
  span dump at the bottom.
- Events drawer per span (collapsed by default, first 50 events).
- Per-segment grouping: when any span carries session_segment > 0
  (from PR #13 mid-conversation reset), root spans are grouped
  into Initial conversation / After reset #1 / After reset #2.
- Raw JSON drawer at the bottom for power users.
- Error spans get a red dot indicator.

app/lib/traces.ts
- Exported treeCostFromSpans so the page can decorate agent_call
  nodes with their subtree totals without re-loading spans.

Reuses existing infra
- loadTraceSpansAsync: cloud-mode-aware loader from PR #1.
  Cloud-mode tenants only see their own trace spans.
- treeCostFromSpans: subtree rollup logic from PR #14.
- (obs) layout, glass styling, eyebrow/PageTitle widgets.

Verified
- npx tsc --noEmit clean for all my files
- pytest -q tests/ -> 106 passed, 14 skipped (no regressions)
- Renders correctly against fixtures with nested subagent calls,
  per-span events, error spans, multi-segment sessions, missing
  attrs (defensive fallbacks throughout)

Linked from /agents
- The /agents page rows already link href=/traces/{trace_id}
  (from PR #14). With this PR shipped, that link now lands on the
  full conversation walkthrough instead of the wiki-themed
  engineering view.

This closes the third and largest of the Twitter feedback gaps
(after PR #13 session_reset and PR #14 agent cost rollup).

Co-authored-by: Omkar Ray <your real email>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant