Skip to content

Sub-workflow agent usage/cost is omitted from the parent workflow's Token Usage Summary (+ missing pricing for claude-opus-4.8 / gpt-5.5) #266

Description

@brrusino

Summary

The end-of-run Token Usage Summary / Cost Breakdown under-reports badly for any workflow that uses type: workflow sub-workflow steps, because a child sub-workflow's UsageTracker is never merged back into the parent. The printed total only reflects agents the root engine recorded directly; every sub-workflow agent is invisible.

On a real run with 3 sub-workflow agents, the summary printed 116,529 input tokens / $0.43 when the actual figure (summed from the agent_completed events) was 462,844 input tokens — the analyze (119k), fix_code (135k), and checker (92k) agents were all missing from the aggregate.

A second, independent bug compounds it: two current models are absent from DEFAULT_PRICING, so even where their tokens are counted they're costed at $0.

Version: conductor-cli v0.1.19.

Bug 1 — sub-workflow usage is never rolled into the parent

type: workflow steps run in a child WorkflowEngine with its own UsageTracker. The child is run via _run_child_engine, which both sub-workflow entry points funnel through:

  • _execute_subworkflow (≈ line 1367) — return await self._run_child_engine(child_engine, sub_inputs, agent). Returns only the output; child_engine.usage_tracker is dropped.
  • _execute_subworkflow_with_inputs (≈ line 1480) — captures usage = child_engine.usage_tracker.get_summary() but the for-each caller (≈ line 5073) uses it only for a for_each_item_completed event payload, never merging it into self.usage_tracker.

The final summary is built solely from the root engine's tracker:

# get_execution_summary(), ≈ line 5577
usage = self.usage_tracker.get_summary()
summary["usage"] = { "total_input_tokens": usage.total_input_tokens, ... }

Since no code path ever extends the parent tracker with child records, every sub-workflow agent is omitted from the printed total. (The live --web dashboard looks roughly right because each child engine shares the parent's event_emitter and emits its own agent_completed events — only the aggregate is wrong, which makes this easy to misdiagnose.)

Suggested fix

_run_child_engine is the single chokepoint both paths use, and it already receives the child engine. Merging there is enough and is transitive (a grandchild merges into its child, then that child — now carrying the grandchild's records — into the parent), with no double counting because the parent never records sub-workflow agents itself:

async def _run_child_engine(self, child_engine, sub_inputs, agent):
    try:
        return await self._orig_run_child_engine(child_engine, sub_inputs, agent)
    finally:
        # Roll the sub-workflow's per-agent usage into the parent so the
        # final summary + cost breakdown include sub-workflow agents.
        self.usage_tracker._agents.extend(
            child_engine.usage_tracker.get_summary().agents
        )

(Using finally also costs an expensive child that ultimately fails.)

Bug 2 — DEFAULT_PRICING is missing current models

conductor/engine/pricing.py DEFAULT_PRICING has no entry for claude-opus-4.8 or gpt-5.5. get_pricing returns None for both (the --delimited versioned-suffix fallback doesn't match either), so calculate_cost returns None and those agents are costed at $0. Worth noting the fuzzy fallback can also mis-price: a name like claude-opus-4.8 would match the older claude-opus-4 entry ($15/$75 per Mtok) if it were claude-opus-4-..., i.e. ~3× the real Opus 4.x rate — so silent fuzzy matching across model families is a sharp edge in its own right.

Suggested fix

Add current entries (and consider sourcing this table so it doesn't go stale):

"claude-opus-4.8": ModelPricing(input_per_mtok=5.00, output_per_mtok=25.00,
                                cache_read_per_mtok=0.50, cache_write_per_mtok=6.25),
"gpt-5.5": ModelPricing(input_per_mtok=2.00, output_per_mtok=8.00),

Repro

  1. Run any workflow with a type: workflow step whose sub-workflow invokes at least one agent.
  2. Compare the printed "Token Usage Summary" totals against the sum of total_input_tokens / cost_usd across all agent_completed events (e.g. from GET /api/state on the --web dashboard, or the --web logs export).
  3. The summary totals exclude every sub-workflow agent.

Workaround

For anyone hitting this before a fix lands: both can be fixed in-process without editing site-packages, via a sitecustomize.py on PYTHONPATH that (a) adds the missing DEFAULT_PRICING entries in place and (b) wraps WorkflowEngine._run_child_engine with the finally-merge above. Happy to send a PR if a maintainer confirms the chokepoint-merge approach is the preferred shape.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions