Skip to content

fix(cost): never fabricate spend — flat-rate engines $0, cache-aware …#204

Merged
cukas merged 2 commits into
mainfrom
feat/honest-cost-tracking
Jun 11, 2026
Merged

fix(cost): never fabricate spend — flat-rate engines $0, cache-aware …#204
cukas merged 2 commits into
mainfrom
feat/honest-cost-tracking

Conversation

@cukas

@cukas cukas commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

…pricing, budget checks on metered cost only

Root cause of the '$8.79 actual vs $0.07 estimated' budget warning on a kimi coding-plan run: (1) kimi-for-coding/minimax-/zai-coding-plan had no price-table entry and fell to a made-up $2/Mtok default although their marginal cost is $0 (flat-rate subscription); (2) AI-SDK promptTokens include cache reads, so a multi-turn tool loop re-priced the whole context every call (~4.4M 'tokens'); (3) the plan budget check and plan-mode planning watchdog read the ballpark totalCostUsd that token-tracker itself documents as 'NOT a bill'.

  • isFlatRateEngine (naming-convention heuristic until EngineDefinition carries billing): estimateCost returns 0; sdk usage on flat-rate engines counts as unmetered in getStats.
  • estimateCostCacheAware: cached input priced at the 10% cache-read rate, clamped against promptTokens only (never discounts output tokens).
  • plan-mode snapshotTokens + session planning-budget warning read meteredCostUsd.
  • Review round (claude/codex/agy): both codex findings fixed (planning watchdog ballpark, clamp vs promptTokens).

cukas and others added 2 commits June 11, 2026 16:54
…pricing, budget checks on metered cost only

Root cause of the '$8.79 actual vs $0.07 estimated' budget warning on a kimi
coding-plan run: (1) kimi-for-coding/minimax-/zai-coding-plan had no price-table
entry and fell to a made-up $2/Mtok default although their marginal cost is $0
(flat-rate subscription); (2) AI-SDK promptTokens include cache reads, so a
multi-turn tool loop re-priced the whole context every call (~4.4M 'tokens');
(3) the plan budget check and plan-mode planning watchdog read the ballpark
totalCostUsd that token-tracker itself documents as 'NOT a bill'.

- isFlatRateEngine (naming-convention heuristic until EngineDefinition carries
  billing): estimateCost returns 0; sdk usage on flat-rate engines counts as
  unmetered in getStats.
- estimateCostCacheAware: cached input priced at the 10% cache-read rate,
  clamped against promptTokens only (never discounts output tokens).
- plan-mode snapshotTokens + session planning-budget warning read meteredCostUsd.
- Review round (claude/codex/agy): both codex findings fixed (planning watchdog
  ballpark, clamp vs promptTokens).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cukas cukas merged commit ef46f78 into main Jun 11, 2026
2 checks passed
@cukas cukas deleted the feat/honest-cost-tracking branch June 11, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant