feat(cost): cost-aware model router for task complexity (#375)#411
feat(cost): cost-aware model router for task complexity (#375)#411Prateeks16 wants to merge 4 commits into
Conversation
…nth#375) CST-004 scope — dynamic cost-aware routing. Scores a task's complexity and routes it to the cheapest model capable of handling it: simple subtasks go to cheap models (e.g. Gemini Flash) while expensive, advanced models are reserved for complex work, optimizing the project budget. - agentwatch/cost/complexity_router.py: TaskComplexity tiers, a heuristic score_complexity(), a per-model capability table, and CostAwareRouter.route() returning a structured RoutingDecision (chosen model, complexity, estimated cost, alternatives considered, reason). Supports a budget ceiling (downgrades rather than overspending) and a manual model override. Reuses comparator.DEFAULT_PRICING / estimate() as the single source of truth for rates. - Tests: tier routing (simple/standard/complex), explicit complexity, override + unknown-override error, budget downgrade, considered-list ordering, to_dict shape, scorer boundaries, and custom capability tables. - Docs: MASTERLIST_STATUS (CST-008) and a CHANGELOG entry.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughA new ChangesCost-Aware Model Router (CST-009)
Sequence Diagram(s)sequenceDiagram
participant Caller
participant CostAwareRouter
participant score_complexity
participant estimate
Caller->>CostAwareRouter: route(signals, override_model, budget_ceiling)
alt override_model provided
CostAwareRouter->>CostAwareRouter: validate override_model in pricing
CostAwareRouter-->>Caller: RoutingDecision(reason="manual_override")
else
CostAwareRouter->>score_complexity: score_complexity(signals)
score_complexity-->>CostAwareRouter: TaskComplexity tier
loop each model in pricing
CostAwareRouter->>estimate: estimate(model, input_tokens, output_tokens)
estimate-->>CostAwareRouter: cost float
end
CostAwareRouter->>CostAwareRouter: filter by capability >= tier
alt no capable model
CostAwareRouter-->>Caller: RoutingDecision(most_capable, reason="fallback")
else cheapest capable cost > budget_ceiling
CostAwareRouter-->>Caller: RoutingDecision(cheapest_overall, reason="budget_ceiling_exceeded")
else
CostAwareRouter-->>Caller: RoutingDecision(cheapest_capable, reason="cost_optimized")
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🧪 PR Test Results
Python 3.12 · commit 96c1d7d |
There was a problem hiding this comment.
Pull request overview
Adds a new cost-intelligence routing utility that classifies task complexity and selects the cheapest model that is rated capable for that complexity tier, reusing the existing CST-002 pricing/estimation logic.
Changes:
- Introduces
agentwatch.cost.complexity_routerwith complexity scoring, capability filtering, budget ceiling downgrade behavior, and manual override support. - Adds a dedicated test suite for the new router behaviors.
- Updates project documentation (
MASTERLIST_STATUS.md) and release notes (CHANGELOG.md) to reflect the new feature.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| agentwatch/cost/complexity_router.py | Implements complexity tiers, heuristic scorer, capability table, and routing decision logic on top of CST-002 cost estimates. |
| tests/test_cost_router.py | Adds unit tests covering routing tiers, overrides, budget ceiling downgrade, scoring boundaries, and custom capability maps. |
| MASTERLIST_STATUS.md | Adds a status row documenting the new CST item for the router. |
| CHANGELOG.md | Adds an Unreleased “Added” entry describing the new cost-aware router. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_cost_router.py (1)
93-99: ⚡ Quick winAdd regression tests for empty capability-map semantics and negative token rejection.
These two edge cases are part of the router’s public contract and are currently unpinned by tests.
Possible test additions
def test_custom_capability_table_is_respected(): @@ d = router.route(complexity=TaskComplexity.COMPLEX, input_tokens=1000) assert d.model == "claude-haiku-4-5" + + +def test_empty_capability_map_is_respected(): + router = CostAwareRouter(capability={}) + d = router.route(complexity=TaskComplexity.COMPLEX, input_tokens=1000) + assert d.reason == "no_model_rated_for_tier_using_most_capable" + + +@pytest.mark.parametrize("input_tokens,output_tokens", [(-1, 1), (1, -1)]) +def test_negative_tokens_rejected(input_tokens, output_tokens): + with pytest.raises(ValueError, match="must be >= 0"): + CostAwareRouter().route( + complexity=TaskComplexity.SIMPLE, + input_tokens=input_tokens, + output_tokens=output_tokens, + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cost_router.py` around lines 93 - 99, Add two new regression test functions to cover edge cases in the CostAwareRouter's public contract. First, create a test that verifies the router's behavior when initialized with an empty capability map dictionary, ensuring it falls back to default model selection logic. Second, create a test that verifies the router properly rejects or handles negative token counts passed to the route method, confirming it either raises an appropriate exception or returns a valid response. Both tests should use the CostAwareRouter class and its route method to validate these boundary conditions are handled correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@agentwatch/cost/complexity_router.py`:
- Around line 119-120: The constructor for complexity_router.py uses the `or`
operator for default fallbacks on the _pricing and _capability attributes, which
treats empty dictionaries as falsy and silently replaces them with
DEFAULT_PRICING and DEFAULT_CAPABILITY. This breaks the API contract because
callers may explicitly pass empty dictionaries to indicate "use no overrides" or
"use fallback behavior". Replace the `or DEFAULT_PRICING` and `or
DEFAULT_CAPABILITY` fallbacks with explicit None checks using the pattern
`attribute if attribute is not None else DEFAULT_VALUE` so that empty
dictionaries passed by callers are preserved instead of being silently replaced
with defaults.
- Around line 141-145: Add validation to ensure that the token counts are
non-negative before calling the estimate() function. Check both tokens_in (which
is set from input_tokens or signals.input_tokens) and output_tokens to confirm
they are greater than or equal to zero. If either token count is negative,
handle the validation error appropriately (such as raising an exception or
returning an error response) to prevent negative cost calculations that could
lead to incorrect model selection in the totals list.
---
Nitpick comments:
In `@tests/test_cost_router.py`:
- Around line 93-99: Add two new regression test functions to cover edge cases
in the CostAwareRouter's public contract. First, create a test that verifies the
router's behavior when initialized with an empty capability map dictionary,
ensuring it falls back to default model selection logic. Second, create a test
that verifies the router properly rejects or handles negative token counts
passed to the route method, confirming it either raises an appropriate exception
or returns a valid response. Both tests should use the CostAwareRouter class and
its route method to validate these boundary conditions are handled correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e2675a1d-8c6a-45c1-825a-3223559b7d4a
📒 Files selected for processing (4)
CHANGELOG.mdMASTERLIST_STATUS.mdagentwatch/cost/complexity_router.pytests/test_cost_router.py
- Renumber to CST-009 (CST-008 is already used by stale-session eviction in test_cost.py) across module/test docstrings and MASTERLIST. - Use explicit `is None` checks for pricing/capability so a caller can pass an intentionally empty table, and reject an empty pricing table with a clear ValueError (estimate() would otherwise coerce it back to DEFAULT_PRICING). - Type capability as dict[str, TaskComplexity] (annotation + param) to match the enum values actually stored/passed. - Default unknown-model rank to TaskComplexity.SIMPLE in the no-capable-model fallback for consistency. - Test: assert the considered list length against len(DEFAULT_PRICING) instead of a hardcoded 7, and add an empty-pricing guard test.
…vanth#375) - route() rejects negative input/output token counts with a ValueError before computing costs (negative tokens would yield negative "cheapest" estimates). - Tests: explicit empty-capability-map fallback and negative-token rejection. The `or` → `is None` constructor fix CodeRabbit flagged was already applied in 869cd46.
|
@sreerevanth Please review this pull request |
Resolve CHANGELOG conflict by keeping both Unreleased entries (cost-aware model router and the already-merged scheduled red-team automation).
Closes #375
Summary
Implements dynamic cost-aware routing (CST-004 cost-intelligence scope): score a task's complexity and route it to the cheapest model capable of handling it. Simple subtasks go to cheap models (e.g. Gemini Flash); expensive, advanced models are reserved for genuinely complex work — optimizing the project budget.
The existing cost modules don't cover this:
comparator.py(CST-002) gives per-model estimates,router.py(CST-003) does health-based failover, andpredictor.py(CST-004) predicts cost from history. This adds the missing complexity → capability → cheapest-model selector, reusingcomparator.DEFAULT_PRICING/estimate()as the single source of truth for rates.What's added —
agentwatch/cost/complexity_router.pyTaskComplexity(SIMPLE / STANDARD / COMPLEX) and a per-model capability table.score_complexity(TaskSignals)— heuristic from prompt size + reasoning/tool flags; callers can pass an explicit complexity instead.CostAwareRouter.route(...)→RoutingDecision(model, complexity, estimated_cost, reason, considered):budget_ceiling_exceeded_downgradedrather than silently overspending,Behavior
gemini-1.5-flash(cheapest)gemini-1.5-pro(cheapest with capability ≥ standard)claude-opus-4-5Design notes
route(scorer=...)accepts a pluggable classifier (e.g. an LLM-based one) — matching the design question I raised on the issue.Testing
tests/test_cost_router.py(14 tests): tier routing, explicit complexity, override + unknown-override error, budget downgrade, considered-list ordering/completeness,to_dictshape, scorer boundaries (parametrized), and custom capability tables.Docs
MASTERLIST_STATUS.md— newCST-008row (CST-004/005 are already taken by the predictor/anomaly detector; the issue uses CST-004 as a category label).CHANGELOG.md—Unreleased > Added.The repo-wide
ruff check .gate has pre-existing failures in unrelated test files onmain; this PR's changed files are lint-clean.Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests