feat(cost): cost-aware model router for task complexity (#375) by Prateeks16 · Pull Request #411 · sreerevanth/AgentWatch

Prateeks16 · 2026-06-18T08:34:53Z

Closes #375

Summary

Implements dynamic cost-aware routing (CST-004 cost-intelligence scope): score a task's complexity and route it to the cheapest model capable of handling it. Simple subtasks go to cheap models (e.g. Gemini Flash); expensive, advanced models are reserved for genuinely complex work — optimizing the project budget.

The existing cost modules don't cover this: comparator.py (CST-002) gives per-model estimates, router.py (CST-003) does health-based failover, and predictor.py (CST-004) predicts cost from history. This adds the missing complexity → capability → cheapest-model selector, reusing comparator.DEFAULT_PRICING / estimate() as the single source of truth for rates.

What's added — `agentwatch/cost/complexity_router.py`

TaskComplexity (SIMPLE / STANDARD / COMPLEX) and a per-model capability table.
score_complexity(TaskSignals) — heuristic from prompt size + reasoning/tool flags; callers can pass an explicit complexity instead.
CostAwareRouter.route(...) → RoutingDecision(model, complexity, estimated_cost, reason, considered):
- selects the cheapest model whose capability ≥ task complexity,
- budget ceiling: if even the cheapest capable model exceeds it, downgrades to the cheapest model overall and flags budget_ceiling_exceeded_downgraded rather than silently overspending,
- manual override to force a specific model.

Behavior

Task	Routed to
Simple (short, no tools)	`gemini-1.5-flash` (cheapest)
Standard	`gemini-1.5-pro` (cheapest with capability ≥ standard)
Complex (reasoning / huge input)	`claude-opus-4-5`
Complex + tight budget ceiling	downgraded to cheapest overall, flagged

Design notes

Heuristic scorer for v1, but route(scorer=...) accepts a pluggable classifier (e.g. an LLM-based one) — matching the design question I raised on the issue.
Capability ranks are conservative: a model absent from the table is treated as SIMPLE-only, so an unknown/cheap model never gets a complex task.
Standalone module that callers opt into; no existing dispatch point is forced to change.

Testing

tests/test_cost_router.py (14 tests): tier routing, explicit complexity, override + unknown-override error, budget downgrade, considered-list ordering/completeness, to_dict shape, scorer boundaries (parametrized), and custom capability tables.

pytest tests/test_cost_router.py tests/test_cost.py  -> 36 passed
ruff check / format (changed files)                  -> clean

Docs

MASTERLIST_STATUS.md — new CST-008 row (CST-004/005 are already taken by the predictor/anomaly detector; the issue uses CST-004 as a category label).
CHANGELOG.md — Unreleased > Added.

_{The repo-wide ruff check . gate has pre-existing failures in unrelated test files on main; this PR's changed files are lint-clean.}

Summary by CodeRabbit

Release Notes

New Features
- Added a cost-aware model router that evaluates task complexity and routes to the cheapest capable model, improving cost efficiency.
- Supports a configurable budget ceiling with an option to manually override the selected model.
- Provides detailed routing diagnostics (estimated cost, complexity tier, and selection rationale).
Documentation
- Updated the changelog and the feature status tracker to reflect the new cost intelligence capability.
Tests
- Added coverage for routing decisions, validation, budget downgrades, and router diagnostics.

…nth#375) CST-004 scope — dynamic cost-aware routing. Scores a task's complexity and routes it to the cheapest model capable of handling it: simple subtasks go to cheap models (e.g. Gemini Flash) while expensive, advanced models are reserved for complex work, optimizing the project budget. - agentwatch/cost/complexity_router.py: TaskComplexity tiers, a heuristic score_complexity(), a per-model capability table, and CostAwareRouter.route() returning a structured RoutingDecision (chosen model, complexity, estimated cost, alternatives considered, reason). Supports a budget ceiling (downgrades rather than overspending) and a manual model override. Reuses comparator.DEFAULT_PRICING / estimate() as the single source of truth for rates. - Tests: tier routing (simple/standard/complex), explicit complexity, override + unknown-override error, budget downgrade, considered-list ordering, to_dict shape, scorer boundaries, and custom capability tables. - Docs: MASTERLIST_STATUS (CST-008) and a CHANGELOG entry.

coderabbitai · 2026-06-18T08:35:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ef3e1b08-7c24-48f6-b890-34ee7d899959

📥 Commits

Reviewing files that changed from the base of the PR and between ffe5f69 and 96c1d7d.

📒 Files selected for processing (4)

CHANGELOG.md
MASTERLIST_STATUS.md
agentwatch/cost/complexity_router.py
tests/test_cost_router.py

✅ Files skipped from review due to trivial changes (2)

MASTERLIST_STATUS.md
CHANGELOG.md

🚧 Files skipped from review as they are similar to previous changes (1)

agentwatch/cost/complexity_router.py

📝 Walkthrough

Walkthrough

A new agentwatch/cost/complexity_router.py module is added implementing a cost-aware model router. It defines TaskComplexity tiers, TaskSignals for heuristic scoring, CostAwareRouter that routes to the cheapest sufficiently capable model using DEFAULT_PRICING, and RoutingDecision for structured output. A comprehensive test suite and changelog/status entries accompany it.

Changes

Cost-Aware Model Router (CST-009)

Layer / File(s)	Summary
Complexity types, signals, and scoring `agentwatch/cost/complexity_router.py`	Defines `TaskComplexity` enum (`SIMPLE=1`, `STANDARD=2`, `COMPLEX=3`), `DEFAULT_CAPABILITY` mapping, `TaskSignals` dataclass with computed `input_tokens`, and `score_complexity()` which classifies tasks from token count and requirement flags.
RoutingDecision and CostAwareRouter `agentwatch/cost/complexity_router.py`	Defines `RoutingDecision` with `to_dict()` serialization. `CostAwareRouter.route()` computes per-model costs via `estimate()`, resolves complexity tier, validates `override_model`, falls back to most-capable model when no rated model exists, selects cheapest capable model, and downgrades to overall cheapest when `budget_ceiling` is exceeded. Exports all public symbols via `__all__`.
Comprehensive test suite `tests/test_cost_router.py`	Adds test functions covering all routing branches (simple/standard/complex, explicit complexity override, override_model with unknown-model rejection, budget-ceiling downgrade), diagnostic validation (considered-list completeness and cost-sort order, `to_dict()` shape), complexity-scoring boundaries via parametrized `TaskSignals`, custom capability table behavior, empty pricing table validation, and negative token input validation.
Changelog and status tracking `CHANGELOG.md`, `MASTERLIST_STATUS.md`	Records CST-009 feature addition in the Unreleased section and marks the feature complete in the Phase 7 — Cost Intelligence table with test suite reference.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant CostAwareRouter
    participant score_complexity
    participant estimate

    Caller->>CostAwareRouter: route(signals, override_model, budget_ceiling)
    alt override_model provided
        CostAwareRouter->>CostAwareRouter: validate override_model in pricing
        CostAwareRouter-->>Caller: RoutingDecision(reason="manual_override")
    else
        CostAwareRouter->>score_complexity: score_complexity(signals)
        score_complexity-->>CostAwareRouter: TaskComplexity tier
        loop each model in pricing
            CostAwareRouter->>estimate: estimate(model, input_tokens, output_tokens)
            estimate-->>CostAwareRouter: cost float
        end
        CostAwareRouter->>CostAwareRouter: filter by capability >= tier
        alt no capable model
            CostAwareRouter-->>Caller: RoutingDecision(most_capable, reason="fallback")
        else cheapest capable cost > budget_ceiling
            CostAwareRouter-->>Caller: RoutingDecision(cheapest_overall, reason="budget_ceiling_exceeded")
        else
            CostAwareRouter-->>Caller: RoutingDecision(cheapest_capable, reason="cost_optimized")
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

Medium

Poem

🐇 A router hops through models with care,
Counting tokens floating through the air.
Simple tasks take the cheap path down,
Complex thoughts earn the capable crown.
Budget ceiling? No worries here—
The cheapest fallback will appear!
route() away, the savings are near. 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: implementing a cost-aware model router that handles task complexity-based routing, which is the core focus of the PR.
Linked Issues check	✅ Passed	The PR fully implements all coding requirements from issue `#375`: cost-aware routing with task complexity scoring, routing to cheaper models for simple tasks, reserving advanced models for complex operations, and budget optimization.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the linked issue `#375`: core router implementation, tests, documentation updates (CHANGELOG and MASTERLIST_STATUS), with no unrelated modifications.
Docstring Coverage	✅ Passed	Docstring coverage is 89.47% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-18T08:36:04Z

🧪 PR Test Results

Check	Result
Tests (`pytest tests/`)	❌ failure
Lint (`ruff check .`)	❌ failure
Coverage (`agentwatch`)	72.92%

_{Python 3.12 · commit 96c1d7d}

Copilot

Pull request overview

Adds a new cost-intelligence routing utility that classifies task complexity and selects the cheapest model that is rated capable for that complexity tier, reusing the existing CST-002 pricing/estimation logic.

Changes:

Introduces agentwatch.cost.complexity_router with complexity scoring, capability filtering, budget ceiling downgrade behavior, and manual override support.
Adds a dedicated test suite for the new router behaviors.
Updates project documentation (MASTERLIST_STATUS.md) and release notes (CHANGELOG.md) to reflect the new feature.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File	Description
agentwatch/cost/complexity_router.py	Implements complexity tiers, heuristic scorer, capability table, and routing decision logic on top of CST-002 cost estimates.
tests/test_cost_router.py	Adds unit tests covering routing tiers, overrides, budget ceiling downgrade, scoring boundaries, and custom capability maps.
MASTERLIST_STATUS.md	Adds a status row documenting the new CST item for the router.
CHANGELOG.md	Adds an Unreleased “Added” entry describing the new cost-aware router.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/test_cost_router.py (1)

93-99: ⚡ Quick win

Add regression tests for empty capability-map semantics and negative token rejection.

These two edge cases are part of the router’s public contract and are currently unpinned by tests.

Possible test additions

 def test_custom_capability_table_is_respected():
@@
     d = router.route(complexity=TaskComplexity.COMPLEX, input_tokens=1000)
     assert d.model == "claude-haiku-4-5"
+
+
+def test_empty_capability_map_is_respected():
+    router = CostAwareRouter(capability={})
+    d = router.route(complexity=TaskComplexity.COMPLEX, input_tokens=1000)
+    assert d.reason == "no_model_rated_for_tier_using_most_capable"
+
+
+@pytest.mark.parametrize("input_tokens,output_tokens", [(-1, 1), (1, -1)])
+def test_negative_tokens_rejected(input_tokens, output_tokens):
+    with pytest.raises(ValueError, match="must be >= 0"):
+        CostAwareRouter().route(
+            complexity=TaskComplexity.SIMPLE,
+            input_tokens=input_tokens,
+            output_tokens=output_tokens,
+        )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cost_router.py` around lines 93 - 99, Add two new regression test
functions to cover edge cases in the CostAwareRouter's public contract. First,
create a test that verifies the router's behavior when initialized with an empty
capability map dictionary, ensuring it falls back to default model selection
logic. Second, create a test that verifies the router properly rejects or
handles negative token counts passed to the route method, confirming it either
raises an appropriate exception or returns a valid response. Both tests should
use the CostAwareRouter class and its route method to validate these boundary
conditions are handled correctly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/cost/complexity_router.py`:
- Around line 119-120: The constructor for complexity_router.py uses the `or`
operator for default fallbacks on the _pricing and _capability attributes, which
treats empty dictionaries as falsy and silently replaces them with
DEFAULT_PRICING and DEFAULT_CAPABILITY. This breaks the API contract because
callers may explicitly pass empty dictionaries to indicate "use no overrides" or
"use fallback behavior". Replace the `or DEFAULT_PRICING` and `or
DEFAULT_CAPABILITY` fallbacks with explicit None checks using the pattern
`attribute if attribute is not None else DEFAULT_VALUE` so that empty
dictionaries passed by callers are preserved instead of being silently replaced
with defaults.
- Around line 141-145: Add validation to ensure that the token counts are
non-negative before calling the estimate() function. Check both tokens_in (which
is set from input_tokens or signals.input_tokens) and output_tokens to confirm
they are greater than or equal to zero. If either token count is negative,
handle the validation error appropriately (such as raising an exception or
returning an error response) to prevent negative cost calculations that could
lead to incorrect model selection in the totals list.

---

Nitpick comments:
In `@tests/test_cost_router.py`:
- Around line 93-99: Add two new regression test functions to cover edge cases
in the CostAwareRouter's public contract. First, create a test that verifies the
router's behavior when initialized with an empty capability map dictionary,
ensuring it falls back to default model selection logic. Second, create a test
that verifies the router properly rejects or handles negative token counts
passed to the route method, confirming it either raises an appropriate exception
or returns a valid response. Both tests should use the CostAwareRouter class and
its route method to validate these boundary conditions are handled correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e2675a1d-8c6a-45c1-825a-3223559b7d4a

📥 Commits

Reviewing files that changed from the base of the PR and between 19bbbeb and ffe5f69.

📒 Files selected for processing (4)

CHANGELOG.md
MASTERLIST_STATUS.md
agentwatch/cost/complexity_router.py
tests/test_cost_router.py

- Renumber to CST-009 (CST-008 is already used by stale-session eviction in test_cost.py) across module/test docstrings and MASTERLIST. - Use explicit `is None` checks for pricing/capability so a caller can pass an intentionally empty table, and reject an empty pricing table with a clear ValueError (estimate() would otherwise coerce it back to DEFAULT_PRICING). - Type capability as dict[str, TaskComplexity] (annotation + param) to match the enum values actually stored/passed. - Default unknown-model rank to TaskComplexity.SIMPLE in the no-capable-model fallback for consistency. - Test: assert the considered list length against len(DEFAULT_PRICING) instead of a hardcoded 7, and add an empty-pricing guard test.

…vanth#375) - route() rejects negative input/output token counts with a ValueError before computing costs (negative tokens would yield negative "cheapest" estimates). - Tests: explicit empty-capability-map fallback and negative-token rejection. The `or` → `is None` constructor fix CodeRabbit flagged was already applied in 869cd46.

Prateeks16 · 2026-06-19T07:57:16Z

@sreerevanth Please review this pull request

Resolve CHANGELOG conflict by keeping both Unreleased entries (cost-aware model router and the already-merged scheduled red-team automation).

Copilot AI review requested due to automatic review settings June 18, 2026 08:34

Copilot started reviewing on behalf of Prateeks16 June 18, 2026 08:35 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread agentwatch/cost/complexity_router.py Outdated

Comment thread agentwatch/cost/complexity_router.py

Prateeks16 added 2 commits June 18, 2026 14:13

Merge origin/main into cost-router branch

96c1d7d

Resolve CHANGELOG conflict by keeping both Unreleased entries (cost-aware model router and the already-merged scheduled red-team automation).

Prateeks16 mentioned this pull request Jun 20, 2026

fix(cli): route all session subcommands through one Typer group #449

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cost): cost-aware model router for task complexity (#375)#411

feat(cost): cost-aware model router for task complexity (#375)#411
Prateeks16 wants to merge 4 commits into
sreerevanth:mainfrom
Prateeks16:feat/model-cost-comparator-375

Prateeks16 commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Prateeks16 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Prateeks16 commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's added — agentwatch/cost/complexity_router.py

Behavior

Design notes

Testing

Docs

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 PR Test Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Prateeks16 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Prateeks16 commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

What's added — `agentwatch/cost/complexity_router.py`

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading