Skip to content

feat(cost): cost-aware model router for task complexity (#375)#411

Open
Prateeks16 wants to merge 4 commits into
sreerevanth:mainfrom
Prateeks16:feat/model-cost-comparator-375
Open

feat(cost): cost-aware model router for task complexity (#375)#411
Prateeks16 wants to merge 4 commits into
sreerevanth:mainfrom
Prateeks16:feat/model-cost-comparator-375

Conversation

@Prateeks16

@Prateeks16 Prateeks16 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Closes #375

Summary

Implements dynamic cost-aware routing (CST-004 cost-intelligence scope): score a task's complexity and route it to the cheapest model capable of handling it. Simple subtasks go to cheap models (e.g. Gemini Flash); expensive, advanced models are reserved for genuinely complex work — optimizing the project budget.

The existing cost modules don't cover this: comparator.py (CST-002) gives per-model estimates, router.py (CST-003) does health-based failover, and predictor.py (CST-004) predicts cost from history. This adds the missing complexity → capability → cheapest-model selector, reusing comparator.DEFAULT_PRICING / estimate() as the single source of truth for rates.

What's added — agentwatch/cost/complexity_router.py

  • TaskComplexity (SIMPLE / STANDARD / COMPLEX) and a per-model capability table.
  • score_complexity(TaskSignals) — heuristic from prompt size + reasoning/tool flags; callers can pass an explicit complexity instead.
  • CostAwareRouter.route(...)RoutingDecision(model, complexity, estimated_cost, reason, considered):
    • selects the cheapest model whose capability ≥ task complexity,
    • budget ceiling: if even the cheapest capable model exceeds it, downgrades to the cheapest model overall and flags budget_ceiling_exceeded_downgraded rather than silently overspending,
    • manual override to force a specific model.

Behavior

Task Routed to
Simple (short, no tools) gemini-1.5-flash (cheapest)
Standard gemini-1.5-pro (cheapest with capability ≥ standard)
Complex (reasoning / huge input) claude-opus-4-5
Complex + tight budget ceiling downgraded to cheapest overall, flagged

Design notes

  • Heuristic scorer for v1, but route(scorer=...) accepts a pluggable classifier (e.g. an LLM-based one) — matching the design question I raised on the issue.
  • Capability ranks are conservative: a model absent from the table is treated as SIMPLE-only, so an unknown/cheap model never gets a complex task.
  • Standalone module that callers opt into; no existing dispatch point is forced to change.

Testing

tests/test_cost_router.py (14 tests): tier routing, explicit complexity, override + unknown-override error, budget downgrade, considered-list ordering/completeness, to_dict shape, scorer boundaries (parametrized), and custom capability tables.

pytest tests/test_cost_router.py tests/test_cost.py  -> 36 passed
ruff check / format (changed files)                  -> clean

Docs

  • MASTERLIST_STATUS.md — new CST-008 row (CST-004/005 are already taken by the predictor/anomaly detector; the issue uses CST-004 as a category label).
  • CHANGELOG.mdUnreleased > Added.

The repo-wide ruff check . gate has pre-existing failures in unrelated test files on main; this PR's changed files are lint-clean.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a cost-aware model router that evaluates task complexity and routes to the cheapest capable model, improving cost efficiency.
    • Supports a configurable budget ceiling with an option to manually override the selected model.
    • Provides detailed routing diagnostics (estimated cost, complexity tier, and selection rationale).
  • Documentation

    • Updated the changelog and the feature status tracker to reflect the new cost intelligence capability.
  • Tests

    • Added coverage for routing decisions, validation, budget downgrades, and router diagnostics.

…nth#375)

CST-004 scope — dynamic cost-aware routing. Scores a task's complexity and
routes it to the cheapest model capable of handling it: simple subtasks go to
cheap models (e.g. Gemini Flash) while expensive, advanced models are reserved
for complex work, optimizing the project budget.

- agentwatch/cost/complexity_router.py: TaskComplexity tiers, a heuristic
  score_complexity(), a per-model capability table, and CostAwareRouter.route()
  returning a structured RoutingDecision (chosen model, complexity, estimated
  cost, alternatives considered, reason). Supports a budget ceiling (downgrades
  rather than overspending) and a manual model override. Reuses
  comparator.DEFAULT_PRICING / estimate() as the single source of truth for rates.
- Tests: tier routing (simple/standard/complex), explicit complexity, override
  + unknown-override error, budget downgrade, considered-list ordering,
  to_dict shape, scorer boundaries, and custom capability tables.
- Docs: MASTERLIST_STATUS (CST-008) and a CHANGELOG entry.
Copilot AI review requested due to automatic review settings June 18, 2026 08:34
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ef3e1b08-7c24-48f6-b890-34ee7d899959

📥 Commits

Reviewing files that changed from the base of the PR and between ffe5f69 and 96c1d7d.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • MASTERLIST_STATUS.md
  • agentwatch/cost/complexity_router.py
  • tests/test_cost_router.py
✅ Files skipped from review due to trivial changes (2)
  • MASTERLIST_STATUS.md
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • agentwatch/cost/complexity_router.py

📝 Walkthrough

Walkthrough

A new agentwatch/cost/complexity_router.py module is added implementing a cost-aware model router. It defines TaskComplexity tiers, TaskSignals for heuristic scoring, CostAwareRouter that routes to the cheapest sufficiently capable model using DEFAULT_PRICING, and RoutingDecision for structured output. A comprehensive test suite and changelog/status entries accompany it.

Changes

Cost-Aware Model Router (CST-009)

Layer / File(s) Summary
Complexity types, signals, and scoring
agentwatch/cost/complexity_router.py
Defines TaskComplexity enum (SIMPLE=1, STANDARD=2, COMPLEX=3), DEFAULT_CAPABILITY mapping, TaskSignals dataclass with computed input_tokens, and score_complexity() which classifies tasks from token count and requirement flags.
RoutingDecision and CostAwareRouter
agentwatch/cost/complexity_router.py
Defines RoutingDecision with to_dict() serialization. CostAwareRouter.route() computes per-model costs via estimate(), resolves complexity tier, validates override_model, falls back to most-capable model when no rated model exists, selects cheapest capable model, and downgrades to overall cheapest when budget_ceiling is exceeded. Exports all public symbols via __all__.
Comprehensive test suite
tests/test_cost_router.py
Adds test functions covering all routing branches (simple/standard/complex, explicit complexity override, override_model with unknown-model rejection, budget-ceiling downgrade), diagnostic validation (considered-list completeness and cost-sort order, to_dict() shape), complexity-scoring boundaries via parametrized TaskSignals, custom capability table behavior, empty pricing table validation, and negative token input validation.
Changelog and status tracking
CHANGELOG.md, MASTERLIST_STATUS.md
Records CST-009 feature addition in the Unreleased section and marks the feature complete in the Phase 7 — Cost Intelligence table with test suite reference.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant CostAwareRouter
    participant score_complexity
    participant estimate

    Caller->>CostAwareRouter: route(signals, override_model, budget_ceiling)
    alt override_model provided
        CostAwareRouter->>CostAwareRouter: validate override_model in pricing
        CostAwareRouter-->>Caller: RoutingDecision(reason="manual_override")
    else
        CostAwareRouter->>score_complexity: score_complexity(signals)
        score_complexity-->>CostAwareRouter: TaskComplexity tier
        loop each model in pricing
            CostAwareRouter->>estimate: estimate(model, input_tokens, output_tokens)
            estimate-->>CostAwareRouter: cost float
        end
        CostAwareRouter->>CostAwareRouter: filter by capability >= tier
        alt no capable model
            CostAwareRouter-->>Caller: RoutingDecision(most_capable, reason="fallback")
        else cheapest capable cost > budget_ceiling
            CostAwareRouter-->>Caller: RoutingDecision(cheapest_overall, reason="budget_ceiling_exceeded")
        else
            CostAwareRouter-->>Caller: RoutingDecision(cheapest_capable, reason="cost_optimized")
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

Medium

Poem

🐇 A router hops through models with care,
Counting tokens floating through the air.
Simple tasks take the cheap path down,
Complex thoughts earn the capable crown.
Budget ceiling? No worries here—
The cheapest fallback will appear!
route() away, the savings are near. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing a cost-aware model router that handles task complexity-based routing, which is the core focus of the PR.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #375: cost-aware routing with task complexity scoring, routing to cheaper models for simple tasks, reserving advanced models for complex operations, and budget optimization.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue #375: core router implementation, tests, documentation updates (CHANGELOG and MASTERLIST_STATUS), with no unrelated modifications.
Docstring Coverage ✅ Passed Docstring coverage is 89.47% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

🧪 PR Test Results

Check Result
Tests (pytest tests/) ❌ failure
Lint (ruff check .) ❌ failure
Coverage (agentwatch) 72.92%

Python 3.12 · commit 96c1d7d

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new cost-intelligence routing utility that classifies task complexity and selects the cheapest model that is rated capable for that complexity tier, reusing the existing CST-002 pricing/estimation logic.

Changes:

  • Introduces agentwatch.cost.complexity_router with complexity scoring, capability filtering, budget ceiling downgrade behavior, and manual override support.
  • Adds a dedicated test suite for the new router behaviors.
  • Updates project documentation (MASTERLIST_STATUS.md) and release notes (CHANGELOG.md) to reflect the new feature.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
agentwatch/cost/complexity_router.py Implements complexity tiers, heuristic scorer, capability table, and routing decision logic on top of CST-002 cost estimates.
tests/test_cost_router.py Adds unit tests covering routing tiers, overrides, budget ceiling downgrade, scoring boundaries, and custom capability maps.
MASTERLIST_STATUS.md Adds a status row documenting the new CST item for the router.
CHANGELOG.md Adds an Unreleased “Added” entry describing the new cost-aware router.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agentwatch/cost/complexity_router.py Outdated
Comment thread agentwatch/cost/complexity_router.py
Comment thread tests/test_cost_router.py Outdated
Comment thread MASTERLIST_STATUS.md Outdated
Comment thread agentwatch/cost/complexity_router.py Outdated
Comment thread agentwatch/cost/complexity_router.py Outdated
Comment thread agentwatch/cost/complexity_router.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_cost_router.py (1)

93-99: ⚡ Quick win

Add regression tests for empty capability-map semantics and negative token rejection.

These two edge cases are part of the router’s public contract and are currently unpinned by tests.

Possible test additions
 def test_custom_capability_table_is_respected():
@@
     d = router.route(complexity=TaskComplexity.COMPLEX, input_tokens=1000)
     assert d.model == "claude-haiku-4-5"
+
+
+def test_empty_capability_map_is_respected():
+    router = CostAwareRouter(capability={})
+    d = router.route(complexity=TaskComplexity.COMPLEX, input_tokens=1000)
+    assert d.reason == "no_model_rated_for_tier_using_most_capable"
+
+
+@pytest.mark.parametrize("input_tokens,output_tokens", [(-1, 1), (1, -1)])
+def test_negative_tokens_rejected(input_tokens, output_tokens):
+    with pytest.raises(ValueError, match="must be >= 0"):
+        CostAwareRouter().route(
+            complexity=TaskComplexity.SIMPLE,
+            input_tokens=input_tokens,
+            output_tokens=output_tokens,
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cost_router.py` around lines 93 - 99, Add two new regression test
functions to cover edge cases in the CostAwareRouter's public contract. First,
create a test that verifies the router's behavior when initialized with an empty
capability map dictionary, ensuring it falls back to default model selection
logic. Second, create a test that verifies the router properly rejects or
handles negative token counts passed to the route method, confirming it either
raises an appropriate exception or returns a valid response. Both tests should
use the CostAwareRouter class and its route method to validate these boundary
conditions are handled correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/cost/complexity_router.py`:
- Around line 119-120: The constructor for complexity_router.py uses the `or`
operator for default fallbacks on the _pricing and _capability attributes, which
treats empty dictionaries as falsy and silently replaces them with
DEFAULT_PRICING and DEFAULT_CAPABILITY. This breaks the API contract because
callers may explicitly pass empty dictionaries to indicate "use no overrides" or
"use fallback behavior". Replace the `or DEFAULT_PRICING` and `or
DEFAULT_CAPABILITY` fallbacks with explicit None checks using the pattern
`attribute if attribute is not None else DEFAULT_VALUE` so that empty
dictionaries passed by callers are preserved instead of being silently replaced
with defaults.
- Around line 141-145: Add validation to ensure that the token counts are
non-negative before calling the estimate() function. Check both tokens_in (which
is set from input_tokens or signals.input_tokens) and output_tokens to confirm
they are greater than or equal to zero. If either token count is negative,
handle the validation error appropriately (such as raising an exception or
returning an error response) to prevent negative cost calculations that could
lead to incorrect model selection in the totals list.

---

Nitpick comments:
In `@tests/test_cost_router.py`:
- Around line 93-99: Add two new regression test functions to cover edge cases
in the CostAwareRouter's public contract. First, create a test that verifies the
router's behavior when initialized with an empty capability map dictionary,
ensuring it falls back to default model selection logic. Second, create a test
that verifies the router properly rejects or handles negative token counts
passed to the route method, confirming it either raises an appropriate exception
or returns a valid response. Both tests should use the CostAwareRouter class and
its route method to validate these boundary conditions are handled correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e2675a1d-8c6a-45c1-825a-3223559b7d4a

📥 Commits

Reviewing files that changed from the base of the PR and between 19bbbeb and ffe5f69.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • MASTERLIST_STATUS.md
  • agentwatch/cost/complexity_router.py
  • tests/test_cost_router.py

Comment thread agentwatch/cost/complexity_router.py Outdated
Comment thread agentwatch/cost/complexity_router.py
- Renumber to CST-009 (CST-008 is already used by stale-session eviction in
  test_cost.py) across module/test docstrings and MASTERLIST.
- Use explicit `is None` checks for pricing/capability so a caller can pass an
  intentionally empty table, and reject an empty pricing table with a clear
  ValueError (estimate() would otherwise coerce it back to DEFAULT_PRICING).
- Type capability as dict[str, TaskComplexity] (annotation + param) to match
  the enum values actually stored/passed.
- Default unknown-model rank to TaskComplexity.SIMPLE in the no-capable-model
  fallback for consistency.
- Test: assert the considered list length against len(DEFAULT_PRICING) instead
  of a hardcoded 7, and add an empty-pricing guard test.
…vanth#375)

- route() rejects negative input/output token counts with a ValueError before
  computing costs (negative tokens would yield negative "cheapest" estimates).
- Tests: explicit empty-capability-map fallback and negative-token rejection.

The `or` → `is None` constructor fix CodeRabbit flagged was already applied in
869cd46.
@Prateeks16

Copy link
Copy Markdown
Contributor Author

@sreerevanth Please review this pull request

Resolve CHANGELOG conflict by keeping both Unreleased entries (cost-aware model
router and the already-merged scheduled red-team automation).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat] [ELUSOC] Implement Model Cost Comparator for Task Complexity Analysis

2 participants