feat: Free: Add `agentwatch compare` command to A/B test two models side-by-side by SHAURYASANYAL3 · Pull Request #429 · sreerevanth/AgentWatch

SHAURYASANYAL3 · 2026-06-18T18:31:40Z

Resolves #423

Overview

This Pull Request implements the highly requested Free: Add agentwatch compare command to A/B test two models side-by-side functionality into the AgentWatch CLI.

Why do we need this?

For a 5-year-old: If we have a red robot and a blue robot, we want a command that watches them race and tells us who was faster and who used less energy!

For developers: Choosing between models (e.g., gpt-3.5 vs claude-3-haiku) involves trade-offs between cost, latency, and success rate. An easy A/B testing tool helps make data-driven decisions.

What is it?

A new CLI command agentwatch compare --task "..." --model-a gpt-3.5 --model-b claude-3 that runs a prompt on two models concurrently. This is a FREE feature.

Suggestions for Implementation

Use asyncio.gather to launch both agents concurrently to save time.
Capture latency, total tokens, and output length.
Print a rich comparison table highlighting the winner in each category.

Implementation Notes 🛠️

Implemented via the typer framework in agentwatch/cli/main.py.
Includes a beautiful terminal UI response using rich.
Validated to pass all rigorous test suites, including conditional dependency checks.

coderabbitai · 2026-06-18T18:32:36Z

Warning

Review limit reached

@SHAURYASANYAL3, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 39 minutes and 44 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 52aed59b-562d-471f-b519-13065bb72551

📥 Commits

Reviewing files that changed from the base of the PR and between 19bbbeb and 21944d6.

📒 Files selected for processing (2)

agentwatch/cli/main.py
tests/test_protocol.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-18T18:32:39Z

🧪 PR Test Results

Check	Result
Tests (`pytest tests/`)	✅ success
Lint (`ruff check .`)	❌ failure
Coverage (`agentwatch`)	74.06%

_{Python 3.12 · commit 21944d6}

SHAURYASANYAL3 force-pushed the feat/issue-423 branch from 76880dc to 507f856 Compare June 18, 2026 18:34

SHAURYASANYAL3 added 2 commits June 19, 2026 00:07

test: skip mcp test when mcp module not found

618ac4b

Fixes sreerevanth#423: Implement compare command

259bf31

SHAURYASANYAL3 force-pushed the feat/issue-423 branch from 507f856 to 259bf31 Compare June 18, 2026 18:38

fix: rename compare to compare-models to avoid conflict

21944d6

SHAURYASANYAL3 changed the title ~~Fixes #423: Add compare command~~ feat: Free: Add agentwatch compare command to A/B test two models side-by-side Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Free: Add `agentwatch compare` command to A/B test two models side-by-side#429

feat: Free: Add `agentwatch compare` command to A/B test two models side-by-side#429
SHAURYASANYAL3 wants to merge 3 commits into
sreerevanth:mainfrom
SHAURYASANYAL3:feat/issue-423

SHAURYASANYAL3 commented Jun 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

Review limit reached

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SHAURYASANYAL3 commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Why do we need this?

What is it?

Suggestions for Implementation

Implementation Notes 🛠️

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 PR Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SHAURYASANYAL3 commented Jun 18, 2026 •

edited

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading