Three AI models debate your code. No human input between passes.
A multi-agent code review system where Groq, Mistral, and NVIDIA NIM work together in an autonomous pipeline — each model challenging and building on the previous one's findings — to produce a structured, verified code quality report.
Supports single files, multi-file projects, entire directories, and GitHub PRs. Comes with both a CLI and a dark-themed Gradio web UI with live streaming output.
>> Scanner (Groq / Llama 3.3 70B) → Found 11 raw issues
>> Critic (Mistral Large) → Confirmed 9, rejected 2 false positives, added 3 missed
>> Arbitrator (NVIDIA / Llama 3.1 70B) → Final report: Score 42/100
- How It Works
- Agent Pipeline
- System Architecture
- Multi-File Flow
- Input Modes
- Installation
- Configuration
- Usage — CLI
- Usage — Web UI
- File Structure
- Report Output
Unlike single-model reviews, this system runs a 3-agent debate. Each agent has a different role, receives the previous agent's output as context, and is allowed to challenge, reject, or extend earlier findings before the final report is written.
flowchart TD
INPUT([Code Input\nfile · files · directory · GitHub PR · paste])
INPUT --> SCANNER
subgraph AGENT1 ["🔍 Agent 1 — Scanner (Groq · Llama 3.3 70B)"]
SCANNER[Reads raw code\nFinds ALL potential issues\nCasts wide net — over-reporting is OK]
end
SCANNER -->|"JSON array of raw issues"| AGENT2
subgraph AGENT2 ["🧐 Agent 2 — Critic (Mistral Large)"]
CRITIC[Receives code + Scanner findings\nConfirms real issues\nRejects false positives with reasoning\nAdds issues Scanner missed]
end
AGENT2 -->|"Verified + enriched issue list"| AGENT3
subgraph AGENT3 ["⚖️ Agent 3 — Arbitrator (NVIDIA NIM · Llama 3.1 70B)"]
ARBITRATOR[Receives full debate transcript\nResolves conflicts between agents\nWrites concrete fix suggestions\nAssigns quality score 0–100\nProduces executive summary]
end
ARBITRATOR --> REPORT([Final Report\nMarkdown · JSON · Web UI · Terminal])
Each agent has a fixed role and a fixed provider:
| Agent | Provider | Model | Responsibility |
|---|---|---|---|
| Scanner | Groq | llama-3.3-70b-versatile |
Exhaustive, fast issue discovery — finds everything including edge cases |
| Critic | Mistral | mistral-large-latest |
Challenges Scanner findings, eliminates false positives, adds missed issues |
| Arbitrator | NVIDIA NIM | meta/llama-3.1-70b-instruct |
Reads the full debate, resolves disagreements, writes the final authoritative report |
flowchart LR
A["Single model\nreviews code"] -->|"One perspective\nNo verification\nHigh false-positive rate"| B["Report"]
C["Scanner\nfinds issues"] --> D["Critic\nchallenges them"] --> E["Arbitrator\nfinal verdict"] --> F["Verified Report"]
style A fill:#ef4444,color:#fff
style B fill:#ef4444,color:#fff
style C fill:#6366f1,color:#fff
style D fill:#f97316,color:#fff
style E fill:#22c55e,color:#fff
style F fill:#22c55e,color:#fff
The Critic's job is to be skeptical — it actively tries to disprove the Scanner's findings. Only issues that survive both the Scanner and the Critic make it into the final report. The Arbitrator then resolves any remaining disagreements with a senior engineer's judgment.
flowchart TD
subgraph INPUT_LAYER ["Input Layer (inputs.py)"]
F1[Single File]
F2[Multiple Files]
F3[Project Directory]
F4[GitHub PR]
F5[Pasted Code]
COMBINE[combine_files\nBuilds multi-file context\nwith ### File: headers]
F1 & F2 & F3 & F4 & F5 --> COMBINE
end
subgraph PROVIDER_LAYER ["Provider Layer (providers.py)"]
GROQ_CLIENT[Groq Client\nOpenAI SDK → api.groq.com]
MISTRAL_CLIENT[Mistral Client\nOpenAI SDK → api.mistral.ai]
NVIDIA_CLIENT[NVIDIA Client\nOpenAI SDK → integrate.api.nvidia.com]
RETRY[Tenacity retry\n3 attempts · exponential backoff]
GROQ_CLIENT & MISTRAL_CLIENT & NVIDIA_CLIENT --> RETRY
end
subgraph REVIEW_LAYER ["Review Layer (multi_reviewer.py)"]
PASS1[Pass 1 — Scanner\nSCANNER_PROMPT]
PASS2[Pass 2 — Critic\nCRITIC_PROMPT\nincludes Pass 1 output]
PASS3[Pass 3 — Arbitrator\nARBITRATOR_PROMPT\nincludes Pass 1 + Pass 2 output]
PASS1 -->|raw issues JSON| PASS2
PASS2 -->|verified issues JSON| PASS3
end
subgraph PARSE_LAYER ["Parse Layer (parsers.py)"]
P1[parse_issues\nJSON array → list of Issue]
P2[parse_verified_issues\nJSON array → confirmed + rejected]
P3[parse_report\nJSON object → ReviewReport\nhandles array fallback]
end
subgraph OUTPUT_LAYER ["Output Layer"]
CLI[CLI — Rich tables\nagent.py]
WEB[Web UI — Gradio\napp.py\nLive streaming]
MD[Markdown Report\nreport.py]
JSON_OUT[JSON Report\nreport.py]
end
INPUT_LAYER --> REVIEW_LAYER
REVIEW_LAYER <--> PROVIDER_LAYER
REVIEW_LAYER --> PARSE_LAYER
PARSE_LAYER --> OUTPUT_LAYER
When reviewing connected files, all files are combined into one structured context and sent through the same 3-agent pipeline. The Scanner is explicitly instructed to look for cross-file issues.
flowchart TD
subgraph FILES ["Project Files"]
FA[models.py]
FB[api.py]
FC[utils.py]
FD[auth.py]
end
FILES --> COMBINER
subgraph COMBINER ["combine_files (inputs.py)"]
HEADER["## Multi-File Project Review\n[1] models.py Python\n[2] api.py Python\n..."]
BLOCKS["### File: models.py\n```python\n...\n```\n---\n### File: api.py\n```python\n...\n```"]
HEADER --> BLOCKS
end
COMBINER --> SCANNER_MULTI
subgraph SCANNER_MULTI ["Scanner — Cross-File Awareness"]
XF1[Broken imports between files]
XF2[API contract mismatches]
XF3[Type inconsistencies across modules]
XF4[Circular dependency risks]
XF5[Dead code — exported but never imported]
XF6[Shared mutable state across files]
end
SCANNER_MULTI --> CRITIC_MULTI[Critic verifies\ncross-file findings]
CRITIC_MULTI --> ARBITRATOR_MULTI[Arbitrator writes\nfinal multi-file report]
Auto-excluded from directory scans: __pycache__, node_modules, .git, .venv, dist, build, target, .next, and all binary/lock files.
| Mode | CLI | Web UI |
|---|---|---|
| Paste code | --code "def foo(): ..." |
Paste Code tab |
| Single file | --file app.py |
Upload Files → pick 1 file |
| Multiple files | --files models.py api.py utils.py |
Upload Files → Ctrl+click multiple |
| Entire directory | --dir ./src |
Upload Files → enter folder path |
| GitHub PR | --github https://github.com/owner/repo/pull/42 |
GitHub PR tab |
- Python 3.11+
- API keys for Groq, Mistral, and NVIDIA NIM
# 1. Clone the repo
git clone https://github.com/Zem-0/Autonomous-Code-Review-Agent.git
cd Autonomous-Code-Review-Agent
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure API keys
cp .env.example .env
# Edit .env and add your three keysCopy .env.example to .env and fill in your keys:
GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=...
NVIDIA_API_KEY=nvapi-...
# Optional — only needed for GitHub PR reviews
GITHUB_TOKEN=ghp_...You can also pass keys directly via CLI flags or enter them in the web UI — the .env file is a convenience default.
# Review a single file
python agent.py --file mycode.py
# Review multiple connected files together
python agent.py --files models.py api.py utils.py auth.py
# Review an entire project directory
python agent.py --dir ./src --output report.md
# Review a GitHub PR
python agent.py --github https://github.com/owner/repo/pull/42
# Paste a snippet inline
python agent.py --code "def divide(a, b): return a/b"
# Save both markdown and JSON
python agent.py --file app.py --output report.md --output-json report.json
# Preview what would happen without calling the API
python agent.py --dir ./src --dry-run
# Get raw JSON output (useful for piping)
python agent.py --file app.py --json | jq '.issues[] | select(.severity == "CRITICAL")'| Flag | Description |
|---|---|
--file PATH |
Review a single local file |
--files PATH [PATH ...] |
Review multiple connected files |
--dir PATH |
Review an entire project directory |
--code STRING |
Review an inline snippet |
--github URL |
Review a GitHub PR |
--language LANG |
Override auto-detected language |
--output PATH |
Save Markdown report |
--output-json PATH |
Save JSON report |
--json |
Print JSON to stdout |
--dry-run |
Preview without calling APIs |
--groq-key KEY |
Override GROQ_API_KEY |
--mistral-key KEY |
Override MISTRAL_API_KEY |
--nvidia-key KEY |
Override NVIDIA_API_KEY |
python app.py
# Opens at http://localhost:7860flowchart LR
subgraph LEFT ["Left Panel — Input"]
TAB1[Paste Code tab]
TAB2[Upload Files tab\nMulti-select · Folder path]
TAB3[GitHub PR tab]
LANG[Language dropdown]
KEYS[API Key fields\nGroq · Mistral · NVIDIA]
BTN[Start Review button]
end
subgraph RIGHT ["Right Panel — Live Results"]
PROG[Agent Progress\nScanner · Critic · Arbitrator\nwith live state badges]
STREAM[Debate Transcript\nRaw streaming output\nfrom each agent]
SCORE[Quality Score\n0–100 with colour badge]
STATS[Issue counts\nCritical · High · Medium · Low · Info]
TABLE[Issues Table\nSeverity · Category · Line · Description · Fix]
DL[Download buttons\nMarkdown report · JSON report]
end
BTN --> PROG --> STREAM --> SCORE --> STATS --> TABLE --> DL
The right panel updates in real time as each agent streams its output — you can watch the Critic disagree with the Scanner live.
Autonomous-Code-Review-Agent/
├── agent.py # CLI entry point — Rich terminal UI
├── app.py # Gradio web UI entry point
│
├── multi_reviewer.py # 3-agent agentic loop (streaming + blocking)
├── reviewer.py # Legacy single-provider loop
│
├── providers.py # Unified API client for Groq, Mistral, NVIDIA NIM
├── agents.py # Agent role definitions (persona, provider, prompts)
├── grok_client.py # Original xAI/Grok client (backward compat)
│
├── inputs.py # All input modes: file, files, dir, GitHub, paste
├── parsers.py # Parse LLM JSON responses → Pydantic models
├── models.py # Pydantic models: Issue, ReviewReport, ReviewSession
├── prompts.py # Prompt templates for scanner / critic / arbitrator
├── report.py # Markdown and JSON report generation
│
├── requirements.txt
├── .env.example # API key template
└── .gitignore # .env is blocked from commits
Every review produces:
A 0–100 score assigned by the Arbitrator based on the severity and breadth of confirmed issues:
| Score | Label | Meaning |
|---|---|---|
| 85–100 | Excellent | Production-ready, minor improvements only |
| 70–84 | Good | A few issues worth fixing |
| 40–69 | Fair | Meaningful bugs or security gaps present |
| 0–39 | Needs Work | Critical or high issues that must be fixed |
| Severity | Description |
|---|---|
| CRITICAL | Must fix before shipping — security vulnerabilities, data loss risks |
| HIGH | Significant bugs or security weaknesses |
| MEDIUM | Logic errors, connection leaks, performance problems |
| LOW | Style, naming, missing validation |
| INFO | Observations and improvement suggestions |
{
"overall_score": 42,
"summary": "The code contains a SQL injection vulnerability and uses MD5 for password hashing, both of which are critical security issues that must be fixed before deployment.",
"total_issues": 9,
"critical_count": 2,
"high_count": 3,
"issues": [
{
"id": "a3f1b2c4",
"severity": "CRITICAL",
"category": "security",
"line_number": 7,
"description": "SQL injection via string concatenation in login()",
"suggestion": "Use parameterized queries:\ncursor.execute('SELECT * FROM users WHERE username=? AND password=?', (username, password))",
"confidence": "high",
"confirmed": true
}
]
}| Package | Purpose |
|---|---|
openai |
API client for all three providers (all use OpenAI-compatible endpoints) |
gradio |
Web UI |
pydantic |
Data models and validation |
tenacity |
Retry logic with exponential backoff |
rich |
Terminal tables and progress display |
python-dotenv |
.env file loading |
PyGithub |
GitHub PR diff fetching |
MIT