Autonomous Code Review Agent

Three AI models debate your code. No human input between passes.

A multi-agent code review system where Groq, Mistral, and NVIDIA NIM work together in an autonomous pipeline — each model challenging and building on the previous one's findings — to produce a structured, verified code quality report.

Supports single files, multi-file projects, entire directories, and GitHub PRs. Comes with both a CLI and a dark-themed Gradio web UI with live streaming output.

Demo

>> Scanner  (Groq / Llama 3.3 70B)     →  Found 11 raw issues
>> Critic   (Mistral Large)             →  Confirmed 9, rejected 2 false positives, added 3 missed
>> Arbitrator (NVIDIA / Llama 3.1 70B) →  Final report: Score 42/100

How It Works

Unlike single-model reviews, this system runs a 3-agent debate. Each agent has a different role, receives the previous agent's output as context, and is allowed to challenge, reject, or extend earlier findings before the final report is written.

flowchart TD
    INPUT([Code Input\nfile · files · directory · GitHub PR · paste])
    INPUT --> SCANNER

    subgraph AGENT1 ["🔍  Agent 1 — Scanner  (Groq · Llama 3.3 70B)"]
        SCANNER[Reads raw code\nFinds ALL potential issues\nCasts wide net — over-reporting is OK]
    end

    SCANNER -->|"JSON array of raw issues"| AGENT2

    subgraph AGENT2 ["🧐  Agent 2 — Critic  (Mistral Large)"]
        CRITIC[Receives code + Scanner findings\nConfirms real issues\nRejects false positives with reasoning\nAdds issues Scanner missed]
    end

    AGENT2 -->|"Verified + enriched issue list"| AGENT3

    subgraph AGENT3 ["⚖️  Agent 3 — Arbitrator  (NVIDIA NIM · Llama 3.1 70B)"]
        ARBITRATOR[Receives full debate transcript\nResolves conflicts between agents\nWrites concrete fix suggestions\nAssigns quality score 0–100\nProduces executive summary]
    end

    ARBITRATOR --> REPORT([Final Report\nMarkdown · JSON · Web UI · Terminal])

Agent Pipeline

Each agent has a fixed role and a fixed provider:

Agent	Provider	Model	Responsibility
Scanner	Groq	`llama-3.3-70b-versatile`	Exhaustive, fast issue discovery — finds everything including edge cases
Critic	Mistral	`mistral-large-latest`	Challenges Scanner findings, eliminates false positives, adds missed issues
Arbitrator	NVIDIA NIM	`meta/llama-3.1-70b-instruct`	Reads the full debate, resolves disagreements, writes the final authoritative report

Why three models?

flowchart LR
    A["Single model\nreviews code"] -->|"One perspective\nNo verification\nHigh false-positive rate"| B["Report"]

    C["Scanner\nfinds issues"] --> D["Critic\nchallenges them"] --> E["Arbitrator\nfinal verdict"] --> F["Verified Report"]

    style A fill:#ef4444,color:#fff
    style B fill:#ef4444,color:#fff
    style C fill:#6366f1,color:#fff
    style D fill:#f97316,color:#fff
    style E fill:#22c55e,color:#fff
    style F fill:#22c55e,color:#fff

The Critic's job is to be skeptical — it actively tries to disprove the Scanner's findings. Only issues that survive both the Scanner and the Critic make it into the final report. The Arbitrator then resolves any remaining disagreements with a senior engineer's judgment.

System Architecture

flowchart TD
    subgraph INPUT_LAYER ["Input Layer  (inputs.py)"]
        F1[Single File]
        F2[Multiple Files]
        F3[Project Directory]
        F4[GitHub PR]
        F5[Pasted Code]
        COMBINE[combine_files\nBuilds multi-file context\nwith ### File: headers]
        F1 & F2 & F3 & F4 & F5 --> COMBINE
    end

    subgraph PROVIDER_LAYER ["Provider Layer  (providers.py)"]
        GROQ_CLIENT[Groq Client\nOpenAI SDK → api.groq.com]
        MISTRAL_CLIENT[Mistral Client\nOpenAI SDK → api.mistral.ai]
        NVIDIA_CLIENT[NVIDIA Client\nOpenAI SDK → integrate.api.nvidia.com]
        RETRY[Tenacity retry\n3 attempts · exponential backoff]
        GROQ_CLIENT & MISTRAL_CLIENT & NVIDIA_CLIENT --> RETRY
    end

    subgraph REVIEW_LAYER ["Review Layer  (multi_reviewer.py)"]
        PASS1[Pass 1 — Scanner\nSCANNER_PROMPT]
        PASS2[Pass 2 — Critic\nCRITIC_PROMPT\nincludes Pass 1 output]
        PASS3[Pass 3 — Arbitrator\nARBITRATOR_PROMPT\nincludes Pass 1 + Pass 2 output]
        PASS1 -->|raw issues JSON| PASS2
        PASS2 -->|verified issues JSON| PASS3
    end

    subgraph PARSE_LAYER ["Parse Layer  (parsers.py)"]
        P1[parse_issues\nJSON array → list of Issue]
        P2[parse_verified_issues\nJSON array → confirmed + rejected]
        P3[parse_report\nJSON object → ReviewReport\nhandles array fallback]
    end

    subgraph OUTPUT_LAYER ["Output Layer"]
        CLI[CLI — Rich tables\nagent.py]
        WEB[Web UI — Gradio\napp.py\nLive streaming]
        MD[Markdown Report\nreport.py]
        JSON_OUT[JSON Report\nreport.py]
    end

    INPUT_LAYER --> REVIEW_LAYER
    REVIEW_LAYER <--> PROVIDER_LAYER
    REVIEW_LAYER --> PARSE_LAYER
    PARSE_LAYER --> OUTPUT_LAYER

Multi-File Flow

When reviewing connected files, all files are combined into one structured context and sent through the same 3-agent pipeline. The Scanner is explicitly instructed to look for cross-file issues.

flowchart TD
    subgraph FILES ["Project Files"]
        FA[models.py]
        FB[api.py]
        FC[utils.py]
        FD[auth.py]
    end

    FILES --> COMBINER

    subgraph COMBINER ["combine_files  (inputs.py)"]
        HEADER["## Multi-File Project Review\n[1] models.py  Python\n[2] api.py     Python\n..."]
        BLOCKS["### File: models.py\n```python\n...\n```\n---\n### File: api.py\n```python\n...\n```"]
        HEADER --> BLOCKS
    end

    COMBINER --> SCANNER_MULTI

    subgraph SCANNER_MULTI ["Scanner — Cross-File Awareness"]
        XF1[Broken imports between files]
        XF2[API contract mismatches]
        XF3[Type inconsistencies across modules]
        XF4[Circular dependency risks]
        XF5[Dead code — exported but never imported]
        XF6[Shared mutable state across files]
    end

    SCANNER_MULTI --> CRITIC_MULTI[Critic verifies\ncross-file findings]
    CRITIC_MULTI --> ARBITRATOR_MULTI[Arbitrator writes\nfinal multi-file report]

Auto-excluded from directory scans: __pycache__, node_modules, .git, .venv, dist, build, target, .next, and all binary/lock files.

Input Modes

Mode	CLI	Web UI
Paste code	`--code "def foo(): ..."`	Paste Code tab
Single file	`--file app.py`	Upload Files → pick 1 file
Multiple files	`--files models.py api.py utils.py`	Upload Files → Ctrl+click multiple
Entire directory	`--dir ./src`	Upload Files → enter folder path
GitHub PR	`--github https://github.com/owner/repo/pull/42`	GitHub PR tab

Installation

Prerequisites

Python 3.11+
API keys for Groq, Mistral, and NVIDIA NIM

Steps

# 1. Clone the repo
git clone https://github.com/Zem-0/Autonomous-Code-Review-Agent.git
cd Autonomous-Code-Review-Agent

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure API keys
cp .env.example .env
# Edit .env and add your three keys

Configuration

Copy .env.example to .env and fill in your keys:

GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=...
NVIDIA_API_KEY=nvapi-...

# Optional — only needed for GitHub PR reviews
GITHUB_TOKEN=ghp_...

You can also pass keys directly via CLI flags or enter them in the web UI — the .env file is a convenience default.

Usage — CLI

# Review a single file
python agent.py --file mycode.py

# Review multiple connected files together
python agent.py --files models.py api.py utils.py auth.py

# Review an entire project directory
python agent.py --dir ./src --output report.md

# Review a GitHub PR
python agent.py --github https://github.com/owner/repo/pull/42

# Paste a snippet inline
python agent.py --code "def divide(a, b): return a/b"

# Save both markdown and JSON
python agent.py --file app.py --output report.md --output-json report.json

# Preview what would happen without calling the API
python agent.py --dir ./src --dry-run

# Get raw JSON output (useful for piping)
python agent.py --file app.py --json | jq '.issues[] | select(.severity == "CRITICAL")'

CLI flags

Flag	Description
`--file PATH`	Review a single local file
`--files PATH [PATH ...]`	Review multiple connected files
`--dir PATH`	Review an entire project directory
`--code STRING`	Review an inline snippet
`--github URL`	Review a GitHub PR
`--language LANG`	Override auto-detected language
`--output PATH`	Save Markdown report
`--output-json PATH`	Save JSON report
`--json`	Print JSON to stdout
`--dry-run`	Preview without calling APIs
`--groq-key KEY`	Override `GROQ_API_KEY`
`--mistral-key KEY`	Override `MISTRAL_API_KEY`
`--nvidia-key KEY`	Override `NVIDIA_API_KEY`

Usage — Web UI

python app.py
# Opens at http://localhost:7860

flowchart LR
    subgraph LEFT ["Left Panel — Input"]
        TAB1[Paste Code tab]
        TAB2[Upload Files tab\nMulti-select · Folder path]
        TAB3[GitHub PR tab]
        LANG[Language dropdown]
        KEYS[API Key fields\nGroq · Mistral · NVIDIA]
        BTN[Start Review button]
    end

    subgraph RIGHT ["Right Panel — Live Results"]
        PROG[Agent Progress\nScanner · Critic · Arbitrator\nwith live state badges]
        STREAM[Debate Transcript\nRaw streaming output\nfrom each agent]
        SCORE[Quality Score\n0–100 with colour badge]
        STATS[Issue counts\nCritical · High · Medium · Low · Info]
        TABLE[Issues Table\nSeverity · Category · Line · Description · Fix]
        DL[Download buttons\nMarkdown report · JSON report]
    end

    BTN --> PROG --> STREAM --> SCORE --> STATS --> TABLE --> DL

The right panel updates in real time as each agent streams its output — you can watch the Critic disagree with the Scanner live.

File Structure

Autonomous-Code-Review-Agent/
├── agent.py            # CLI entry point — Rich terminal UI
├── app.py              # Gradio web UI entry point
│
├── multi_reviewer.py   # 3-agent agentic loop (streaming + blocking)
├── reviewer.py         # Legacy single-provider loop
│
├── providers.py        # Unified API client for Groq, Mistral, NVIDIA NIM
├── agents.py           # Agent role definitions (persona, provider, prompts)
├── grok_client.py      # Original xAI/Grok client (backward compat)
│
├── inputs.py           # All input modes: file, files, dir, GitHub, paste
├── parsers.py          # Parse LLM JSON responses → Pydantic models
├── models.py           # Pydantic models: Issue, ReviewReport, ReviewSession
├── prompts.py          # Prompt templates for scanner / critic / arbitrator
├── report.py           # Markdown and JSON report generation
│
├── requirements.txt
├── .env.example        # API key template
└── .gitignore          # .env is blocked from commits

Report Output

Every review produces:

Quality Score

A 0–100 score assigned by the Arbitrator based on the severity and breadth of confirmed issues:

Score	Label	Meaning
85–100	Excellent	Production-ready, minor improvements only
70–84	Good	A few issues worth fixing
40–69	Fair	Meaningful bugs or security gaps present
0–39	Needs Work	Critical or high issues that must be fixed

Issue Severity Levels

Severity	Description
CRITICAL	Must fix before shipping — security vulnerabilities, data loss risks
HIGH	Significant bugs or security weaknesses
MEDIUM	Logic errors, connection leaks, performance problems
LOW	Style, naming, missing validation
INFO	Observations and improvement suggestions

Sample JSON output

{
  "overall_score": 42,
  "summary": "The code contains a SQL injection vulnerability and uses MD5 for password hashing, both of which are critical security issues that must be fixed before deployment.",
  "total_issues": 9,
  "critical_count": 2,
  "high_count": 3,
  "issues": [
    {
      "id": "a3f1b2c4",
      "severity": "CRITICAL",
      "category": "security",
      "line_number": 7,
      "description": "SQL injection via string concatenation in login()",
      "suggestion": "Use parameterized queries:\ncursor.execute('SELECT * FROM users WHERE username=? AND password=?', (username, password))",
      "confidence": "high",
      "confirmed": true
    }
  ]
}

Dependencies

Package	Purpose
`openai`	API client for all three providers (all use OpenAI-compatible endpoints)
`gradio`	Web UI
`pydantic`	Data models and validation
`tenacity`	Retry logic with exponential backoff
`rich`	Terminal tables and progress display
`python-dotenv`	`.env` file loading
`PyGithub`	GitHub PR diff fetching

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomous Code Review Agent

Demo

Table of Contents

How It Works

Agent Pipeline

Why three models?

System Architecture

Multi-File Flow

Input Modes

Installation

Prerequisites

Steps

Configuration

Usage — CLI

CLI flags

Usage — Web UI

File Structure

Report Output

Quality Score

Issue Severity Levels

Sample JSON output

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
agents.py		agents.py
app.py		app.py
grok_client.py		grok_client.py
inputs.py		inputs.py
models.py		models.py
multi_reviewer.py		multi_reviewer.py
parsers.py		parsers.py
prompts.py		prompts.py
providers.py		providers.py
report.py		report.py
requirements.txt		requirements.txt
reviewer.py		reviewer.py

Folders and files

Latest commit

History

Repository files navigation

Autonomous Code Review Agent

Demo

Table of Contents

How It Works

Agent Pipeline

Why three models?

System Architecture

Multi-File Flow

Input Modes

Installation

Prerequisites

Steps

Configuration

Usage — CLI

CLI flags

Usage — Web UI

File Structure

Report Output

Quality Score

Issue Severity Levels

Sample JSON output

Dependencies

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages