Elevate your pull requests with an autonomous squad of expert AI engineers.
Features • Architecture • The Agents • Quick Start • Evaluation
Code reviews are critical but time-consuming. AutoReviewer goes beyond simple "wrapper" LLM scripts by introducing a highly asynchronous, multi-agent orchestration layer. It evaluates Git diffs and full repositories through the lens of specialized AI agents—filtering out noise, checking for security vulnerabilities, and ensuring architectural alignment—all before a human reviewer even looks at the code.
Constructed with enterprise-grade engineering practices, this system leverages Celery & Redis for scalable task queuing, FastAPI for high-throughput webhook ingestion, LangGraph for multi-agent workflows, and MLflow for rigorous LLM evaluation.
- 🤖 Multi-Agent Orchestration (LangGraph): A routed workflow of specialized AI agents (Planner, Bug Detector, Security, Style).
- ⚡ Completely Asynchronous: Built on FastAPI, Redis, and Celery. Fire-and-forget API drops heavy AI tasks into background queues seamlessly.
- 🔍 AST-Powered Context Engine: Intelligently extracts localized AST (Abstract Syntax Tree) contexts from raw diffs to prevent LLM hallucination and manage token limits.
- 🛡️ Enterprise Observability: Fully integrated with LangSmith for real-time execution tracing and debugging.
- 📈 Automated Prompt Evaluation: Uses MLflow LLM-as-a-judge to mathematically score new prompt versions against a "golden dataset" of historical pull requests.
- 📦 Universal Ingestion: Supports bare code
.zipuploads, diff extraction, and full-repository fallback analysis. - 🐳 Fully Dockerized: Spin up the API, Worker, and Redis broker with a single
docker-compose up.
The architecture separates the high-throughput web layer from the blocking nature of LLM inferences, ensuring the system can process hundreds of pull requests concurrently without dropping hooks.
flowchart LR
classDef primary fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff
classDef secondary fill:#059669,stroke:#047857,stroke-width:2px,color:#fff
classDef agent fill:#7c3aed,stroke:#5b21b6,stroke-width:2px,color:#fff
classDef db fill:#dc2626,stroke:#b91c1c,stroke-width:2px,color:#fff
Client((Client / CI))
subgraph Infrastructure [Async Event Infrastructure]
API[FastAPI Gateway]:::primary
Redis[(Redis Broker)]:::db
Worker[Celery Task Worker]:::primary
end
subgraph Engine [LangGraph Agentic Engine]
AST[AST Diff Parser]:::secondary
Planner{Planner Strategy}:::agent
Bug[Bug Detector]:::agent
Sec[Security Scanner]:::agent
Style[Style Checker]:::agent
Synth{Synthesizer}:::agent
end
Client -->|1. POST .zip| API
API -.->|2. Enqueue| Redis
Redis -.->|3. Consume| Worker
Worker ==>|4. Execute Graph| AST
AST --> Planner
Planner -->|async dispatch| Bug
Planner -->|async dispatch| Sec
Planner -->|async dispatch| Style
Bug --> Synth
Sec --> Synth
Style --> Synth
Synth ==>|5. Final Markdown| Worker
Worker -.->|6. Store Result| Redis
Client -.->|7. GET /results/id| API
The system utilizes a directed acyclic graph (DAG) to coordinate responsibilities rather than relying on a single, easily confused LLM prompt:
- The Planner: 🧭 Reads the initial diff and maps out a strategic review plan. Only activates the necessary downstream agents to save tokens and time.
- Bug Detector: 🐛 Deep-dives into logic to find edge cases, off-by-one errors, and runtime exceptions.
- Security Scanner: 🔐 Acts as an AppSec engineer, aggressively hunting for OWASP top 10 vulnerabilities and hardcoded secrets.
- Style Enforcer: 👔 Checks PEP8 (or language-specific) guidelines, variable naming conventions, and linting rules.
- The Synthesizer (Lead Dev): 🎯 Consolidates all agent reports, filters out contradictory hallucinations, and formats a beautiful, actionable Markdown report for the final output.
- Docker & Docker Compose
- OpenAI / Anthropic API Key (configurable via environment variables)
-
Clone the repository
git clone https://github.com/yourusername/autoreviewer.git cd autoreviewer -
Configure Environment Variables
cp .env.example .env # Add your LLM keys (e.g., OPENAI_API_KEY) and LangSmith API keys to .env -
Spin up the stack
docker-compose up --build -d
This starts the FastAPI server on port
8000, the Redis broker, and the Celery background workers.
For true enterprise scale, deploy AutoReviewer on Kubernetes. Manifests configure scalable API instances and infinite Celery workers.
-
Configure Secrets
cp k8s/secret.yaml.example k8s/secret.yaml # Edit secret.yaml with your OPENAI_API_KEY kubectl apply -f k8s/secret.yaml -
Apply Manifests
kubectl apply -f k8s/manifests.yaml
This deploys Redis, the FastAPI LoadBalancer, and the Celery Worker deployment.
1. Submit a repository for review:
curl -X POST "http://localhost:8000/review" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/your/repo.zip"Response:
{
"task_id": "b1a2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "Task submitted successfully."
}2. Poll for the final AI synthesis:
curl -X GET "http://localhost:8000/results/b1a2c3d4-e5f6-7890-abcd-ef1234567890"We treat our AI prompts as actual software. This repository features an aggressive CI/CD pipeline and mathematical LLM evaluations.
- Test Coverage: Maintained at >85% with
pytestandpytest-mock, heavily utilizing Dependency Injection to mock LLM calls. - CI/CD: GitHub Actions automatically tests every PR.
- Prompt Evals: Run
python eval/mlflow_evaluate.pyto trigger local MLflow deployments. It uses LLM-as-a-judge against a golden dataset of past pull requests to ensure our Bug Detector and Security Scanners aren't dropping in accuracy when we tweak the system prompt.
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request