| title | Code Review Env |
|---|---|
| emoji | π |
| colorFrom | blue |
| colorTo | purple |
| sdk | docker |
| pinned | false |
An OpenEnv-compliant Reinforcement Learning environment where AI agents learn to perform Python code review.
Built for the Scaler x Hugging Face OpenEnv Hackathon.
HuggingFace Space β’ GitHub β’ Live API Docs
| Name | Role |
|---|---|
| Shiva Kewat | Environment Architecture & API |
| Aayush Pandey | Graders & Reward Design |
| Roshan Kumar Singh | Tasks & Inference Pipeline |
An AI agent is given real Python source files and must act as a code reviewer β identifying bugs, lint violations, and security vulnerabilities. For every action the agent takes, it receives a structured reward signal between 0.0 and 1.0.
The environment is fully OpenEnv-compliant with a REST API, deterministic graders, and three progressively harder tasks.
Agent receives Python code
β
Agent identifies issues (line, type, severity, fix)
β
Grader evaluates: location accuracy + explanation + fix validity
β
Reward score returned (0.0 β 1.0)
| Resource | URL |
|---|---|
| HuggingFace Space | https://huggingface.co/spaces/shiva0999/code-review-env |
| GitHub Repo | https://github.com/shivakewat1/Code-Review-env |
| Live API | https://shiva0999-code-review-env.hf.space |
| Swagger UI | https://shiva0999-code-review-env.hf.space/docs |
- Input: Python file with unused imports, bad naming (camelCase, non-CapWords), missing docstrings, spacing violations
- Agent must: Return list of issues with line numbers, types, and fix suggestions
- Grader: Runs
pylint, computes precision/recall F1 against real pylint output - Baseline score:
0.72
- Input: Python file with 4 subtle logical bugs β off-by-one errors, wrong operators (
=+vs+=), float division used as index, index out of bounds - Agent must: Identify exact line, explain the bug, suggest corrected code
- Grader: AST-verified line matching + keyword-based explanation check + syntax validation of fix
- Baseline score:
0.54
- Input: Flask web app with SQL injection, hardcoded API keys, unsafe
eval(), command injection viaos.system(), path traversal - Agent must: List all vulnerabilities with severity (
low/medium/high/critical) and secure fix - Grader:
banditstatic analysis + ground-truth cross-reference + severity tolerance check - Baseline score:
0.38
| Event | Score Delta |
|---|---|
| Correct line identified | +0.4 |
| Correct issue type | +0.3 |
| Valid fix suggestion (AST-verified) | +0.3 |
| Hallucinated non-existent issue | -0.1 |
All scores are deterministically clamped to
[0.0, 1.0]. No randomness.
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Returns {"status": "ok"} |
POST |
/reset |
Reset env, returns initial Observation |
POST |
/step |
Submit Action, returns Reward + next Observation |
GET |
/state |
Current task, step count, cumulative reward |
GET |
/docs |
Swagger UI |
curl -X POST "https://shiva0999-code-review-env.hf.space/reset?task=bug-detect"curl -X POST "https://shiva0999-code-review-env.hf.space/step" \
-H "Content-Type: application/json" \
-d '{
"issues": [
{
"line": 6,
"issue_type": "bug",
"severity": "error",
"description": "Off-by-one error: range goes out of bounds",
"suggestion": "for i in range(1, len(numbers)):"
}
]
}'{
"observation": { "task_id": "bug-detect-001", "step_count": 1, ... },
"reward": {
"score": 0.72,
"breakdown": { "line_score": 0.4, "explanation_score": 0.3, "fix_score": 0.3 },
"feedback": "Found 2/4 known bugs. Fix suggestions were syntactically valid."
},
"done": false,
"info": { "cumulative_reward": 0.72 }
}class Observation(BaseModel):
task_id: str
code: str
language: str = "python"
task_description: str
step_count: int
previous_feedback: Optional[str]class Action(BaseModel):
issues: List[Issue]
class Issue(BaseModel):
line: int
issue_type: str # "lint" | "bug" | "security"
severity: str # "info" | "warning" | "error" | "critical"
description: str
suggestion: str| Component | Technology |
|---|---|
| Language | Python 3.11 |
| API Framework | FastAPI + Uvicorn |
| Data Validation | Pydantic v2 |
| Lint Grader | Pylint |
| Security Grader | Bandit |
| Bug Grader | AST parsing (stdlib) |
| LLM Client | OpenAI SDK (HF Router) |
| Containerization | Docker |
| Deployment | HuggingFace Spaces |
git clone https://github.com/shivakewat1/Code-Review-env
cd Code-Review-env
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 7860Open http://localhost:7860/docs for the Swagger UI.
docker build -t code-review-env .
docker run -p 7860:7860 code-review-envset HF_TOKEN=your_hf_token_here
set ENV_BASE_URL=https://shiva0999-code-review-env.hf.space
python inference.pyThe agent uses Qwen/Qwen2.5-72B-Instruct via the HuggingFace router by default. Override with:
set MODEL_NAME=meta-llama/Llama-3.3-70B-InstructExpected output format:
[START] task=lint-fix env=code-review-env model=Qwen/Qwen2.5-72B-Instruct
[STEP] step=1 action={...} reward=0.72 done=false error=null
[END] success=true steps=3 score=0.72 rewards=0.72,0.80,0.85
| Task | Difficulty | Model | Score | Steps |
|---|---|---|---|---|
| lint-fix | Easy | Qwen2.5-72B | 0.72 | 3 |
| bug-detect | Medium | Qwen2.5-72B | 0.54 | 5 |
| security-audit | Hard | Qwen2.5-72B | 0.38 | 6 |
- Runs on 2 vCPU, 8GB RAM
- All graders are fully deterministic (no randomness)
- Scores always in
[0.0, 1.0] reset()always returns a clean state- All task code samples are hardcoded β no external APIs needed for the environment itself
- Max 8 steps per task, total inference runtime under 20 minutes
code-review-env/
βββ Dockerfile
βββ openenv.yaml
βββ pyproject.toml
βββ requirements.txt
βββ inference.py
βββ server/
β βββ app.py # OpenEnv entry point with main()
βββ app/
βββ main.py # FastAPI routes
βββ env.py # Core RL environment logic
βββ models.py # Pydantic models
βββ tasks/
β βββ task1_lint.py
β βββ task2_bugdetect.py
β βββ task3_security.py
βββ graders/
βββ grader1_lint.py
βββ grader2_bug.py
βββ grader3_security.py
Built with dedication by Team Terminal Agents for the Scaler x HuggingFace OpenEnv Hackathon