AutoJudge: Programming Problem Difficulty Prediction System

A comprehensive system for automatically classifying programming problems into difficulty categories (Easy, Medium, Hard) and predicting numerical difficulty scores on a 1-10 scale.

🎥 Demo Video

Watch Live Demo - Complete system demonstration showing real-time predictions and technical implementation.

Overview

AutoJudge analyzes programming problem descriptions using natural language processing and statistical learning techniques to predict their difficulty. The system is designed for competitive programming platforms, educational institutions, and coding assessment services.

Performance Metrics

Classification Results

Overall Accuracy: 55.0%
Dataset Size: 4,112 programming problems
Training/Test Split: 3,289 / 823 samples

Confusion Matrix

                 Predicted
Actual      Easy  Medium  Hard   Total
Easy         69     49     35     153
Medium       36    107    138     281
Hard         25     87    277     389
Total       130    243    450     823

Per-Class Performance

Class	Precision	Recall	F1-Score	Support
Easy	0.531	0.451	0.488	153
Medium	0.440	0.381	0.408	281
Hard	0.616	0.712	0.660	389

Weighted Average: Precision=0.540, Recall=0.550, F1-Score=0.542

Regression Performance

Mean Absolute Error (MAE): 1.735 points
Root Mean Square Error (RMSE): 2.071 points
R² Score: 0.116

Score Prediction by Class

Class	MAE	Actual Score	Predicted Score
Easy	2.267	1.99 ± 0.43	4.25 ± 0.54
Medium	0.817	4.13 ± 0.75	4.71 ± 0.52
Hard	2.189	7.11 ± 1.13	4.92 ± 0.48

Dataset Statistics

Total Problems: 4,112
Class Distribution:
- Hard: 1,941 problems (47.2%)
- Medium: 1,405 problems (34.2%)
- Easy: 766 problems (18.6%)
Score Range: 1.1 - 9.7
Mean Score: 5.11 ± 2.18

Technical Architecture

Feature Engineering

Total Features: 3,015
TF-IDF Features: 3,000 (selected from 4,000 using chi-square feature selection)
Custom Features: 15 domain-specific features

Custom Features

Text Metrics: Length, word count, vocabulary richness
Algorithm Indicators: Graph algorithms, dynamic programming, data structures
Complexity Markers: Sorting/searching, string processing, mathematical content
Problem Characteristics: Constraints, optimization keywords, test case patterns

System Architecture

Classification System

Type: Voting Classifier (Ensemble)
Components:
- Logistic Regression (C=2.0, balanced class weights)
- Random Forest Classifier (400 estimators, max_depth=35)
- Gradient Boosting Classifier (300 estimators, max_depth=12)
Voting Strategy: Soft voting

Regression System

Type: Random Forest Regressor
Parameters:
- n_estimators: 350
- max_depth: 30
- min_samples_split: 5
- min_samples_leaf: 2
- max_features: sqrt

Text Processing Pipeline

Preprocessing: Lowercasing, whitespace normalization, abbreviation expansion
TF-IDF Vectorization:
- Max features: 4,000
- N-gram range: (1, 3)
- Stop words: English
- Min/Max document frequency: 2 / 0.85
Feature Selection: Chi-square test (k=3,000)
Feature Scaling: StandardScaler for custom features
Feature Combination: Sparse matrix concatenation

API Documentation

Prediction Endpoint

POST /predict
Content-Type: application/json

{
  "description": "Find the maximum sum of a subarray using dynamic programming"
}

Response:

{
  "class": "medium",
  "score": 5.2,
  "confidence": 0.678,
  "reliable": true,
  "features": {
    "textLength": 65,
    "wordCount": 11,
    "dynamicProgramming": 1.0,
    "graphAlgorithms": 0.0,
    "dataStructures": 0.0,
    "sortingSearching": 0.0,
    "stringProcessing": 0.0,
    "basicMath": 0.0,
    "advancedMath": 0.0,
    "complexityNotation": 0.0
  }
}

Structured Input Format

POST /predict/structured
Content-Type: application/json

{
  "description": "Find the shortest path in a weighted graph",
  "input_desc": "Graph represented as adjacency matrix with weights",
  "output_desc": "Array of distances from source to all vertices"
}

Health Check

GET /health

Returns system status and component health information.

Installation and Setup

Prerequisites

Python 3.8+
Flask 2.0+
scikit-learn 1.0+
pandas, numpy, scipy

Local Installation

# Clone the repository
git clone https://github.com/oldhero07/Autojudge.git
cd Autojudge

# Install dependencies
pip install -r flask_app/requirements.txt

# Run the application
cd flask_app
python app.py

The application will start on http://localhost:5000

Docker Deployment

# Build the image
docker build -t autojudge .

# Run the container
docker run -p 5000:5000 autojudge

Docker Compose

# Start all services
docker-compose up -d

Usage Examples

Web Interface

Navigate to http://localhost:5000 to access the web interface for problem submission and prediction.

API Usage

import requests

# Predict problem difficulty
response = requests.post('http://localhost:5000/predict', json={
    'description': 'Implement a binary search algorithm to find an element in a sorted array'
})

result = response.json()
print(f"Class: {result['class']}")
print(f"Score: {result['score']}/10")
print(f"Confidence: {result['confidence']:.3f}")

Batch Processing

problems = [
    "Sort an array of integers",
    "Find shortest path using Dijkstra's algorithm",
    "Implement suffix array with linear time complexity"
]

for problem in problems:
    response = requests.post('http://localhost:5000/predict', 
                           json={'description': problem})
    result = response.json()
    print(f"{problem[:50]}... -> {result['class']} ({result['score']}/10)")

System Limitations

Class Imbalance: The system shows bias toward predicting "Hard" problems due to dataset imbalance (47.2% hard problems)
Score Regression: Limited R² score (0.116) indicates challenges in precise numerical score prediction
Domain Specificity: Optimized for competitive programming problems
Language Dependency: Designed for English problem descriptions

Performance Considerations

Inference Time: ~50ms per prediction
Memory Usage: ~200MB for loaded components
Throughput: ~20 requests/second on standard hardware
Storage: ~15MB serialized components

Project Structure

AutoJudge/
├── README.md                    # Project documentation  
├── PROJECT_REPORT.md           # Comprehensive technical report
├── DEPLOYMENT.md               # Deployment guide
├── Dockerfile                  # Container configuration
├── docker-compose.yml          # Multi-service setup
├── flask_app/                  # Flask application
│   ├── app.py                 # Main application (1,393 lines)
│   ├── requirements.txt       # Python dependencies
│   ├── models/               # Trained ML models
│   ├── templates/            # HTML templates
│   └── static/              # Static assets
├── problems_data.jsonl        # Training dataset (4,112 problems)
├── test_api.py               # API validation tests
└── docs/                    # Additional documentation

├── PROJECT_STRUCTURE.md # Architecture documentation ├── Dockerfile # Container configuration ├── docker-compose.yml # Multi-service setup ├── flask_app/ # Flask application │ ├── app.py # Main application (1,393 lines) │ ├── requirements.txt # Python dependencies │ ├── models/ # Trained components │ ├── templates/ # HTML templates │ └── static/ # Static assets ├── problems_data.jsonl # Training dataset ├── scripts/ # Utility scripts └── docs/ # Additional documentation


## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/improvement`)
3. Commit your changes (`git commit -am 'Add new feature'`)
4. Push to the branch (`git push origin feature/improvement`)
5. Create a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Citation

If you use AutoJudge in your research or applications, please cite:

```bibtex
@software{autojudge2026,
  title={AutoJudge: Programming Problem Difficulty Prediction System},
  author={oldhero07},
  year={2026},
  url={https://github.com/oldhero07/Autojudge}
}

Acknowledgments

Dataset sourced from competitive programming platforms
Built with scikit-learn, Flask, and modern NLP techniques
Inspired by the need for automated problem categorization in educational technology

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
components		components
docs		docs
flask_app		flask_app
.gitignore		.gitignore
App.tsx		App.tsx
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PROJECT_REPORT.md		PROJECT_REPORT.md
README.md		README.md
docker-compose.yml		docker-compose.yml
index.html		index.html
index.tsx		index.tsx
metadata.json		metadata.json
package-lock.json		package-lock.json
package.json		package.json
problems_data.jsonl		problems_data.jsonl
test_api.py		test_api.py
tsconfig.json		tsconfig.json
types.ts		types.ts
vite.config.ts		vite.config.ts

Folders and files

Latest commit

History

Repository files navigation

AutoJudge: Programming Problem Difficulty Prediction System

🎥 Demo Video

Overview

Performance Metrics

Classification Results

Confusion Matrix

Per-Class Performance

Regression Performance

Score Prediction by Class

Dataset Statistics

Technical Architecture

Feature Engineering

Custom Features

System Architecture

Classification System

Regression System

Text Processing Pipeline

API Documentation

Prediction Endpoint

Structured Input Format

Health Check

Installation and Setup

Prerequisites

Local Installation

Docker Deployment

Docker Compose

Usage Examples

Web Interface

API Usage

Batch Processing

System Limitations

Performance Considerations

Project Structure

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages