Craft your own LoRA adapters with LoRA Craft - A web-based interface for fine-tuning pre-trained language models using GRPO (Group Relative Policy Optimization)
🌅 Sunset Notice: LoRA Craft is no longer actively maintained. This was a fun project but sometimes the world moves a little too fast. I encourage anyone interested to check out Unsloth Studio.
- Overview
- Prerequisites
- Installation
- Quick Start
- User Guide
- Key Concepts
- Troubleshooting
- Technical Reference
- Appendix
LoRA Craft is a web-based application that enables fine-tuning of large language models without requiring extensive machine learning expertise. The application uses GRPO (Group Relative Policy Optimization) to train models through reinforcement learning with custom reward functions.
- Fine-tune models for specific tasks: Math reasoning, code generation, question answering, and general instruction-following
- Use pre-configured datasets: Access popular datasets like Alpaca, GSM8K, OpenMath, and Code Alpaca
- Upload custom datasets: Train on your own data in JSON, JSONL, CSV, or Parquet formats
- Monitor training in real-time: Track loss, rewards, KL divergence, and other metrics through interactive charts
- Export trained models: Convert models to GGUF format for use with llama.cpp, Ollama, or LM Studio
- No-code interface: Configure and train models through a web browser
- Preset reward functions: Choose from pre-built reward functions for common tasks (math, coding, reasoning)
- Real-time monitoring: WebSocket-based live updates during training
- Configuration management: Save and load training configurations for reproducibility
- GPU monitoring: Track VRAM usage and system resources during training
GPU Mode (Recommended):
- GPU: NVIDIA GPU with CUDA support
- 8GB VRAM: Small models (0.6B - 1.7B parameters)
- 12GB VRAM: Medium models (3B - 4B parameters)
- 16GB+ VRAM: Large models (7B - 8B parameters)
- RAM: Minimum 16GB system memory (32GB+ recommended)
- Storage: At least 64GB free disk space for models and datasets
CPU-Only Mode (Supported):
- CPU: Modern multi-core processor (4+ cores)
- RAM: Minimum 16GB system memory (32GB+ strongly recommended)
- Storage: At least 64GB free disk space
- Note: Training will be 5-10x slower than GPU mode. Best for development, testing, or small-scale training.
- Operating System: Windows, Linux, or macOS
- Python: Version 3.11 or higher (for native installation)
- CUDA: CUDA Toolkit 12.8 or compatible version (GPU mode only)
- Git: For cloning the repository
- Docker: For Docker installation (optional but recommended)
Fast Windows Install for CUDA
pip install -r requirements.txtpip install torch==2.8.0+cu128 --index-url https://download.pytorch.org/whl/cu128pip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
Choose your installation method:
- Docker Installation (Recommended) - Easiest setup, works on any system
- Native Installation - Direct installation on your system
Docker provides the easiest and most reliable way to run LoRA Craft with all dependencies pre-configured. The Docker setup works on Windows (with WSL2), Linux, and macOS.
- Zero dependency management - No need to install Python, CUDA, or PyTorch manually
- Consistent environment - Works the same on any system
- Isolated installation - Won't conflict with other Python projects
- Easy updates - Pull latest image and restart
- Production-ready - Includes health checks, logging, and volume management
- CPU/GPU flexibility - Automatically detects and uses available hardware
Required (All Systems):
- Docker 20.10+ and Docker Compose 2.0+
For GPU Mode (Optional):
- NVIDIA GPU with CUDA support
- NVIDIA Driver 535+ installed on host
- NVIDIA Container Toolkit - See DOCKER-QUICKSTART.md for installation
For CPU-Only Mode:
- No additional setup needed - Docker is sufficient
- Recommended: 16GB+ RAM (32GB+ preferred)
# Clone the repository
git clone https://github.com/jwest33/lora_craft.git
cd lora_craft
# Optional: Configure environment
cp .env.example .env
# Edit .env to customize PORT, FLASK_SECRET_KEY, etc.
# Start the application (builds image on first run)
docker compose up -d
# View logs to verify startup
docker compose logs -f
# Access the web interface
# Open browser to http://localhost:5000First startup takes 5-10 minutes to download the base image and install dependencies. Subsequent starts are nearly instant.
The LoRA Craft Docker image provides:
- NVIDIA CUDA 12.8 runtime with cuDNN 9.7 (works on both GPU and CPU)
- Python 3.11 with all dependencies pre-installed
- PyTorch 2.8.0 with CUDA 12.8 support
- nvidia-smi utility for GPU monitoring (when GPU available)
- Automatic GPU/CPU detection on startup
- CPU fallback when GPU not available
- Persistent volumes for models, datasets, configs, and outputs
- Health checks to monitor application status
- Optimized training libraries (Unsloth, Transformers, PEFT, TRL)
Image size: ~20GB (includes all ML libraries)
Docker automatically creates persistent volumes for your data:
| Host Directory | Container Path | Purpose |
|---|---|---|
./outputs/ |
/app/outputs |
Trained model checkpoints |
./exports/ |
/app/exports |
Exported GGUF models |
./configs/ |
/app/configs |
Saved training configurations |
./uploads/ |
/app/uploads |
Uploaded dataset files |
./logs/ |
/app/logs |
Application logs |
Named volumes (stored in Docker):
huggingface-cache- Downloaded models from HuggingFacetransformers-cache- Transformers model cachedatasets-cache- HuggingFace datasets cachetorch-cache- PyTorch cache
Backup your data: Simply copy the host directories listed above.
# Start application
docker compose up -d
# Stop application (preserves data)
docker compose down
# View live logs
docker compose logs -f
# Restart after changes
docker compose restart
# Check container status
docker compose ps
# Execute commands in container
docker compose exec lora-craft bash
# Check GPU inside container
docker compose exec lora-craft nvidia-smi
# Rebuild image (after Dockerfile changes)
docker compose build --no-cache
docker compose up -d
# Update to latest version
git pull
docker compose build
docker compose up -d
# Remove everything including volumes (WARNING: deletes all data)
docker compose down -vWindows (Docker Desktop with WSL2):
- GPU support requires WSL2 backend (enabled by default)
- No need to install CUDA Toolkit on Windows host
- NVIDIA Driver must be installed on Windows (not in WSL2)
- Docker Desktop automatically includes NVIDIA Container Toolkit
Linux:
- Requires NVIDIA Container Toolkit installation
- See DOCKER-QUICKSTART.md for setup instructions
- Use
sudofor Docker commands or add user to docker group
macOS:
- GPU acceleration not available (no NVIDIA GPU support)
- CPU-only mode - works for development and testing
- Training will be 5-10x slower than GPU mode
- Recommended for small models (Qwen3-0.6B, Qwen3-1.7B)
- For production training, consider using cloud GPU instance
CPU Mode Detection:
The application automatically detects available hardware on startup:
- With GPU: Uses CUDA acceleration and Unsloth optimizations
- Without GPU: Falls back to CPU mode with standard PyTorch
Check logs after startup to see which mode is active:
docker compose logs lora-craft | grep -i "gpu\|cuda\|cpu mode"For detailed Docker setup, CPU mode configuration, troubleshooting, and platform-specific instructions, see DOCKER-QUICKSTART.md.
For users who prefer to install directly on their system.
git clone https://github.com/jwest33/lora_craft.git
cd lora_craftFor GPU (CUDA 12.8):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128For CPU-only:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpuFor other CUDA versions or more options, visit PyTorch's installation page.
For GPU mode:
pip install -r requirements.txtFor CPU-only mode:
# Install PyTorch CPU first (if not done in Step 2)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# Comment out GPU-specific packages in requirements.txt:
# - unsloth, unsloth_zoo
# - bitsandbytes
# - xformers
# - triton-windows (Windows only)
# Then install:
pip install -r requirements.txtThis will install all required packages including:
- Unsloth (optimized training framework, GPU-only)
- Transformers and PEFT (model handling)
- Flask and SocketIO (web interface)
- Training utilities (accelerate, TRL, bitsandbytes)
Check that PyTorch is working:
GPU Mode:
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"You should see CUDA available: True.
CPU Mode:
python -c "import torch; print(f'PyTorch version: {torch.__version__}')"You should see the PyTorch version printed.
-
Navigate to the project directory:
cd lora_craft -
Start the Flask server:
python server.py
-
Open your web browser and navigate to:
http://localhost:5000
- Select a model from the Model tab (e.g., Qwen3 1.7B)
- Choose a dataset from the Dataset tab (e.g., GSM8K for math)
- Configure training parameters in the Config tab
- Select a reward function matching your task
- Start training and monitor progress in real-time
- Export your model when training completes
- Test the model with sample prompts
The Model Configuration page allows you to select the base model for fine-tuning.
- Recommended: Uses best default settings for most use cases
- Custom: Configure LoRA parameters (rank, alpha, dropout)
- Advanced: Full control over all training parameters
Choose from several model families:
- Qwen3: Efficient models ranging from 0.6B to 8B parameters
- Llama: Popular open-source models
- Mistral: High-quality instruction-following models
- Phi: Microsoft's compact models
Select a model size based on your available VRAM. Examples:
- 0.6B - 1.7B: Works on 4GB+ VRAM
- 3B - 4B: Requires 8GB+ VRAM
- 7B - 8B: Requires 16GB+ VRAM
Configure the training data for your model.
-
Public Datasets: Browse curated datasets from HuggingFace
- Filter by category: Math, Coding, General, Q&A
- View dataset size and sample count
- Preview dataset samples before training
-
Custom HF Dataset: Enter any HuggingFace dataset path
- Format:
username/dataset-name - Specify split (train, test, validation)
- Format:
-
Upload File: Use your own data
- Supported formats: JSON, JSONL, CSV, Parquet
- Maximum size: 10GB
- Alpaca (52K samples): General instruction-following
- GSM8K (8.5K problems): Grade school math reasoning
- OpenMath Reasoning (100K problems): Advanced math problems
- Code Alpaca (20K examples): Code generation tasks
- Dolly 15k (15K samples): Diverse instruction tasks
- Orca Math (200K problems): Math word problems
- SQuAD v2 (130K questions): Question answering
Map your dataset columns to expected fields:
- Instruction: The input prompt or question
- Response: The expected output or answer
The system auto-detects common field names (question, answer, prompt, completion, etc.).
Define the instruction format for your model:
- Template Type: Choose GRPO Default or create custom templates
- System Prompt: Instructions given to the model
- Reasoning Markers: Tags to structure model thinking process
- Solution Markers: Tags to identify final answers
Configure hyperparameters for the training process.
Training Duration
- Epochs: Number of complete passes through the dataset (typical: 1-5)
- Samples Per Epoch: Limit samples per epoch, or use "All" for full dataset
Batch Settings
- Batch Size: Samples processed simultaneously (typical: 1-4)
- Gradient Accumulation Steps: Effective batch size multiplier (typical: 4-8)
- Effective batch size = batch_size × gradient_accumulation_steps
Learning Rate
- Learning Rate: Step size for model updates (typical: 5e-5 to 5e-4)
- Warmup Steps: Gradual learning rate increase at start (typical: 10-100)
- LR Scheduler: Learning rate adjustment strategy
constant: No change during traininglinear: Linear decay from peak to zerocosine: Smooth cosine decay
Optimization
- Optimizer: Algorithm for updating model weights
paged_adamw_32bit: Memory-efficient (recommended)adamw_8bit: More memory-efficient
- Weight Decay: Regularization to prevent overfitting (typical: 0.001-0.01)
- Max Gradient Norm: Gradient clipping threshold (typical: 0.3-1.0)
- KL Penalty: Prevents model from deviating too far from base model (typical: 0.01-0.1)
- Clip Range: PPO-style clipping for stable training (typical: 0.2)
- Importance Sampling Level: Token-level or sequence-level weighting
- LoRA Rank: Controls adapter capacity (typical: 8-32)
- LoRA Alpha: Scaling factor for adapter (typically 2x rank)
- LoRA Dropout: Regularization to prevent overfitting (typical: 0.0-0.1)
- Max Sequence Length: Maximum input length in tokens (typical: 1024-4096)
- Max New Tokens: Maximum generated response length (typical: 256-1024)
- Temperature: Randomness in generation (0.7 = balanced, lower = deterministic)
- Top-P: Nucleus sampling threshold (typical: 0.9-0.95)
Optional supervised fine-tuning phase before GRPO:
- Enabled: Toggle pre-training on/off
- Epochs: Number of pre-training epochs (typical: 1-2)
- Max Samples: Limit pre-training samples (or use "All")
- Learning Rate: Separate learning rate for pre-training (typical: 5e-5)
Pre-training helps the model learn output formatting before reinforcement learning.
Reward functions evaluate model outputs and guide training. Choose functions that match your task.
Algorithm Implementation
- Rewards correct algorithm implementation with efficiency considerations
- Use for: Code generation, algorithm design
Chain of Thought
- Rewards step-by-step reasoning processes
- Use for: Math problems, logical reasoning, complex analysis
Citation Format
- Rewards proper citation formatting (APA/MLA style)
- Use for: Academic writing, research tasks
Code Generation
- Rewards well-formatted code with proper syntax and structure
- Use for: Programming tasks, code completion
Concise Summarization
- Rewards accurate, concise summaries that capture key points
- Use for: Text summarization, data reporting
Creative Writing
- Rewards engaging text with good flow and vocabulary
- Use for: Content generation, storytelling
Math & Science
- Rewards correct mathematical solutions and scientific accuracy
- Use for: Math problems, scientific reasoning
Programming
- Rewards executable, efficient code
- Use for: Software development tasks
Reasoning
- Rewards logical reasoning and inference
- Use for: General problem-solving
Question Answering
- Rewards accurate, relevant answers
- Use for: Q&A systems, information retrieval
-
Select Algorithm Type: GRPO (standard) or GSPO (sequence-level)
-
Choose Reward Source:
- Quick Start: Auto-configured based on dataset
- Preset Library: Browse categorized reward functions
- Custom Builder: Create custom reward logic (advanced)
-
Map Dataset Fields:
- Instruction: Field containing the input prompt
- Response: Field containing the expected output
- Additional fields may be required depending on the reward function
-
Test Reward: Verify reward function works with sample data before training
Once training starts, monitor progress through real-time metrics.
Top Metrics Bar
- KL Divergence: Measures model deviation from base model (lower is more conservative)
- Completion Length: Average length of generated responses
- Clipped Ratio: Percentage of updates clipped by PPO (indicates training stability)
- Clip Reason: Whether clipping is due to min or max bounds
- Grad Norm: Gradient magnitude (monitors training health)
Reward Metrics Chart
- Mean Reward: Average reward across training samples
- Reward Std: Standard deviation of rewards (measures consistency)
- Tracks how well the model is learning to maximize rewards
Training Loss Chart
- Training Loss: Primary optimization objective
- Validation Loss: Performance on held-out data (if validation set provided)
- Both should decrease over time
KL Divergence Chart
- Tracks how much the model diverges from the base model
- Should remain relatively stable (controlled by KL penalty)
Completion Length Statistics
- Mean Length: Average response length
- Min Length: Shortest response
- Max Length: Longest response
- Helps identify if model is generating appropriate response lengths
Policy Clip Ratios
- Target Mean: Desired clip ratio
- Clip Mean: Actual clip ratio
- Clip Median: Median clip ratio
- Indicates training stability (high clipping = aggressive updates)
Learning Rate Schedule
- Shows learning rate over training steps
- Helps verify scheduler configuration
- Stop Training: Gracefully halt training and save current checkpoint
- View Logs: Access detailed training logs
- Session Management: Track multiple training sessions
The left sidebar shows all training sessions:
- Active sessions show real-time status
- Completed sessions remain available for review
- Click a session to view its metrics and model path
After training completes, export your model for deployment.
HuggingFace Format
- Standard format for Transformers library
- Includes base model + LoRA adapter
- Location:
outputs/<session_id>/
GGUF Format
- Optimized format for llama.cpp, Ollama, LM Studio
- Multiple quantization levels available:
- Q4_K_M: 4-bit quantization (balanced)
- Q5_K_M: 5-bit quantization (higher quality)
- Q8_0: 8-bit quantization (best quality)
- F16: 16-bit floating point (no quantization)
- Location:
exports/<session_id>/
Quantization reduces model size for deployment:
- Q4_K_M: ~4GB for 7B model (recommended for most users)
- Q5_K_M: ~5GB for 7B model (better quality)
- Q8_0: ~8GB for 7B model (minimal quality loss)
- F16: ~14GB for 7B model (no quality loss)
With llama.cpp
./main -m exports/<session_id>/model-q4_k_m.gguf -p "Your prompt here"With Ollama
ollama create my-model -f exports/<session_id>/Modelfile
ollama run my-modelWith LM Studio
- Open LM Studio
- Navigate to "Local Models"
- Click "Import" and select your GGUF file
Test your fine-tuned model with custom prompts.
- Select Model: Choose from trained models or active training sessions
- Enter Prompt: Type or paste your test question
- Configure Generation:
- Temperature: Control randomness (0.1 = deterministic, 1.0 = creative)
- Max Tokens: Maximum response length
- Top-P: Nucleus sampling threshold
- Generate: Click "Test Model" to generate response
Test multiple prompts at once:
- Upload a file with test prompts (one per line)
- Configure generation parameters
- Run batch test
- Export results to JSON or CSV
Evaluate model outputs using the same reward functions from training:
- Select reward function
- Enter prompt and expected response
- Generate model output
- View reward score and feedback
This helps quantify model improvement on your specific task.
GRPO is a reinforcement learning algorithm for training language models. Unlike supervised learning (which simply teaches the model to imitate examples), GRPO teaches the model to maximize rewards.
How GRPO Works:
- Model generates multiple responses for each prompt
- Reward function scores each response
- Model learns to increase probability of high-reward responses
- Model learns to decrease probability of low-reward responses
Benefits:
- Models learn to optimize for specific objectives (correctness, format, style)
- Better generalization than pure supervised learning
- Can improve beyond training data quality
GRPO vs Other Algorithms:
- GRPO: Token-level importance weighting (standard)
- GSPO: Sequence-level optimization (simpler, less granular)
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method.
Key Concepts:
- Instead of updating all model parameters (billions), LoRA adds small "adapter" layers
- Adapters are typically 1-2% the size of the full model
- Base model remains frozen, only adapters are trained
- Multiple adapters can be applied to the same base model
Benefits:
- Memory Efficient: Train on consumer GPUs (4-8GB VRAM)
- Fast Training: Fewer parameters to update
- Easy Sharing: Adapter files are small (typically 10-100MB)
- Modular: Switch adapters for different tasks
LoRA Parameters:
- Rank: Number of dimensions in adapter (higher = more capacity, slower training)
- Alpha: Scaling factor (controls adapter influence)
- Dropout: Regularization to prevent overfitting
Reward functions are Python functions that evaluate model outputs and return scores.
Components of a Reward Function:
- Input: Model's generated response + reference data
- Evaluation Logic: Checks correctness, format, quality
- Output: Numerical score (typically 0.0 to 1.0)
Example: Math Reward Function
def math_reward(response, expected_answer):
# Extract answer from response
model_answer = extract_solution(response)
# Check correctness
if model_answer == expected_answer:
return 1.0 # Correct
else:
return 0.0 # IncorrectTypes of Reward Functions:
- Exact Match: Binary reward (correct/incorrect)
- Partial Credit: Gradual scoring (0.0 to 1.0)
- Multi-Component: Combines multiple criteria (correctness + format + efficiency)
- Heuristic: Rule-based evaluation
- Model-Based: Uses another model to evaluate quality
Best Practices:
- Start with simple, interpretable reward functions
- Ensure rewards align with your desired behavior
- Test rewards on sample data before training
- Monitor reward distributions during training
System prompts define the instruction format and expected output structure.
Components:
- System Message: High-level instructions for the model
- Instruction Template: How to format input prompts
- Response Template: Expected output structure
- Special Markers: Tags for reasoning and solutions
Example System Prompt (GRPO Default):
You are given a problem.
Think about the problem and provide your working out.
Place it between <start_working_out> and <end_working_out>.
Then, provide your solution between <SOLUTION></SOLUTION>
Why Use Structured Outputs?
- Separates reasoning from final answer
- Makes reward function evaluation easier
- Improves model interpretability
- Enables extraction of specific components
Note: LoRA Craft automatically detects and uses available hardware. If you see "CPU Mode" in the logs but have an NVIDIA GPU, see the GPU troubleshooting section below.
Expected CPU Mode Indicators:
- Log message:
x No GPU detected - running in CPU mode - Log message:
x Unsloth optimizations: DISABLED (requires CUDA) - System status shows: "CPU Mode (No GPU Detected)"
This is normal if:
- You don't have an NVIDIA GPU
- Running on macOS
- Intentionally using CPU mode for testing
- NVIDIA Container Toolkit not installed (Docker)
Symptom: Container logs show "CPU Mode" or "CUDA Available: False" even though you have an NVIDIA GPU
Solutions:
-
Verify GPU works with Docker:
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
If this fails, your Docker GPU setup needs configuration.
-
Check docker-compose.yml has correct GPU configuration:
runtime: nvidia environment: - NVIDIA_VISIBLE_DEVICES=all - NVIDIA_DRIVER_CAPABILITIES=compute,utility
-
For Docker Desktop (Windows/macOS):
- Restart Docker Desktop
- Settings → Resources → WSL Integration (ensure enabled)
- Verify NVIDIA driver installed on Windows host
-
For Linux:
- Ensure NVIDIA Container Toolkit installed
- Run:
sudo nvidia-ctk runtime configure --runtime=docker - Restart Docker:
sudo systemctl restart docker
Symptom: "exec /app/src/entrypoint.sh: no such file or directory"
Cause: Line ending issues when building on Windows
Solution:
# Rebuild without cache
docker compose build --no-cache
docker compose up -dThe Dockerfile automatically fixes line endings, so rebuilding should resolve this.
Problem: "CUDA out of memory" error during training
Solutions:
- Reduce batch size to 1
- Increase gradient accumulation steps (maintains effective batch size)
- Reduce max sequence length (e.g., 2048 → 1024)
- Use smaller model (e.g., 1.7B instead of 4B)
- Enable gradient checkpointing (trades compute for memory)
- Use 8-bit or 4-bit quantization (reduces memory usage)
Problem: Training session created but doesn't start
Solutions:
- Check logs folder for error messages (
logs/) - Verify dataset downloaded successfully (check
cache/folder) - Ensure reward function is properly configured
- Check that all required fields are mapped
- Restart the Flask server and try again
Problem: "Failed to load dataset" error
Solutions:
- Verify dataset name is correct (case-sensitive)
- Check internet connection for HuggingFace downloads
- For uploaded files, verify format:
- JSON: Must be list of objects or object with data field
- JSONL: One JSON object per line
- CSV: Must have column headers
- Parquet: Standard Apache Parquet format
- Ensure instruction and response fields exist in dataset
Problem: Training is slower than expected
Solutions:
- Verify GPU is being used: Check system monitoring (top bar should show GPU usage)
- Reduce gradient accumulation steps (increases update frequency)
- Enable flash attention if using supported model (Llama, Mistral)
- Disable gradient checkpointing if memory allows
- Use larger batch size if VRAM permits
- Check that CUDA and PyTorch are properly installed
Problem: Model outputs are nonsensical or low quality
Solutions:
- Check reward signal: Ensure rewards are varying (not all 0.0 or 1.0)
- Increase pre-training epochs: Model needs to learn format first
- Adjust KL penalty: Lower values allow more deviation from base model
- Verify dataset quality: Check that training data is clean and relevant
- Increase training epochs: Model may need more training time
- Check system prompt: Ensure it clearly describes expected output format
- Test with different temperatures: Lower temperature (0.3-0.5) for more deterministic outputs
Problem: Real-time metrics not updating
Solutions:
- Refresh browser page
- Check browser console for WebSocket errors (F12)
- Verify Flask server is running
- Check firewall settings (port 5000 must be accessible)
- Try a different browser (Chrome/Firefox recommended)
Problem: GGUF export fails or produces invalid files
Solutions:
- Ensure training completed successfully
- Check that model checkpoint exists (
outputs/<session_id>/) - Verify sufficient disk space for export
- Check logs for llama.cpp converter errors
- Try exporting with different quantization level
The Flask server provides RESTful API endpoints for programmatic access.
Start Training
POST /api/training/start
Content-Type: application/json
{
"session_id": "unique-id",
"config": { ... training configuration ... }
}Stop Training
POST /api/training/stop
Content-Type: application/json
{
"session_id": "session-id-to-stop"
}List Training Sessions
GET /api/training/sessionsList Datasets
GET /api/datasets/listUpload Dataset
POST /api/datasets/upload
Content-Type: multipart/form-data
file=@dataset.jsonPreview Dataset
POST /api/datasets/preview
Content-Type: application/json
{
"path": "tatsu-lab/alpaca",
"samples": 5
}Test Model
POST /api/models/test
Content-Type: application/json
{
"model_path": "outputs/session-id/",
"prompt": "What is 2+2?",
"temperature": 0.7,
"max_tokens": 256
}List Trained Models
GET /api/models/listExport Model
POST /api/exports/create
Content-Type: application/json
{
"session_id": "session-id",
"format": "gguf",
"quantization": "q4_k_m"
}Save Configuration
POST /api/configs/save
Content-Type: application/json
{
"name": "my-config",
"config": { ... configuration object ... }
}Load Configuration
GET /api/configs/load?name=my-configList Configurations
GET /api/configs/listReal-time updates are delivered via Socket.IO.
Connect to Socket
const socket = io('http://localhost:5000');Subscribe to Training Updates
socket.on('training_update', (data) => {
console.log('Step:', data.step);
console.log('Loss:', data.loss);
console.log('Reward:', data.reward);
});Subscribe to System Updates
socket.on('system_update', (data) => {
console.log('GPU Memory:', data.gpu_memory);
console.log('GPU Utilization:', data.gpu_utilization);
});Saved configurations are stored as JSON in the configs/ directory.
Example Configuration:
{
"name": "math-reasoning-config",
"model": {
"name": "unsloth/Qwen3-1.7B",
"lora_rank": 16,
"lora_alpha": 32,
"lora_dropout": 0.0
},
"dataset": {
"source": "openai/gsm8k",
"split": "train",
"instruction_field": "question",
"response_field": "answer",
"max_samples": null
},
"training": {
"num_epochs": 3,
"batch_size": 1,
"gradient_accumulation_steps": 8,
"learning_rate": 0.0002,
"warmup_steps": 10,
"weight_decay": 0.001,
"max_grad_norm": 0.3,
"lr_scheduler_type": "constant",
"optim": "paged_adamw_32bit",
"max_sequence_length": 2048,
"max_new_tokens": 512,
"temperature": 0.7
},
"grpo": {
"kl_penalty": 0.05,
"clip_range": 0.2,
"importance_sampling_level": "token"
},
"reward": {
"type": "preset",
"preset_name": "math"
},
"pre_training": {
"enabled": true,
"epochs": 2,
"max_samples": 100,
"learning_rate": 0.00005
}
}JSON Format
[
{
"instruction": "What is the capital of France?",
"response": "The capital of France is Paris."
},
{
"instruction": "Solve 2+2",
"response": "2+2 = 4"
}
]JSONL Format
{"instruction": "What is the capital of France?", "response": "The capital of France is Paris."}
{"instruction": "Solve 2+2", "response": "2+2 = 4"}CSV Format
instruction,response
"What is the capital of France?","The capital of France is Paris."
"Solve 2+2","2+2 = 4"Parquet Format
- Standard Apache Parquet files with
instructionandresponsecolumns - Supports nested structures and efficient compression
Adapter: Small trainable module added to a frozen base model (see LoRA)
Base Model: Pre-trained language model before fine-tuning
Batch Size: Number of samples processed simultaneously during training
CUDA: NVIDIA's parallel computing platform for GPU acceleration
Epoch: One complete pass through the entire training dataset
Fine-tuning: Training a pre-trained model on new data for a specific task
GGUF: File format for quantized models (used by llama.cpp ecosystem)
Gradient Accumulation: Technique to simulate larger batch sizes with limited memory
Gradient Clipping: Technique to prevent exploding gradients by limiting their magnitude
GRPO: Group Relative Policy Optimization (reinforcement learning algorithm)
KL Divergence: Measure of how much the fine-tuned model differs from the base model
Learning Rate: Step size for model parameter updates
LoRA: Low-Rank Adaptation (parameter-efficient fine-tuning method)
Quantization: Reducing model precision (e.g., from 16-bit to 4-bit) to save memory
Reinforcement Learning: Training paradigm where model learns from reward signals
Reward Function: Function that evaluates model outputs and assigns scores
System Prompt: Instructions that define expected model behavior and output format
Token: Smallest unit of text processed by language models (roughly 3/4 of a word)
VRAM: Video RAM (GPU memory)
Warmup: Gradual increase of learning rate at training start
Documentation
- Unsloth Documentation
- HuggingFace Transformers
- PEFT Library
- TRL (Transformer Reinforcement Learning)
Model Sources
Dataset Sources
Deployment Tools
Community & Support
Acknowledgments: Built with Unsloth, HuggingFace Transformers, and Flask.
Created and maintained by jwest33 (loracraft.org)
Licensed under the MIT License.
If you reuse or distribute this project, please retain attribution. Thank you, and happy crafting!






