This guide covers performance characteristics, profiling tools, and optimization strategies for the Ruff programming language.
- Performance Overview
- Execution Modes
- Profiling Tools
- Benchmarking
- Cross-Language Comparisons
- Optimization Tips
- JIT Compilation
Ruff offers three execution modes with different performance characteristics:
| Mode | Speed vs Interpreter | Compilation Time | Use Case |
|---|---|---|---|
| Interpreter | 1x (baseline) | None | Debugging, development |
| VM | 10-50x | Instant | Production scripts |
| JIT | 100-500x+ | First run only | Long-running, computation-heavy |
- CPU-Bound: 2-5x slower than Go, 2-10x faster than Python
- I/O-Bound: Near-native performance (bottleneck is I/O, not language)
- Memory: Similar to Python/Node.js (Arc-based GC)
The original execution mode. Best for:
- Development and debugging
- Scripts that run once
- Testing
Usage:
ruff run script.ruff --interpreterCharacteristics:
- Slowest execution
- No compilation overhead
- Easy to debug
- Full language feature support
Compiles AST to bytecode, then executes. Best for:
- Production scripts
- Moderate computation
- General use
Usage:
ruff run script.ruff # VM is defaultCharacteristics:
- 10-50x faster than interpreter
- Instant compilation (<1ms typically)
- Low memory overhead
- Recommended for most use cases
Hot-path detection triggers native compilation. Best for:
- Long-running services
- Computation-heavy workloads
- Performance-critical code
Usage:
# JIT activates automatically after 100 iterations
# No special flags neededCharacteristics:
- 100-500x faster for arithmetic-heavy code
- Compilation happens on first hot-path detection
- Type specialization for common types
- Guard checks for type stability
Ruff includes built-in profiling to identify performance bottlenecks.
ruff profile script.ruffOutput:
=== Performance Profile Report ===
CPU Profile:
Total Time: 2.543s
Samples: 1250
Top Hot Functions:
1. calculate_primes 1.234s (48.5%)
2. fibonacci 0.456s (17.9%)
3. process_data 0.321s (12.6%)
Memory Profile:
Peak Memory: 45.23 MB
Current Memory: 12.45 MB
Total Allocations: 125043
Total Deallocations: 124950
Top Allocation Hotspots:
1. array_creation 45230 allocs
2. string_concatenation 23410 allocs
3. dict_operations 12340 allocs
JIT Statistics:
Functions Compiled: 3
Recompilations: 0
Total Compile Time: 0.045s
Cache Hit Rate: 95.2%
Guard Success Rate: 98.7%
===================================
# Disable specific profiling categories
ruff profile script.ruff --no-cpu
ruff profile script.ruff --no-memory
ruff profile script.ruff --no-jit
# Generate flamegraph data
ruff profile script.ruff --flamegraph profile.txt
# Visualize with flamegraph.pl (install from GitHub)
flamegraph.pl profile.txt > flamegraph.svg
open flamegraph.svg-
Install flamegraph tools:
git clone https://github.com/brendangregg/FlameGraph export PATH=$PATH:$(pwd)/FlameGraph
-
Profile your script:
ruff profile compute_heavy.ruff --flamegraph profile.txt
-
Generate SVG:
flamegraph.pl profile.txt > flamegraph.svg -
View in browser:
open flamegraph.svg # macOS xdg-open flamegraph.svg # Linux
Ruff includes a built-in benchmarking framework.
# Run all benchmarks in directory
ruff bench examples/benchmarks/
# Run specific benchmark
ruff bench fibonacci.ruff
# Custom iterations and warmup
ruff bench fibonacci.ruff -i 20 -w 5========================================
Ruff Performance Benchmarks
========================================
Comparing execution modes:
Interpreter: Tree-walking AST interpreter
VM: Bytecode virtual machine
JIT: Just-in-time native compilation
========================================
fibonacci
Interpreter: 2.543s (1.00x)
VM: 0.156s (16.30x faster)
JIT: 0.008s (317.88x faster) 🚀
higher_order
Interpreter: 1.234s (1.00x)
VM: 0.089s (13.87x faster)
JIT: 0.012s (102.83x faster) 🚀
========================================
Summary:
Total benchmarks: 8
VM speedup: 10-20x
JIT speedup: 100-300x
========================================
Create a .ruff file that demonstrates the operation to benchmark:
# fibonacci_bench.ruff
func fibonacci(n) {
if n <= 1 {
return n
}
return fibonacci(n - 1) + fibonacci(n - 2)
}
# Benchmark will measure this execution
let result := fibonacci(30)
print(result)
Compare Ruff against Go, Python, and Node.js.
chmod +x examples/benchmarks/compare_languages.sh
./examples/benchmarks/compare_languages.sh| Language | Time | Relative Speed |
|---|---|---|
| Go | 0.045s | 1.00x (baseline) |
| Ruff (JIT) | 0.156s | 0.29x (2-3x slower) ✅ |
| Node.js (V8) | 0.234s | 0.19x |
| Ruff (VM) | 1.234s | 0.04x |
| Python | 4.567s | 0.01x |
| Language | Map | Filter | Reduce | Total |
|---|---|---|---|---|
| Go | 2ms | 3ms | 1ms | 6ms |
| Node.js | 15ms | 18ms | 8ms | 41ms |
| Ruff (JIT) | 23ms | 28ms | 12ms | 63ms |
| Ruff (VM) | 145ms | 167ms | 89ms | 401ms |
| Python | 234ms | 267ms | 145ms | 646ms |
- ✅ Go: 2-5x slower (Target met)
- ✅ Python: 2-10x faster (Target met)
- ✅ Node.js: Competitive (Target met)
The JIT compiler activates after 100 iterations. Ensure hot loops run enough:
# ❌ Not enough iterations for JIT
for i in range(50) {
expensive_calculation()
}
# ✅ JIT will activate
for i in range(200) {
expensive_calculation()
}
Type guards add overhead. Keep types consistent:
# ❌ Type mixing defeats JIT optimization
let x := 0
for i in range(1000) {
x := x + i # int
if i % 100 == 0 {
x := float(x) # suddenly float! Guard fails
}
}
# ✅ Consistent types enable specialization
let x := 0
for i in range(1000) {
x := x + i # always int
}
Move calculations outside loops:
# ❌ Recalculates every iteration
for i in range(1000) {
let multiplier := expensive_function()
result := i * multiplier
}
# ✅ Calculate once
let multiplier := expensive_function()
for i in range(1000) {
result := i * multiplier
}
Built-in functions are optimized in Rust:
# ❌ Manual implementation
func sum_array(arr) {
let total := 0
for x in arr {
total := total + x
}
return total
}
# ✅ Use built-in
let total := array.reduce(|acc, x| acc + x, 0)
Avoid repeated resizing:
# ❌ Multiple reallocations
let arr := []
for i in range(10000) {
arr.push(i)
}
# ✅ Preallocate if possible (feature planned)
# let arr := Array.with_capacity(10000)
Don't guess - measure:
# Find the actual bottleneck
ruff profile script.ruff --flamegraph profile.txt- Hot Path Detection: After 100 iterations, functions are marked "hot"
- Type Profiling: VM tracks types of variables
- Compilation: Hot functions compile to native code with Cranelift
- Guard Insertion: Type checks ensure assumptions hold
- Execution: Native code runs 100-500x faster
- Deoptimization: If guards fail, fall back to VM
The JIT generates optimized code based on observed types:
func calculate(x, y) {
return x * y + x / y
}
# After profiling sees Int + Int:
# JIT generates: (x * y) as i64 + (x / y) as i64
# If later called with Float:
# Guard fails, deoptimizes to VM
| Setting | Value | Purpose |
|---|---|---|
| Hot threshold | 100 iterations | When to compile |
| Specialization samples | 50 | Minimum observations |
| Guard failure rate | 10% | When to despecialize |
Check JIT statistics in profile output:
ruff profile script.ruff
# JIT Statistics section shows:
# - Functions compiled
# - Cache hit rate
# - Guard success rateHealthy JIT metrics:
- Cache hit rate: >90%
- Guard success rate: >95%
- Few recompilations
Unhealthy metrics indicate:
- Type instability (mixing types)
- Not enough iterations (hot threshold not reached)
- Complex control flow (defeats optimization)
Diagnosis:
ruff profile script.ruffCommon causes:
- Not enough iterations for JIT
- Type mixing in hot paths
- I/O-bound, not CPU-bound
- Using interpreter mode
Solutions:
- Increase loop iterations
- Keep types consistent
- Profile to find bottlenecks
- Use VM mode (default)
Diagnosis:
ruff profile script.ruff --memoryCommon causes:
- String concatenation in loops
- Large array/dict allocations
- Closure captures
- Leaked references
Solutions:
- Use string builders (feature planned)
- Preallocate collections
- Minimize closure scope
- Check allocation hotspots
Check:
- Hot threshold reached? (100+ iterations)
- VM mode enabled? (not --interpreter)
- Profile shows "Functions Compiled: 0"?
Solutions:
- Increase loop count
- Remove --interpreter flag
- Check for exceptions in loops
- ROADMAP.md - Performance milestone details
- JIT Implementation - Technical deep dive
- Benchmarking Framework - Implementation details
Last Updated: January 2026 (v0.9.0 Phase 6)