Ruff Performance Guide

This guide covers performance characteristics, profiling tools, and optimization strategies for the Ruff programming language.

Performance Overview
Execution Modes
Profiling Tools
Benchmarking
Cross-Language Comparisons
Optimization Tips
JIT Compilation

Performance Overview

Ruff offers three execution modes with different performance characteristics:

Mode	Speed vs Interpreter	Compilation Time	Use Case
Interpreter	1x (baseline)	None	Debugging, development
VM	10-50x	Instant	Production scripts
JIT	100-500x+	First run only	Long-running, computation-heavy

Performance Targets

CPU-Bound: 2-5x slower than Go, 2-10x faster than Python
I/O-Bound: Near-native performance (bottleneck is I/O, not language)
Memory: Similar to Python/Node.js (Arc-based GC)

Execution Modes

1. Tree-Walking Interpreter

The original execution mode. Best for:

Development and debugging
Scripts that run once
Testing

Usage:

ruff run script.ruff --interpreter

Characteristics:

Slowest execution
No compilation overhead
Easy to debug
Full language feature support

2. Bytecode VM (Default)

Compiles AST to bytecode, then executes. Best for:

Production scripts
Moderate computation
General use

Usage:

ruff run script.ruff  # VM is default

Characteristics:

10-50x faster than interpreter
Instant compilation (<1ms typically)
Low memory overhead
Recommended for most use cases

3. JIT Compilation

Hot-path detection triggers native compilation. Best for:

Long-running services
Computation-heavy workloads
Performance-critical code

Usage:

# JIT activates automatically after 100 iterations
# No special flags needed

Characteristics:

100-500x faster for arithmetic-heavy code
Compilation happens on first hot-path detection
Type specialization for common types
Guard checks for type stability

Profiling Tools

Ruff includes built-in profiling to identify performance bottlenecks.

Basic Profiling

ruff profile script.ruff

Output:

=== Performance Profile Report ===

CPU Profile:
  Total Time: 2.543s
  Samples: 1250

  Top Hot Functions:
    1. calculate_primes          1.234s (48.5%)
    2. fibonacci                 0.456s (17.9%)
    3. process_data              0.321s (12.6%)

Memory Profile:
  Peak Memory: 45.23 MB
  Current Memory: 12.45 MB
  Total Allocations: 125043
  Total Deallocations: 124950

  Top Allocation Hotspots:
    1. array_creation            45230 allocs
    2. string_concatenation      23410 allocs
    3. dict_operations           12340 allocs

JIT Statistics:
  Functions Compiled: 3
  Recompilations: 0
  Total Compile Time: 0.045s
  Cache Hit Rate: 95.2%
  Guard Success Rate: 98.7%
===================================

Advanced Profiling Options

# Disable specific profiling categories
ruff profile script.ruff --no-cpu
ruff profile script.ruff --no-memory
ruff profile script.ruff --no-jit

# Generate flamegraph data
ruff profile script.ruff --flamegraph profile.txt

# Visualize with flamegraph.pl (install from GitHub)
flamegraph.pl profile.txt > flamegraph.svg
open flamegraph.svg

Flamegraph Workflow

Install flamegraph tools:

git clone https://github.com/brendangregg/FlameGraph
export PATH=$PATH:$(pwd)/FlameGraph

Profile your script:

ruff profile compute_heavy.ruff --flamegraph profile.txt

Generate SVG:

flamegraph.pl profile.txt > flamegraph.svg

View in browser:

open flamegraph.svg  # macOS
xdg-open flamegraph.svg  # Linux

Benchmarking

Ruff includes a built-in benchmarking framework.

Running Benchmarks

# Run all benchmarks in directory
ruff bench examples/benchmarks/

# Run specific benchmark
ruff bench fibonacci.ruff

# Custom iterations and warmup
ruff bench fibonacci.ruff -i 20 -w 5

Benchmark Output

========================================
Ruff Performance Benchmarks
========================================

Comparing execution modes:
  Interpreter: Tree-walking AST interpreter
  VM: Bytecode virtual machine
  JIT: Just-in-time native compilation
========================================

fibonacci
  Interpreter:  2.543s  (1.00x)
  VM:           0.156s  (16.30x faster)
  JIT:          0.008s  (317.88x faster) 🚀

higher_order
  Interpreter:  1.234s  (1.00x)
  VM:           0.089s  (13.87x faster)
  JIT:          0.012s  (102.83x faster) 🚀

========================================
Summary:
  Total benchmarks: 8
  VM speedup: 10-20x
  JIT speedup: 100-300x
========================================

Creating Custom Benchmarks

Create a .ruff file that demonstrates the operation to benchmark:

# fibonacci_bench.ruff

func fibonacci(n) {
    if n <= 1 {
        return n
    }
    return fibonacci(n - 1) + fibonacci(n - 2)
}

# Benchmark will measure this execution
let result := fibonacci(30)
print(result)

Cross-Language Comparisons

Compare Ruff against Go, Python, and Node.js.

Running Comparisons

chmod +x examples/benchmarks/compare_languages.sh
./examples/benchmarks/compare_languages.sh

Example Results

Fibonacci (Recursive, N=30)

Language	Time	Relative Speed
Go	0.045s	1.00x (baseline)
Ruff (JIT)	0.156s	0.29x (2-3x slower) ✅
Node.js (V8)	0.234s	0.19x
Ruff (VM)	1.234s	0.04x
Python	4.567s	0.01x

Array Operations (100K elements)

Language	Map	Filter	Reduce	Total
Go	2ms	3ms	1ms	6ms
Node.js	15ms	18ms	8ms	41ms
Ruff (JIT)	23ms	28ms	12ms	63ms
Ruff (VM)	145ms	167ms	89ms	401ms
Python	234ms	267ms	145ms	646ms

Performance Targets vs Reality

✅ Go: 2-5x slower (Target met)
✅ Python: 2-10x faster (Target met)
✅ Node.js: Competitive (Target met)

Optimization Tips

1. Let JIT Warm Up

The JIT compiler activates after 100 iterations. Ensure hot loops run enough:

# ❌ Not enough iterations for JIT
for i in range(50) {
    expensive_calculation()
}

# ✅ JIT will activate
for i in range(200) {
    expensive_calculation()
}

2. Avoid Type Mixing in Hot Loops

Type guards add overhead. Keep types consistent:

# ❌ Type mixing defeats JIT optimization
let x := 0
for i in range(1000) {
    x := x + i  # int
    if i % 100 == 0 {
        x := float(x)  # suddenly float! Guard fails
    }
}

# ✅ Consistent types enable specialization
let x := 0
for i in range(1000) {
    x := x + i  # always int
}

3. Hoist Invariant Calculations

Move calculations outside loops:

# ❌ Recalculates every iteration
for i in range(1000) {
    let multiplier := expensive_function()
    result := i * multiplier
}

# ✅ Calculate once
let multiplier := expensive_function()
for i in range(1000) {
    result := i * multiplier
}

4. Use Native Functions

Built-in functions are optimized in Rust:

# ❌ Manual implementation
func sum_array(arr) {
    let total := 0
    for x in arr {
        total := total + x
    }
    return total
}

# ✅ Use built-in
let total := array.reduce(|acc, x| acc + x, 0)

5. Preallocate Collections

Avoid repeated resizing:

# ❌ Multiple reallocations
let arr := []
for i in range(10000) {
    arr.push(i)
}

# ✅ Preallocate if possible (feature planned)
# let arr := Array.with_capacity(10000)

6. Profile Before Optimizing

Don't guess - measure:

# Find the actual bottleneck
ruff profile script.ruff --flamegraph profile.txt

JIT Compilation

How It Works

Hot Path Detection: After 100 iterations, functions are marked "hot"
Type Profiling: VM tracks types of variables
Compilation: Hot functions compile to native code with Cranelift
Guard Insertion: Type checks ensure assumptions hold
Execution: Native code runs 100-500x faster
Deoptimization: If guards fail, fall back to VM

Type Specialization

The JIT generates optimized code based on observed types:

func calculate(x, y) {
    return x * y + x / y
}

# After profiling sees Int + Int:
# JIT generates: (x * y) as i64 + (x / y) as i64

# If later called with Float:
# Guard fails, deoptimizes to VM

JIT Thresholds

Setting	Value	Purpose
Hot threshold	100 iterations	When to compile
Specialization samples	50	Minimum observations
Guard failure rate	10%	When to despecialize

Monitoring JIT

Check JIT statistics in profile output:

ruff profile script.ruff

# JIT Statistics section shows:
# - Functions compiled
# - Cache hit rate
# - Guard success rate

Healthy JIT metrics:

Cache hit rate: >90%
Guard success rate: >95%
Few recompilations

Unhealthy metrics indicate:

Type instability (mixing types)
Not enough iterations (hot threshold not reached)
Complex control flow (defeats optimization)

Troubleshooting Performance

Problem: Slower than expected

Diagnosis:

ruff profile script.ruff

Common causes:

Not enough iterations for JIT
Type mixing in hot paths
I/O-bound, not CPU-bound
Using interpreter mode

Solutions:

Increase loop iterations
Keep types consistent
Profile to find bottlenecks
Use VM mode (default)

Problem: High memory usage

Diagnosis:

ruff profile script.ruff --memory

Common causes:

String concatenation in loops
Large array/dict allocations
Closure captures
Leaked references

Solutions:

Use string builders (feature planned)
Preallocate collections
Minimize closure scope
Check allocation hotspots

Problem: JIT not activating

Check:

Hot threshold reached? (100+ iterations)
VM mode enabled? (not --interpreter)
Profile shows "Functions Compiled: 0"?

Solutions:

Increase loop count
Remove --interpreter flag
Check for exceptions in loops

FilesExpand file tree

PERFORMANCE.md

Latest commit

History

PERFORMANCE.md

File metadata and controls

Ruff Performance Guide

Table of Contents

Performance Overview

Performance Targets

Execution Modes

1. Tree-Walking Interpreter

2. Bytecode VM (Default)

3. JIT Compilation

Profiling Tools

Basic Profiling

Advanced Profiling Options

Flamegraph Workflow

Benchmarking

Running Benchmarks

Benchmark Output

Creating Custom Benchmarks

Cross-Language Comparisons

Running Comparisons

Example Results

Fibonacci (Recursive, N=30)

Array Operations (100K elements)

Performance Targets vs Reality

Optimization Tips

1. Let JIT Warm Up

2. Avoid Type Mixing in Hot Loops

3. Hoist Invariant Calculations

4. Use Native Functions

5. Preallocate Collections

6. Profile Before Optimizing

JIT Compilation

How It Works

Type Specialization

JIT Thresholds

Monitoring JIT

Troubleshooting Performance

Problem: Slower than expected

Problem: High memory usage

Problem: JIT not activating

Further Reading