A high-performance load testing tool for testing Bifrost's chat completion endpoints with support for multiple models, providers, and streaming responses.
- π Configurable requests per second (RPS)
- β±οΈ Customizable test duration
- π Streaming and non-streaming support
- π― Multiple models and providers
- π Real-time statistics
- π Virtual key authentication
- π Success rate tracking
cd /path/to/bifrost-enterprise/cmd/hitter
go build -o hitter main.gogo run main.go [flags]./hitter --rps 100 --duration 60s./hitter \
--models "gpt-4o,gpt-4o-mini,claude-3-opus" \
--providers "openai,anthropic" \
--rps 50 \
--duration 120s./hitter \
--stream \
--models "gpt-4o,gpt-5.2" \
--rps 100 \
--duration 60s \
--virtual-key sk-bf-xxxxx| Flag | Type | Default | Description |
|---|---|---|---|
--url |
string | http://localhost:8080/v1/chat/completions |
Target API endpoint |
--rps |
int | 100 |
Requests per second |
--duration |
duration | 60s |
Test duration (e.g., 30s, 5m, 1h) |
--models |
string | gpt-4,gpt-4o,gpt-4o-mini,gpt-4.1,gpt-5 |
Comma-separated list of models to test |
--providers |
string | "" |
Comma-separated list of providers (optional) |
--max-tokens |
int | 150 |
Maximum tokens per request |
--temperature |
float | 0.7 |
Temperature for model responses |
--stream |
bool | false |
Enable streaming responses |
--verbose |
bool | false |
Enable verbose logging |
--virtual-key |
string | "" |
Virtual API key for authentication |
Test with 500 requests per second for 5 minutes:
./hitter --rps 500 --duration 5m --verboseTest different models simultaneously:
./hitter \
--models "gpt-4o,gpt-4o-mini,gpt-5.2,claude-3-opus" \
--rps 200 \
--duration 180sTest with specific providers:
./hitter \
--models "gpt-4o,claude-3-opus" \
--providers "openai,anthropic" \
--rps 100 \
--duration 120s./hitter \
--stream \
--models "gpt-4o-mini" \
--providers "openai" \
--rps 50 \
--duration 60s \
--virtual-key sk-bf-your-key-here \
--verbose./hitter \
--url "https://api.example.com/v1/chat/completions" \
--models "gpt-4o" \
--rps 100 \
--duration 30s \
--virtual-key sk-bf-your-key-hereBoolean Flags:
- β
Correct:
--streamor--stream=true - β Wrong:
--stream true(will break subsequent flags)
Comma-Separated Values:
- β
Correct:
--models "gpt-4o,gpt-5.2,claude-3"(quoted with spaces) - β
Correct:
--models gpt-4o,gpt-5.2,claude-3(no spaces) - β Wrong:
--models gpt-4o, gpt-5.2, claude-3(spaces without quotes)
Duration Format:
- Valid:
30s,5m,1h,90s,2h30m - Examples:
--duration 30s,--duration 5m,--duration 1h30m
- Random Selection: For each request, a random model, provider, and prompt are selected from the configured options
- Token Variation: Max tokens vary by Β±25 tokens from the configured value
- Temperature Variation: Temperature varies by Β±0.1 from the configured value
- Graceful Shutdown: Press
Ctrl+Cto stop the test early and see final statistics
When providers are specified, requests will use the format: provider/model
Example:
--models "gpt-4o,gpt-5.2" --providers "openai,anthropic"Will generate requests like:
openai/gpt-4oopenai/gpt-5.2anthropic/gpt-4oanthropic/gpt-5.2
π [10s] Requests: 1000 | Success: 98.5% | RPS: 100.0
π [20s] Requests: 2000 | Success: 98.7% | RPS: 100.0
π FINAL STATISTICS
Duration: 60.123s
Total Requests: 6012
Successful: 5934 (98.7%)
Errors: 78
Average RPS: 100.0
The tool uses a variety of prompts including:
- Technical explanations (quantum computing, machine learning, neural networks)
- Creative writing (short stories, poems)
- Educational content (photosynthesis, climate change)
- Technical processes (blockchain, GPS systems)
Check:
- Is the URL correct and reachable?
- Is the server running on the specified port?
- Are you using the correct virtual key?
# Test with verbose logging
./hitter --verbose --rps 1 --duration 10sCommon causes:
- Invalid virtual key
- Server not running
- Wrong endpoint URL
- Network issues
Debug:
./hitter --verbose --rps 1 --duration 10s --virtual-key your-keyCheck:
- Are boolean flags formatted correctly? (use
--streamnot--stream true) - Are comma-separated values quoted if they contain spaces?
- Are you using double dashes
--for long flags?
- Start Small: Begin with low RPS (10-50) and gradually increase
- Use Streaming: Streaming tests better simulate real-world usage
- Monitor Server: Watch server metrics during load tests
- Timeout Settings: Default HTTP timeout is 30 seconds
- System Resources: Ensure your system can handle the target RPS
When modifying the tool:
- Update this README with new flags or features
- Test with various RPS and duration combinations
- Ensure graceful shutdown works correctly
- Add appropriate error handling
Part of the Bifrost Enterprise project.