A local application with a web UI for benchmarking multiple small language models (SLMs) running via Microsoft Foundry Local.
Read the full story: How we built FLPerformance to learn about the architecture decisions, challenges faced, and how to obtain real-world LLM performance metrics on your local hardware.
Windows users: If you have Node.js installed, run .\START_APP.ps1 to start everything. This opens two terminals and opens the browser automatically.
- Complete Benchmark System: Full end-to-end benchmarking with accurate metrics
- Enhanced Visualisations: Performance cards, comparison charts, and radar graphs
- Non-blocking Model Loading: Models download and load in the background with real-time status polling
- Real-time Progress: Polling-based status updates every two seconds during runs
- Pre-test Validation: Test button to verify model inference before benchmarking
- Results Export: JSON and CSV export functionality
- Hardware Detection: Comprehensive system information capture
- Storage System: JSON-based storage with optional SQLite support
- Custom Cache Support: Switch between model cache directories via the Cache tab
- Multi-Model Comparison: Side-by-side performance analysis with visual insights
FLPerformance enables you to:
- Manage the Foundry Local service using the official JavaScript SDK
- Load and benchmark multiple models simultaneously
- Run standardised benchmark tests across models
- Display clear performance statistics with tables and charts
- Export results for analysis
Dashboard showing system status, benchmark run history, and quick actions
Models page for loading, testing, and managing AI models
Benchmarks page with suite selection, model selection, and configuration
Comprehensive results with performance scores, comparison charts, and detailed metrics
Cache management page for switching model directories and viewing cached models
Settings page with system information, API endpoint, and application details
Required: Install Microsoft Foundry Local first
# Windows
winget install Microsoft.FoundryLocal
# macOS
brew tap microsoft/foundrylocal
brew install foundrylocal
# Or download from: https://aka.ms/foundry-local-installerVerify installation:
foundry --versionStep 1: Navigate to project directory
cd C:\Users\YourUsername\path\to\FLPerformanceStep 2: Install Node.js (if not already installed)
# Windows - Install Node.js LTS
winget install --id OpenJS.NodeJS.LTS --accept-package-agreements --accept-source-agreements
# After installation, RESTART YOUR TERMINAL for PATH updatesmacOS:
brew install nodeOr download from: https://nodejs.org/
Step 3: Run installation script
# Windows
.\scripts\install.ps1
# macOS/Linux
chmod +x scripts/install.sh && ./scripts/install.shNote: Installation uses --no-optional flag to skip SQLite database (requires build tools).
Results are saved as JSON files instead. This works perfectly for all features!
Step 4: Start the application
# Easy Mode - Opens 2 terminals + browser automatically (Windows)
.\START_APP.ps1
# Manual Mode - Starts both servers
npm run devOnce the server starts, open your browser:
You will see:
- Models tab: Add and load AI models
- Benchmarks tab: Run performance tests
- Results tab: View comparison charts
- Cache tab: Switch to custom model cache directories
- Click Models, then Initialise Foundry Local (one-time setup)
- Click Add Model and select
phi-3-mini-4k-instruct - Click Load Model (downloads roughly 2 GB; the model loads in the background while you see real-time status)
- Go to Benchmarks, select your model, and click Run Benchmark
- View results in the Results tab
- Use the Cache tab to switch the Foundry cache directory
- Point to directories containing custom ONNX models
- Custom models appear in the Models dropdown with a wrench badge
- Benchmark custom models in the same way as catalogue models
If the automated installation script does not work, follow these manual steps:
-
Microsoft Foundry Local
- Download from: https://aka.ms/foundry-local-installer
- Verify installation:
foundry --version - Note: Foundry Local CLI must be in your PATH
-
Node.js & NPM
- Node.js v18 or higher
- NPM v9 or higher
- Download from: https://nodejs.org/
- Verify:
node --versionandnpm --version
-
System Requirements
- Windows 10/11, macOS, or Linux
- Minimum 16GB RAM (32GB+ recommended for multiple models)
- GPU with CUDA support (optional but recommended)
- Adequate disk space for model storage (varies by model, typically 5-50GB per model)
# Skip optional SQLite (requires build tools)
npm install --no-optional
# Install frontend dependencies
cd src/client
npm install
cd ../..
# Create results directory
mkdir resultsWant SQLite database support? Install Visual Studio Build Tools first:
# Windows only - needed for better-sqlite3
winget install Microsoft.VisualStudio.2022.BuildTools --silent --override "--wait --passive --add Microsoft.VisualStudio.Workload.VCTools"
# Then install with optional dependencies
npm install
# Create results directory
mkdir results# Development mode (with hot reload)
npm run devAccess the application at: http://localhost:3000
The application will be available at:
- Frontend UI: http://localhost:3000
- Backend API: http://localhost:3001
- Open the UI at http://localhost:3000
- Navigate to the Models tab
- Click Initialise Foundry Local to start the service
- Click Add Model
- Select a model from the available Foundry Local catalogue (for example,
phi-3-mini-4k-instruct) - Click Load Model to download (if needed) and load the model into memory
Note: Foundry Local uses a single service instance that can load multiple models simultaneously. Models are differentiated by their model ID when making inference requests.
- Navigate to the Benchmarks tab
- Select the default benchmark suite
- Choose one or more models to benchmark
- Configure settings (iterations, concurrency, and so on)
- Click Run Benchmark
- Watch live progress as tests execute
- Navigate to the Results tab
- View comparison tables and charts
- Filter by run, model, or benchmark type
- Export results as JSON or CSV
FLPerformance/
├── src/
│ ├── server/ # Backend API
│ │ ├── index.js # Express server entry point
│ │ ├── orchestrator.js # Foundry Local service orchestration
│ │ ├── benchmark.js # Benchmark engine
│ │ ├── cacheManager.js # Model cache management (filesystem-based)
│ │ ├── storage.js # Results storage (JSON + SQLite)
│ │ └── logger.js # Structured logging
│ └── client/ # Frontend UI (React + Vite)
│ └── src/
│ ├── pages/ # Page views
│ └── utils/ # Client utilities
├── benchmarks/
│ └── suites/
│ └── default.json # Default benchmark suite definition
├── docs/
│ ├── architecture.md # System architecture
│ ├── api.md # REST API reference
│ ├── setup.md # Setup documentation
│ ├── BENCHMARK_GUIDE.md # Troubleshooting guide
│ ├── QUICK_REFERENCE.md # Commands and code patterns cheat sheet
│ ├── TESTING_CHECKLIST.md # Comprehensive test cases
│ ├── VALIDATION_STEPS.md # Validation procedures
│ └── images/ # Screenshots and diagrams
├── scripts/
│ └── helpers/ # Utility scripts
├── results/
│ └── example/ # Example benchmark results
├── package.json
└── README.md
- Unified service management using foundry-local-sdk
- Add/remove models from Foundry Local catalog
- Load multiple models simultaneously in a single service
- Custom Model Support: Benchmark custom ONNX models from alternate cache directories via Cache tab
- Monitor model health and status in real-time
- Automatic model download and caching
- Throughput (TPS): Tokens generated per second (overall)
- Latency: Time to first token (TTFT), time per output token (TPOT), and end-to-end completion time
- Generation Speed (GenTPS): Token generation rate after first token (1000/TPOT)
- Percentile Metrics: P50, P95, and P99 latency measurements for reliability analysis
- Performance Scoring: 0-100 score based on throughput, latency, and reliability
- Stability: Error rate and timeout tracking
- Resource Usage: CPU, RAM, and GPU utilisation (platform-dependent)
- Performance Score Cards: Visual 0-100 ratings for each model
- "Best Model For..." Cards: Automatic recommendations for throughput, latency, reliability, and TTFT
- Side-by-side Comparison Table: Detailed metrics with colour-coded scores
- Interactive Charts:
- Throughput comparison (TPS)
- Latency comparison (P50/P95/P99)
- Generation performance (TTFT, TPOT, GenTPS)
- Performance radar chart showing multidimensional analysis
- Detailed Results Table: Per-scenario breakdowns with all metrics
- Export Options: JSON and CSV export for further analysis
Default settings can be modified in the Settings tab:
- Default iterations per benchmark
- Concurrency level
- Request timeout values
- Results storage path
- Streaming mode (if supported)
FLPerformance uses the official foundry-local-sdk JavaScript package to manage the Foundry Local service:
- Single Service Instance: One Foundry Local service handles all models
- Multiple Loaded Models: Models are loaded on-demand and run simultaneously
- OpenAI-Compatible API: Standard OpenAI client for inference requests
- Model Differentiation: Models are identified by their model ID in API calls
See Architecture Documentation for details.
- Ensure Foundry Local is installed:
foundry --version - Verify Foundry Local CLI is in your PATH
- Check that port 8080 is available (default Foundry Local port)
- View logs in the Models tab for specific error messages
- Verify sufficient disk space for model download
- Check network connectivity for first-time downloads
- Ensure adequate RAM for model size
- Try manually loading with Foundry Local CLI:
foundry model run <model-name>
- Increase timeout values in Settings
- Reduce concurrency level
- Check system resource availability (RAM, GPU memory)
- Use the Test button in the Models tab to verify inference works
- Successful test ensures model will work in benchmarks
- Test validates both model loading and inference response
- Quick way to catch configuration issues early
- Run the appropriate installation script (install.ps1 or install.sh) for detailed diagnostics
- Check Quick Start Guide for common installation issues
- Verify Node.js version:
node --version(must be v18+)
For more detailed information, see:
- Quick Start Guide - Comprehensive getting started guide
- Quick Reference - Commands and code patterns cheat sheet
- Architecture Documentation - System design and SDK integration
- API Reference - REST API endpoint documentation
- Setup Guide - Detailed installation and configuration
- Benchmark Guide - Troubleshooting and testing guide
- Testing Checklist - Comprehensive test cases
For issues or questions:
- Check the documentation in
/docs - Review logs in the UI under each service
- Examine results in
/resultsdirectory
MIT License