Run private AI experiments directly in your browser using WebGPU.
A privacy-first WebGPU playground for running AI inference, semantic search, and GPU benchmarks directly in the browser. No backend, no API keys, no accounts.
Built by David Vazquez — GitHub · LinkedIn
- WebGPU detection — detects support, adapter, device, features, and limits with safe fallbacks
- GPU Benchmark — real WebGPU compute-shader matrix multiplication benchmark
- Image AI — local image classification using MobileNet v2 via Transformers.js
- Semantic Search — local TF-vector search over user-provided documents
- CPU vs WebGPU comparison — same matrix multiplication run on both backends at the same size for a direct speedup ratio
- Browser diagnostics panel — collapsible sections showing browser environment, WebGPU adapter, limits, and features
- Shareable benchmark reports — export results as Markdown, JSON, or a PNG share card, fully client-side
- Local experiment history — benchmark results persisted in IndexedDB; view, delete, or clear runs
- Local storage manager — visibility into what the app stores locally, with clear actions
- Analyze My Browser — one-click browser readiness analysis summarizing which AI workloads are realistic (Excellent / Good / Limited / Not Recommended)
- AI Capability Matrix — a full table of AI task categories with readiness level, recommended backend, throughput estimates, and memory pressure — generated from live browser signals
- Model Catalog (
/models) — static catalog of browser-compatible AI models with search, task/provider filters, compatibility ratings, and risk labels - Model Compatibility Checker — evaluates each catalog model against current browser signals; shows status, confidence, recommended backend, reasons, and warnings
- Tokens/min Estimator — heuristic throughput estimator for LLMs; no model download required; clearly labeled Estimated
- Model Load Profiler — measures real init, warmup, and inference timings for a lightweight model (all-MiniLM-L6-v2); clearly labeled Measured
- Memory & Storage Estimator — calculate weights, KV cache, and runtime overhead for any model size + quantization combination
- Context Window Calculator — visualize how system prompt, user input, retrieved chunks, and generation budget consume the model's context window
- Local AI Readiness Report — export a full readiness report as Markdown or JSON; includes capability summary, model recommendations, and privacy note
| Layer | Technology |
|---|---|
| Framework | Next.js 16 App Router |
| Language | TypeScript (strict) |
| UI | React, shadcn/ui, Tailwind CSS v4, lucide-react |
| Toasts | sonner |
| Browser compute | WebGPU API, Web Workers |
| AI inference | @huggingface/transformers (MobileNet v2) |
| Local storage | IndexedDB via Dexie |
| PNG export | html-to-image |
| Unit tests | Vitest |
| E2E tests | Playwright |
The project uses pragmatic Domain-Driven Design with clean one-way dependency flow:
presentation → application → domain
application → infrastructure (via interfaces)
domain → no external dependencies
src/
app/ # Next.js App Router pages
modules/
ai-calculators/ # Memory estimator, context budget calculator, token count estimator
benchmark/ # BenchmarkRunner interface, CPU + WebGPU runners, comparison logic
diagnostics/ # Browser capability and WebGPU diagnostics collection
history/ # ExperimentRepository (Dexie IndexedDB), CRUD use cases
local-ai/ # ImageClassificationProvider interface, TransformersJS adapter
local-storage/ # Storage summary and clear use cases
model-capabilities/ # AI workload capability matrix generation from browser signals
model-catalog/ # Static model catalog, compatibility checker, model filtering
model-profiler/ # Token throughput estimator, model load profiler
readiness/ # Browser AI readiness analysis (per task category)
reports/ # BenchmarkReport + LocalAiReadinessReport serializers
semantic-search/ # EmbeddingProvider interface, TF-vector fallback
webgpu/ # WebGPU capability detection
components/
ui/ # shadcn/ui components
layout/ # AppShell, SiteHeader, SiteFooter
shared/ # MetricCard, EmptyState, ErrorBoundary
workers/ # Web Worker stubs for heavy compute
lib/ # cn, env, logger utilities
domain— pure TypeScript types and logic; no React, no browser APIsapplication— use cases that orchestrate domain + infrastructureinfrastructure— WebGPU, IndexedDB, Cache API, model librariespresentation— React components using shadcn/ui
For small workloads (matrix size ≤ 128), CPU may outperform WebGPU. GPU acceleration has real overhead: device initialization, buffer allocation, data transfer to the GPU, shader compilation, and result synchronization. This overhead amortizes only for large parallel workloads.
At 512×512 matrix multiplication on a modern GPU, WebGPU typically achieves a meaningful speedup. At 64×64, CPU often wins. GPU Local Lab shows this clearly, including a warning message when CPU wins.
npm install
npm run devOpen http://localhost:3000.
| Command | Description |
|---|---|
npm run dev |
Start development server |
npm run build |
Production build |
npm run test |
Run unit tests (Vitest) |
npm run test:watch |
Watch mode |
npm run typecheck |
TypeScript strict check |
npm run test:e2e |
Playwright E2E tests |
- Push to GitHub
- Import the repository in vercel.com
- No environment variables required for the MVP
- WebGPU requires HTTPS — Vercel provides this by default
| Browser | WebGPU | Image AI | Semantic Search |
|---|---|---|---|
| Chrome 113+ | ✓ | ✓ | ✓ |
| Edge 113+ | ✓ | ✓ | ✓ |
| Safari 18+ | Partial | ✓ | ✓ |
| Firefox | ✗ (flag only) | ✓ | ✓ |
The app degrades gracefully on unsupported browsers — WebGPU features show clear "unsupported" states, and the CPU-only benchmark path always works.
GPU Local Lab distinguishes between estimated and measured performance throughout the UI.
Estimated results are based on browser signals, model metadata, and previous local benchmarks. They are heuristic — they do not require downloading or running a model.
Measured results come from workloads executed directly in the browser (e.g., the model load profiler, the GPU benchmark).
Do not treat estimates as guarantees. Browser APIs expose limited hardware information, and real performance can vary because of model architecture, quantization, thermal throttling, browser backend implementation, and active system load. Every estimate in the UI is clearly labeled Estimated.
| What | Where |
|---|---|
| Uploaded images | Stay in your browser — never uploaded |
| Text for search | Stays in your browser — never uploaded |
| Benchmark results | Stored in your browser's IndexedDB |
| Readiness analysis | Computed locally from browser APIs — never sent anywhere |
| AI model files | Downloaded from Hugging Face CDN on first use, then cached locally |
| Any server inference | None — all inference is local |
You can clear all locally stored data from the Lab → Local Storage panel.
No model is ever downloaded without an explicit user action (clicking Profile or a similar button).
- CPU benchmark at large sizes blocks the main thread. 512×512 CPU matrix multiply is synchronous and may cause brief UI unresponsiveness. A Web Worker path is scaffolded but not fully wired for the UI.
- AI model downloads are not bundled. Model files are downloaded from Hugging Face CDN on first use.
- Semantic search uses TF-vector fallback. Full embedding-based search via WebGPU is not yet stable across browsers; the current implementation uses a local term-frequency vector with cosine similarity.
- PNG export may fail in some browser environments (cross-origin restrictions). The Markdown and JSON export paths are always available.
- IndexedDB is unavailable in some private/incognito contexts. History storage failures are caught silently and do not break benchmark execution.
- Readiness estimates are heuristic. Device memory detection (
navigator.deviceMemory) is not available in all browsers. When unavailable, the analyzer assumes 4 GB. - Token throughput estimates are rough heuristics. They are based on model size, quantization, and backend — not on measured model execution.
- Model load profiler only supports one model (all-MiniLM-L6-v2). Other catalog models show "Profiler not available for this model yet."
Coming soon
- Public benchmark leaderboard (Neon Postgres, anonymous submissions, explicit consent)
- GitHub OAuth for saving experiments across devices
- Full WebGPU embedding model for semantic search (once browser support stabilizes)
- WebNN fallback
- WASM fallback for CPU path
- Shareable benchmark result URLs (URL-encoded, no backend)
- Drag-and-drop PDF/text file ingestion for semantic search
- Visual WGSL shader editor
- Advanced GPU profiling panel
- PWA support with offline shell