GPU Local Lab

Run private AI experiments directly in your browser using WebGPU.

A privacy-first WebGPU playground for running AI inference, semantic search, and GPU benchmarks directly in the browser. No backend, no API keys, no accounts.

Built by David Vazquez — GitHub · LinkedIn

Features

MVP 1

WebGPU detection — detects support, adapter, device, features, and limits with safe fallbacks
GPU Benchmark — real WebGPU compute-shader matrix multiplication benchmark
Image AI — local image classification using MobileNet v2 via Transformers.js
Semantic Search — local TF-vector search over user-provided documents

MVP 2

CPU vs WebGPU comparison — same matrix multiplication run on both backends at the same size for a direct speedup ratio
Browser diagnostics panel — collapsible sections showing browser environment, WebGPU adapter, limits, and features
Shareable benchmark reports — export results as Markdown, JSON, or a PNG share card, fully client-side
Local experiment history — benchmark results persisted in IndexedDB; view, delete, or clear runs
Local storage manager — visibility into what the app stores locally, with clear actions

MVP 3 — Local AI Readiness Lab

Analyze My Browser — one-click browser readiness analysis summarizing which AI workloads are realistic (Excellent / Good / Limited / Not Recommended)
AI Capability Matrix — a full table of AI task categories with readiness level, recommended backend, throughput estimates, and memory pressure — generated from live browser signals
Model Catalog (/models) — static catalog of browser-compatible AI models with search, task/provider filters, compatibility ratings, and risk labels
Model Compatibility Checker — evaluates each catalog model against current browser signals; shows status, confidence, recommended backend, reasons, and warnings
Tokens/min Estimator — heuristic throughput estimator for LLMs; no model download required; clearly labeled Estimated
Model Load Profiler — measures real init, warmup, and inference timings for a lightweight model (all-MiniLM-L6-v2); clearly labeled Measured
Memory & Storage Estimator — calculate weights, KV cache, and runtime overhead for any model size + quantization combination
Context Window Calculator — visualize how system prompt, user input, retrieved chunks, and generation budget consume the model's context window
Local AI Readiness Report — export a full readiness report as Markdown or JSON; includes capability summary, model recommendations, and privacy note

Tech Stack

Layer	Technology
Framework	Next.js 16 App Router
Language	TypeScript (strict)
UI	React, shadcn/ui, Tailwind CSS v4, lucide-react
Toasts	sonner
Browser compute	WebGPU API, Web Workers
AI inference	@huggingface/transformers (MobileNet v2)
Local storage	IndexedDB via Dexie
PNG export	html-to-image
Unit tests	Vitest
E2E tests	Playwright

Architecture

The project uses pragmatic Domain-Driven Design with clean one-way dependency flow:

presentation → application → domain
application → infrastructure (via interfaces)
domain → no external dependencies

src/
  app/                      # Next.js App Router pages
  modules/
    ai-calculators/         # Memory estimator, context budget calculator, token count estimator
    benchmark/              # BenchmarkRunner interface, CPU + WebGPU runners, comparison logic
    diagnostics/            # Browser capability and WebGPU diagnostics collection
    history/                # ExperimentRepository (Dexie IndexedDB), CRUD use cases
    local-ai/               # ImageClassificationProvider interface, TransformersJS adapter
    local-storage/          # Storage summary and clear use cases
    model-capabilities/     # AI workload capability matrix generation from browser signals
    model-catalog/          # Static model catalog, compatibility checker, model filtering
    model-profiler/         # Token throughput estimator, model load profiler
    readiness/              # Browser AI readiness analysis (per task category)
    reports/                # BenchmarkReport + LocalAiReadinessReport serializers
    semantic-search/        # EmbeddingProvider interface, TF-vector fallback
    webgpu/                 # WebGPU capability detection
  components/
    ui/                     # shadcn/ui components
    layout/                 # AppShell, SiteHeader, SiteFooter
    shared/                 # MetricCard, EmptyState, ErrorBoundary
  workers/                  # Web Worker stubs for heavy compute
  lib/                      # cn, env, logger utilities

Module boundary rules

domain — pure TypeScript types and logic; no React, no browser APIs
application — use cases that orchestrate domain + infrastructure
infrastructure — WebGPU, IndexedDB, Cache API, model libraries
presentation — React components using shadcn/ui

Why WebGPU is not always faster

For small workloads (matrix size ≤ 128), CPU may outperform WebGPU. GPU acceleration has real overhead: device initialization, buffer allocation, data transfer to the GPU, shader compilation, and result synchronization. This overhead amortizes only for large parallel workloads.

At 512×512 matrix multiplication on a modern GPU, WebGPU typically achieves a meaningful speedup. At 64×64, CPU often wins. GPU Local Lab shows this clearly, including a warning message when CPU wins.

Local Development

npm install
npm run dev

Open http://localhost:3000.

Commands

Command	Description
`npm run dev`	Start development server
`npm run build`	Production build
`npm run test`	Run unit tests (Vitest)
`npm run test:watch`	Watch mode
`npm run typecheck`	TypeScript strict check
`npm run test:e2e`	Playwright E2E tests

Deployment on Vercel

Push to GitHub
Import the repository in vercel.com
No environment variables required for the MVP
WebGPU requires HTTPS — Vercel provides this by default

Browser Support

Browser	WebGPU	Image AI	Semantic Search
Chrome 113+	✓	✓	✓
Edge 113+	✓	✓	✓
Safari 18+	Partial	✓	✓
Firefox	✗ (flag only)	✓	✓

The app degrades gracefully on unsupported browsers — WebGPU features show clear "unsupported" states, and the CPU-only benchmark path always works.

Estimates vs Measured Results

GPU Local Lab distinguishes between estimated and measured performance throughout the UI.

Estimated results are based on browser signals, model metadata, and previous local benchmarks. They are heuristic — they do not require downloading or running a model.

Measured results come from workloads executed directly in the browser (e.g., the model load profiler, the GPU benchmark).

Do not treat estimates as guarantees. Browser APIs expose limited hardware information, and real performance can vary because of model architecture, quantization, thermal throttling, browser backend implementation, and active system load. Every estimate in the UI is clearly labeled Estimated.

Privacy Model

What	Where
Uploaded images	Stay in your browser — never uploaded
Text for search	Stays in your browser — never uploaded
Benchmark results	Stored in your browser's IndexedDB
Readiness analysis	Computed locally from browser APIs — never sent anywhere
AI model files	Downloaded from Hugging Face CDN on first use, then cached locally
Any server inference	None — all inference is local

You can clear all locally stored data from the Lab → Local Storage panel.

No model is ever downloaded without an explicit user action (clicking Profile or a similar button).

Known Limitations

CPU benchmark at large sizes blocks the main thread. 512×512 CPU matrix multiply is synchronous and may cause brief UI unresponsiveness. A Web Worker path is scaffolded but not fully wired for the UI.
AI model downloads are not bundled. Model files are downloaded from Hugging Face CDN on first use.
Semantic search uses TF-vector fallback. Full embedding-based search via WebGPU is not yet stable across browsers; the current implementation uses a local term-frequency vector with cosine similarity.
PNG export may fail in some browser environments (cross-origin restrictions). The Markdown and JSON export paths are always available.
IndexedDB is unavailable in some private/incognito contexts. History storage failures are caught silently and do not break benchmark execution.
Readiness estimates are heuristic. Device memory detection (navigator.deviceMemory) is not available in all browsers. When unavailable, the analyzer assumes 4 GB.
Token throughput estimates are rough heuristics. They are based on model size, quantization, and backend — not on measured model execution.
Model load profiler only supports one model (all-MiniLM-L6-v2). Other catalog models show "Profiler not available for this model yet."

Screenshots

Coming soon

Roadmap

Public benchmark leaderboard (Neon Postgres, anonymous submissions, explicit consent)
GitHub OAuth for saving experiments across devices
Full WebGPU embedding model for semantic search (once browser support stabilizes)
WebNN fallback
WASM fallback for CPU path
Shareable benchmark result URLs (URL-encoded, no backend)
Drag-and-drop PDF/text file ingestion for semantic search
Visual WGSL shader editor
Advanced GPU profiling panel
PWA support with offline shell

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs		docs
public		public
src		src
tests/unit		tests/unit
video		video
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
components.json		components.json
next.config.ts		next.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Local Lab

Features

MVP 1

MVP 2

MVP 3 — Local AI Readiness Lab

Tech Stack

Architecture

Module boundary rules

Why WebGPU is not always faster

Local Development

Commands

Deployment on Vercel

Browser Support

Estimates vs Measured Results

Privacy Model

Known Limitations

Screenshots

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU Local Lab

Features

MVP 1

MVP 2

MVP 3 — Local AI Readiness Lab

Tech Stack

Architecture

Module boundary rules

Why WebGPU is not always faster

Local Development

Commands

Deployment on Vercel

Browser Support

Estimates vs Measured Results

Privacy Model

Known Limitations

Screenshots

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages