Skip to content

DaveOval/gpu-local-lab

Repository files navigation

GPU Local Lab

Run private AI experiments directly in your browser using WebGPU.

A privacy-first WebGPU playground for running AI inference, semantic search, and GPU benchmarks directly in the browser. No backend, no API keys, no accounts.

Built by David VazquezGitHub · LinkedIn


Features

MVP 1

  • WebGPU detection — detects support, adapter, device, features, and limits with safe fallbacks
  • GPU Benchmark — real WebGPU compute-shader matrix multiplication benchmark
  • Image AI — local image classification using MobileNet v2 via Transformers.js
  • Semantic Search — local TF-vector search over user-provided documents

MVP 2

  • CPU vs WebGPU comparison — same matrix multiplication run on both backends at the same size for a direct speedup ratio
  • Browser diagnostics panel — collapsible sections showing browser environment, WebGPU adapter, limits, and features
  • Shareable benchmark reports — export results as Markdown, JSON, or a PNG share card, fully client-side
  • Local experiment history — benchmark results persisted in IndexedDB; view, delete, or clear runs
  • Local storage manager — visibility into what the app stores locally, with clear actions

MVP 3 — Local AI Readiness Lab

  • Analyze My Browser — one-click browser readiness analysis summarizing which AI workloads are realistic (Excellent / Good / Limited / Not Recommended)
  • AI Capability Matrix — a full table of AI task categories with readiness level, recommended backend, throughput estimates, and memory pressure — generated from live browser signals
  • Model Catalog (/models) — static catalog of browser-compatible AI models with search, task/provider filters, compatibility ratings, and risk labels
  • Model Compatibility Checker — evaluates each catalog model against current browser signals; shows status, confidence, recommended backend, reasons, and warnings
  • Tokens/min Estimator — heuristic throughput estimator for LLMs; no model download required; clearly labeled Estimated
  • Model Load Profiler — measures real init, warmup, and inference timings for a lightweight model (all-MiniLM-L6-v2); clearly labeled Measured
  • Memory & Storage Estimator — calculate weights, KV cache, and runtime overhead for any model size + quantization combination
  • Context Window Calculator — visualize how system prompt, user input, retrieved chunks, and generation budget consume the model's context window
  • Local AI Readiness Report — export a full readiness report as Markdown or JSON; includes capability summary, model recommendations, and privacy note

Tech Stack

Layer Technology
Framework Next.js 16 App Router
Language TypeScript (strict)
UI React, shadcn/ui, Tailwind CSS v4, lucide-react
Toasts sonner
Browser compute WebGPU API, Web Workers
AI inference @huggingface/transformers (MobileNet v2)
Local storage IndexedDB via Dexie
PNG export html-to-image
Unit tests Vitest
E2E tests Playwright

Architecture

The project uses pragmatic Domain-Driven Design with clean one-way dependency flow:

presentation → application → domain
application → infrastructure (via interfaces)
domain → no external dependencies
src/
  app/                      # Next.js App Router pages
  modules/
    ai-calculators/         # Memory estimator, context budget calculator, token count estimator
    benchmark/              # BenchmarkRunner interface, CPU + WebGPU runners, comparison logic
    diagnostics/            # Browser capability and WebGPU diagnostics collection
    history/                # ExperimentRepository (Dexie IndexedDB), CRUD use cases
    local-ai/               # ImageClassificationProvider interface, TransformersJS adapter
    local-storage/          # Storage summary and clear use cases
    model-capabilities/     # AI workload capability matrix generation from browser signals
    model-catalog/          # Static model catalog, compatibility checker, model filtering
    model-profiler/         # Token throughput estimator, model load profiler
    readiness/              # Browser AI readiness analysis (per task category)
    reports/                # BenchmarkReport + LocalAiReadinessReport serializers
    semantic-search/        # EmbeddingProvider interface, TF-vector fallback
    webgpu/                 # WebGPU capability detection
  components/
    ui/                     # shadcn/ui components
    layout/                 # AppShell, SiteHeader, SiteFooter
    shared/                 # MetricCard, EmptyState, ErrorBoundary
  workers/                  # Web Worker stubs for heavy compute
  lib/                      # cn, env, logger utilities

Module boundary rules

  • domain — pure TypeScript types and logic; no React, no browser APIs
  • application — use cases that orchestrate domain + infrastructure
  • infrastructure — WebGPU, IndexedDB, Cache API, model libraries
  • presentation — React components using shadcn/ui

Why WebGPU is not always faster

For small workloads (matrix size ≤ 128), CPU may outperform WebGPU. GPU acceleration has real overhead: device initialization, buffer allocation, data transfer to the GPU, shader compilation, and result synchronization. This overhead amortizes only for large parallel workloads.

At 512×512 matrix multiplication on a modern GPU, WebGPU typically achieves a meaningful speedup. At 64×64, CPU often wins. GPU Local Lab shows this clearly, including a warning message when CPU wins.


Local Development

npm install
npm run dev

Open http://localhost:3000.

Commands

Command Description
npm run dev Start development server
npm run build Production build
npm run test Run unit tests (Vitest)
npm run test:watch Watch mode
npm run typecheck TypeScript strict check
npm run test:e2e Playwright E2E tests

Deployment on Vercel

  1. Push to GitHub
  2. Import the repository in vercel.com
  3. No environment variables required for the MVP
  4. WebGPU requires HTTPS — Vercel provides this by default

Browser Support

Browser WebGPU Image AI Semantic Search
Chrome 113+
Edge 113+
Safari 18+ Partial
Firefox ✗ (flag only)

The app degrades gracefully on unsupported browsers — WebGPU features show clear "unsupported" states, and the CPU-only benchmark path always works.


Estimates vs Measured Results

GPU Local Lab distinguishes between estimated and measured performance throughout the UI.

Estimated results are based on browser signals, model metadata, and previous local benchmarks. They are heuristic — they do not require downloading or running a model.

Measured results come from workloads executed directly in the browser (e.g., the model load profiler, the GPU benchmark).

Do not treat estimates as guarantees. Browser APIs expose limited hardware information, and real performance can vary because of model architecture, quantization, thermal throttling, browser backend implementation, and active system load. Every estimate in the UI is clearly labeled Estimated.


Privacy Model

What Where
Uploaded images Stay in your browser — never uploaded
Text for search Stays in your browser — never uploaded
Benchmark results Stored in your browser's IndexedDB
Readiness analysis Computed locally from browser APIs — never sent anywhere
AI model files Downloaded from Hugging Face CDN on first use, then cached locally
Any server inference None — all inference is local

You can clear all locally stored data from the Lab → Local Storage panel.

No model is ever downloaded without an explicit user action (clicking Profile or a similar button).


Known Limitations

  • CPU benchmark at large sizes blocks the main thread. 512×512 CPU matrix multiply is synchronous and may cause brief UI unresponsiveness. A Web Worker path is scaffolded but not fully wired for the UI.
  • AI model downloads are not bundled. Model files are downloaded from Hugging Face CDN on first use.
  • Semantic search uses TF-vector fallback. Full embedding-based search via WebGPU is not yet stable across browsers; the current implementation uses a local term-frequency vector with cosine similarity.
  • PNG export may fail in some browser environments (cross-origin restrictions). The Markdown and JSON export paths are always available.
  • IndexedDB is unavailable in some private/incognito contexts. History storage failures are caught silently and do not break benchmark execution.
  • Readiness estimates are heuristic. Device memory detection (navigator.deviceMemory) is not available in all browsers. When unavailable, the analyzer assumes 4 GB.
  • Token throughput estimates are rough heuristics. They are based on model size, quantization, and backend — not on measured model execution.
  • Model load profiler only supports one model (all-MiniLM-L6-v2). Other catalog models show "Profiler not available for this model yet."

Screenshots

Coming soon


Roadmap

  1. Public benchmark leaderboard (Neon Postgres, anonymous submissions, explicit consent)
  2. GitHub OAuth for saving experiments across devices
  3. Full WebGPU embedding model for semantic search (once browser support stabilizes)
  4. WebNN fallback
  5. WASM fallback for CPU path
  6. Shareable benchmark result URLs (URL-encoded, no backend)
  7. Drag-and-drop PDF/text file ingestion for semantic search
  8. Visual WGSL shader editor
  9. Advanced GPU profiling panel
  10. PWA support with offline shell

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages