Skip to content

johnmikel/data-copilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Copilot

A real language model that reads your dataset and answers questions — running entirely in the browser tab via WebGPU, with no API key and no server.

▶ Live demo: https://www.johnmikelregida.com/labs/copilot

Runs entirely in your browser. No API key, no backend, no telemetry — your data never leaves the device.

What it is

Paste a small dataset (CSV or a plain list), then ask a question or request a data-quality read. A quantised open-weights LLM is downloaded once, compiled to your GPU, and runs inference locally; the answer streams back token-by-token without a single network request leaving the page. This is interesting to anyone who deals with data that cannot leave the device — regulated, on-prem, air-gapped, or simply privacy-sensitive workflows — and to engineers tracking the browser's emergence as a genuine inference runtime.

It is deliberately scoped: a ~0.5B-parameter model is small. It is good for structural and data-quality observations over a few rows, not for heavy reasoning over large tables. The honesty matters more than the demo.

How it works

The whole app is a single self-contained index.html — no build step, no framework, no bundler.

  • Inference engine: WebLLM (@mlc-ai/web-llm), loaded as an ES module straight from https://esm.run/@mlc-ai/web-llm. WebLLM uses the MLC/Apache TVM runtime to compile the model to WebGPU compute shaders and run it locally.
  • Model: Qwen2.5-0.5B-Instruct-q4f16_1-MLC — Qwen2.5 0.5B Instruct, 4-bit weight quantisation with fp16 activations (q4f16_1). The download is ~0.5 GB and is cached by the browser after the first load.
  • Gated load: the model is not fetched on page open. WebGPU is feature-detected with navigator.gpu; if it's missing, the load button is disabled and the page shows a clear fallback message plus a worked example so it stays useful without WebGPU. The ~0.5 GB download is only triggered by an explicit Load model button, with a live progress bar wired to WebLLM's initProgressCallback.
  • Prompt construction: a fixed system prompt frames the model as a precise data analyst that must cite real values and never invent data. The user's pasted dataset is truncated to the first 2000 characters and concatenated with the question.
  • Generation: engine.chat.completions.create({ ..., temperature: 0.3, stream: true }) — an OpenAI-shaped streaming API. Output is accumulated and HTML-escaped before rendering, so pasted data can't inject markup.
page load → navigator.gpu check → (button) import WebLLM via esm.run
          → CreateMLCEngine("Qwen2.5-0.5B-Instruct-q4f16_1-MLC")  [~0.5 GB, cached]
          → system prompt + DATA (≤2000 chars) + QUESTION
          → streamed completion (temp 0.3) → escaped, rendered token-by-token

A note on provenance: the default dataset shipped in the textarea is a small synthetic stops table (ATCO-style codes, deliberately seeded with empty coordinates, stale modified dates, an inactive-but-present row, and a "Bank of Engalnd" misspelling) used to demonstrate the data-quality read. It is illustrative example data, not a real export.

Why it matters

On-device inference removes the two hardest objections to putting an LLM near sensitive data: the data never crosses a trust boundary, and there is no per-call API cost or rate limit — the compute is the user's own GPU. The same pattern generalises to data-platform work: a data-quality "linter" that runs in the analyst's browser, contract validation that never ships rows to a vendor, or an agent step that can operate offline and air-gapped. As WebGPU support matures, "the model runs where the data already is" becomes a real architectural option rather than a compliance compromise.

Run it locally

These are static pages that fetch ES modules and WebGPU/WASM assets from CDNs, so they must be served over HTTP — file:// will not work.

cd data-copilot
python3 -m http.server 8000
# then open http://localhost:8000/index.html in a WebGPU-capable browser

There is no build step, no npm install, and no data pipeline — index.html is the entire application. To run the model live you need a desktop browser with WebGPU enabled (recent Chrome, Edge, or Safari) and enough VRAM for the quantised 0.5B model. Without WebGPU the page still loads and shows the worked example.

Tech

  • WebLLM (@mlc-ai/web-llm) over the MLC / Apache TVM runtime
  • WebGPU for on-device compute
  • Qwen2.5-0.5B-Instruct, q4f16_1 quantisation
  • Vanilla JS ES modules via esm.run (no framework, no bundler)
  • Single self-contained index.html

Built by John Mikel Regida — Lead Data Architect (Thoughtworks; UK Dept for Transport / NaPTAN; ex-CTO; 5× Google Cloud Professional). GitHub: github.com/johnmikel. Site: https://www.johnmikelregida.com

Part of the JMR Labs suite — https://www.johnmikelregida.com/labs

About

A real LLM running entirely in your browser via WebGPU (WebLLM, Qwen2.5-0.5B) — paste a dataset and ask questions or get a data-quality read. No API key, nothing leaves the device.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages