Data Copilot

A real language model that reads your dataset and answers questions — running entirely in the browser tab via WebGPU, with no API key and no server.

▶ Live demo: https://www.johnmikelregida.com/labs/copilot

Runs entirely in your browser. No API key, no backend, no telemetry — your data never leaves the device.

What it is

Paste a small dataset (CSV or a plain list), then ask a question or request a data-quality read. A quantised open-weights LLM is downloaded once, compiled to your GPU, and runs inference locally; the answer streams back token-by-token without a single network request leaving the page. This is interesting to anyone who deals with data that cannot leave the device — regulated, on-prem, air-gapped, or simply privacy-sensitive workflows — and to engineers tracking the browser's emergence as a genuine inference runtime.

It is deliberately scoped: a ~0.5B-parameter model is small. It is good for structural and data-quality observations over a few rows, not for heavy reasoning over large tables. The honesty matters more than the demo.

How it works

The whole app is a single self-contained index.html — no build step, no framework, no bundler.

Inference engine: WebLLM (@mlc-ai/web-llm), loaded as an ES module straight from https://esm.run/@mlc-ai/web-llm. WebLLM uses the MLC/Apache TVM runtime to compile the model to WebGPU compute shaders and run it locally.
Model: Qwen2.5-0.5B-Instruct-q4f16_1-MLC — Qwen2.5 0.5B Instruct, 4-bit weight quantisation with fp16 activations (q4f16_1). The download is ~0.5 GB and is cached by the browser after the first load.
Gated load: the model is not fetched on page open. WebGPU is feature-detected with navigator.gpu; if it's missing, the load button is disabled and the page shows a clear fallback message plus a worked example so it stays useful without WebGPU. The ~0.5 GB download is only triggered by an explicit Load model button, with a live progress bar wired to WebLLM's initProgressCallback.
Prompt construction: a fixed system prompt frames the model as a precise data analyst that must cite real values and never invent data. The user's pasted dataset is truncated to the first 2000 characters and concatenated with the question.
Generation: engine.chat.completions.create({ ..., temperature: 0.3, stream: true }) — an OpenAI-shaped streaming API. Output is accumulated and HTML-escaped before rendering, so pasted data can't inject markup.

page load → navigator.gpu check → (button) import WebLLM via esm.run
          → CreateMLCEngine("Qwen2.5-0.5B-Instruct-q4f16_1-MLC")  [~0.5 GB, cached]
          → system prompt + DATA (≤2000 chars) + QUESTION
          → streamed completion (temp 0.3) → escaped, rendered token-by-token

A note on provenance: the default dataset shipped in the textarea is a small synthetic stops table (ATCO-style codes, deliberately seeded with empty coordinates, stale modified dates, an inactive-but-present row, and a "Bank of Engalnd" misspelling) used to demonstrate the data-quality read. It is illustrative example data, not a real export.

Why it matters

On-device inference removes the two hardest objections to putting an LLM near sensitive data: the data never crosses a trust boundary, and there is no per-call API cost or rate limit — the compute is the user's own GPU. The same pattern generalises to data-platform work: a data-quality "linter" that runs in the analyst's browser, contract validation that never ships rows to a vendor, or an agent step that can operate offline and air-gapped. As WebGPU support matures, "the model runs where the data already is" becomes a real architectural option rather than a compliance compromise.

Run it locally

These are static pages that fetch ES modules and WebGPU/WASM assets from CDNs, so they must be served over HTTP — file:// will not work.

cd data-copilot
python3 -m http.server 8000
# then open http://localhost:8000/index.html in a WebGPU-capable browser

There is no build step, no npm install, and no data pipeline — index.html is the entire application. To run the model live you need a desktop browser with WebGPU enabled (recent Chrome, Edge, or Safari) and enough VRAM for the quantised 0.5B model. Without WebGPU the page still loads and shows the worked example.

Tech

WebLLM (@mlc-ai/web-llm) over the MLC / Apache TVM runtime
WebGPU for on-device compute
Qwen2.5-0.5B-Instruct, q4f16_1 quantisation
Vanilla JS ES modules via esm.run (no framework, no bundler)
Single self-contained index.html

Built by John Mikel Regida — Lead Data Architect (Thoughtworks; UK Dept for Transport / NaPTAN; ex-CTO; 5× Google Cloud Professional). GitHub: github.com/johnmikel. Site: https://www.johnmikelregida.com

Part of the JMR Labs suite — https://www.johnmikelregida.com/labs

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Copilot

What it is

How it works

Why it matters

Run it locally

Tech

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Copilot

What it is

How it works

Why it matters

Run it locally

Tech

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages