A high-performance, private, and zero-cost RAG (Retrieval-Augmented Generation) engine that runs entirely in your browser.
- Zero Cost: No server-side GPUs, no API keys, and no subscription fees. Everything runs on the client machine.
- Orma Indexing: Ultra-fast, browser-native vector search powered by Orama. Handles thousands of documents with sub-millisecond search times.
- WebLLM: First-class support for on-device LLMs via WebLLM and WebGPU. Stream answers directly from local models like Qwen2.5.
- Framework: Angular 19
- Vector Search: Orama (Sharded & lazy-loaded)
- Embeddings: Transformers.js (
all-MiniLM-L6-v2) - Inference: WebLLM (Qwen2.5-0.5B-Instruct)
-
Install Dependencies
npm install cd indexer && npm install && cd ..
-
Build Search Index Place your
.mdfiles incontent/, then run:cd indexer && node build-index.mjs
-
Run Locally
npm start
Open http://localhost:4200 in your browser. Note: A WebGPU-enabled browser (like Chrome 113+) is required for on-device LLM inference.
To deploy to GitHub Pages:
npm run deployThis will build the project with the correct base-href (/Peach/) and push the dist/slm/browser folder to the gh-pages branch.
- Local Inference: Your questions and the LLM's answers are processed entirely on your machine via WebGPU.
- No Tracking: No user data, queries, or chat history are ever sent to a server.
License: MIT