A fully local document chat application. Upload PDFs or text files and ask questions about them — powered by RAG (Retrieval-Augmented Generation) with local AI models via Ollama. No API keys, no cloud, no data leaves your machine.
- Local-First — all processing runs via Ollama; your documents never leave your machine
- PDF & TXT Support — upload and parse documents up to 10 MB
- Semantic Search — documents are chunked, embedded, and stored in LanceDB for vector similarity search
- Streaming Responses — answers stream token-by-token via Server-Sent Events
- Source Citations — every answer shows the retrieved chunks and their source locations
- Document Summarization — generate a full summary of any uploaded document
- Document Management — upload, select, and delete documents from the sidebar
- Chat History — conversations are persisted in browser localStorage
- Configurable LLM Params — tune temperature, context window, top-p, and top-k
- Markdown Rendering — responses render with full Markdown and code highlighting
| Layer | Technology |
|---|---|
| API | FastAPI + Uvicorn |
| LLM Runtime | Ollama |
| Language Model | phi3:mini (2.2 GB) |
| Embeddings | nomic-embed-text (274 MB) |
| Vector Store | LanceDB |
| PDF Parsing | pdfplumber |
| HTTP Client | httpx |
| Layer | Technology |
|---|---|
| Framework | React 19 + Vite |
| Language | TypeScript |
| Animations | Framer Motion |
| File Upload | react-dropzone |
| Markdown | react-markdown + react-syntax-highlighter |
| HTTP | Axios |
- Ollama — local AI runtime
- Python 3.10+
- Node.js 18+
ollama pull phi3:mini
ollama pull nomic-embed-textcd backend
python -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reloadThe API will be available at http://localhost:8000.
cd frontend
npm install
npm run devOpen http://localhost:5173.
- Upload a PDF or TXT file via drag-and-drop
- The backend chunks the document and generates vector embeddings using
nomic-embed-text - Embeddings are stored in LanceDB (a local vector database)
- When you ask a question, the backend performs a semantic search to retrieve the most relevant chunks
- The retrieved chunks are injected into the prompt and sent to phi3:mini via Ollama
- The response streams back to the frontend token-by-token
├── backend/
│ ├── main.py # FastAPI app — all API routes
│ ├── embedder.py # Text → vector embeddings via Ollama
│ ├── vectorstore.py # LanceDB wrapper (store & search)
│ ├── ingestor.py # PDF/TXT → chunks → embed → store pipeline
│ └── requirements.txt
└── frontend/
└── src/
├── App.tsx # Main React component
├── api.ts # Backend API calls
└── chatStorage.ts # Chat history (localStorage)
MIT