Novel-RAG is a specialized Retrieval-Augmented Generation system built to navigate the complexities of long-form fiction. While standard RAG often struggles with character pronouns and evolving plot points, this system uses strategic chunking and high-precision retrieval to provide accurate answers about character relationships, past events, and hidden lore.
- π Deep Novel Indexing: Process hundreds of markdown chapters into a persistent local vector database.
- π€ Relationship Mapping: Designed specifically to track "Who did what to whom" across 500,000+ words.
- β‘ Hybrid Embeddings: Switch between local
sentence-transformersfor privacy/cost and Gemini API for maximum accuracy. - π‘οΈ Anti-Hallucination: Restrictive prompting ensures the AI acts as a historian, not a co-author, citing its sources for every claim.
- π Web Interface: Modern chat UI with streaming responses, session history, and a settings panel.
- π³ Docker-Ready: One command to build and run the entire application.
The easiest way to run Novel RAG β everything is containerized.
# 1. Clone the repository
git clone https://github.com/your-username/novel_rag.git
cd novel_rag
# 2. Create your .env file with your API key
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY
# 3. Add your novels (markdown chapters in subfolders of data/)
mkdir -p data/my-novel
# Copy your .md chapter files into data/my-novel/
# 4. Build and run
docker compose up -d
# 5. Open your browser
# http://localhost:8000# 1. Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Or `.venv\Scripts\activate` on Windows
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY
# 4. Add your novels to data/
mkdir -p data/my-novel
# Copy .md files into data/my-novel/
# 5a. Run the Web UI
uvicorn src.main:app --host 0.0.0.0 --port 8000 --reload
# Open http://localhost:8000
# 5b. Or run the CLI (original terminal mode)
python -m src.cliChoose a novel from the sidebar dropdown. The status indicator shows whether embeddings exist.
If a novel hasn't been processed yet, click Re-Embed. A progress bar shows real-time embedding progress via WebSocket.
Click New Chat to start asking questions. Responses stream in token-by-token β no waiting for the full answer.
Previous chats appear in the sidebar. Click any session to resume it. Sessions auto-title from your first message.
Click the βοΈ gear icon to change:
- API Key β your Gemini API key (stored securely, masked in the UI)
- Generation Model β which Gemini model generates answers
- Embedding Mode β Local (HuggingFace, free) or API (Gemini, accurate)
- Model Names β customize embedding model names
Toggle between light and dark mode with the π/βοΈ button. Your preference is saved in the browser.
Create a new folder in data/ and add your .md files. The novel appears in the dropdown on next page load.
Drop new .md files into the novel's folder, then click Re-Embed in the UI (or type !reingest in CLI mode).
# Run the full test suite
python -m pytest tests/ -v
# Run specific test files
python -m pytest tests/test_document_processor.py -v
python -m pytest tests/test_chat_store.py -v
python -m pytest tests/test_api.py -v
python -m pytest tests/test_settings.py -vTests run automatically on every push/PR via GitHub Actions CI/CD.
This application is split into highly modular parts so you can study exactly how the data flows. For a deep dive, see the dedicated documentation:
| Document | What It Covers |
|---|---|
| Architecture Guide | System diagrams, design decisions, patterns, and rationale |
| WebSocket Guide | Streaming protocol, annotated code, connection lifecycle |
-
Chunking (
src/core/document_processor.py) β Chapters are split into ~600-char overlapping chunks using a sliding window. The 150-char overlap prevents pronoun references from being lost between chunks. -
Embeddings (
src/core/embeddings.py) β Text chunks are converted to numerical vectors (arrays of floats). You can switch between free local embeddings (HuggingFace) and paid Gemini API embeddings. -
Vector Database (
src/core/vector_db.py) β ChromaDB stores text alongside its embedding vector. When you ask a question, it finds the mathematically nearest chunks. -
Generation (
src/api/routes.py) β The 7 most relevant chunks are injected into a carefully engineered prompt, and Gemini generates an answer citing its sources. -
Streaming β Responses stream token-by-token via WebSocket, so you see the answer being written in real-time.
novel_rag/
βββ src/
β βββ core/ # Core RAG engine (framework-independent)
β β βββ config.py # Settings manager with 3-tier priority
β β βββ document_processor.py # Chapter chunking algorithm
β β βββ embeddings.py # Text-to-vector conversion
β β βββ vector_db.py # ChromaDB wrapper
β β βββ utils.py # Logging and timing utilities
β βββ api/ # FastAPI web layer
β β βββ routes.py # REST + WebSocket endpoints
β β βββ chat_store.py # SQLite session persistence
β βββ main.py # FastAPI app entry point
β βββ cli.py # Original terminal interface
βββ frontend/ # Vanilla HTML/CSS/JS chat UI
β βββ index.html
β βββ style.css
β βββ app.js
βββ tests/ # Test suite (39 tests)
βββ docs/ # Architecture & WebSocket documentation
βββ data/ # Your novel chapters (gitignored)
βββ db/ # ChromaDB + SQLite data (gitignored)
βββ Dockerfile # Container build instructions
βββ docker-compose.yml # Service orchestration
βββ .github/workflows/ci.yml # CI/CD pipeline
βββ requirements.txt # Python dependencies
- Multi-modal RAG β Support images/illustrations embedded in novel chapters
- Chapter-level filtering β Ask questions scoped to specific chapters
- Export conversations β Download chat history as Markdown
- Authentication β Multi-user support with login
- Novel upload via UI β Drag-and-drop
.mdfiles - Semantic search UI β Show retrieved chunks alongside the answer for transparency
- Chunk visualization β Interactive view of how a chapter was split
- Comparison mode β Compare lore across multiple novels side-by-side
This project is licensed under the MIT License.
This repository was built using an AI-assisted "vibe coding" approachβfocusing on rapid iteration, intuitive flow, and collaborative generation to bridge the gap between idea and implementation.
