A Python implementation of Retrieval-Augmented Generation (RAG) using LangChain, ChromaDB, and language models via Groq API. This project demonstrates how to build a system that retrieves relevant documents and generates contextual responses.
- π Multi-format Document Loading: Support for PDF and text files
- π Vector Search: Efficient similarity search using FAISS and ChromaDB
- π§ LLM Integration: Powered by Groq's fast language models
- π Embeddings: Sentence-transformers for high-quality document embeddings
- π LangChain Integration: Built with LangChain for flexible RAG pipelines
- π Jupyter Notebooks: Ready-to-use notebooks for PDF loading and document querying
traditional-rag/
βββ main.py # Main application entry point
βββ pyproject.toml # Project metadata and dependencies
βββ requirements.txt # pip dependencies
βββ README.md # This file
βββ CONTRIBUTING.md # Contribution guidelines
βββ LICENSE # MIT License
βββ .gitignore # Git ignore rules
βββ .env.example # Environment variables template
β
βββ data/ # Data directory
β βββ pdf_files/ # PDF documents for processing
β βββ text_files/ # Text documents
β β βββ sample1.txt
β β βββ sample2.txt
β βββ vector_store/ # ChromaDB vector store
β
βββ notebook/ # Jupyter notebooks
βββ pdf_loader.ipynb # PDF loading examples
βββ document.ipynb # Document querying examples
- Python 3.13 or higher
- A valid Groq API key (get one at https://console.groq.com)
- Virtual environment (recommended)
# Clone the repository
git clone https://github.com/yourusername/traditional-rag.git
cd traditional-rag
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e .git clone https://github.com/yourusername/traditional-rag.git
cd traditional-rag
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt-
Create environment file:
cp .env.example .env
-
Add your Groq API key:
GROQ_API_KEY=your_api_key_here
The .env file will be automatically loaded by python-dotenv.
python main.py-
PDF Loader Notebook: Learn how to load and process PDF files
jupyter notebook notebook/pdf_loader.ipynb
-
Document Query Notebook: Explore RAG querying with your documents
jupyter notebook notebook/document.ipynb
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import CharacterTextSplitter
from langchain_groq import ChatGroq
# Load documents
loader = PyPDFLoader("path/to/document.pdf")
documents = loader.load()
# Split documents
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = splitter.split_documents(documents)
# Create vector store
vectorstore = Chroma.from_documents(docs, embedding=embeddings)
# Query with RAG
llm = ChatGroq(model="mixtral-8x7b-32768")
response = llm.invoke("Your question here")- langchain & langchain-community: RAG framework and integrations
- chromadb: Vector database for storing embeddings
- faiss-cpu: Efficient similarity search
- sentence-transformers: Creating embeddings from text
- groq: Fast language model API
- pymupdf & pypdf: PDF document loading
- python-dotenv: Environment variable management
Create a .env file in the root directory:
GROQ_API_KEY=your_groq_api_key_here
# Install in editable mode with dev dependencies
uv pip install -e ".[dev]"
# Run tests
pytest tests/
# Format code
black .
# Lint
pylint src/Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- Add more embedding model options
- Implement query result caching
- Add support for more document formats (Word, Excel, etc.)
- Implement document metadata filtering
- Add comprehensive test suite
- Create CLI tool for document ingestion
- Ensure your
GROQ_API_KEYis set in.env - Check that your API key is valid at https://console.groq.com
- Delete the
data/vector_store/directory to reset ChromaDB - Ensure you have write permissions to the data directory
- Update all dependencies:
uv pip install --upgrade -r requirements.txt - Clear cache:
rm -rf .venvand reinstall
For issues and questions:
- π GitHub Issues
- π¬ Discussions
Made with β€οΈ for the RAG community