Traditional RAG

A Python implementation of Retrieval-Augmented Generation (RAG) using LangChain, ChromaDB, and language models via Groq API. This project demonstrates how to build a system that retrieves relevant documents and generates contextual responses.

Features

📄 Multi-format Document Loading: Support for PDF and text files
🔍 Vector Search: Efficient similarity search using FAISS and ChromaDB
🧠 LLM Integration: Powered by Groq's fast language models
📝 Embeddings: Sentence-transformers for high-quality document embeddings
🔗 LangChain Integration: Built with LangChain for flexible RAG pipelines
📊 Jupyter Notebooks: Ready-to-use notebooks for PDF loading and document querying

Project Structure

traditional-rag/
├── main.py                      # Main application entry point
├── pyproject.toml              # Project metadata and dependencies
├── requirements.txt            # pip dependencies
├── README.md                   # This file
├── CONTRIBUTING.md             # Contribution guidelines
├── LICENSE                     # MIT License
├── .gitignore                  # Git ignore rules
├── .env.example                # Environment variables template
│
├── data/                       # Data directory
│   ├── pdf_files/             # PDF documents for processing
│   ├── text_files/            # Text documents
│   │   ├── sample1.txt
│   │   └── sample2.txt
│   └── vector_store/          # ChromaDB vector store
│
└── notebook/                   # Jupyter notebooks
    ├── pdf_loader.ipynb       # PDF loading examples
    └── document.ipynb         # Document querying examples

Prerequisites

Python 3.13 or higher
A valid Groq API key (get one at https://console.groq.com)
Virtual environment (recommended)

Installation

Using `uv` (Recommended)

# Clone the repository
git clone https://github.com/yourusername/traditional-rag.git
cd traditional-rag

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

Using `pip`

git clone https://github.com/yourusername/traditional-rag.git
cd traditional-rag

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Configuration

Create environment file:
```
cp .env.example .env
```
Add your Groq API key:
```
GROQ_API_KEY=your_api_key_here
```

The .env file will be automatically loaded by python-dotenv.

Usage

Running the Main Application

python main.py

Using Jupyter Notebooks

PDF Loader Notebook: Learn how to load and process PDF files
```
jupyter notebook notebook/pdf_loader.ipynb
```
Document Query Notebook: Explore RAG querying with your documents
```
jupyter notebook notebook/document.ipynb
```

Basic RAG Pipeline Example

from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import CharacterTextSplitter
from langchain_groq import ChatGroq

# Load documents
loader = PyPDFLoader("path/to/document.pdf")
documents = loader.load()

# Split documents
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = splitter.split_documents(documents)

# Create vector store
vectorstore = Chroma.from_documents(docs, embedding=embeddings)

# Query with RAG
llm = ChatGroq(model="mixtral-8x7b-32768")
response = llm.invoke("Your question here")

Key Dependencies

langchain & langchain-community: RAG framework and integrations
chromadb: Vector database for storing embeddings
faiss-cpu: Efficient similarity search
sentence-transformers: Creating embeddings from text
groq: Fast language model API
pymupdf & pypdf: PDF document loading
python-dotenv: Environment variable management

Environment Variables

Create a .env file in the root directory:

GROQ_API_KEY=your_groq_api_key_here

Development

Setting up for development:

# Install in editable mode with dev dependencies
uv pip install -e ".[dev]"

# Run tests
pytest tests/

# Format code
black .

# Lint
pylint src/

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Roadmap

Add more embedding model options
Implement query result caching
Add support for more document formats (Word, Excel, etc.)
Implement document metadata filtering
Add comprehensive test suite
Create CLI tool for document ingestion

Troubleshooting

API Key Issues

Ensure your GROQ_API_KEY is set in .env
Check that your API key is valid at https://console.groq.com

Vector Store Issues

Delete the data/vector_store/ directory to reset ChromaDB
Ensure you have write permissions to the data directory

Dependency Issues

Update all dependencies: uv pip install --upgrade -r requirements.txt
Clear cache: rm -rf .venv and reinstall

Resources

Support

For issues and questions:

🐛 GitHub Issues
💬 Discussions

Made with ❤️ for the RAG community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Traditional RAG

Features

Project Structure

Prerequisites

Installation

Using `uv` (Recommended)

Using `pip`

Configuration

Usage

Running the Main Application

Using Jupyter Notebooks

Basic RAG Pipeline Example

Key Dependencies

Environment Variables

Development

Setting up for development:

Contributing

License

Roadmap

Troubleshooting

API Key Issues

Vector Store Issues

Dependency Issues

Resources

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
notebook		notebook
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Traditional RAG

Features

Project Structure

Prerequisites

Installation

Using uv (Recommended)

Using pip

Configuration

Usage

Running the Main Application

Using Jupyter Notebooks

Basic RAG Pipeline Example

Key Dependencies

Environment Variables

Development

Setting up for development:

Contributing

License

Roadmap

Troubleshooting

API Key Issues

Vector Store Issues

Dependency Issues

Resources

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using `uv` (Recommended)

Using `pip`

Packages