A powerful Retrieval-Augmented Generation (RAG) application that allows you to chat with your PDF documents using state-of-the-art language models. Upload any PDF and ask questions about its content through an intuitive Streamlit web interface.
- π PDF Upload & Processing: Load any PDF document through the web interface
- π§ Advanced RAG Pipeline: Uses OpenAI's GPT-OSS models via Hugging Face router
- π¬ Interactive Chat Interface: Ask questions and get intelligent answers about your PDF content
- π Document Chunking: Efficiently splits large documents into manageable chunks
- π Semantic Search: Uses sentence-transformers for accurate content retrieval
- πΎ Vector Storage: ChromaDB for persistent embedding storage
- π Chat History: Maintains conversation context across multiple questions
- π¨ Beautiful UI: Clean and responsive Streamlit interface
- β‘ Real-time Processing: Fast document processing and question answering
- Python 3.8 or higher (Download Python)
- Git (Download Git)
- Hugging Face API Token (Get one here)
First, verify your Python installation:
python --version
# or
python3 --versionYou should see Python 3.8+ (e.g., "Python 3.9.7" or "Python 3.11.2")
git clone https://github.com/aaarif796/RAGs-PDF-Reader.git
cd RAGs-PDF-Reader- Go to GitHub Repository
- Click "Code" β "Download ZIP"
- Extract the ZIP file
- Navigate to the extracted folder
# Create virtual environment
python -m venv gen_ai
# Activate virtual environment
gen_ai\Scripts\activate
# Verify activation (you should see (gen_ai) in your prompt)# Create virtual environment
python3 -m venv gen_ai
# Activate virtual environment
source gen_ai/bin/activate
# Verify activation (you should see (gen_ai) in your prompt)# Create conda environment
conda create -n gen_ai python=3.9 -y
# Activate environment
conda activate gen_ai# Make sure your virtual environment is activated
pip install -r requirements.txtIf you encounter issues with requirements.txt, install packages individually:
# Core packages
pip install streamlit==1.28.0
pip install langchain
pip install langchain-community
pip install langchain-chroma
pip install langchain-huggingface
pip install langchain-openai
pip install langchain-text-splitters
# Supporting packages
pip install chromadb
pip install sentence-transformers
pip install torch
pip install pypdf
pip install python-dotenv
pip install openaiCreate a file named .env in the project root directory:
HUGGINGFACEHUB_API_TOKEN=your_actual_token_hereTo get your Hugging Face token:
- Go to Hugging Face Settings
- Click "New token"
- Give it a name (e.g., "RAG-PDF-Reader")
- Select "Read" permissions
- Copy the generated token
Windows (Command Prompt):
set HUGGINGFACEHUB_API_TOKEN=your_actual_token_hereWindows (PowerShell):
$env:HUGGINGFACEHUB_API_TOKEN="your_actual_token_here"macOS/Linux:
export HUGGINGFACEHUB_API_TOKEN="your_actual_token_here"Test if everything is installed correctly:
python -c "import streamlit, langchain, chromadb; print('All packages installed successfully!')"# Make sure you're in the project directory and virtual environment is activated
streamlit run app.pyExpected Output:
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Network URL: http://192.168.1.xxx:8501
- Open your browser and go to
http://localhost:8501 - Click "π€ Initialize Chatbot" (wait for success message)
- Upload a PDF file using the file uploader
- Click "π Load PDF" (wait for processing to complete)
- Ask questions about your PDF in the text area
- Click "π Ask Question" to get answers
# Try these alternatives:
python3 --version
py --version
python3.9 --version# Run as administrator or use:
python -m pip install --user -r requirements.txt# Deactivate current environment
deactivate
# Remove old environment
rm -rf gen_ai # Linux/Mac
rmdir /s gen_ai # Windows
# Create new environment
python -m venv gen_ai_new# Upgrade pip first
pip install --upgrade pip
# Install packages one by one
pip install streamlit
pip install langchain
# ... continue with others# Install system dependencies (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install build-essential
# For macOS
xcode-select --install
# Alternative: Use SQLite backend
pip install chromadb[sqlite]# Check if token is set
python -c "import os; print(os.environ.get('HUGGINGFACEHUB_API_TOKEN', 'Token not found'))"
# Test token validity
python -c "from huggingface_hub import HfApi; api = HfApi(); print('Token valid!' if api.whoami() else 'Invalid token')"- Minimum RAM: 8GB
- Recommended RAM: 16GB+
- Storage: 2GB free space (for model downloads)
If port 8501 is already in use:
streamlit run app.py --server.port 8502To get the latest version:
# Navigate to project directory
cd RAGs-PDF-Reader
# Pull latest changes
git pull origin main
# Update dependencies
pip install -r requirements.txt --upgradeRAGs-PDF-Reader/
βββ app.py # Main Streamlit application
βββ pdf_chatbot.py # Core chatbot class implementation
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (create this)
βββ README.md # This file
βββ chroma_db_temp/ # ChromaDB storage (auto-created)
βββ gen_ai/ # Virtual environment (auto-created)
- Use
pythoninstead ofpython3 - Use backslashes
\in paths - Consider using PowerShell instead of Command Prompt
- May need Visual Studio Build Tools for some packages
- May need to install Xcode Command Line Tools
- Use
python3explicitly - Homebrew can help with system dependencies:
brew install python
- Install system dependencies:
sudo apt-get update sudo apt-get install python3-pip python3-venv build-essential
- Use
python3andpip3explicitly
For consistent environments across all systems:
# Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py"]# Build and run
docker build -t rag-pdf-reader .
docker run -p 8501:8501 -e HUGGINGFACEHUB_API_TOKEN=your_token rag-pdf-readerThis project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for the RAG framework
- Streamlit for the web interface
- Hugging Face for model hosting and APIs
- OpenAI for the GPT-OSS models
- ChromaDB for vector storage
Made with β€οΈ by aaarif796
If this project helped you, please consider giving it a β!
