DocQuery is an intelligent document question-answering system that allows users to upload PDF documents and ask questions in natural language. The system uses advanced NLP techniques to provide accurate answers with page citations.
- 📚 PDF Upload & Processing - Upload multiple PDF documents
- 🤖 AI-Powered Answers - Uses FLAN-T5 for accurate responses
- 📍 Page Citations - Answers include exact page numbers
- 🌙 Dark Mode - Toggle between light and dark themes
- 📋 Copy Answers - One-click copy to clipboard
- 📥 Export Q&A - Save conversations as text files
- 💡 Question Suggestions - AI-generated follow-up questions
- 📊 Document Comparison - Compare answers across documents
- Frontend: Streamlit
- PDF Processing: PyPDF
- Text Chunking: LangChain
- Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
- Vector Store: FAISS
- LLM: FLAN-T5-small
git clone https://github.com/DishaAgarwalla/DocQuery.git
cd DocQuery
pip install -r requirements.txt
streamlit run streamlit_app.py- Upload PDF files via sidebar
- Click "Process New Documents"
- Wait for processing to complete
- Ask questions in natural language
- Get answers with page citations
- Streamlit for the amazing framework
- Hugging Face for FLAN-T5 model
- FAISS for vector similarity search
Disha Agarwalla
- GitHub: @DishaAgarwalla
- Project Link: https://github.com/DishaAgarwalla/DocQuery
Made with ❤️ by Disha Agarwalla