DocRevAI

DocRevAI is an AI-powered document analysis and question-answering system that enables users to upload PDF documents and ask questions about their content. The application extracts text from PDFs, processes the content, retrieves relevant information using TF-IDF similarity search, and generates context-aware responses using local Large Language Models (LLMs) through Ollama.

Features

PDF document ingestion
Text extraction and preprocessing
Intelligent text chunking
TF-IDF based document retrieval
Context-aware question answering
Local AI inference using Ollama
Modular and scalable architecture
Comprehensive logging and error handling

Project Workflow

PDF Upload
    ↓
Text Extraction
    ↓
Text Cleaning
    ↓
Chunk Creation
    ↓
TF-IDF Vectorization
    ↓
Similarity Search
    ↓
Context Retrieval
    ↓
Ollama LLM
    ↓
Generated Response

Tech Stack

Backend

Python 3.12+
Ollama
Scikit-learn
PyPDF2 / PDF Processing Libraries
Logging Module

AI & Retrieval

TF-IDF Vectorization
Cosine Similarity Search
Local Language Models via Ollama

Development Tools

uv
Git
GitHub
Jira
Pytest

Dependency Management

This project uses uv for package management and virtual environment handling, providing faster dependency resolution and installation compared to traditional pip-based workflows.

Project Structure

DocRevAI/
│
├── docrevai/
│   ├── scripts/
│   │   ├── clean_text.py
│   │   ├── create_chunks.py
│   │   ├── similarity_finder.py
│   │   ├── tf_idf.py
│   │   └── ...
│   │
│   ├── logging/
│   │   └── logger.py
│   │
│   └── ...
│
├── tests/
│
├── logs/
│
├── pyproject.toml
│
├── uv.lock
│
└── README.md

Installation

Clone Repository

git clone https://github.com/Hanan-Nawaz/DocRevAI
cd DocRevAI

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

or

pip3 install uv

Create Virtual Environment

uv venv

Activate environment:

macOS/Linux:

source .venv/bin/activate

Windows:

.venv\Scripts\activate

Install Dependencies

uv sync

Running the Project

uv run main.py

Testing

Run all tests:

uv run pytest

Run tests with coverage:

uv run pytest --cov=docrevai

Logging

DocRevAI includes centralized logging for:

Error tracking
Debugging
System monitoring
Runtime diagnostics

Logs are stored in the project's log directory.

Project Management

The project follows Agile development practices and uses Jira for:

Sprint planning
Task management
Issue tracking
Feature development tracking

Future Enhancements

Semantic search using embeddings
Vector databases (FAISS / ChromaDB)
Multi-document support
Web-based user interface
Conversation memory
Document summarization
Citation and source highlighting
Hybrid retrieval (TF-IDF + Embeddings)

Author

Abdul Hanan Nawaz

License

This project is intended for educational and portfolio purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docrevai		docrevai
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocRevAI

Features

Project Workflow

Tech Stack

Backend

AI & Retrieval

Development Tools

Dependency Management

Project Structure

Installation

Clone Repository

Install uv

Create Virtual Environment

Install Dependencies

Running the Project

Testing

Logging

Project Management

Future Enhancements

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocRevAI

Features

Project Workflow

Tech Stack

Backend

AI & Retrieval

Development Tools

Dependency Management

Project Structure

Installation

Clone Repository

Install uv

Create Virtual Environment

Install Dependencies

Running the Project

Testing

Logging

Project Management

Future Enhancements

Author

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages