Fake Review Detection System

A robust Python-based system to combat fraudulent reviews on ecommerce platforms by automatically scraping, classifying, flagging, and facilitating the removal of fake product reviews.

Features

Core Capabilities

Web Scraper: Extracts reviews from Amazon, Flipkart, and other platforms
NLP Pipeline: Advanced text preprocessing with BERT embeddings
ML Classifier: Ensemble models (Random Forest, XGBoost, SVM) for fake review detection
Automated Flagging: Trust scoring, pattern detection, and suspicious behavior clustering
API Integration: Automated deletion requests with approval workflow
Admin Dashboard: Streamlit-based interface for review management

Advanced Features

Sentiment analysis with rating correlation
IP/User behavior tracking and clustering
Explainable AI with reason codes for predictions
Real-time review validation API
Comprehensive test coverage

Tech Stack

Backend: FastAPI
ML/NLP: scikit-learn, XGBoost, transformers (BERT)
Web Scraping: Selenium, BeautifulSoup, Playwright
Database: PostgreSQL with SQLAlchemy ORM
Dashboard: Streamlit
Deployment: Docker, Docker Compose

Quick Start

Prerequisites

Python 3.9+
PostgreSQL 13+
Chrome/Chromium (for Selenium)

Installation

# Clone repository
git clone https://github.com/yourusername/fake-review-detector.git
cd fake-review-detector

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your configuration

# Initialize database
python scripts/init_db.py

# Run migrations
alembic upgrade head

Running the Application

# Start the API server
uvicorn app.main:app --reload --port 8000

# Start the Streamlit dashboard (separate terminal)
streamlit run dashboard/app.py

# Run scraper
python -m app.scraper.main --platform amazon --product-url "..."

Docker Deployment

# Build and run all services
docker-compose up -d

# Access services
# API: http://localhost:8000
# Dashboard: http://localhost:8501
# PostgreSQL: localhost:5432

Streamlit Cloud Deployment

The dashboard can be deployed to Streamlit Cloud with the following configuration:

Prerequisites

GitHub repository with your code
Streamlit Cloud account (https://streamlit.io/cloud)
Backend API deployed and publicly accessible

Configuration Steps

Configure Secrets: In your Streamlit Cloud app settings, add the following secrets:

# .streamlit/secrets.toml
API_URL = "https://your-backend-api.example.com/api"

Verify Requirements: Ensure requirements.txt includes:

streamlit>=1.28.0
httpx>=0.25.0
tenacity>=8.2.0
nest-asyncio>=1.5.0
pandas>=2.1.0
plotly>=5.18.0
loguru>=0.7.0

Runtime Configuration: Create .streamlit/runtime.txt:

python-3.11.16

Deploy:
- Go to https://share.streamlit.io/
- Click "New app"
- Select your repository
- Set main file path: dashboard/app.py
- Click "Deploy"

Important Notes

API URL: The dashboard requires a publicly accessible backend API. Set the API_URL secret to your production API endpoint.
Health Check: The dashboard performs a health check on startup. If the backend is unreachable, a banner will be displayed with a retry button.
CPU-Only Torch: The requirements are configured for CPU-only PyTorch to ensure compatibility with Streamlit Cloud's environment.
No Browser Automation: Selenium and Playwright are disabled in the cloud requirements as they require system-level browser binaries.
File Size Limits: CSV uploads are limited to 50MB. Adjust MAX_UPLOAD_SIZE_MB in app/utils.py if needed.

Troubleshooting

Backend Connection Issues:

Verify the API_URL secret is set correctly
Ensure the backend API is publicly accessible and not blocked by CORS
Check backend health endpoint: https://your-api.example.com/api/admin/health

Dependency Errors:

Ensure runtime.txt specifies Python 3.11.16
Verify all packages in requirements.txt are compatible with the Python version
Check Streamlit Cloud build logs for specific error messages

Memory Issues:

Disable BERT embeddings in Settings if experiencing memory errors
Consider using the minimal requirements file: requirements-streamlit-minimal.txt

For detailed deployment instructions, see STREAMLIT_DEPLOYMENT.md.

Usage

1. Scrape Reviews

from app.scraper import AmazonScraper

scraper = AmazonScraper()
reviews = scraper.scrape_product("https://www.amazon.com/product/...")

2. Classify Reviews

from app.classifier import FakeReviewClassifier

classifier = FakeReviewClassifier()
result = classifier.predict(review_text)
print(f"Fake probability: {result['fake_probability']}")
print(f"Reasons: {result['reasons']}")

3. API Usage

# Check single review
curl -X POST "http://localhost:8000/api/reviews/check" \
  -H "Content-Type: application/json" \
  -d '{"text": "Amazing product!", "rating": 5}'

# Batch processing
curl -X POST "http://localhost:8000/api/reviews/batch" \
  -H "Content-Type: application/json" \
  -F "file=@reviews.csv"

Project Structure

fake-review-detector/
├── app/
│   ├── __init__.py
│   ├── main.py                 # FastAPI application
│   ├── config.py               # Configuration management
│   ├── database.py             # Database connection
│   ├── models/                 # SQLAlchemy models
│   ├── scraper/                # Web scraping modules
│   ├── preprocessing/          # Text preprocessing pipeline
│   ├── classifier/             # ML models and training
│   ├── flagging/               # Flagging and alert system
│   ├── api_integration/        # Platform API integration
│   └── routers/                # API endpoints
├── dashboard/
│   └── app.py                  # Streamlit dashboard
├── tests/
│   ├── test_scraper.py
│   ├── test_classifier.py
│   └── test_api.py
├── scripts/
│   ├── init_db.py              # Database initialization
│   └── train_model.py          # Model training script
├── data/
│   ├── raw/                    # Scraped reviews
│   ├── processed/              # Cleaned data
│   └── models/                 # Trained model artifacts
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md

Model Performance

Model	Accuracy	Precision	Recall	F1-Score
Random Forest	94.2%	93.8%	94.5%	94.1%
XGBoost	95.1%	94.9%	95.3%	95.1%
SVM	92.7%	92.1%	93.2%	92.6%
Ensemble	96.3%	96.1%	96.5%	96.3%

API Documentation

Once the server is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=app --cov-report=html

# Run specific test suite
pytest tests/test_classifier.py -v

Contributing

Fork the repository
Create feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open Pull Request

License

@Mayank-iitj

Acknowledgments

Dataset: Amazon Review Dataset, Fake Review Corpus
Pre-trained models: HuggingFace Transformers
Libraries: scikit-learn, XGBoost, Selenium

Contact

Project Link: https://github.com/yourusername/fake-review-detector

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
app		app
dashboard		dashboard
data		data
models		models
notebooks		notebooks
scripts		scripts
src		src
tests		tests
visualizations		visualizations
.env.example		.env.example
.env.production		.env.production
.gitignore		.gitignore
.verify_imports.py		.verify_imports.py
API_GUIDE.md		API_GUIDE.md
ARCHITECTURE_DIAGRAM.txt		ARCHITECTURE_DIAGRAM.txt
COMPLETION_CERTIFICATE.txt		COMPLETION_CERTIFICATE.txt
DASHBOARD_REFACTORING_COMPLETE.md		DASHBOARD_REFACTORING_COMPLETE.md
DELIVERABLES.md		DELIVERABLES.md
DEPLOYMENT.md		DEPLOYMENT.md
DEPLOYMENT_CHECKLIST.md		DEPLOYMENT_CHECKLIST.md
DEPLOYMENT_DEBUG.md		DEPLOYMENT_DEBUG.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
DEPLOYMENT_READY.md		DEPLOYMENT_READY.md
DEPLOYMENT_STATUS.md		DEPLOYMENT_STATUS.md
Dockerfile		Dockerfile
FEATURE_SUMMARY.md		FEATURE_SUMMARY.md
FINAL_PROJECT_REPORT.md		FINAL_PROJECT_REPORT.md
FIXED_ISSUES.md		FIXED_ISSUES.md
FIX_NOTES.txt		FIX_NOTES.txt
GETTING_STARTED.md		GETTING_STARTED.md
HEROKU_DEPLOYMENT.md		HEROKU_DEPLOYMENT.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
INDEX.html		INDEX.html
LICENSE		LICENSE
MASTER_INDEX.md		MASTER_INDEX.md
MODEL_EVALUATION.md		MODEL_EVALUATION.md
PRODUCTION_READY.md		PRODUCTION_READY.md
PROJECT_COMPLETION_SUMMARY.md		PROJECT_COMPLETION_SUMMARY.md
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
Procfile		Procfile
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
START_HERE.md		START_HERE.md
STREAMLIT_DEPLOYMENT.md		STREAMLIT_DEPLOYMENT.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
URL_ANALYSIS_GUIDE.md		URL_ANALYSIS_GUIDE.md
USAGE_GUIDE.md		USAGE_GUIDE.md
VERIFICATION_COMPLETE.txt		VERIFICATION_COMPLETE.txt
app.py		app.py
demo_analysis_results.csv		demo_analysis_results.csv
demo_url_analysis.py		demo_url_analysis.py
deploy-check.bat		deploy-check.bat
deploy-check.sh		deploy-check.sh
docker-compose.yml		docker-compose.yml
download_nltk_data.py		download_nltk_data.py
main.py		main.py
quickstart.ps1		quickstart.ps1
quickstart.sh		quickstart.sh
requirements-full.txt		requirements-full.txt
requirements-streamlit-minimal.txt		requirements-streamlit-minimal.txt
requirements-streamlit.txt		requirements-streamlit.txt
requirements.txt		requirements.txt
runtime.txt		runtime.txt
setup.bat		setup.bat
setup.py		setup.py
setup.sh		setup.sh
test_setup.py		test_setup.py
validate_refactoring.py		validate_refactoring.py
~$README.md		~$README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake Review Detection System

Features

Core Capabilities

Advanced Features

Tech Stack

Quick Start

Prerequisites

Installation

Running the Application

Docker Deployment

Streamlit Cloud Deployment

Prerequisites

Configuration Steps

Important Notes

Troubleshooting

Usage

1. Scrape Reviews

2. Classify Reviews

3. API Usage

Project Structure

Model Performance

API Documentation

Testing

Contributing

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fake Review Detection System

Features

Core Capabilities

Advanced Features

Tech Stack

Quick Start

Prerequisites

Installation

Running the Application

Docker Deployment

Streamlit Cloud Deployment

Prerequisites

Configuration Steps

Important Notes

Troubleshooting

Usage

1. Scrape Reviews

2. Classify Reviews

3. API Usage

Project Structure

Model Performance

API Documentation

Testing

Contributing

License

Acknowledgments

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages