Skip to content

Aadityavarier/internshield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

InternShield πŸ›‘οΈ

Protecting Students from Fake Internship Offers

InternShield is a free, AI-powered tool that helps students verify the authenticity of internship and job offer letters. Upload or paste any offer letter and get an instant analysis with confidence score, red flags, and actionable next steps.

🚨 Every year, thousands of students in India fall victim to fake internship offers. Scammers demand registration fees, collect sensitive documents, and waste students' time with non-existent positions. InternShield was built to fight back.


✨ Features

  • Multi-format input β€” Analyze PDFs, images (JPG/PNG), DOCX, TXT, or paste text directly
  • 8-point rule engine β€” Checks for suspicious email domains, fake company names, urgency tactics, implausible stipends, missing fields, and more
  • NLP language analysis β€” Detects fraud indicators and genuine offer patterns using keyword-weighted classification
  • Entity verification β€” Extracts and verifies company names, people, dates, and contacts (spaCy NER or regex fallback)
  • Enriched analysis β€” Optionally provide company name, website, and email for deeper verification
  • Education section β€” Learn how to spot fake offers with a detailed fake vs. genuine comparison
  • Privacy first β€” No signup required, no data stored permanently, fully anonymous
  • Session history β€” Track your past scans within the browser session

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Next.js 16 Frontend   │────▢│    FastAPI Backend       │────▢│   Supabase   β”‚
β”‚   (React 19, TypeScript)│◀────│    (Python 3.9+)        │◀────│  (Optional)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β”‚
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β–Ό         β–Ό         β–Ό
                          Rule Engine    NLP     NER/spaCy
                          (8 rules)   (Keyword   (Entity
                           30%       Analysis)  Extraction)
                                      50%        20%

ML Pipeline

Component Purpose Weight
Rule Engine 8 deterministic structural checks (email domain, stipend, fake companies, missing fields, dates, grammar, urgency, greeting) 30%
NLP Classifier Keyword-weighted language pattern analysis (genuine vs fraud indicators) 50%
NER Extractor Named entity extraction & verification using spaCy or regex fallback (companies, people, dates, contacts) 20%

Scoring

Score Verdict
75–100% βœ… Likely Genuine
45–74% ⚠️ Suspicious
0–44% 🚨 Likely Fake

πŸš€ Quick Start

Prerequisites

  • Python 3.9+ with pip
  • Node.js 18+ with npm

1. Clone the Repository

git clone https://github.com/Aadityavarier/internshield.git
cd internshield

2. Start the Backend

cd backend
python -m venv venv

# Windows
venv\Scripts\activate
# Mac/Linux
# source venv/bin/activate

pip install -r requirements.txt

# Optional: install spaCy for enhanced NER (regex fallback works without it)
# pip install spacy && python -m spacy download en_core_web_sm

# Copy and fill env variables (optional β€” works without Supabase)
copy .env.example .env

# Run the server
uvicorn main:app --reload --port 8000

The backend will start at http://localhost:8000.

3. Start the Frontend

cd frontend
npm install
npm run dev

The frontend will start at http://localhost:3000.

4. Open the App

Navigate to http://localhost:3000 and start verifying offer letters!


πŸ“ Project Structure

internshield/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py                    # FastAPI app, CORS, router mounting
β”‚   β”œβ”€β”€ requirements.txt           # Python dependencies
β”‚   β”œβ”€β”€ .env.example               # Environment variables template
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ known_fake_companies.json
β”‚   β”‚   └── suspicious_domains.json
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── schemas.py             # Pydantic request/response models
β”‚   β”œβ”€β”€ routers/
β”‚   β”‚   └── analyze.py             # API endpoints (/analyze, /result, /history)
β”‚   └── services/
β”‚       β”œβ”€β”€ text_extractor.py      # PDF, image, DOCX, TXT extraction
β”‚       β”œβ”€β”€ rule_engine.py         # 8 rule-based fraud checks
β”‚       β”œβ”€β”€ nlp_classifier.py      # Keyword-weighted NLP classification
β”‚       β”œβ”€β”€ ner_extractor.py       # spaCy NER + regex fallback
β”‚       └── scorer.py              # Weighted ensemble scoring
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ tsconfig.json
β”‚   β”œβ”€β”€ next.config.ts
β”‚   └── src/
β”‚       β”œβ”€β”€ app/
β”‚       β”‚   β”œβ”€β”€ layout.tsx         # Root layout (Navbar + Footer)
β”‚       β”‚   β”œβ”€β”€ globals.css        # Design system (dark mode, glassmorphism)
β”‚       β”‚   β”œβ”€β”€ page.tsx           # Homepage (hero, upload, education, about)
β”‚       β”‚   β”œβ”€β”€ page.module.css    # Homepage styles
β”‚       β”‚   β”œβ”€β”€ history/           # Scan history page
β”‚       β”‚   └── result/[id]/       # Analysis result page
β”‚       β”œβ”€β”€ components/
β”‚       β”‚   β”œβ”€β”€ Navbar.tsx         # Navigation bar
β”‚       β”‚   └── Footer.tsx         # Footer with links & disclaimer
β”‚       └── lib/
β”‚           └── api.ts             # API client + session caching
└── README.md

πŸ”Œ API Endpoints

Method Endpoint Description
POST /api/analyze Analyze an offer letter (file or text + optional company details)
GET /api/result/{scan_id} Get full analysis result for a scan
GET /api/history/{session_id} Get scan history for a browser session
GET /api/health Health check

POST /api/analyze

Form Data:

  • file (optional) β€” PDF, DOCX, image, or TXT file
  • text (optional) β€” Plain text of the offer letter
  • session_id (required) β€” Browser session identifier
  • company_name_input (optional) β€” Company name from the letter
  • company_website (optional) β€” Company website URL
  • contact_email (optional) β€” Contact email from the letter

πŸ’Ύ Database (Optional)

InternShield works fully without a database β€” results are cached in-memory on the server and in sessionStorage on the client. For persistent storage:

  1. Create a project at supabase.com
  2. Run this SQL in the SQL editor:
CREATE TABLE scans (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  created_at TIMESTAMPTZ DEFAULT now(),
  input_type TEXT,
  extracted_text TEXT,
  confidence_score NUMERIC,
  verdict TEXT,
  dimension_scores JSONB,
  triggered_flags JSONB,
  next_steps JSONB,
  company_name TEXT,
  session_id TEXT,
  file_hash TEXT,
  extraction_method TEXT,
  processing_time_ms INT,
  model_version TEXT DEFAULT 'v1.0'
);

CREATE INDEX idx_scans_session_id ON scans(session_id);
CREATE INDEX idx_scans_file_hash ON scans(file_hash);
  1. Add your credentials to backend/.env:
SUPABASE_URL=your_project_url
SUPABASE_KEY=your_anon_key

πŸ› οΈ Tech Stack

Layer Technology
Frontend Next.js 16, React 19, TypeScript, CSS Modules
Backend FastAPI, Python 3.9+, Pydantic v2
ML/NLP Keyword-weighted NLP classifier, regex-based NER, spaCy (optional), textstat, rapidfuzz
OCR Tesseract (optional), pdfplumber, python-docx, Pillow
Database Supabase/PostgreSQL (optional β€” works without it)
Design Dark mode, glassmorphism, Inter font, micro-animations

πŸ“„ License

This project is licensed under the MIT License.


⚠️ Disclaimer

InternShield provides automated analysis and should not be treated as legal advice. Always independently verify offers through official channels. If you suspect fraud, report it at cybercrime.gov.in.

About

πŸ›‘οΈ AI-powered fake internship offer letter detector β€” Protects students from internship fraud using an 8-rule engine, NLP classification, and entity verification. Built with Next.js, FastAPI & Python.

Resources

License

Stars

Watchers

Forks

Contributors