Protecting Students from Fake Internship Offers
InternShield is a free, AI-powered tool that helps students verify the authenticity of internship and job offer letters. Upload or paste any offer letter and get an instant analysis with confidence score, red flags, and actionable next steps.
π¨ Every year, thousands of students in India fall victim to fake internship offers. Scammers demand registration fees, collect sensitive documents, and waste students' time with non-existent positions. InternShield was built to fight back.
- Multi-format input β Analyze PDFs, images (JPG/PNG), DOCX, TXT, or paste text directly
- 8-point rule engine β Checks for suspicious email domains, fake company names, urgency tactics, implausible stipends, missing fields, and more
- NLP language analysis β Detects fraud indicators and genuine offer patterns using keyword-weighted classification
- Entity verification β Extracts and verifies company names, people, dates, and contacts (spaCy NER or regex fallback)
- Enriched analysis β Optionally provide company name, website, and email for deeper verification
- Education section β Learn how to spot fake offers with a detailed fake vs. genuine comparison
- Privacy first β No signup required, no data stored permanently, fully anonymous
- Session history β Track your past scans within the browser session
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ ββββββββββββββββ
β Next.js 16 Frontend ββββββΆβ FastAPI Backend ββββββΆβ Supabase β
β (React 19, TypeScript)βββββββ (Python 3.9+) βββββββ (Optional) β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ ββββββββββββββββ
β
βββββββββββΌββββββββββ
βΌ βΌ βΌ
Rule Engine NLP NER/spaCy
(8 rules) (Keyword (Entity
30% Analysis) Extraction)
50% 20%
| Component | Purpose | Weight |
|---|---|---|
| Rule Engine | 8 deterministic structural checks (email domain, stipend, fake companies, missing fields, dates, grammar, urgency, greeting) | 30% |
| NLP Classifier | Keyword-weighted language pattern analysis (genuine vs fraud indicators) | 50% |
| NER Extractor | Named entity extraction & verification using spaCy or regex fallback (companies, people, dates, contacts) | 20% |
| Score | Verdict |
|---|---|
| 75β100% | β Likely Genuine |
| 45β74% | |
| 0β44% | π¨ Likely Fake |
- Python 3.9+ with pip
- Node.js 18+ with npm
git clone https://github.com/Aadityavarier/internshield.git
cd internshieldcd backend
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
# source venv/bin/activate
pip install -r requirements.txt
# Optional: install spaCy for enhanced NER (regex fallback works without it)
# pip install spacy && python -m spacy download en_core_web_sm
# Copy and fill env variables (optional β works without Supabase)
copy .env.example .env
# Run the server
uvicorn main:app --reload --port 8000The backend will start at http://localhost:8000.
cd frontend
npm install
npm run devThe frontend will start at http://localhost:3000.
Navigate to http://localhost:3000 and start verifying offer letters!
internshield/
βββ backend/
β βββ main.py # FastAPI app, CORS, router mounting
β βββ requirements.txt # Python dependencies
β βββ .env.example # Environment variables template
β βββ data/
β β βββ known_fake_companies.json
β β βββ suspicious_domains.json
β βββ models/
β β βββ schemas.py # Pydantic request/response models
β βββ routers/
β β βββ analyze.py # API endpoints (/analyze, /result, /history)
β βββ services/
β βββ text_extractor.py # PDF, image, DOCX, TXT extraction
β βββ rule_engine.py # 8 rule-based fraud checks
β βββ nlp_classifier.py # Keyword-weighted NLP classification
β βββ ner_extractor.py # spaCy NER + regex fallback
β βββ scorer.py # Weighted ensemble scoring
βββ frontend/
β βββ package.json
β βββ tsconfig.json
β βββ next.config.ts
β βββ src/
β βββ app/
β β βββ layout.tsx # Root layout (Navbar + Footer)
β β βββ globals.css # Design system (dark mode, glassmorphism)
β β βββ page.tsx # Homepage (hero, upload, education, about)
β β βββ page.module.css # Homepage styles
β β βββ history/ # Scan history page
β β βββ result/[id]/ # Analysis result page
β βββ components/
β β βββ Navbar.tsx # Navigation bar
β β βββ Footer.tsx # Footer with links & disclaimer
β βββ lib/
β βββ api.ts # API client + session caching
βββ README.md
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/analyze |
Analyze an offer letter (file or text + optional company details) |
GET |
/api/result/{scan_id} |
Get full analysis result for a scan |
GET |
/api/history/{session_id} |
Get scan history for a browser session |
GET |
/api/health |
Health check |
Form Data:
file(optional) β PDF, DOCX, image, or TXT filetext(optional) β Plain text of the offer lettersession_id(required) β Browser session identifiercompany_name_input(optional) β Company name from the lettercompany_website(optional) β Company website URLcontact_email(optional) β Contact email from the letter
InternShield works fully without a database β results are cached in-memory on the server and in sessionStorage on the client. For persistent storage:
- Create a project at supabase.com
- Run this SQL in the SQL editor:
CREATE TABLE scans (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
created_at TIMESTAMPTZ DEFAULT now(),
input_type TEXT,
extracted_text TEXT,
confidence_score NUMERIC,
verdict TEXT,
dimension_scores JSONB,
triggered_flags JSONB,
next_steps JSONB,
company_name TEXT,
session_id TEXT,
file_hash TEXT,
extraction_method TEXT,
processing_time_ms INT,
model_version TEXT DEFAULT 'v1.0'
);
CREATE INDEX idx_scans_session_id ON scans(session_id);
CREATE INDEX idx_scans_file_hash ON scans(file_hash);- Add your credentials to
backend/.env:
SUPABASE_URL=your_project_url
SUPABASE_KEY=your_anon_key
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React 19, TypeScript, CSS Modules |
| Backend | FastAPI, Python 3.9+, Pydantic v2 |
| ML/NLP | Keyword-weighted NLP classifier, regex-based NER, spaCy (optional), textstat, rapidfuzz |
| OCR | Tesseract (optional), pdfplumber, python-docx, Pillow |
| Database | Supabase/PostgreSQL (optional β works without it) |
| Design | Dark mode, glassmorphism, Inter font, micro-animations |
This project is licensed under the MIT License.
InternShield provides automated analysis and should not be treated as legal advice. Always independently verify offers through official channels. If you suspect fraud, report it at cybercrime.gov.in.