InternHunt

AI-Powered Resume Analysis & Internship Matching Platform

Streamline your internship search using NLP, Machine Learning, and Real-Time Web Scraping.

Quick Links:
Landing Page (Vercel) • Streamlit Web App

📸 Screenshots

💡 Want to see it in action? Check out the Live Demo!

🏠 Landing Page (Vercel)

Beautiful Figma-designed landing page hosted on Vercel

🚀 Main Application Portal (Streamlit)

Core interactive app landing page on Streamlit Cloud

💼 Resume Analysis

Smart resume parsing and classification

🤖 AI Career Assistant

Powered by Google Gemini for personalized career guidance

🎓 Course Recommendations

Tailored learning paths based on your profile

🔍 Job Search

Real-time internship opportunities matching your skills (scraped from multiple platforms like Internshala, Remotive, and Jooble)

🔐 Admin Dashboard

User management and analytics with premium Plotly interactive visualizations

🎯 Key Features

📋 Complete Resume Analysis Pipeline

Stage	What Happens
1. Upload & Parse	PDF/DOCX support (50MB limit), text extraction via `pypdf` + `python-docx`
2. Skill Detection	100+ technical skills via spaCy NLP + fuzzy matching (e.g., `k8s` → `Kubernetes`)
3. Role Prediction	MLP Neural Network (TF-IDF → 128/64 hidden layers) → Top 3 roles with confidence
4. ATS Scoring	5-factor breakdown: Content (50%), Formatting (15%), Keywords (20%), Experience (10%), Readability (5%)
5. Recommendations	Live internships (Jooble, Internshala, Remotive, GitHub) + curated courses
6. AI Assistant	Gemini 1.5 Flash with full resume context for personalized career coaching

⚙️ System Architecture & Workflow

InternHunt processes resume uploads, evaluates scoring, logs user details to the database, queries external scrapers, and initializes the chatbot context in a highly structured pipeline:

Workflow Breakdown:

Extraction & Preprocessing: The uploaded document is parsed, stripped of raw formatting, and cleaned of non-ASCII symbols and excessive spacing.
Parallel Feature Extraction:
- Regex filters extract email, phone, and profile URLs.
- spaCy matcher extracts normalized skill sets.
- TF-IDF Vectorizer maps the cleaned text to 2500 terms, which are classified by the MLPClassifier to predict the candidate's career role.
ATS Assessment: Core sections are checked, and missing skills are highlighted by matching actual skills against predicted career role profiles.
Data Persistence: Candidate profiles and computed stats are logged to Neon serverless PostgreSQL for admin audit and analytics.
Recommendation & Assistant Routing: Matched courses (from Courses.py) and job listings are displayed, and a detailed profile is built and loaded into the Google Gemini system instructions so the sidebar assistant chatbot can answer resume-specific questions.

🤖 Machine Learning Model

InternHunt utilizes a custom-trained Multi-Layer Perceptron (MLP) Neural Network classifier to automatically route resumes to 25 distinct job roles.

Model Architecture:

Algorithm: Multi-Layer Perceptron Classifier (scikit-learn)
Vectorization: TF-IDF (Term Frequency-Inverse Document Frequency)
Pipeline: TfidfVectorizer (ngram_range=(1, 2), max_features=2500) → MLPClassifier (128, 64 hidden nodes)
File: resume_classifier_v3_skills_mlp.pkl (10.1 MB)
Training Data: UpdatedResumeDataSet.csv (166 unique deduplicated samples to prevent training bias)

Model Performance:

Metric	Score
Test Accuracy	85.29%
Precision	88.2% (weighted avg)
Recall	85.3% (weighted avg)
F1-Score	81.9% (weighted avg)
Cross-Validation	81.69% ± 0.85% (3-fold Stratified)

Training Configuration:

Pipeline([
    ('tfidf', TfidfVectorizer(
        max_features=2500,        # Vocabulary size limit
        ngram_range=(1, 2),       # Unigrams & bigrams
        min_df=1,
        max_df=0.95,              
        stop_words='english',     
        lowercase=True            
    )),
    ('classifier', MLPClassifier(
        hidden_layer_sizes=(128, 64),
        activation='relu',
        solver='adam',
        alpha=0.1,
        learning_rate_init=0.001,
        max_iter=1000,
        early_stopping=False,
        random_state=42
    ))
])

🌐 Job Sources & Recommendations

InternHunt combines APIs + custom scrapers for comprehensive coverage:

Source	Type	Coverage	Key Details
Internshala	HTML Scraper	India-focused internships	Keyword-based URLs, parses stipend, duration, apply links
Remotive	REST API	Global remote dev jobs	Free public endpoint, filtered by top 5 skills
Jooble	REST API	Global job search	Requires `JOOBLE_API_KEY`, POST with keywords + location
GitHub	HTML Scraper	Hiring/Internship repos	Searches repos with `topic:hiring` or `topic:internship`

💾 Database & Persistence

InternHunt supports dual persistence schemes for local development and cloud production:

Dual persistence for local dev + cloud production:

Database	Use Case	Driver
Neon PostgreSQL	Production (Streamlit Cloud)	`psycopg2` via `DATABASE_URL`
MySQL	Local development fallback	`pymysql` via env vars

Logged User Registry Schema:

CREATE TABLE IF NOT EXISTS user_data (
    ID SERIAL PRIMARY KEY,
    Name VARCHAR(500) NOT NULL,
    Email_ID VARCHAR(500) NOT NULL,
    resume_score VARCHAR(8) NOT NULL,
    Timestamp VARCHAR(50) NOT NULL,
    Page_no VARCHAR(5) NOT NULL,
    Predicted_Field TEXT NOT NULL,
    User_level TEXT NOT NULL,
    Actual_skills TEXT NOT NULL,
    Recommended_skills TEXT NOT NULL,
    Recommended_courses TEXT NOT NULL
);

🚀 Installation

Prerequisites

Python 3.9 or higher
pip package manager
Google Gemini API key (Get one here)

Steps

Clone the repository

git clone https://github.com/ShubhamSnSharma/InternHunt.git
cd InternHunt

Create virtual environment

python -m venv venv

# On Windows
venv\Scripts\activate

# On macOS/Linux
source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Download NLTK data (Required for NLP)

python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"

Set up environment variables Create a .env file in the root directory:

# Google Gemini API
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-1.5-flash

# Neon Database (PostgreSQL)
DATABASE_URL=postgresql://user:password@host.neon.tech/dbname?sslmode=require

Run the application

streamlit run App.py

The app will open in your browser at http://localhost:8501 🎉

📁 Project Structure

InternHunt/
├── 📄 App.py                                   # Main Streamlit application entry point
├── 🎨 styles.py                                # Centralized UI styling and themes
├── 🤖 chat_service.py                          # Gemini AI chatbot service
├── 📝 resume_parser.py                         # Resume parsing & NLP analysis
├── ⚙️ config.py                                # Configuration management
├── 🛠️ utils.py                                 # Utility functions
├── 💾 database.py                              # Neon PostgreSQL database operations
├── 🌐 api_services.py                          # External API integrations (Jooble)
├── 🔍 job_scrapers.py                          # Job scraping (Internshala)
├── ⚠️ error_handler.py                         # Error handling & logging
├── 📚 Courses.py                               # Course recommendation engine
│
├── 🤖 resume_classifier_v3_skills_mlp.pkl      # Upgraded ML model (TF-IDF + MLP, 10.1 MB)
├── ⚙️ soft_skill_role_trainer.py               # Local model training script
├── 📊 UpdatedResumeDataSet.csv                 # Training dataset (166 deduplicated unique samples)
├── 📓 ResumeClassification_Model.ipynb         # Exploration model notebook
│
├── 📋 requirements.txt                         # Python dependencies
├── 📖 README.md                                # Project documentation
├── 📜 LICENSE                                  # MIT License
├── 🔒 PRIVACY.md                               # Privacy policy
├── 🔐 .env.example                             # Environment variables template
├── 🚫 .gitignore                               # Git ignore rules
│        
├── 📁 .streamlit/                              # Streamlit configuration
│   ├── config.toml                             # App configuration
│   └── secrets.toml.example                    # Secrets template
│        
├── 🔤 nevera_font/                             # Custom Nevera font files
│   ├── Nevera-Bold.ttf
│   ├── Nevera-Regular.ttf
│   └── Nevera-Light.ttf
│
├── 📂 Uploaded_Resumes/                        # User uploaded resume storage
│   └── .gitkeep                                # Preserve directory in Git
│        
└── 📁 screenshots/                             # Application screenshots for README

🧪 Development & Testing

Run Model Training (Optional)

# Retrain the classifier with new data
python soft_skill_role_trainer.py

Check Gemini Connection

python -c "from chat_service import check_gemini_health; print(check_gemini_health())"

Linting & Formatting

# If you have ruff/black configured
ruff check .
black .

🤝 Contributing

We welcome contributions! Here's how to get started:

Fork the repository
Create a feature branch — git checkout -b feature/amazing-feature
Commit changes — git commit -m 'Add amazing feature'
Push to branch — git push origin feature/amazing-feature
Open a Pull Request

Ideas for Contribution Ideas:

Add new job sources (LinkedIn, Indeed, Wellfound)
Improve resume parsing for DOCX/images
Add more ML roles or fine-tune the classifier
Enhance UI/UX with new visualizations
Write tests for core modules

👨‍💻 Author

Shubham Sharma

GitHub: @ShubhamSnSharma
Project: InternHunt

🙏 Acknowledgments

Google Gemini — Conversational AI capabilities
Streamlit — Beautiful web framework for data apps
Internshala — Internship listings for Indian students
Remotive — Free remote job API
Jooble — Global job search API
scikit-learn — Machine learning toolkit
spaCy — Industrial-strength NLP
All open-source contributors ❤️

⭐ Star this repo if you found it helpful!

Made with ❤️ by students, for students

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.streamlit		.streamlit
Uploaded_Resumes		Uploaded_Resumes
nevera_font		nevera_font
screenshots		screenshots
.env.example		.env.example
.gitignore		.gitignore
App.py		App.py
Courses.py		Courses.py
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
README.md		README.md
ResumeClassification_Model.ipynb		ResumeClassification_Model.ipynb
api_services.py		api_services.py
chat_service.py		chat_service.py
config.py		config.py
database.py		database.py
error_handler.py		error_handler.py
job_scrapers.py		job_scrapers.py
requirements.txt		requirements.txt
resume_classifier_v3_skills_mlp.pkl		resume_classifier_v3_skills_mlp.pkl
resume_parser.py		resume_parser.py
soft_skill_role_trainer.py		soft_skill_role_trainer.py
styles.py		styles.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

InternHunt

📸 Screenshots

🏠 Landing Page (Vercel)

🚀 Main Application Portal (Streamlit)

💼 Resume Analysis

🤖 AI Career Assistant

🎓 Course Recommendations

🔍 Job Search

🔐 Admin Dashboard

🎯 Key Features

📋 Complete Resume Analysis Pipeline

⚙️ System Architecture & Workflow

Workflow Breakdown:

🤖 Machine Learning Model

Model Architecture:

Model Performance:

Training Configuration:

🌐 Job Sources & Recommendations

💾 Database & Persistence

Logged User Registry Schema:

🚀 Installation

Prerequisites

Steps

📁 Project Structure

🧪 Development & Testing

Run Model Training (Optional)

Check Gemini Connection

Linting & Formatting

🤝 Contributing

Ideas for Contribution Ideas:

👨‍💻 Author

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages