Skip to content

ShubhamSnSharma/InternHunt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

InternHunt

AI-Powered Resume Analysis & Internship Matching Platform

Landing Page Web App Classifier Accuracy

Streamline your internship search using NLP, Machine Learning, and Real-Time Web Scraping.


Quick Links:
Landing Page (Vercel) Β β€’Β  Streamlit Web App


πŸ“Έ Screenshots

πŸ’‘ Want to see it in action? Check out the Live Demo!

🏠 Landing Page (Vercel)

Landing Page Beautiful Figma-designed landing page hosted on Vercel

πŸš€ Main Application Portal (Streamlit)

Main App Landing Page Core interactive app landing page on Streamlit Cloud

πŸ’Ό Resume Analysis

Resume Upload Smart resume parsing and classification

πŸ€– AI Career Assistant

AI Chatbot Powered by Google Gemini for personalized career guidance

πŸŽ“ Course Recommendations

Course Recommendations Tailored learning paths based on your profile

πŸ” Job Search

Job Search 1 Job Search 2 Real-time internship opportunities matching your skills (scraped from multiple platforms like Internshala, Remotive, and Jooble)

πŸ” Admin Dashboard

Admin Dashboard User management and analytics with premium Plotly interactive visualizations


🎯 Key Features

πŸ“‹ Complete Resume Analysis Pipeline

Stage What Happens
1. Upload & Parse PDF/DOCX support (50MB limit), text extraction via pypdf + python-docx
2. Skill Detection 100+ technical skills via spaCy NLP + fuzzy matching (e.g., k8s β†’ Kubernetes)
3. Role Prediction MLP Neural Network (TF-IDF β†’ 128/64 hidden layers) β†’ Top 3 roles with confidence
4. ATS Scoring 5-factor breakdown: Content (50%), Formatting (15%), Keywords (20%), Experience (10%), Readability (5%)
5. Recommendations Live internships (Jooble, Internshala, Remotive, GitHub) + curated courses
6. AI Assistant Gemini 1.5 Flash with full resume context for personalized career coaching

βš™οΈ System Architecture & Workflow

InternHunt processes resume uploads, evaluates scoring, logs user details to the database, queries external scrapers, and initializes the chatbot context in a highly structured pipeline:

System Workflow Diagram

Workflow Breakdown:

  1. Extraction & Preprocessing: The uploaded document is parsed, stripped of raw formatting, and cleaned of non-ASCII symbols and excessive spacing.
  2. Parallel Feature Extraction:
    • Regex filters extract email, phone, and profile URLs.
    • spaCy matcher extracts normalized skill sets.
    • TF-IDF Vectorizer maps the cleaned text to 2500 terms, which are classified by the MLPClassifier to predict the candidate's career role.
  3. ATS Assessment: Core sections are checked, and missing skills are highlighted by matching actual skills against predicted career role profiles.
  4. Data Persistence: Candidate profiles and computed stats are logged to Neon serverless PostgreSQL for admin audit and analytics.
  5. Recommendation & Assistant Routing: Matched courses (from Courses.py) and job listings are displayed, and a detailed profile is built and loaded into the Google Gemini system instructions so the sidebar assistant chatbot can answer resume-specific questions.

πŸ€– Machine Learning Model

InternHunt utilizes a custom-trained Multi-Layer Perceptron (MLP) Neural Network classifier to automatically route resumes to 25 distinct job roles.

Model Architecture:

  • Algorithm: Multi-Layer Perceptron Classifier (scikit-learn)
  • Vectorization: TF-IDF (Term Frequency-Inverse Document Frequency)
  • Pipeline: TfidfVectorizer (ngram_range=(1, 2), max_features=2500) β†’ MLPClassifier (128, 64 hidden nodes)
  • File: resume_classifier_v3_skills_mlp.pkl (10.1 MB)
  • Training Data: UpdatedResumeDataSet.csv (166 unique deduplicated samples to prevent training bias)

Model Performance:

Metric Score
Test Accuracy 85.29%
Precision 88.2% (weighted avg)
Recall 85.3% (weighted avg)
F1-Score 81.9% (weighted avg)
Cross-Validation 81.69% Β± 0.85% (3-fold Stratified)

Training Configuration:

Pipeline([
    ('tfidf', TfidfVectorizer(
        max_features=2500,        # Vocabulary size limit
        ngram_range=(1, 2),       # Unigrams & bigrams
        min_df=1,
        max_df=0.95,              
        stop_words='english',     
        lowercase=True            
    )),
    ('classifier', MLPClassifier(
        hidden_layer_sizes=(128, 64),
        activation='relu',
        solver='adam',
        alpha=0.1,
        learning_rate_init=0.001,
        max_iter=1000,
        early_stopping=False,
        random_state=42
    ))
])

🌐 Job Sources & Recommendations

InternHunt combines APIs + custom scrapers for comprehensive coverage:

Source Type Coverage Key Details
Internshala HTML Scraper India-focused internships Keyword-based URLs, parses stipend, duration, apply links
Remotive REST API Global remote dev jobs Free public endpoint, filtered by top 5 skills
Jooble REST API Global job search Requires JOOBLE_API_KEY, POST with keywords + location
GitHub HTML Scraper Hiring/Internship repos Searches repos with topic:hiring or topic:internship

πŸ’Ύ Database & Persistence

InternHunt supports dual persistence schemes for local development and cloud production:

Dual persistence for local dev + cloud production:

Database Use Case Driver
Neon PostgreSQL Production (Streamlit Cloud) psycopg2 via DATABASE_URL
MySQL Local development fallback pymysql via env vars

Logged User Registry Schema:

CREATE TABLE IF NOT EXISTS user_data (
    ID SERIAL PRIMARY KEY,
    Name VARCHAR(500) NOT NULL,
    Email_ID VARCHAR(500) NOT NULL,
    resume_score VARCHAR(8) NOT NULL,
    Timestamp VARCHAR(50) NOT NULL,
    Page_no VARCHAR(5) NOT NULL,
    Predicted_Field TEXT NOT NULL,
    User_level TEXT NOT NULL,
    Actual_skills TEXT NOT NULL,
    Recommended_skills TEXT NOT NULL,
    Recommended_courses TEXT NOT NULL
);

πŸš€ Installation

Prerequisites

  • Python 3.9 or higher
  • pip package manager
  • Google Gemini API key (Get one here)

Steps

  1. Clone the repository
git clone https://github.com/ShubhamSnSharma/InternHunt.git
cd InternHunt
  1. Create virtual environment
python -m venv venv

# On Windows
venv\Scripts\activate

# On macOS/Linux
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Download NLTK data (Required for NLP)
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
  1. Set up environment variables Create a .env file in the root directory:
# Google Gemini API
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-1.5-flash

# Neon Database (PostgreSQL)
DATABASE_URL=postgresql://user:password@host.neon.tech/dbname?sslmode=require
  1. Run the application
streamlit run App.py

The app will open in your browser at http://localhost:8501 πŸŽ‰


πŸ“ Project Structure

InternHunt/
β”œβ”€β”€ πŸ“„ App.py                                   # Main Streamlit application entry point
β”œβ”€β”€ 🎨 styles.py                                # Centralized UI styling and themes
β”œβ”€β”€ πŸ€– chat_service.py                          # Gemini AI chatbot service
β”œβ”€β”€ πŸ“ resume_parser.py                         # Resume parsing & NLP analysis
β”œβ”€β”€ βš™οΈ config.py                                # Configuration management
β”œβ”€β”€ πŸ› οΈ utils.py                                 # Utility functions
β”œβ”€β”€ πŸ’Ύ database.py                              # Neon PostgreSQL database operations
β”œβ”€β”€ 🌐 api_services.py                          # External API integrations (Jooble)
β”œβ”€β”€ πŸ” job_scrapers.py                          # Job scraping (Internshala)
β”œβ”€β”€ ⚠️ error_handler.py                         # Error handling & logging
β”œβ”€β”€ πŸ“š Courses.py                               # Course recommendation engine
β”‚
β”œβ”€β”€ πŸ€– resume_classifier_v3_skills_mlp.pkl      # Upgraded ML model (TF-IDF + MLP, 10.1 MB)
β”œβ”€β”€ βš™οΈ soft_skill_role_trainer.py               # Local model training script
β”œβ”€β”€ πŸ“Š UpdatedResumeDataSet.csv                 # Training dataset (166 deduplicated unique samples)
β”œβ”€β”€ πŸ““ ResumeClassification_Model.ipynb         # Exploration model notebook
β”‚
β”œβ”€β”€ πŸ“‹ requirements.txt                         # Python dependencies
β”œβ”€β”€ πŸ“– README.md                                # Project documentation
β”œβ”€β”€ πŸ“œ LICENSE                                  # MIT License
β”œβ”€β”€ πŸ”’ PRIVACY.md                               # Privacy policy
β”œβ”€β”€ πŸ” .env.example                             # Environment variables template
β”œβ”€β”€ 🚫 .gitignore                               # Git ignore rules
β”‚        
β”œβ”€β”€ πŸ“ .streamlit/                              # Streamlit configuration
β”‚   β”œβ”€β”€ config.toml                             # App configuration
β”‚   └── secrets.toml.example                    # Secrets template
β”‚        
β”œβ”€β”€ πŸ”€ nevera_font/                             # Custom Nevera font files
β”‚   β”œβ”€β”€ Nevera-Bold.ttf
β”‚   β”œβ”€β”€ Nevera-Regular.ttf
β”‚   └── Nevera-Light.ttf
β”‚
β”œβ”€β”€ πŸ“‚ Uploaded_Resumes/                        # User uploaded resume storage
β”‚   └── .gitkeep                                # Preserve directory in Git
β”‚        
└── πŸ“ screenshots/                             # Application screenshots for README

πŸ§ͺ Development & Testing

Run Model Training (Optional)

# Retrain the classifier with new data
python soft_skill_role_trainer.py

Check Gemini Connection

python -c "from chat_service import check_gemini_health; print(check_gemini_health())"

Linting & Formatting

# If you have ruff/black configured
ruff check .
black .

🀝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch β€” git checkout -b feature/amazing-feature
  3. Commit changes β€” git commit -m 'Add amazing feature'
  4. Push to branch β€” git push origin feature/amazing-feature
  5. Open a Pull Request

Ideas for Contribution Ideas:

  • Add new job sources (LinkedIn, Indeed, Wellfound)
  • Improve resume parsing for DOCX/images
  • Add more ML roles or fine-tune the classifier
  • Enhance UI/UX with new visualizations
  • Write tests for core modules

πŸ‘¨β€πŸ’» Author

Shubham Sharma


πŸ™ Acknowledgments

  • Google Gemini β€” Conversational AI capabilities
  • Streamlit β€” Beautiful web framework for data apps
  • Internshala β€” Internship listings for Indian students
  • Remotive β€” Free remote job API
  • Jooble β€” Global job search API
  • scikit-learn β€” Machine learning toolkit
  • spaCy β€” Industrial-strength NLP
  • All open-source contributors ❀️

⭐ Star this repo if you found it helpful!

Made with ❀️ by students, for students

About

Resume analysis and internship recommendation platform that helps students identify opportunities, skill gaps, and career growth pathways.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors