AI-Powered Resume Analysis & Internship Matching Platform
Streamline your internship search using NLP, Machine Learning, and Real-Time Web Scraping.
Quick Links:
Landing Page (Vercel) Β β’Β Streamlit Web App
π‘ Want to see it in action? Check out the Live Demo!
Beautiful Figma-designed landing page hosted on Vercel
Core interactive app landing page on Streamlit Cloud
Smart resume parsing and classification
Powered by Google Gemini for personalized career guidance
Tailored learning paths based on your profile
Real-time internship opportunities matching your skills (scraped from multiple platforms like Internshala, Remotive, and Jooble)
User management and analytics with premium Plotly interactive visualizations
| Stage | What Happens |
|---|---|
| 1. Upload & Parse | PDF/DOCX support (50MB limit), text extraction via pypdf + python-docx |
| 2. Skill Detection | 100+ technical skills via spaCy NLP + fuzzy matching (e.g., k8s β Kubernetes) |
| 3. Role Prediction | MLP Neural Network (TF-IDF β 128/64 hidden layers) β Top 3 roles with confidence |
| 4. ATS Scoring | 5-factor breakdown: Content (50%), Formatting (15%), Keywords (20%), Experience (10%), Readability (5%) |
| 5. Recommendations | Live internships (Jooble, Internshala, Remotive, GitHub) + curated courses |
| 6. AI Assistant | Gemini 1.5 Flash with full resume context for personalized career coaching |
InternHunt processes resume uploads, evaluates scoring, logs user details to the database, queries external scrapers, and initializes the chatbot context in a highly structured pipeline:
- Extraction & Preprocessing: The uploaded document is parsed, stripped of raw formatting, and cleaned of non-ASCII symbols and excessive spacing.
- Parallel Feature Extraction:
- Regex filters extract email, phone, and profile URLs.
- spaCy matcher extracts normalized skill sets.
- TF-IDF Vectorizer maps the cleaned text to 2500 terms, which are classified by the MLPClassifier to predict the candidate's career role.
- ATS Assessment: Core sections are checked, and missing skills are highlighted by matching actual skills against predicted career role profiles.
- Data Persistence: Candidate profiles and computed stats are logged to Neon serverless PostgreSQL for admin audit and analytics.
- Recommendation & Assistant Routing: Matched courses (from
Courses.py) and job listings are displayed, and a detailed profile is built and loaded into the Google Gemini system instructions so the sidebar assistant chatbot can answer resume-specific questions.
InternHunt utilizes a custom-trained Multi-Layer Perceptron (MLP) Neural Network classifier to automatically route resumes to 25 distinct job roles.
- Algorithm: Multi-Layer Perceptron Classifier (scikit-learn)
- Vectorization: TF-IDF (Term Frequency-Inverse Document Frequency)
- Pipeline:
TfidfVectorizer(ngram_range=(1, 2), max_features=2500) βMLPClassifier(128, 64 hidden nodes) - File:
resume_classifier_v3_skills_mlp.pkl(10.1 MB) - Training Data:
UpdatedResumeDataSet.csv(166 unique deduplicated samples to prevent training bias)
| Metric | Score |
|---|---|
| Test Accuracy | 85.29% |
| Precision | 88.2% (weighted avg) |
| Recall | 85.3% (weighted avg) |
| F1-Score | 81.9% (weighted avg) |
| Cross-Validation | 81.69% Β± 0.85% (3-fold Stratified) |
Pipeline([
('tfidf', TfidfVectorizer(
max_features=2500, # Vocabulary size limit
ngram_range=(1, 2), # Unigrams & bigrams
min_df=1,
max_df=0.95,
stop_words='english',
lowercase=True
)),
('classifier', MLPClassifier(
hidden_layer_sizes=(128, 64),
activation='relu',
solver='adam',
alpha=0.1,
learning_rate_init=0.001,
max_iter=1000,
early_stopping=False,
random_state=42
))
])InternHunt combines APIs + custom scrapers for comprehensive coverage:
| Source | Type | Coverage | Key Details |
|---|---|---|---|
| Internshala | HTML Scraper | India-focused internships | Keyword-based URLs, parses stipend, duration, apply links |
| Remotive | REST API | Global remote dev jobs | Free public endpoint, filtered by top 5 skills |
| Jooble | REST API | Global job search | Requires JOOBLE_API_KEY, POST with keywords + location |
| GitHub | HTML Scraper | Hiring/Internship repos | Searches repos with topic:hiring or topic:internship |
InternHunt supports dual persistence schemes for local development and cloud production:
Dual persistence for local dev + cloud production:
| Database | Use Case | Driver |
|---|---|---|
| Neon PostgreSQL | Production (Streamlit Cloud) | psycopg2 via DATABASE_URL |
| MySQL | Local development fallback | pymysql via env vars |
CREATE TABLE IF NOT EXISTS user_data (
ID SERIAL PRIMARY KEY,
Name VARCHAR(500) NOT NULL,
Email_ID VARCHAR(500) NOT NULL,
resume_score VARCHAR(8) NOT NULL,
Timestamp VARCHAR(50) NOT NULL,
Page_no VARCHAR(5) NOT NULL,
Predicted_Field TEXT NOT NULL,
User_level TEXT NOT NULL,
Actual_skills TEXT NOT NULL,
Recommended_skills TEXT NOT NULL,
Recommended_courses TEXT NOT NULL
);- Python 3.9 or higher
- pip package manager
- Google Gemini API key (Get one here)
- Clone the repository
git clone https://github.com/ShubhamSnSharma/InternHunt.git
cd InternHunt- Create virtual environment
python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate- Install dependencies
pip install -r requirements.txt- Download NLTK data (Required for NLP)
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"- Set up environment variables
Create a
.envfile in the root directory:
# Google Gemini API
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-1.5-flash
# Neon Database (PostgreSQL)
DATABASE_URL=postgresql://user:password@host.neon.tech/dbname?sslmode=require- Run the application
streamlit run App.pyThe app will open in your browser at http://localhost:8501 π
InternHunt/
βββ π App.py # Main Streamlit application entry point
βββ π¨ styles.py # Centralized UI styling and themes
βββ π€ chat_service.py # Gemini AI chatbot service
βββ π resume_parser.py # Resume parsing & NLP analysis
βββ βοΈ config.py # Configuration management
βββ π οΈ utils.py # Utility functions
βββ πΎ database.py # Neon PostgreSQL database operations
βββ π api_services.py # External API integrations (Jooble)
βββ π job_scrapers.py # Job scraping (Internshala)
βββ β οΈ error_handler.py # Error handling & logging
βββ π Courses.py # Course recommendation engine
β
βββ π€ resume_classifier_v3_skills_mlp.pkl # Upgraded ML model (TF-IDF + MLP, 10.1 MB)
βββ βοΈ soft_skill_role_trainer.py # Local model training script
βββ π UpdatedResumeDataSet.csv # Training dataset (166 deduplicated unique samples)
βββ π ResumeClassification_Model.ipynb # Exploration model notebook
β
βββ π requirements.txt # Python dependencies
βββ π README.md # Project documentation
βββ π LICENSE # MIT License
βββ π PRIVACY.md # Privacy policy
βββ π .env.example # Environment variables template
βββ π« .gitignore # Git ignore rules
β
βββ π .streamlit/ # Streamlit configuration
β βββ config.toml # App configuration
β βββ secrets.toml.example # Secrets template
β
βββ π€ nevera_font/ # Custom Nevera font files
β βββ Nevera-Bold.ttf
β βββ Nevera-Regular.ttf
β βββ Nevera-Light.ttf
β
βββ π Uploaded_Resumes/ # User uploaded resume storage
β βββ .gitkeep # Preserve directory in Git
β
βββ π screenshots/ # Application screenshots for README
# Retrain the classifier with new data
python soft_skill_role_trainer.pypython -c "from chat_service import check_gemini_health; print(check_gemini_health())"# If you have ruff/black configured
ruff check .
black .We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch β
git checkout -b feature/amazing-feature - Commit changes β
git commit -m 'Add amazing feature' - Push to branch β
git push origin feature/amazing-feature - Open a Pull Request
- Add new job sources (LinkedIn, Indeed, Wellfound)
- Improve resume parsing for DOCX/images
- Add more ML roles or fine-tune the classifier
- Enhance UI/UX with new visualizations
- Write tests for core modules
Shubham Sharma
- GitHub: @ShubhamSnSharma
- Project: InternHunt
- Google Gemini β Conversational AI capabilities
- Streamlit β Beautiful web framework for data apps
- Internshala β Internship listings for Indian students
- Remotive β Free remote job API
- Jooble β Global job search API
- scikit-learn β Machine learning toolkit
- spaCy β Industrial-strength NLP
- All open-source contributors β€οΈ
β Star this repo if you found it helpful!
Made with β€οΈ by students, for students
