Skip to content

create2000/Job-scraper

Repository files navigation

Job Scraper Platform

A comprehensive Node.js/Express backend platform that scrapes job listings from multiple sources and provides AI-powered resume analysis and matching. Built for job seekers to find opportunities and optimize their applications.

πŸš€ Features

Core Features

  • βœ… Job Scraping - Automated scraping from Indeed, RemoteOK, and WeWorkRemotely
  • βœ… Resume Upload & Parsing - PDF/DOCX support with text extraction
  • βœ… AI Resume Analysis - Match resumes against job descriptions (OpenAI, Gemini, Cohere, n8n)
  • βœ… Subscription System - Free and Pro plans with credit management
  • βœ… Payment Integration - Paystack integration for Pro upgrades
  • βœ… Admin Dashboard - User management, stats, and manual scraping triggers

Phase 1 Features (NEW! ✨)

  • βœ… Apply to Jobs - Apply with or without resume analysis
  • βœ… Application Tracking - View application history and status
  • βœ… Saved Jobs - Bookmark jobs for later
  • βœ… Password Recovery - Forgot/reset password via email
  • βœ… Google OAuth - One-click login with Google
  • βœ… Email Notifications - Welcome emails, password reset, application confirmations

πŸ“– View Phase 1 Implementation Details

πŸ› οΈ Tech Stack

  • Runtime: Node.js v20
  • Framework: Express.js
  • Database: PostgreSQL
  • Authentication: JWT + Passport.js (Google OAuth)
  • AI Services: OpenAI, Google Gemini, Cohere
  • Scraping: Playwright
  • Email: Nodemailer
  • Payment: Paystack

πŸ“¦ Installation

1. Clone Repository

git clone <repository-url>
cd indeed-job-scraper

2. Install Dependencies

npm install

3. Setup Environment Variables

Copy .env.example to .env and fill in your credentials:

cp .env.example .env

Required variables:

  • Database credentials (PostgreSQL)
  • JWT secret
  • AI API keys (OpenAI, Gemini, Cohere)
  • Google OAuth credentials (for Phase 1)
  • Email configuration (for Phase 1)

4. Run Database Migrations

# Run Phase 1 migrations
node db/phase1_migrations.js

# Verify database schema
node check_db.js

5. Start Server

# Development mode
npm run dev

# Production mode
npm start

Server runs on: http://localhost:4000

πŸ“š Documentation

πŸ” API Endpoints

Authentication

POST   /auth/register              - Register new user
POST   /auth/login                 - Login with email/password
GET    /auth/profile               - Get user profile (protected)
POST   /auth/forgot-password       - Request password reset
POST   /auth/reset-password/:token - Reset password with token
POST   /auth/change-password       - Change password (protected)
GET    /auth/google                - Google OAuth login
GET    /auth/google/callback       - Google OAuth callback

Jobs

GET    /jobs                       - List all jobs (with filters)
GET    /jobs/:id                   - Get job details
POST   /jobs/:id/parse             - Parse job description
POST   /jobs/:id/scrape-detail     - Scrape job details
POST   /jobs/:id/analyze-resume    - Analyze resume for job (protected)

Applications (Phase 1)

POST   /applications/jobs/:id/apply        - Apply to job (protected)
GET    /applications                       - Get application history (protected)
GET    /applications/:id                   - Get application details (protected)
PUT    /applications/:id/withdraw          - Withdraw application (protected)
GET    /applications/jobs/:id/status       - Check application status (protected)

Saved Jobs (Phase 1)

POST   /saved-jobs/jobs/:id/save           - Save job (protected)
DELETE /saved-jobs/jobs/:id/save           - Unsave job (protected)
GET    /saved-jobs                         - Get saved jobs (protected)
GET    /saved-jobs/jobs/:id/status         - Check if saved (protected)

Resumes

GET    /resumes                    - List user resumes (protected)
POST   /resumes/upload             - Upload resume (protected)
POST   /resumes/:id/parse          - Parse resume (protected)
POST   /resumes/export             - Export resume (protected)

Admin

GET    /admin/stats                - Dashboard stats (admin only)
GET    /admin/users                - List users (admin only)
POST   /admin/users/credits        - Update user credits (admin only)
POST   /admin/scrape               - Trigger manual scrape (admin only)
GET    /admin/audit-logs           - View audit logs (admin only)

Payment

POST   /payment/initialize         - Initialize payment (protected)
GET    /payment/verify/:reference  - Verify payment

πŸ§ͺ Testing

See TESTING_GUIDE_PHASE1.md for detailed testing instructions.

Quick test:

# Get authentication token
curl -X POST http://localhost:4000/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"user@example.com","password":"password123"}'

# Apply to a job
curl -X POST http://localhost:4000/applications/jobs/{jobId}/apply \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{"resumeId":"resume-uuid","coverLetter":"I am interested..."}'

πŸ—‚οΈ Project Structure

indeed-job-scraper/
β”œβ”€β”€ db/                           # Database scripts
β”‚   β”œβ”€β”€ phase1_migrations.js      # Phase 1 database migrations
β”‚   β”œβ”€β”€ migrations.js             # Original migrations
β”‚   └── saveJobs.js               # Job saving logic
β”œβ”€β”€ scraper/                      # Scraping modules
β”‚   β”œβ”€β”€ indeed.js
β”‚   β”œβ”€β”€ remoteok.js
β”‚   └── weworkremotely.js
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   └── passport.js           # Passport OAuth config
β”‚   β”œβ”€β”€ controllers/
β”‚   β”‚   β”œβ”€β”€ auth.controller.js    # Authentication + password reset
β”‚   β”‚   β”œβ”€β”€ jobs.controller.js
β”‚   β”‚   β”œβ”€β”€ resumes.controller.js
β”‚   β”‚   β”œβ”€β”€ applications.controller.js  # NEW: Phase 1
β”‚   β”‚   β”œβ”€β”€ savedJobs.controller.js     # NEW: Phase 1
β”‚   β”‚   β”œβ”€β”€ analysis.controller.js
β”‚   β”‚   β”œβ”€β”€ payment.controller.js
β”‚   β”‚   └── admin.controller.js
β”‚   β”œβ”€β”€ middlewares/
β”‚   β”‚   └── auth.middleware.js    # JWT verification, credit checks
β”‚   β”œβ”€β”€ routes/
β”‚   β”‚   β”œβ”€β”€ auth.routes.js        # Auth + OAuth routes
β”‚   β”‚   β”œβ”€β”€ jobs.routes.js
β”‚   β”‚   β”œβ”€β”€ resumes.routes.js
β”‚   β”‚   β”œβ”€β”€ applications.routes.js       # NEW: Phase 1
β”‚   β”‚   β”œβ”€β”€ savedJobs.routes.js          # NEW: Phase 1
β”‚   β”‚   β”œβ”€β”€ payment.routes.js
β”‚   β”‚   └── admin.routes.js
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ aiAnalyzer.service.js
β”‚   β”‚   β”œβ”€β”€ jobParser.service.js
β”‚   β”‚   β”œβ”€β”€ jobDetailScraper.service.js
β”‚   β”‚   β”œβ”€β”€ resumeExport.service.js
β”‚   β”‚   └── email.service.js              # NEW: Phase 1
β”‚   β”œβ”€β”€ app.js                    # Express app configuration
β”‚   └── server.js                 # Server entry point
β”œβ”€β”€ .env                          # Environment variables (not in repo)
β”œβ”€β”€ .env.example                  # Environment template
β”œβ”€β”€ package.json
└── README.md

πŸ”„ Job Scraping

Run the scraper manually:

node runner.js

This will:

  1. Scrape jobs from Indeed, RemoteOK, and WeWorkRemotely
  2. Deduplicate results
  3. Save to PostgreSQL database

🎯 Roadmap

βœ… Phase 1 (COMPLETE)

  • Apply to jobs
  • Saved jobs
  • Password reset
  • Google OAuth
  • Email notifications

🚧 Phase 2 (Next)

  • Employer job posting
  • Employer dashboard
  • View job applicants
  • Application status updates

πŸ“‹ Phase 3 (Planned)

  • Job alerts
  • Enhanced notifications
  • Profile management
  • Resume management

🎨 Phase 4 (Future)

  • Advanced search & filters
  • Job recommendations
  • Cover letter AI generator
  • Interview preparation

See FEATURE_GAP_ANALYSIS.md for complete roadmap.

🀝 Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open Pull Request

πŸ“ License

ISC

πŸ‘€ Author

Your Name

πŸ™ Acknowledgments

  • Job boards: Indeed, RemoteOK, WeWorkRemotely
  • AI providers: OpenAI, Google Gemini, Cohere
  • Payment: Paystack

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors