This is a Node.js backend written in TypeScript for crawling, extracting, and monitoring online courses specifically from courseopera website. The system automates the scraping of course titles and URLs, leveraging Puppeteer for full-page JavaScript rendering, then processes the HTML with Cheerio for parsing. It additionally supports email notifications and can be triggered externally via API.
- Personal project, primarily built for automating course updates.
- Suitable for deployment on Vercel with lightweight Chromium (
@sparticuz/chromium).
- Dynamic content rendering with Puppeteer (or
@sparticuz/chromiumfor serverless). - Extracts specific course titles and URLs without needing
ulentries explicitly. - Sends email notifications with course information via Nodemailer.
- Provides an API endpoint (
/trigger-crawl) to trigger crawling remotely. - Supports cross-origin requests (limited to your extension or trusted sources).
- TypeScript
- Node.js
- Puppeteer / @sparticuz/chromium (for headless Chrome)
- Cheerio (DOM parsing)
- Express.js (API server)
- Nodemailer (email notifications)
- dotenv (manage secrets)
- CORS (secure cross-origin API calls)
- Node.js >= 18
- pnpm (or npm/yarn)
- Gmail SMTP credentials (for email alerts)
Clone the repository:
git clone https://github.com/KPorus/Auto-Trigger-course-extraction-backend.git
cd Auto-Trigger-course-extraction-backendInstall dependencies:
pnpm installConfigure environment variables:
# .env file
GMAIL=your-email@gmail.com
APP_PASS=your-app-password
EXTENSION_ID=your-extension-id-if-neededpnpm run build
pnpm startThis starts an API server on port 4000, listening for external triggers or scheduled tasks.
-
Trigger crawl manually
POST
http://localhost:4000/trigger-crawl -
Configure your extension or scheduler to call this endpoint for automated updates.
-
Monitor output logs for course extraction and email sent status.
- Use
@sparticuz/chromiumfor Chromium in serverless. - Prepare
vercel.json:
{
"version": 2,
"builds": [
{ "src": "index.ts", "use": "@vercel/node" }
],
"routes": [
{ "src": "/(.*)", "dest": "index.ts" }
]
}- Bundle with the regular production build process.
- Use environment variables to avoid secrets.
- Configure CORS for trusted origins/extensions.
This repo is optimized for personal automation tasks:
- Focuses solely on courseopera.com, but can be extended.
- You can modify selectors without affecting core logic.
- Great for personal tracking or learning projects.
MIT License — feel free to fork, modify, and deploy your own version!
This repository automates course monitoring from a specific JavaScript-heavy website, ideal for personal use, educational projects, or lightweight automation. For larger-scale or production use, consider additional error handling, scheduling, and secure environment management.