This project provides a tool to scrape Thai school information from Wikipedia and serve it via an API. It is built with Bun, ElysiaJS, and Cheerio.
- Live Scraping: Fetches up-to-date school data directly from Wikipedia.
- REST API: Serves school data via high-performance ElysiaJS endpoints.
- Data Export: Script to scrape and save all school data to JSON and CSV files.
- Bun runtime installed.
bun installBefore running the API, you must scrape the data from Wikipedia. This will generate the necessary data files in the dist/ directory.
bun run scrapeOutput:
- JSON:
dist/json/schools.json: Combined list of all schools (pretty).dist/json/schools.min.json: Combined list of all schools (minified).dist/json/provinces/[province].json: Individual JSON files for each province.
- CSV:
dist/csv/schools.csv: Combined list of all schools.dist/csv/provinces/[province].csv: Individual CSV files for each province.
Start the development server:
bun devThe server will be running at http://localhost:3000.
GET /schools: Retrieve a list of schools.- Query Parameters:
q: Search by school name (optional).province: Filter by province (optional).
- Example:
GET /schools?province=ภูเก็ต
- Query Parameters:
GET /: API Information.GET /openapi: OpenAPI documentation.
src/index.ts: API server entry point.src/scripts/scrape.ts: CLI script for scraping and saving data.src/services/scraper.ts: Scraper logic using Cheerio.src/constants/provinces.ts: List of Thai provinces.
This project uses GitHub Actions to automatically update the school data.
- The workflow runs on the 1st of every 3rd month at midnight UTC.
- It executes the scraper and commits any changes to the
dist/directory back to the repository. - You can also manually trigger the "Update School Data" workflow from the Actions tab.