Welcome to the Wev project. This repository contains the Bulletin app, the Scraper service, and the Supabase infrastructure.
These cannot be automated; install them first.
- Node.js: v22.22.2+ (the repo pins Node 22.22.2 in
.nvmrc; usenvm usebefore running installs or tests). - Python:
3.10,3.11, or3.12(Python 3.13 has no torch wheel on macOS x86_64). Python 3.11 is the safest choice on Intel Mac. - Docker Desktop: Required for local Supabase. Must be running before
npm run migrate. - Supabase CLI: installed automatically as a dev dependency via
npm install. - Ollama (optional): for local LLM during
*:publishruns; see LLM / embeddings below.make doctorchecks it.
Run make doctor to verify versions and venv/.env.
nvm use # pick the Node version pinned in .nvmrc
make setup # npm install + scraper venv + requirements (incl. -dev) + .env scaffold
# edit .env with your secrets
# start Docker Desktop
npm run migrate # full DB reset + seed (loads the "Community Builder N" fixtures)
npm run skills:index -- --upsert-db
npm run skills:embeddings # populate ESCO embeddings (needed for skills matching)
npm run dev # http://localhost:3000Local emails are intercepted by Mailpit at http://localhost:54324.
make setup— full bootstrap:npm install+ scraper venv +requirements.txt(editable install) +requirements-dev.txt+ scaffold.env.make setup-py/make setup-py-dev— Python deps only.setup-py-devadds torch / transformers / einops for local Jina v3 skill embeddings. Required for any scrape that does skills tagging or vector embeddings.make setup-env— copy.env.example→.envif missing.make doctor— verify Node/Python/Docker/Supabase versions and venv/.env presence.make clean-py— wipe the scraper venv (use if it gets corrupted).make reset—clean-pythensetup.
The Makefile auto-detects a torch-compatible Python (python3.11 → python3.12 → python3.10 → python3). Override with make setup PYTHON_BIN=python3.12.
npm run help— cheat sheet for scripts and flags (npm runalone only lists script names).npm run dev— start the Bulletin app.npm run migrate— full local DB reset & seed (uses fixture data fromsupabase/src/dataset.ts).npm run migrate -- --staging— push migrations to staging.npm run migrate -- --prod— push migrations to production.npm run restore— restore backup JSON into local Supabase.npm run restore -- --staging— restore backups into staging.npm run seed/seed:local— seed local DB (without full migrate reset).npm run seed:staging— seed staging (npm run seed -- --env staging).npx supabase status— check local Supabase services.
| Pattern | When to use | Examples |
|---|---|---|
family:target |
Named env/target — thin alias → npm run family -- --env target |
scrape:staging → npm run scrape -- --env staging |
command + -- flags |
Extra options on top of a named script | npm run scrape:prod -- --source mac |
family:member (colon) |
Different tasks (not env variants) | skills:index, test:e2e |
Run npm run help for a cheat sheet. npm run alone lists script names but not flags.
Named targets (visible in npm run):
| Script | Target |
|---|---|
scrape:local |
Local DB |
scrape:staging |
Staging |
scrape:publish |
Prod DB, local LLMs (YES prompt) |
scrape:prod |
Full prod (YES prompt) |
scrape:list-sources |
List slugs (fast) |
Add options after --: npm run scrape:prod -- --source mac
Same via flags on scrape: npm run scrape -- --env staging --source cent
npm run scrape:list-sources
npm run scrape:local
npm run scrape:prod -- --source macnpm run process:staging -- --limit 50
npm run skills:index # ESCO API → JSON file only npm run skills:index -- --upsert-db # upsert skills to local DB npm run skills:index -- --upsert-db --staging npm run skills:embeddings # embed skills in local DB npm run skills:embeddings -- --staging npm run skills:embeddings -- --prod # prompts for YES
- `npm run test` — full test suite (Bulletin + scraper).
- `npm run verify` — lint + tsc + tests (run automatically on `git push`; bypass with `npm run push:skip`).
## LLM / embeddings (scraper)
- **Jina v3** (skill embeddings): runs **locally** when `ENV_MODE=local` (via `transformers` + `torch`; `make setup-py-dev`, ~570MB model on first use). Any other `ENV_MODE` (e.g. `prod` or unset after a prod overlay) uses the REST API path when configured.
- **Ollama** (optional): used for unified / local LLM work when `ENV_MODE=local`. Install from [ollama.com/download](https://ollama.com/download), run `ollama serve`, and pull `LOCAL_LLM_MODEL` (e.g. `ollama pull llama3.2:3b`).
- **`--publish`:** prod Supabase credentials from `.env.production`; `ENV_MODE=local` so embeddings + text LLM stay on your machine.
- **`--prod`:** loads `.env.production` over `.env` and sets `ENV_MODE=prod` (cloud keys, no local-first routing).
- **Gemini** / **Groq**: API keys as needed for `*:prod` and cloud fallbacks. Gemini free tier can be tight on volume; consider paid tier or smaller batches (`--limit` on the post-processor).
## Notes
- In **`publish`** mode, only prod Supabase keys are applied from `.env.production`; the runner sets `ENV_MODE=local` so **prod DB + local LLMs/embeddings** (typical “push from my machine” workflow). In **`prod`** mode, the full prod file is layered on `.env`, then `ENV_MODE=prod` so you don’t accidentally keep `ENV_MODE=local` from `.env` when you meant a fully prod-configured run.
- Pre-push hook runs `npm run verify:fix`. To bypass: `SKIP_VERIFY=1 git push` or `npm run push:skip`.
- Avoid wrapping scraper Python in `dotenv-cli` (**first-wins**). Use `npm run scrape -- --prod` or `npm run scrape -- --publish` so env layering matches `scrape.py`.
## Architectural Roadmap
- **CV Uploads**: Currently implemented as a synchronous API route with inline `worker_threads` for parsing to avoid event loop blocking. As scale increases, this should be moved to a decoupled Background Job / Microservice architecture (uploading to Supabase Storage -> async queue processing) to fully resolve potential HTTP load-balancer timeouts.