Wev Monorepo

Welcome to the Wev project. This repository contains the Bulletin app, the Scraper service, and the Supabase infrastructure.

Prerequisites (manual install)

These cannot be automated; install them first.

Node.js: v22.22.2+ (the repo pins Node 22.22.2 in .nvmrc; use nvm use before running installs or tests).
Python: 3.10, 3.11, or 3.12 (Python 3.13 has no torch wheel on macOS x86_64). Python 3.11 is the safest choice on Intel Mac.
Docker Desktop: Required for local Supabase. Must be running before npm run migrate.
Supabase CLI: installed automatically as a dev dependency via npm install.
Ollama (optional): for local LLM during *:publish runs; see LLM / embeddings below. make doctor checks it.

Run make doctor to verify versions and venv/.env.

nvm use                     # pick the Node version pinned in .nvmrc
make setup                  # npm install + scraper venv + requirements (incl. -dev) + .env scaffold
# edit .env with your secrets
# start Docker Desktop
npm run migrate             # full DB reset + seed (loads the "Community Builder N" fixtures)
npm run skills:index -- --upsert-db
npm run skills:embeddings   # populate ESCO embeddings (needed for skills matching)
npm run dev                 # http://localhost:3000

Local emails are intercepted by Mailpit at http://localhost:54324.

Make targets

make setup — full bootstrap: npm install + scraper venv + requirements.txt (editable install) + requirements-dev.txt + scaffold .env.
make setup-py / make setup-py-dev — Python deps only. setup-py-dev adds torch / transformers / einops for local Jina v3 skill embeddings. Required for any scrape that does skills tagging or vector embeddings.
make setup-env — copy .env.example → .env if missing.
make doctor — verify Node/Python/Docker/Supabase versions and venv/.env presence.
make clean-py — wipe the scraper venv (use if it gets corrupted).
make reset — clean-py then setup.

The Makefile auto-detects a torch-compatible Python (python3.11 → python3.12 → python3.10 → python3). Override with make setup PYTHON_BIN=python3.12.

Useful npm scripts

npm run help — cheat sheet for scripts and flags (npm run alone only lists script names).
npm run dev — start the Bulletin app.
npm run migrate — full local DB reset & seed (uses fixture data from supabase/src/dataset.ts).
npm run migrate -- --staging — push migrations to staging.
npm run migrate -- --prod — push migrations to production.
npm run restore — restore backup JSON into local Supabase.
npm run restore -- --staging — restore backups into staging.
npm run seed / seed:local — seed local DB (without full migrate reset).
npm run seed:staging — seed staging (npm run seed -- --env staging).
npx supabase status — check local Supabase services.

Script naming

Pattern	When to use	Examples
`family:target`	Named env/target — thin alias → `npm run family -- --env target`	`scrape:staging` → `npm run scrape -- --env staging`
`command` + `--` flags	Extra options on top of a named script	`npm run scrape:prod -- --source mac`
`family:member` (colon)	Different tasks (not env variants)	`skills:index`, `test:e2e`

Run npm run help for a cheat sheet. npm run alone lists script names but not flags.

Scrape

Named targets (visible in npm run):

Script	Target
`scrape:local`	Local DB
`scrape:staging`	Staging
`scrape:publish`	Prod DB, local LLMs (YES prompt)
`scrape:prod`	Full prod (YES prompt)
`scrape:list-sources`	List slugs (fast)

Add options after --: npm run scrape:prod -- --source mac

Same via flags on scrape: npm run scrape -- --env staging --source cent

npm run scrape:list-sources
npm run scrape:local
npm run scrape:prod -- --source mac

npm run process:staging -- --limit 50

or: npm run process -- --env staging --limit 50

npm run skills:index # ESCO API → JSON file only npm run skills:index -- --upsert-db # upsert skills to local DB npm run skills:index -- --upsert-db --staging npm run skills:embeddings # embed skills in local DB npm run skills:embeddings -- --staging npm run skills:embeddings -- --prod # prompts for YES


- `npm run test` — full test suite (Bulletin + scraper).
- `npm run verify` — lint + tsc + tests (run automatically on `git push`; bypass with `npm run push:skip`).

## LLM / embeddings (scraper)

- **Jina v3** (skill embeddings): runs **locally** when `ENV_MODE=local` (via `transformers` + `torch`; `make setup-py-dev`, ~570MB model on first use). Any other `ENV_MODE` (e.g. `prod` or unset after a prod overlay) uses the REST API path when configured.
- **Ollama** (optional): used for unified / local LLM work when `ENV_MODE=local`. Install from [ollama.com/download](https://ollama.com/download), run `ollama serve`, and pull `LOCAL_LLM_MODEL` (e.g. `ollama pull llama3.2:3b`).
- **`--publish`:** prod Supabase credentials from `.env.production`; `ENV_MODE=local` so embeddings + text LLM stay on your machine.
- **`--prod`:** loads `.env.production` over `.env` and sets `ENV_MODE=prod` (cloud keys, no local-first routing).
- **Gemini** / **Groq**: API keys as needed for `*:prod` and cloud fallbacks. Gemini free tier can be tight on volume; consider paid tier or smaller batches (`--limit` on the post-processor).

## Notes

- In **`publish`** mode, only prod Supabase keys are applied from `.env.production`; the runner sets `ENV_MODE=local` so **prod DB + local LLMs/embeddings** (typical “push from my machine” workflow). In **`prod`** mode, the full prod file is layered on `.env`, then `ENV_MODE=prod` so you don’t accidentally keep `ENV_MODE=local` from `.env` when you meant a fully prod-configured run.
- Pre-push hook runs `npm run verify:fix`. To bypass: `SKIP_VERIFY=1 git push` or `npm run push:skip`.
- Avoid wrapping scraper Python in `dotenv-cli` (**first-wins**). Use `npm run scrape -- --prod` or `npm run scrape -- --publish` so env layering matches `scrape.py`.

## Architectural Roadmap

- **CV Uploads**: Currently implemented as a synchronous API route with inline `worker_threads` for parsing to avoid event loop blocking. As scale increases, this should be moved to a decoupled Background Job / Microservice architecture (uploading to Supabase Storage -> async queue processing) to fully resolve potential HTTP load-balancer timeouts.

Name		Name	Last commit message	Last commit date
Latest commit History 518 Commits
.githooks		.githooks
.github		.github
.trunk		.trunk
e2e		e2e
scripts		scripts
shared/taxonomy		shared/taxonomy
supabase		supabase
wev-bulletin		wev-bulletin
wev-scraper		wev-scraper
.coderabbit.yaml		.coderabbit.yaml
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
.pr_agent.toml		.pr_agent.toml
.sqlfluff		.sqlfluff
Makefile		Makefile
README.md		README.md
TESTING.md		TESTING.md
debug-csi-bot-challenge.md		debug-csi-bot-challenge.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wev Monorepo

Prerequisites (manual install)

Make targets

Useful npm scripts

Script naming

Scrape

or: npm run process -- --env staging --limit 50

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wev Monorepo

Prerequisites (manual install)

Make targets

Useful npm scripts

Script naming

Scrape

or: npm run process -- --env staging --limit 50

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages