██╗ ██████╗ ██████╗██╗ ██╗███████╗
██║ ██╔═══██╗██╔════╝██║ ██║██╔════╝
██║ ██║ ██║██║ ██║ ██║███████╗
██║ ██║ ██║██║ ██║ ██║╚════██║
███████╗╚██████╔╝╚██████╗╚██████╔╝███████║
╚══════╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝
AI-powered drug target discovery — from a plain-English query to ranked variants, 3D structure, and approved drugs in seconds.
Type
EGFR non-small cell lung cancer→ get ranked driver mutations, a live 3D structure, 12 approved drugs with IC₅₀ values, and a research-grade AI report. One query. Under 60 seconds.
Locus connects five major bioinformatics databases in a single automated pipeline, ranks pathogenic variants by a multi-factor composite score grounded in clinical evidence, and generates a structured research summary using a 70B language model. It is built for computational biologists, pharmacologists, and drug discovery teams who need rapid, evidence-based variant prioritisation without writing database queries.
User query (plain English)
│
▼
┌───────────────────────────────────────────────────────────┐
│ Step 1 · Parse Query 🤖 Groq │
│ Llama 3.3 70B extracts { gene, disease, tissue } │
└────────────────────────────┬──────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ Step 2 · Validate Gene 🧬 Ensembl │
│ Confirms symbol exists in GRCh38; rejects typos early │
└────────────────────────────┬──────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ Step 3 · Fetch Variants 📋 ClinVar │
│ esearch + esummary — pathogenic + likely pathogenic │
│ Extracts rsIDs (SNPs) and HGVS (exon deletions, indels) │
│ Reads ClinVar review status → 0–4 star evidence rating │
└────────────────────────────┬──────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ Step 4 · Annotate — VEP Batch 🧬 Ensembl │
│ /id endpoint (rsIDs) + /hgvs endpoint (exon del, indels) │
│ → protein position, SIFT, PolyPhen, consequence, HGVS │
│ → gnomAD allele frequency from colocated variants │
└────────────────────────────┬──────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ Step 5 · Score Variants 🔢 Locus │
│ Composite = 0.55 × pathScore × starWeight │
│ + 0.45 × rarityScore │
│ + 0.15 if active/binding site hit (UniProt) │
└────────────────────────────┬──────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ Step 6 · Rank Variants 🤖 Groq │
│ Llama 3.3 70B ranks top 10 with clinical reasoning │
└────────────────────────────┬──────────────────────────────┘
│
┌────────────────┴────────────────┐
▼ ▼
┌─────────────────────────┐ ┌───────────────────────────┐
│ Step 7a · Structure │ │ Step 7b · Drugs │
│ UniProt → RCSB PDB │ │ UniProt → ChEMBL target │
│ → AlphaFold fallback │ │ → mechanisms + IC₅₀ │
└───────────┬─────────────┘ └─────────────┬─────────────┘
└────────────────┬────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ Step 8 · Generate Report 🤖 Groq │
│ Structured research summary: mechanism, structure, drugs │
└───────────────────────────────────────────────────────────┘
|
|
|
|
compositeScore = min(1.0,
0.55 × pathScore × starWeight ← deleterious prediction × evidence quality
+ 0.45 × rarityScore ← population allele frequency
+ 0.15 × isFunctionalSite ← UniProt active/binding site overlap
)
| Component | Source | Range |
|---|---|---|
pathScore |
1 − SIFT score, or PolyPhen score if no SIFT |
0.0 – 1.0 |
starWeight |
(clinvarStars + 1) / 5 |
0.20 – 1.00 |
rarityScore |
gnomAD AF tier | 0.05 – 1.00 |
isFunctionalSite |
UniProt Active site / Binding site overlap | 0 or 1 |
| Allele Frequency | Score | Interpretation |
|---|---|---|
| < 1 × 10⁻⁶ | 1.00 |
Ultra-rare — almost certainly pathogenic |
| < 1 × 10⁻⁵ | 0.90 |
Very rare |
| < 1 × 10⁻⁴ | 0.75 |
Rare |
| < 1 × 10⁻³ | 0.50 |
Low frequency |
| < 1 × 10⁻² | 0.20 |
Common variant |
| ≥ 1 × 10⁻² | 0.05 |
Benign frequency |
| Review Status | Stars | starWeight |
|---|---|---|
| Practice guideline | ⭐⭐⭐⭐ | 1.00 |
| Reviewed by expert panel | ⭐⭐⭐ | 0.80 |
| Multiple submitters, no conflicts | ⭐⭐ | 0.60 |
| Single submitter / conflicting | ⭐ | 0.40 |
| No assertion provided | — | 0.20 |
| API | Used for | Timeout | Retries |
|---|---|---|---|
| Ensembl REST | Gene validation, VEP /id batch, VEP /hgvs batch |
10s / 45s | 3 × |
| NCBI ClinVar | Pathogenic variant search (esearch + esummary) | 15s | 3 × |
| UniProt REST | Protein accession lookup + domain features | 8s | 3 × |
| RCSB PDB | Experimental structure search + resolution + ligands | 10s / 6s | 3 × |
| AlphaFold DB | Predicted structure for all reviewed human proteins | 10s | 3 × |
| ChEMBL | Drug mechanisms, max phase, IC₅₀ bioactivity | 10s / 6s / 5s | 3 × |
| Groq | Llama 3.3 70B — parse, rank, report (3 calls) | n/a | 1 × |
| AlphaGenome sidecar | Regulatory scoring — always offline; 2s health check | 2s | 0 × |
All fetches use
fetchWithRetry— each retry creates a freshAbortSignal.timeout()so an expired signal from attempt 1 does not immediately fail attempt 2.
| Layer | Technology | Version |
|---|---|---|
| Framework | Next.js App Router + Turbopack | 16 |
| Language | TypeScript (strict, zero `any`) | 5 |
| Styling | Tailwind CSS v4 | 4 |
| Auth | Auth.js (Google OAuth) | v5 beta |
| Database | Neon Postgres (serverless WebSocket) | — |
| ORM | Drizzle ORM | 0.44 |
| LLM | Groq — Llama 3.3 70B Versatile | — |
| 3D Viewer | 3Dmol.js (WebGL) | 2.5 |
| Charts | Recharts | 3.9 |
| Validation | Zod | 3.25 |
| UI Components | Radix UI + Lucide React | — |
| CI | GitHub Actions | — |
locus/
├── .github/
│ └── workflows/ci.yml # tsc + next build on every PR
│
├── app/
│ ├── (app)/
│ │ ├── analyze/[id]/page.tsx # Result page: variants, 3D, drugs, report
│ │ ├── dashboard/ # Recent analyses overview
│ │ ├── history/page.tsx # Cursor-paginated analysis history
│ │ └── guide/page.tsx # Plain-English terminology guide
│ └── api/
│ ├── analyze/route.ts # POST: start pipeline; rate-limited
│ │ └── [id]/stream/ # SSE: real-time progress
│ ├── health/route.ts # GET: dependency status
│ ├── history/route.ts # GET: cursor-paginated
│ └── account/route.ts # DELETE: GDPR erasure
│
├── lib/
│ ├── env.ts # Startup env var validation (throws on missing)
│ └── pipeline/
│ ├── fetchWithRetry.ts # Exponential backoff wrapper
│ ├── apiSchemas.ts # Zod schemas for all 5 external APIs
│ ├── parseQuery.ts # Groq: extract gene / disease / tissue
│ ├── fetchVariants.ts # ClinVar: variants + star rating + HGVS
│ ├── annotateVariants.ts # Ensembl VEP: rsID batch + HGVS batch
│ ├── scoreVariants.ts # Composite score + sidecar health check
│ ├── rankVariants.ts # Groq: rank top 10 with reasoning
│ ├── fetchStructure.ts # UniProt + RCSB PDB + AlphaFold
│ ├── fetchDrugs.ts # ChEMBL: drugs + IC₅₀
│ └── generateReport.ts # Groq: research summary
│
├── components/
│ ├── analysis/
│ │ ├── ProteinViewer.tsx # 3Dmol.js viewer with domain colour overlays
│ │ └── DrugPanel.tsx # Drug cards with IC₅₀ and mechanism
│ └── layout/
│ └── Sidebar.tsx # Responsive nav (hamburger on mobile)
│
└── next.config.ts # CSP headers, server action config
- Node.js ≥ 20
- A Neon Postgres database (free tier is sufficient)
- A Groq API key (free)
- Google OAuth credentials from Google Cloud Console
git clone https://github.com/s3ak6i-dev/Locus.git
cd Locus
npm installCreate .env.local in the project root:
# Database
DATABASE_URL=postgresql://user:pass@ep-xxx.neon.tech/locus?sslmode=require
# Auth
NEXTAUTH_SECRET=<run: openssl rand -hex 32>
NEXTAUTH_URL=http://localhost:3000
GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-client-secret
# AI
GROQ_API_KEY=gsk_...
# Bioinformatics APIs (public, no key required)
ENSEMBL_API_BASE=https://rest.ensembl.org
CLINVAR_API_BASE=https://eutils.ncbi.nlm.nih.gov/entrez/eutils
# AlphaGenome sidecar (optional — app falls back gracefully when offline)
GENOME_SERVICE_URL=http://localhost:8000
GENOME_SERVICE_SECRET=any-local-secretnpm run db:pushnpm run devOpen http://localhost:3000, sign in with Google, and run your first analysis.
-- Core
analyses (
id UUID PRIMARY KEY,
user_id TEXT REFERENCES users(id),
query TEXT, -- original user query
gene TEXT, -- extracted gene symbol e.g. "EGFR"
disease TEXT,
tissue TEXT,
status TEXT, -- 'running' | 'complete' | 'error'
variant_count INTEGER,
top_targets JSONB, -- RankedVariant[]
structure_meta JSONB, -- StructureMeta
drug_data JSONB, -- Drug[]
report TEXT,
error_message TEXT,
created_at TIMESTAMP,
completed_at TIMESTAMP
)
-- 30-day TTL: variant scores keyed by (gene, variant_id)
variant_cache (
id UUID PRIMARY KEY,
gene TEXT,
variant_id TEXT, -- rsID or "hgvs_NM_..." for exon deletions
rna_delta REAL,
atac_delta REAL,
splice_delta REAL,
composite_score REAL,
cached_at TIMESTAMP,
UNIQUE (gene, variant_id)
)
-- 7-day TTL: drug data keyed by UniProt accession (not gene name)
drug_cache (
id UUID PRIMARY KEY,
gene TEXT UNIQUE, -- stores UniProt accession e.g. "P00533"
drugs JSONB, -- Drug[]
cached_at TIMESTAMP
)
-- 7-day TTL: structure metadata
structure_cache (
id UUID PRIMARY KEY,
gene TEXT UNIQUE,
pdb_id TEXT,
source TEXT, -- 'rcsb' | 'alphafold'
structure_url TEXT,
resolution REAL,
plddt REAL,
uniprot_id TEXT,
cached_at TIMESTAMP
)Start a new analysis pipeline (async — returns immediately with an ID).
Request
{ "query": "EGFR non-small cell lung cancer" }Response 200
{ "analysisId": "3f4a2b1c-..." }Errors
| Status | Reason |
|---|---|
400 |
Query outside 5–500 characters, or invalid gene name extracted |
401 |
Not authenticated |
429 |
Rate limit: 10 analyses per user per 24 hours |
429 body:
{ "error": "Rate limit: 10 analyses per day", "resetsAt": "2026-06-26T09:00:00.000Z" }Server-Sent Events stream. Polls the DB every 1 second until the analysis reaches complete or error, or 120 seconds pass. Stops immediately if the client disconnects.
| Event | Payload |
|---|---|
update |
{ status, gene, disease, variantCount } |
done |
{ analysisId } |
error |
{ message } |
Live dependency status — useful for monitoring and debugging.
Response
{
"status": "degraded",
"db": "ok",
"genome_sidecar": "offline",
"ensembl": "ok",
"chembl": "ok",
"timestamp": "2026-06-25T14:00:00.000Z"
}Cursor-based paginated history for the authenticated user. cursor is the ISO timestamp of the last item received.
Response
{
"analyses": [...],
"nextCursor": "2026-06-20T12:00:00.000Z",
"hasMore": true
}Permanently delete all analyses for the authenticated user and sign them out. Implements the GDPR right to erasure.
Locus ships strict CSP headers from next.config.ts:
default-src 'self'
script-src 'self' 'unsafe-eval' 'unsafe-inline' cdn.jsdelivr.net
style-src 'self' 'unsafe-inline'
connect-src 'self'
rest.ensembl.org
eutils.ncbi.nlm.nih.gov
rest.uniprot.org
search.rcsb.org data.rcsb.org files.rcsb.org
alphafold.ebi.ac.uk www.ebi.ac.uk
img-src 'self' data: lh3.googleusercontent.com
frame-ancestors 'none'
object-src 'none'
'unsafe-eval'is required by 3Dmol.js WebGL shader compilation.'unsafe-inline'is required by Tailwind CSS v4's runtime style injection.
Locus uses a bespoke dark research-instrument aesthetic.
| Role | Hex | Used for |
|---|---|---|
| App background | #0A0A0A |
Page background |
| Card background | #111111 |
All card surfaces |
| Hover background | #1A1A1A |
Interactive hover states |
| Border default | #2A2A2A |
Card and table borders |
| Border hover | #3A3A3A |
Focus and hover borders |
| Text primary | #F0F0F0 |
Headings, values |
| Text secondary | #888888 |
Labels, descriptions |
| Text muted | #555555 |
Placeholders, icons |
| Teal | #1D9E75 |
Biology data, rank 1–3 variants |
| Blue | #378ADD |
AI outputs, rank 4–7 variants |
| Orange | #BA7517 |
Binding sites, warnings |
| Red | #E24B4A |
Deleterious predictions, errors |
| Purple | #8B5CF6 |
Splice variants, AlphaFold |
Fonts: Space Grotesk (headings) · Inter (body) · JetBrains Mono (IDs, scores)
Every push and pull request to main runs:
- npm ci
- npx tsc --noEmit # Type check — zero errors required
- npx next build # Full production build- Connect your GitHub repo to Vercel
- Add all environment variables from the Getting Started section in the Vercel dashboard
- Run
npm run db:pushagainst your production Neon database
The sidecar is a Python FastAPI service (/genome-service) that wraps the AlphaGenome model. The app detects its status with a 2-second health check at the start of every pipeline run. When offline (which is the current default), the pipeline falls back to the SIFT / PolyPhen / gnomAD formula with no user-visible degradation beyond logging.
- Fork the repository
- Create a feature branch —
git checkout -b feat/your-feature - Ensure
npm run buildpasses with zero TypeScript errors - Open a PR — CI runs
tscandnext buildautomatically
- Google DeepMind AlphaGenome — regulatory variant scoring
- Groq — ultra-fast LLM inference (Llama 3.3 70B)
- NCBI ClinVar — clinical variant classification database
- Ensembl VEP — genomic variant annotation
- RCSB PDB — experimental protein structures
- AlphaFold DB — predicted protein structures
- ChEMBL — drug-target bioactivity database
- UniProt — protein sequence and functional annotation
MIT © Surya Krishna Bharadwaj Kunapuli