Quantify the financial impact of cybersecurity incidents on publicly traded companies.
BreachAlpha uses event study methodology (MacKinlay, 1997) to measure how breaches move stock prices.
Company: Equifax (EFX) Breach: 2017-09-07
Risk Score: 72.5/100 Severity: HIGH
CAR (-5,+30): -18.34% Volatility Spike: 2.4x
- Why
- Architecture
- Prerequisites
- Setup
- Environment Variables
- Local Development
- Testing
- API Reference
- Methodology
- Data Sources
- Deployment
- Troubleshooting
- Contributing
- License
Security teams struggle to quantify breach impact in financial terms. Board members ask "how much will this cost?" and get vague answers. BreachAlpha provides a data-driven number grounded in how the market actually prices in breach events.
The approach is borrowed from event study methodology in financial economics — the same technique regulators and academics use to measure the impact of earnings announcements, mergers, and litigation on stock prices.
breachalpha/ # Python backend (FastAPI + XGBoost)
├── server.py # FastAPI app, middleware, SPA catch-all
├── schemas.py # 30+ Pydantic request/response models
├── core/
│ ├── constants.py # Risk weights, feature columns, severity labels
│ ├── exceptions.py # Domain exceptions (decoupled from HTTPException)
│ └── http.py # Shared HTTP session, SSRF validation
├── services/
│ ├── model.py # Model loading, synthetic training, batch scoring
│ ├── scoring.py # Ticker validation, breach search, scoring pipeline
│ └── file_upload.py # Upload validation, temp file management
├── routes/
│ ├── meta.py # Health, demo, cache, data sources
│ ├── score.py # /api/score, /api/score/auto, /api/score/config
│ ├── upload.py # /api/upload, /api/upload/analyze
│ ├── explain.py # /api/explain, /api/explain/auto
│ ├── search.py # /api/search, /api/breach-search
│ ├── llm.py # /api/llm/* (optional LLM enrichment)
│ └── admin.py # /api/train, /api/data-sources/configure
├── breach_search.py # Internet breach search (Yahoo News + DuckDuckGo)
├── ticker_resolver.py # Company name → stock ticker (200+ mappings)
├── ticker_search.py # Live ticker search (Yahoo Finance + NSE India)
├── stock_loader.py # Multi-source stock data fetcher with caching
├── data_sources.py # YFinance, Alpha Vantage, NSE India, Yahoo scrape
├── feature_engine.py # Event study: AR, CAR, volatility, recovery
├── model.py # XGBoost severity classifier
├── preprocessor.py # CSV/XLSX/Excel preprocessing pipeline
├── explainability.py # Step-by-step calculation breakdown
├── llm_integration.py # LM Studio client (optional)
└── cli.py # CLI: demo, train, score
frontend/ # React + Vite + Tailwind CSS
├── src/
│ ├── App.jsx # Dashboard: 4 tabs + LLM panel
│ ├── index.css # Tailwind + terminal aesthetic
│ └── components/
│ ├── score/ # ScoreForm, RiskGauge, ProbabilityBar, FeaturesChart
│ ├── upload/ # FileUpload, DatasetPreview, BatchResults
│ ├── explain/ # ExplainabilityPanel
│ ├── llm/ # LLMAnalysisPanel
│ ├── demos/ # DemoCard
│ ├── settings/ # SettingsPanel
│ ├── layout/ # Header, TabBar, Footer
│ └── ui/ # shadcn/ui primitives
└── package.json
tests/ # 144 tests across 11 modules
- Domain exceptions decoupled from HTTP. Services raise
BreachAlphaErrorsubclasses. A global handler inserver.pytranslates them to HTTP status codes. This keeps business logic framework-agnostic. - Route modules are factory functions.
create_score_routes(limiter) -> APIRouter— the limiter is injected, not global. - Multi-source stock data. If Yahoo Finance fails, the system falls back through Alpha Vantage → NSE India → Yahoo scraping. Each source has a
supports_ticker()gate. - ProcessPoolExecutor for feature computation. CPU-bound work bypasses the GIL via multiprocessing, not threading.
- No TypeScript. Matches the existing codebase. No added build complexity.
- Python 3.10+
- Node.js 18+
- npm
- LM Studio running on
192.168.56.1:1234with a chat model (e.g., qwen3.5-9b) for LLM enrichment features.
# Clone the repo
git clone https://github.com/AshayK003/BreachAlpha.git
cd BreachAlpha
# Create a virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS/Linux
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
# Start the backend
uvicorn breachalpha.server:app --reload --port 8000The API is now at http://localhost:8000. The model trains on synthetic data on first use (~2s).
cd frontend
npm install
npm run devOpens at http://localhost:3000. Vite proxies /api requests to localhost:8000.
| Variable | Default | Purpose |
|---|---|---|
BREACHALPHA_ADMIN_KEY |
"" (disabled) |
Required for admin endpoints (/api/train, /api/data-sources/configure, DELETE /api/cache). When empty, these return 503. |
BREACHALPHA_LLM_URL |
http://192.168.56.1:1234 |
LM Studio server URL for optional LLM features. |
BREACHALPHA_CORS_ORIGINS |
http://localhost:3000,http://127.0.0.1:3000 |
Comma-separated allowed CORS origins. |
ALPHA_VANTAGE_API_KEY |
"" |
Optional Alpha Vantage API key for stock data fallback. Free tier: 25 calls/day. |
# Terminal 1: Backend (hot-reload)
uvicorn breachalpha.server:app --reload --port 8000
# Terminal 2: Frontend (hot-reload)
cd frontend && npm run devThe frontend dev server at :3000 proxies API calls to :8000 via Vite config.
# Run demo with 3 famous breaches (Equifax, Capital One, Marriott)
breachalpha demo
# Score a company
breachalpha score --company Equifax
# Train on a real breach dataset
breachalpha train --data data/breaches.csvThe model trains automatically on synthetic data if no trained model exists. This takes ~2 seconds and produces a basic classifier. For better accuracy, train on real breach data via the admin endpoint or CLI.
# Run all tests
pytest
# With coverage
pytest --cov=breachalpha --cov-report=term-missing
# Run a specific test file
pytest tests/test_routes_api.py -v
# Run a specific test
pytest tests/test_routes_api.py::test_score_company -vThe test suite covers:
- Data loading, preprocessing, and feature computation
- Model training, prediction, and batch scoring
- Ticker resolution and validation
- API endpoint behavior (via httpx AsyncClient)
- File upload validation
- Security middleware (admin auth, rate limiting)
The project enforces a minimum of 60% coverage (pyproject.toml). Run with --cov-fail-under=60 in CI.
| Method | Endpoint | Rate Limit | Description |
|---|---|---|---|
| GET | /api/health |
— | Health check + model status |
| POST | /api/score |
10/min | Score a single company |
| POST | /api/score/config |
10/min | Score with custom analysis config |
| POST | /api/score/auto |
5/min | Auto-search breach data and score |
| GET | /api/search |
30/min | Search stock tickers |
| GET | /api/breach-search |
10/min | Search breach incidents |
| GET | /api/demo |
— | Demo with 3 famous breaches |
| POST | /api/explain |
10/min | Explainability report |
| POST | /api/explain/auto |
5/min | Auto-search + explain |
| Method | Endpoint | Rate Limit | Description |
|---|---|---|---|
| POST | /api/upload |
10/min | Upload & preprocess dataset |
| POST | /api/upload/analyze |
5/min | Upload + batch analyze |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/llm/status |
Check LLM availability |
| POST | /api/llm/analyze-dataset |
AI analysis of batch results |
| POST | /api/llm/risk-summary |
AI risk summary |
| POST | /api/llm/ask |
Q&A about breach data |
| POST | /api/llm/enrich |
Enrich records with LLM |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/train |
Train model on breach CSV |
| POST | /api/data-sources/configure |
Configure data sources |
| DELETE | /api/cache |
Clear stock cache |
curl -X POST http://localhost:8000/api/score \
-H "Content-Type: application/json" \
-d '{"company": "Equifax", "breach_type": "data_leak", "records_affected": 147000000, "breach_date": "2017-09-07"}'Response:
{
"company": "Equifax",
"ticker": "EFX",
"risk_score": 72.5,
"prediction": "high",
"confidence": 0.84,
"probabilities": {"low": 0.02, "medium": 0.10, "high": 0.72, "critical": 0.16},
"features": {
"abnormal_return_day0": -0.0921,
"car_minus5_plus30": -0.1834,
"volatility_spike": 2.4,
"time_to_recovery": 45
}
}- Abnormal Return:
AR = R_stock - R_market— isolate breach-specific impact from market movement - Cumulative AR:
CAR = Σ ARover event window — total breach impact over time - Features: AR at Day 0/1/5/30, CAR windows (-1,+1) and (-5,+30), volatility spike, volume change, recovery time, breach size
- Model: XGBoost classifier trained to predict severity (Low/Medium/High/Critical)
- Risk Score: Weighted probability sum mapped to 0-100:
Low(10)×P(low) + Medium(35)×P(medium) + High(65)×P(high) + Critical(95)×P(critical)
The market aggregates all available information into stock prices. When a breach is disclosed, the price change reflects the market's assessment of the financial damage — accounting for company size, sector, market conditions, and breach specifics. This is more robust than estimating costs from headline numbers alone.
| Source | Tickers | Fallback Priority | Notes |
|---|---|---|---|
| yfinance (curl_cffi) | All | 1 (primary) | Uses Chrome TLS fingerprint to bypass blocks |
| Alpha Vantage | All | 2 | Requires free API key. 25 calls/day. |
| NSE India | .NS, .BO |
3 | Direct API for Indian stocks |
| Yahoo Finance scrape | All | 4 | HTML scraping fallback |
The system automatically tries each source in priority order. Stock data is cached locally in data/stock_cache/ (24h TTL).
BreachAlpha maps company names to tickers using:
- A hardcoded dictionary of 200+ companies (US, India, Europe, Asia)
- Live search via Yahoo Finance and NSE India
- Indian stock suffix detection (
.NS,.BO)
# Build frontend
cd frontend
npm run build
cd ..
# Start backend (serves SPA from frontend/dist)
uvicorn breachalpha.server:app --host 0.0.0.0 --port 8000When frontend/dist/ exists, the backend serves it as static files with SPA catch-all routing.
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -e ".[dev]"
RUN cd frontend && npm install && npm run build
EXPOSE 8000
CMD ["uvicorn", "breachalpha.server:app", "--host", "0.0.0.0", "--port", "8000"]- Admin auth: Set
BREACHALPHA_ADMIN_KEYto a strong secret. Admin endpoints return 503 without it. - CORS: Set
BREACHALPHA_CORS_ORIGINSto your production domain. - Rate limiting: Default 120 req/min per IP. Adjust in
server.pyif needed. - Stock cache: Stored in
data/stock_cache/. Persist this volume across deployments to avoid re-fetching. - Model: Trains on synthetic data if no trained model exists. Train on real data in production for better accuracy.
- LLM: Optional. Requires LM Studio running separately. The backend works fully without it.
The ticker couldn't be resolved or Yahoo Finance returned no data. Check:
- Is the ticker valid? Try
curl localhost:8000/api/search?q=COMPANY - Is the company in
KNOWN_TICKERS(ticker_resolver.py)? Add it if missing. - For Indian stocks, use
.NSsuffix (e.g.,TCS.NS).
Fewer than 30 trading days of stock data around the breach date. Common causes:
- Breach date is too recent (not enough post-breach data)
- Company is thinly traded or delisted
- Try extending
start_datein Settings
BREACHALPHA_ADMIN_KEY is not set. Set it:
export BREACHALPHA_ADMIN_KEY="your-secret-key"
curl -H "X-Admin-Key: your-secret-key" -X POST http://localhost:8000/api/train ...The LLM panel shows "Connect LM Studio" — the backend can't reach BREACHALPHA_LLM_URL. Ensure:
- LM Studio is running
- A model is loaded
- The URL is correct (
http://192.168.56.1:1234by default)
cd frontend
rm -rf node_modules
npm install
npm run buildpip install -e ".[dev]"Make sure you're in the project root and the virtual environment is activated.
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-change - Make changes and add tests
- Run
pytest— all 144 tests must pass - Run
pytest --cov=breachalpha --cov-fail-under=60— coverage must not drop - Submit a pull request
- Python: Follow existing style. No type annotations on internal helpers unless they add clarity.
- Frontend: Plain JavaScript (no TypeScript). Functional components with hooks. shadcn/ui primitives.
- Tests: Write tests for new features. Aim for behavior coverage, not line coverage.
- Commit messages: Short imperative: "add SSRF validation", "fix CSV injection", "remove dead code"
- Create a route function in the appropriate
routes/*.pyfile - Add request/response models to
schemas.py - Add domain exceptions to
core/exceptions.pyif needed - Register the route in
server.pyvia the factory pattern - Write tests in
tests/test_routes_api.py
- Subclass
DataSourceindata_sources.py - Implement
fetch(),supports_ticker(), andname - Add it to
DataFetcher.sourcesandFetcherConfig.sources_priority - Write tests in a new
test_data_sources.py
MIT
If BreachAlpha helps quantify cyber risk for your team or board, consider supporting the developer: