Architect: KAAD · Model Version: KAAD-1.1.0
rpds/
├── backend/ ← FastAPI Python backend
│ ├── config.py ← Central configuration
│ ├── logging_config.py ← Structured JSON logging (structlog)
│ ├── connectivity.py ← Cached online/offline checker
│ ├── circuit_breaker.py ← Async circuit breaker
│ ├── url_validator.py ← URL validation & normalisation
│ ├── feature_extractor.py ← Structural feature extractor
│ ├── main.py ← FastAPI app entry point
│ ├── api/
│ │ ├── routes.py ← /analyze, /health, /status
│ │ ├── schemas.py ← Pydantic v2 models
│ │ └── middleware.py ← CORS + request logging
│ ├── engines/
│ │ ├── whitelist_engine.py ← Trusted domain bypass
│ │ ├── blacklist_engine.py ← OpenPhish feed + cache
│ │ ├── cnn_engine.py ← Char-CNN inference (PyTorch CPU)
│ │ ├── tree_engine.py ← LightGBM/RandomForest
│ │ └── orchestrator.py ← KAAD Core: adaptive fusion
│ ├── training/ ← Standalone offline training scripts
│ │ ├── dataset_utils.py ← Deduplicate, balance, split
│ │ ├── train_cnn.py ← CNN training (FocalLoss + 5-fold CV)
│ │ ├── train_tree.py ← Tree model training
│ │ ├── calibrate.py ← Temperature scaling calibration
│ │ └── tune_threshold.py ← ROC threshold tuning
│ ├── models/ ← Trained model files (generated by training)
│ └── data/
│ ├── raw/ ← Place your phishing dataset CSV here
│ └── whitelists/whitelist.txt
├── frontend/ ← Next.js 14 UI
│ ├── app/
│ │ ├── layout.tsx
│ │ ├── page.tsx ← Main scan page
│ │ └── globals.css ← Cyber dark theme
│ ├── components/
│ │ ├── StatusBar.tsx ← Online/offline + KAAD branding
│ │ ├── ScanInput.tsx ← Multi-URL scanner input
│ │ ├── RiskGauge.tsx ← Animated SVG arc gauge
│ │ ├── EngineCards.tsx ← Per-engine score cards
│ │ ├── ThreatReasoningPanel.tsx ← Explainability layer
│ │ ├── SystemLog.tsx ← AI system log panel
│ │ └── ResultCard.tsx ← Full result per URL
│ └── lib/api.ts ← Typed fetch client
├── smoke_test.py ← Quick orchestrator verification
├── start_backend.bat ← Windows: install + start backend
└── start_frontend.bat ← Windows: install + start frontend
start_backend.batOr manually:
cd backend
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadstart_frontend.batOr manually:
cd frontend
npm install
npm run devOpen http://localhost:3000 in your browser.
The system works without trained models — it runs structural risk analysis via entropy, TLD scoring, keyword detection etc. CNN and Tree show as "N/A" until trained.
Download a phishing URL dataset:
- PhishTank — Free CSV download
- UCI Phishing URLs
Place the CSV in backend/data/raw/.
cd backend
python -m training.dataset_utils data/raw/phishing.csv url label 1python -m training.train_cnn
# Saves: backend/models/cnn_model.ptpython -m training.train_tree
# Saves: backend/models/tree_model.pklpython -m training.calibrate
python -m training.tune_thresholdRestart the backend server — engines will load automatically.
Base URL: http://localhost:8000
| Endpoint | Method | Description |
|---|---|---|
/api/v1/analyze |
POST | Analyze 1–10 URLs |
/api/v1/health |
GET | System health + engine status |
/api/v1/status |
GET | Version + mode info |
/docs |
GET | Swagger UI |
POST /api/v1/analyze
{ "urls": ["https://suspicious-site.tk/login"] }{
"engine": "RPDS – KAAD CORE",
"architect": "KAAD",
"mode": "online",
"url": "http://suspicious-site.tk/login",
"final_verdict": "HIGH RISK",
"final_score": 0.73,
"confidence_class": "HIGH",
"engines": {
"whitelist": {"hit": false, "matched": null},
"blacklist": {"hit": false, "score": 0.0, "available": true},
"cnn": {"score": 0.0, "available": false},
"tree": {"score": 0.0, "available": false, "model_type": "none"}
},
"threat_reasoning": [
{"factor": "High-risk TLD (risk=1.0)", "impact": "TLD commonly abused for phishing campaigns"},
{"factor": "Suspicious keyword detected", "impact": "Common phishing pattern"}
],
"structural_analysis": { "entropy": 3.8, "tld_risk_score": 1.0, ... },
"threat_signature": "sha256hex...",
"model_version": "KAAD-1.1.0",
"analysis_time_ms": 1.3
}| Rule | Implementation |
|---|---|
| Models load once | @app.on_event("startup") |
| No retraining in inference | Training scripts are standalone |
| Blacklist timeout + circuit breaker | circuit_breaker.py |
| All engines try/except wrapped | Returns degraded result, never crashes |
| Offline fallback | CNN + Tree only if no internet |
| Max 10 URLs / request | Pydantic max_length=10 |
| Max URL length 2048 | Validator + Pydantic |
| Global exception handler | 500 → JSON error (never stack trace) |
| Metric | Target |
|---|---|
| Accuracy | > 99% |
| Precision | > 98% |
| Recall | > 97% |
| False Positive Rate | < 1% |
| ROC-AUC | > 0.995 |