Skip to content

Coaad/RPDS_KAAD

Repository files navigation

RPDS – Real-Time Phishing & Web Threat Detection System

KAAD Cyber Intelligence Core · Mini Brain #1

Architect: KAAD · Model Version: KAAD-1.1.0


🏗 Project Structure

rpds/
├── backend/                    ← FastAPI Python backend
│   ├── config.py               ← Central configuration
│   ├── logging_config.py       ← Structured JSON logging (structlog)
│   ├── connectivity.py         ← Cached online/offline checker
│   ├── circuit_breaker.py      ← Async circuit breaker
│   ├── url_validator.py        ← URL validation & normalisation
│   ├── feature_extractor.py    ← Structural feature extractor
│   ├── main.py                 ← FastAPI app entry point
│   ├── api/
│   │   ├── routes.py           ← /analyze, /health, /status
│   │   ├── schemas.py          ← Pydantic v2 models
│   │   └── middleware.py       ← CORS + request logging
│   ├── engines/
│   │   ├── whitelist_engine.py ← Trusted domain bypass
│   │   ├── blacklist_engine.py ← OpenPhish feed + cache
│   │   ├── cnn_engine.py       ← Char-CNN inference (PyTorch CPU)
│   │   ├── tree_engine.py      ← LightGBM/RandomForest
│   │   └── orchestrator.py     ← KAAD Core: adaptive fusion
│   ├── training/               ← Standalone offline training scripts
│   │   ├── dataset_utils.py    ← Deduplicate, balance, split
│   │   ├── train_cnn.py        ← CNN training (FocalLoss + 5-fold CV)
│   │   ├── train_tree.py       ← Tree model training
│   │   ├── calibrate.py        ← Temperature scaling calibration
│   │   └── tune_threshold.py   ← ROC threshold tuning
│   ├── models/                 ← Trained model files (generated by training)
│   └── data/
│       ├── raw/                ← Place your phishing dataset CSV here
│       └── whitelists/whitelist.txt
├── frontend/                   ← Next.js 14 UI
│   ├── app/
│   │   ├── layout.tsx
│   │   ├── page.tsx            ← Main scan page
│   │   └── globals.css         ← Cyber dark theme
│   ├── components/
│   │   ├── StatusBar.tsx       ← Online/offline + KAAD branding
│   │   ├── ScanInput.tsx       ← Multi-URL scanner input
│   │   ├── RiskGauge.tsx       ← Animated SVG arc gauge
│   │   ├── EngineCards.tsx     ← Per-engine score cards
│   │   ├── ThreatReasoningPanel.tsx ← Explainability layer
│   │   ├── SystemLog.tsx       ← AI system log panel
│   │   └── ResultCard.tsx      ← Full result per URL
│   └── lib/api.ts              ← Typed fetch client
├── smoke_test.py               ← Quick orchestrator verification
├── start_backend.bat           ← Windows: install + start backend
└── start_frontend.bat          ← Windows: install + start frontend

🚀 Quick Start

Step 1 — Install Python deps & start backend

start_backend.bat

Or manually:

cd backend
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Step 2 — Start frontend

start_frontend.bat

Or manually:

cd frontend
npm install
npm run dev

Open http://localhost:3000 in your browser.


🤖 Training Your Own Models (Optional but recommended for full accuracy)

The system works without trained models — it runs structural risk analysis via entropy, TLD scoring, keyword detection etc. CNN and Tree show as "N/A" until trained.

1. Get a dataset

Download a phishing URL dataset:

Place the CSV in backend/data/raw/.

2. Prepare dataset

cd backend
python -m training.dataset_utils data/raw/phishing.csv url label 1

3. Train Char-CNN

python -m training.train_cnn
# Saves: backend/models/cnn_model.pt

4. Train Tree model

python -m training.train_tree
# Saves: backend/models/tree_model.pkl

5. Calibrate + tune threshold

python -m training.calibrate
python -m training.tune_threshold

Restart the backend server — engines will load automatically.


📡 API Reference

Base URL: http://localhost:8000

Endpoint Method Description
/api/v1/analyze POST Analyze 1–10 URLs
/api/v1/health GET System health + engine status
/api/v1/status GET Version + mode info
/docs GET Swagger UI

Analyze Request

POST /api/v1/analyze
{ "urls": ["https://suspicious-site.tk/login"] }

Response

{
  "engine": "RPDS – KAAD CORE",
  "architect": "KAAD",
  "mode": "online",
  "url": "http://suspicious-site.tk/login",
  "final_verdict": "HIGH RISK",
  "final_score": 0.73,
  "confidence_class": "HIGH",
  "engines": {
    "whitelist": {"hit": false, "matched": null},
    "blacklist": {"hit": false, "score": 0.0, "available": true},
    "cnn":       {"score": 0.0, "available": false},
    "tree":      {"score": 0.0, "available": false, "model_type": "none"}
  },
  "threat_reasoning": [
    {"factor": "High-risk TLD (risk=1.0)", "impact": "TLD commonly abused for phishing campaigns"},
    {"factor": "Suspicious keyword detected", "impact": "Common phishing pattern"}
  ],
  "structural_analysis": { "entropy": 3.8, "tld_risk_score": 1.0, ... },
  "threat_signature": "sha256hex...",
  "model_version": "KAAD-1.1.0",
  "analysis_time_ms": 1.3
}

🔒 Stability Guarantees

Rule Implementation
Models load once @app.on_event("startup")
No retraining in inference Training scripts are standalone
Blacklist timeout + circuit breaker circuit_breaker.py
All engines try/except wrapped Returns degraded result, never crashes
Offline fallback CNN + Tree only if no internet
Max 10 URLs / request Pydantic max_length=10
Max URL length 2048 Validator + Pydantic
Global exception handler 500 → JSON error (never stack trace)

🎯 Target Performance (after training on quality dataset)

Metric Target
Accuracy > 99%
Precision > 98%
Recall > 97%
False Positive Rate < 1%
ROC-AUC > 0.995

About

RPDS – Real-Time Phishing & Web Threat Detection System | KAAD Cyber Intelligence Core | Hybrid Char-CNN + LightGBM + Blacklist threat detection engine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors