Skip to content

Chaelsoo/Multi-Model-IDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Model IDS

Anomaly-based network intrusion detection using Isolation Forest , trained on benign traffic only, no attack labels required. Benchmarked against supervised classifiers on the full CICIDS2017 dataset (2.8M flows, 14 attack types).


The idea

Most IDS systems are signature-based: they only catch attacks they've seen before. This project uses anomaly detection instead , train exclusively on normal traffic, flag anything that deviates. The model never sees a single attack sample during training.

Train on benign traffic only (1.5M flows)
             ↓
     Learn what "normal" looks like
             ↓
  Flag deviations as intrusions at inference
             ↓
  No signatures. No labeled attacks. Detects unknowns.

Results

Evaluated on 933,200 test samples (749,536 benign / 183,664 attacks) across all 14 attack types in CICIDS2017: DoS Hulk, PortScan, DDoS, DoS GoldenEye, FTP-Patator, SSH-Patator, DoS Slowloris, DoS Slowhttptest, Bot, Web Attack Brute Force, Web Attack XSS, Infiltration, Web Attack SQL Injection, Heartbleed.

Model Type Accuracy Attack Recall Training data
Isolation Forest anomaly detection 73.7% 47% benign only , zero attack labels
Logistic Regression supervised 92.4% 76% full labeled dataset
KNN supervised 99.3% 99% full labeled dataset
Decision Tree supervised 99.9% 99.9% full labeled dataset

Isolation Forest , Anomaly Score Distribution

Anomaly Score Distribution

Benign traffic clusters around 0.3. Attacks spread toward higher scores. The separation is visible even without any attack labels seen during training , IF is picking up genuine deviations from normal behavior.

Isolation Forest , Confusion Matrix

Confusion Matrix


What the numbers actually mean:

Isolation Forest achieves 47% attack recall having never seen a single attack sample during training. The supervised models need thousands of labeled attacks to hit 99% , IF needs none. The gap is the cost of zero label dependency, and the payoff is the ability to detect attack patterns that have never been seen before.

The supervised models hitting 99%+ on CICIDS2017 is also misleading , DoS and PortScan have extremely obvious network signatures that any tree can learn in a couple of splits. Real production traffic is far noisier. IF's approach is inherently more robust to novel attacks.


Dataset

CICIDS2017 , Canadian Institute for Cybersecurity, University of New Brunswick. 8 CSV files covering a full work week of labeled network flows.

Label Count
BENIGN 2,273,097
DoS Hulk 231,073
PortScan 158,930
DDoS 128,027
DoS GoldenEye 10,293
FTP-Patator 7,938
SSH-Patator 5,897
DoS Slowloris 5,796
DoS Slowhttptest 5,499
Bot 1,966
Web Attack Brute Force 1,507
Web Attack XSS 652
Infiltration 36
Web Attack SQL Injection 21
Heartbleed 11

Labels were binary-encoded: BENIGN=0, any attack=1. Attack ratio: 19.68%.


Notebook

The full pipeline is in ml/network-intrusion-detection-system.ipynb , runnable directly on Kaggle with the dataset attached.

Pipeline:

  1. Load and merge all 8 CSVs
  2. Drop NaN and Inf rows
  3. Binary encode labels (BENIGN vs attack)
  4. MinMaxScaler on all features
  5. 67/33 stratified train/test split
  6. Train IF on benign-only subset of train data
  7. Train supervised baselines on full labeled train data
  8. Evaluate all models on the same test set

Project structure

Multi_Model_IDS/
├── ml/
│   └── network-intrusion-detection-system.ipynb
├── lab/
│   ├── docker-compose.yaml     , target (nginx) + attacker (Kali) + capture (tcpdump)
│   ├── attacks.sh              , DoS, brute-force, SQLi, LFI, nmap
│   ├── normal_traffic.sh       , benign HTTP traffic
│   └── benign_anomalies.sh     , edge-case benign patterns
├── results/
│   └── plots/
│       └── if_confusion_matrix.png
├── requirements.txt
└── .gitignore

Attack lab

A Docker-composed environment for generating labeled network traffic , the foundation for building a self-captured dataset.

cd lab
docker compose up -d

# Run simulated attacks from the attacker container
docker exec -it ids_attacker bash /lab/attacks.sh

# PCAP captured to data/captures/traffic.pcap
docker compose down

Three containers: ids_target (nginx), ids_attacker (Kali), ids_capture (tcpdump writing PCAP). Attack scripts simulate Slowloris DoS, login brute-force, SQL injection, LFI path traversal, and nmap scans.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors