Multi-Model IDS

Anomaly-based network intrusion detection using Isolation Forest , trained on benign traffic only, no attack labels required. Benchmarked against supervised classifiers on the full CICIDS2017 dataset (2.8M flows, 14 attack types).

The idea

Most IDS systems are signature-based: they only catch attacks they've seen before. This project uses anomaly detection instead , train exclusively on normal traffic, flag anything that deviates. The model never sees a single attack sample during training.

Train on benign traffic only (1.5M flows)
             ↓
     Learn what "normal" looks like
             ↓
  Flag deviations as intrusions at inference
             ↓
  No signatures. No labeled attacks. Detects unknowns.

Results

Evaluated on 933,200 test samples (749,536 benign / 183,664 attacks) across all 14 attack types in CICIDS2017: DoS Hulk, PortScan, DDoS, DoS GoldenEye, FTP-Patator, SSH-Patator, DoS Slowloris, DoS Slowhttptest, Bot, Web Attack Brute Force, Web Attack XSS, Infiltration, Web Attack SQL Injection, Heartbleed.

Model	Type	Accuracy	Attack Recall	Training data
Isolation Forest	anomaly detection	73.7%	47%	benign only , zero attack labels
Logistic Regression	supervised	92.4%	76%	full labeled dataset
KNN	supervised	99.3%	99%	full labeled dataset
Decision Tree	supervised	99.9%	99.9%	full labeled dataset

Isolation Forest , Anomaly Score Distribution

Benign traffic clusters around 0.3. Attacks spread toward higher scores. The separation is visible even without any attack labels seen during training , IF is picking up genuine deviations from normal behavior.

Isolation Forest , Confusion Matrix

What the numbers actually mean:

Isolation Forest achieves 47% attack recall having never seen a single attack sample during training. The supervised models need thousands of labeled attacks to hit 99% , IF needs none. The gap is the cost of zero label dependency, and the payoff is the ability to detect attack patterns that have never been seen before.

The supervised models hitting 99%+ on CICIDS2017 is also misleading , DoS and PortScan have extremely obvious network signatures that any tree can learn in a couple of splits. Real production traffic is far noisier. IF's approach is inherently more robust to novel attacks.

Dataset

CICIDS2017 , Canadian Institute for Cybersecurity, University of New Brunswick. 8 CSV files covering a full work week of labeled network flows.

Label	Count
BENIGN	2,273,097
DoS Hulk	231,073
PortScan	158,930
DDoS	128,027
DoS GoldenEye	10,293
FTP-Patator	7,938
SSH-Patator	5,897
DoS Slowloris	5,796
DoS Slowhttptest	5,499
Bot	1,966
Web Attack Brute Force	1,507
Web Attack XSS	652
Infiltration	36
Web Attack SQL Injection	21
Heartbleed	11

Labels were binary-encoded: BENIGN=0, any attack=1. Attack ratio: 19.68%.

Notebook

The full pipeline is in ml/network-intrusion-detection-system.ipynb , runnable directly on Kaggle with the dataset attached.

Pipeline:

Load and merge all 8 CSVs
Drop NaN and Inf rows
Binary encode labels (BENIGN vs attack)
MinMaxScaler on all features
67/33 stratified train/test split
Train IF on benign-only subset of train data
Train supervised baselines on full labeled train data
Evaluate all models on the same test set

Project structure

Multi_Model_IDS/
├── ml/
│   └── network-intrusion-detection-system.ipynb
├── lab/
│   ├── docker-compose.yaml     , target (nginx) + attacker (Kali) + capture (tcpdump)
│   ├── attacks.sh              , DoS, brute-force, SQLi, LFI, nmap
│   ├── normal_traffic.sh       , benign HTTP traffic
│   └── benign_anomalies.sh     , edge-case benign patterns
├── results/
│   └── plots/
│       └── if_confusion_matrix.png
├── requirements.txt
└── .gitignore

Attack lab

A Docker-composed environment for generating labeled network traffic , the foundation for building a self-captured dataset.

cd lab
docker compose up -d

# Run simulated attacks from the attacker container
docker exec -it ids_attacker bash /lab/attacks.sh

# PCAP captured to data/captures/traffic.pcap
docker compose down

Three containers: ids_target (nginx), ids_attacker (Kali), ids_capture (tcpdump writing PCAP). Attack scripts simulate Slowloris DoS, login brute-force, SQL injection, LFI path traversal, and nmap scans.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
lab		lab
ml		ml
results/plots		results/plots
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Model IDS

The idea

Results

Isolation Forest , Anomaly Score Distribution

Isolation Forest , Confusion Matrix

Dataset

Notebook

Project structure

Attack lab

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Model IDS

The idea

Results

Isolation Forest , Anomaly Score Distribution

Isolation Forest , Confusion Matrix

Dataset

Notebook

Project structure

Attack lab

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages