Anomaly-based network intrusion detection using Isolation Forest , trained on benign traffic only, no attack labels required. Benchmarked against supervised classifiers on the full CICIDS2017 dataset (2.8M flows, 14 attack types).
Most IDS systems are signature-based: they only catch attacks they've seen before. This project uses anomaly detection instead , train exclusively on normal traffic, flag anything that deviates. The model never sees a single attack sample during training.
Train on benign traffic only (1.5M flows)
↓
Learn what "normal" looks like
↓
Flag deviations as intrusions at inference
↓
No signatures. No labeled attacks. Detects unknowns.
Evaluated on 933,200 test samples (749,536 benign / 183,664 attacks) across all 14 attack types in CICIDS2017: DoS Hulk, PortScan, DDoS, DoS GoldenEye, FTP-Patator, SSH-Patator, DoS Slowloris, DoS Slowhttptest, Bot, Web Attack Brute Force, Web Attack XSS, Infiltration, Web Attack SQL Injection, Heartbleed.
| Model | Type | Accuracy | Attack Recall | Training data |
|---|---|---|---|---|
| Isolation Forest | anomaly detection | 73.7% | 47% | benign only , zero attack labels |
| Logistic Regression | supervised | 92.4% | 76% | full labeled dataset |
| KNN | supervised | 99.3% | 99% | full labeled dataset |
| Decision Tree | supervised | 99.9% | 99.9% | full labeled dataset |
Benign traffic clusters around 0.3. Attacks spread toward higher scores. The separation is visible even without any attack labels seen during training , IF is picking up genuine deviations from normal behavior.
What the numbers actually mean:
Isolation Forest achieves 47% attack recall having never seen a single attack sample during training. The supervised models need thousands of labeled attacks to hit 99% , IF needs none. The gap is the cost of zero label dependency, and the payoff is the ability to detect attack patterns that have never been seen before.
The supervised models hitting 99%+ on CICIDS2017 is also misleading , DoS and PortScan have extremely obvious network signatures that any tree can learn in a couple of splits. Real production traffic is far noisier. IF's approach is inherently more robust to novel attacks.
CICIDS2017 , Canadian Institute for Cybersecurity, University of New Brunswick. 8 CSV files covering a full work week of labeled network flows.
| Label | Count |
|---|---|
| BENIGN | 2,273,097 |
| DoS Hulk | 231,073 |
| PortScan | 158,930 |
| DDoS | 128,027 |
| DoS GoldenEye | 10,293 |
| FTP-Patator | 7,938 |
| SSH-Patator | 5,897 |
| DoS Slowloris | 5,796 |
| DoS Slowhttptest | 5,499 |
| Bot | 1,966 |
| Web Attack Brute Force | 1,507 |
| Web Attack XSS | 652 |
| Infiltration | 36 |
| Web Attack SQL Injection | 21 |
| Heartbleed | 11 |
Labels were binary-encoded: BENIGN=0, any attack=1. Attack ratio: 19.68%.
The full pipeline is in ml/network-intrusion-detection-system.ipynb , runnable directly on Kaggle with the dataset attached.
Pipeline:
- Load and merge all 8 CSVs
- Drop NaN and Inf rows
- Binary encode labels (BENIGN vs attack)
- MinMaxScaler on all features
- 67/33 stratified train/test split
- Train IF on benign-only subset of train data
- Train supervised baselines on full labeled train data
- Evaluate all models on the same test set
Multi_Model_IDS/
├── ml/
│ └── network-intrusion-detection-system.ipynb
├── lab/
│ ├── docker-compose.yaml , target (nginx) + attacker (Kali) + capture (tcpdump)
│ ├── attacks.sh , DoS, brute-force, SQLi, LFI, nmap
│ ├── normal_traffic.sh , benign HTTP traffic
│ └── benign_anomalies.sh , edge-case benign patterns
├── results/
│ └── plots/
│ └── if_confusion_matrix.png
├── requirements.txt
└── .gitignore
A Docker-composed environment for generating labeled network traffic , the foundation for building a self-captured dataset.
cd lab
docker compose up -d
# Run simulated attacks from the attacker container
docker exec -it ids_attacker bash /lab/attacks.sh
# PCAP captured to data/captures/traffic.pcap
docker compose downThree containers: ids_target (nginx), ids_attacker (Kali), ids_capture (tcpdump writing PCAP). Attack scripts simulate Slowloris DoS, login brute-force, SQL injection, LFI path traversal, and nmap scans.

