An Interpretable Hybrid 3-Tier LSTM Model for Breast Tumor Classification: Balancing Accuracy and Transparency in Clinical Decision Support
This repository contains a reproducible, modular pipeline for tumor classification on the Breast Cancer dataset using a Hybrid 3‑Tier LSTM model. The pipeline includes:
- Clean data loading and preprocessing (drop
idandUnnamed: 32if present) - Encode target (
diagnosis: Malignant M = 1, Benign B = 0) - Train/test split with optional stratification
- SMOTE on training set only
- Decision Tree feature importance (with figure)
- 3‑Tier LSTM (Dense → LSTM → Dense head) with input reshaped to
(samples, timesteps=1, features) - Full evaluation (Accuracy, Precision, Recall, F1, AUC, Confusion Matrix, ROC curve)
- Explainability: SHAP and LIME (with actual feature names)
- Reproducible configs via
config.yaml - One‑command run via
python main.py
Note: You need Python 3.10+ and a working TensorFlow environment. GPU is recommended but not required.
# (optional) create venv
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt
# Run the full pipeline
python main.pyconda env create -f environment.yml conda activate hybrid_lstm python main.py
All outputs (models, figures, reports) will be saved under outputs/.
.
├── config.yaml
├── main.py
├── requirements.txt
├── .gitignore
├── data/
│ └── Cancer_Data.xls
├── src/
│ ├── load_data.py
│ ├── preprocess.py
│ ├── evaluate.py
│ ├── explain.py
│ ├── plots.py
│ └── models/
│ └── lstm_model.py
└── outputs/
├── figures/
├── models/
└── reports/
- SMOTE is applied only on the training split to avoid leakage.
- Feature scaling is fitted on train and applied to both train/test.
- Feature names are preserved throughout for correct plotting and explanations.
- SHAP uses a KernelExplainer on a small background sample to keep runtime reasonable on CPUs.
- LIME uses
LimeTabularExplainerwith consistent feature names and classes.
-https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
Nti, I.K., Nyarko-Boateng, O., Zaman, A. et al. Interpretable hybrid three tier LSTM model for accurate and transparent breast tumor classification in clinical decision support. Discov Health Systems 5, 2 (2026). https://doi.org/10.1007/s44250-025-00315-6