A comprehensive collection of machine learning and deep learning projects completed as part of the CS909 Data Mining module at Warwick Data Analytics MSc. This repository demonstrates proficiency in classification, regression, feature engineering, model evaluation, and deep learning architectures.
- Overview
- Projects
- Technologies Used
- Key Skills Demonstrated
- Installation
- Repository Structure
- License
This repository showcases two comprehensive data mining projects that demonstrate end-to-end machine learning workflows:
- Multi-class Classification: Fashion-MNIST image classification with advanced evaluation metrics
- Regression with Computer Vision: Protein expression prediction from tissue images using traditional ML and deep learning
Both projects emphasize rigorous model evaluation, feature engineering, hyperparameter optimization, and comparative analysis of multiple algorithmic approaches.
A comprehensive classification pipeline for the Fashion-MNIST dataset, focusing on proper evaluation metrics and model selection.
Key Highlights:
- 🎯 Multi-class classification of 10 fashion categories
- 📈 Extensive evaluation using precision, recall, F1-score, and confusion matrices
- 🔍 Comparison of multiple algorithms (SVM, Random Forest, Logistic Regression, Neural Networks)
- ⚙️ Hyperparameter tuning and cross-validation
- 📊 Detailed performance analysis and visualization
Predicting protein expression levels from biological tissue images using both traditional machine learning and deep learning approaches.
Key Highlights:
- 🧬 Regression task on protein expression data
- 🖼️ Feature extraction from tissue images (PCA, GLCM texture features)
- 🤖 Traditional ML models (MLP, SVR) and deep learning (CNNs)
- 🔄 Transfer learning with pre-trained ResNet50
- 📊 Comprehensive model comparison and performance evaluation
- Python 3.8+: Primary programming language
- NumPy & Pandas: Data manipulation and analysis
- Matplotlib & Seaborn: Data visualization
- scikit-learn: Traditional ML algorithms, preprocessing, and metrics
- XGBoost/LightGBM: Gradient boosting frameworks
- PyTorch: Neural network architectures and training
- torchvision: Pre-trained models and image transformations
- scikit-image: Image processing and feature extraction
- Jupyter Notebook: Interactive development and documentation
- Git: Version control
- ✅ Classification & Regression: Multi-class classification and regression tasks
- ✅ Feature Engineering: PCA, texture analysis (GLCM), image preprocessing
- ✅ Model Evaluation: Cross-validation, confusion matrices, ROC curves, multiple metrics
- ✅ Hyperparameter Tuning: Grid search and optimization strategies
- ✅ Deep Learning: CNN architectures, transfer learning, fine-tuning
- ✅ Computer Vision: Image-based feature extraction and prediction
- ✅ Data Visualization: Comprehensive plots for model performance and insights
- ✅ Pipeline Development: End-to-end ML workflows from data to deployment
- Python 3.8 or higher
- pip package manager
- 2+ GB of free disk space (for datasets)
-
Clone the repository
git clone https://github.com/Reslan-Tinawi/CS909-data-mining.git cd CS909-data-mining -
Create a virtual environment (recommended)
On Windows:
python -m venv venv venv\Scripts\activate
On macOS/Linux:
python -m venv venv source venv/bin/activate -
Install dependencies
For Assignment 1:
cd assignment-1 pip install -r requirements.txtFor Assignment 2:
cd assignment-2 pip install -r requirements.txt -
Launch Jupyter
jupyter lab
CS909-data-mining/
├── assignment-1/ # Fashion-MNIST Classification
│ ├── data/ # Training and test datasets
│ ├── solution.ipynb # Complete analysis and models
│ ├── requirements.txt # Python dependencies
│ └── README.md # Assignment 1 details
├── assignment-2/ # Protein Expression Prediction
│ ├── data/ # Protein expression and tissue images
│ │ └── patches_256/ # Image patches
│ ├── solution.ipynb # Complete analysis and models
│ ├── requirements.txt # Python dependencies
│ └── README.md # Assignment 2 details
├── LICENSE # MIT License
└── README.md # This file
This project is licensed under the MIT License - see the LICENSE file for details.
Course: CS909 Data Mining - Warwick Data Analytics MSc
Academic Year: 2023-2024
Author: Reslan Tinawi
For questions or collaboration opportunities, feel free to reach out!