Skip to content

Reslan-Tinawi/CS909-data-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS909 Data Mining - Portfolio Projects

Python Jupyter scikit-learn PyTorch License: MIT

A comprehensive collection of machine learning and deep learning projects completed as part of the CS909 Data Mining module at Warwick Data Analytics MSc. This repository demonstrates proficiency in classification, regression, feature engineering, model evaluation, and deep learning architectures.

⚠️ Note: This repository is approximately 1.64 GiB due to image datasets.

Open in Studio

📋 Table of Contents

🎯 Overview

This repository showcases two comprehensive data mining projects that demonstrate end-to-end machine learning workflows:

  1. Multi-class Classification: Fashion-MNIST image classification with advanced evaluation metrics
  2. Regression with Computer Vision: Protein expression prediction from tissue images using traditional ML and deep learning

Both projects emphasize rigorous model evaluation, feature engineering, hyperparameter optimization, and comparative analysis of multiple algorithmic approaches.

📊 Projects

Assignment 1: Fashion-MNIST Classification

A comprehensive classification pipeline for the Fashion-MNIST dataset, focusing on proper evaluation metrics and model selection.

Key Highlights:

  • 🎯 Multi-class classification of 10 fashion categories
  • 📈 Extensive evaluation using precision, recall, F1-score, and confusion matrices
  • 🔍 Comparison of multiple algorithms (SVM, Random Forest, Logistic Regression, Neural Networks)
  • ⚙️ Hyperparameter tuning and cross-validation
  • 📊 Detailed performance analysis and visualization

View Assignment 1 Details →

Assignment 2: Protein Expression Prediction

Predicting protein expression levels from biological tissue images using both traditional machine learning and deep learning approaches.

Key Highlights:

  • 🧬 Regression task on protein expression data
  • 🖼️ Feature extraction from tissue images (PCA, GLCM texture features)
  • 🤖 Traditional ML models (MLP, SVR) and deep learning (CNNs)
  • 🔄 Transfer learning with pre-trained ResNet50
  • 📊 Comprehensive model comparison and performance evaluation

View Assignment 2 Details →

🛠️ Technologies Used

Core Libraries

  • Python 3.8+: Primary programming language
  • NumPy & Pandas: Data manipulation and analysis
  • Matplotlib & Seaborn: Data visualization

Machine Learning

  • scikit-learn: Traditional ML algorithms, preprocessing, and metrics
  • XGBoost/LightGBM: Gradient boosting frameworks

Deep Learning

  • PyTorch: Neural network architectures and training
  • torchvision: Pre-trained models and image transformations
  • scikit-image: Image processing and feature extraction

Development Tools

  • Jupyter Notebook: Interactive development and documentation
  • Git: Version control

💡 Key Skills Demonstrated

  • Classification & Regression: Multi-class classification and regression tasks
  • Feature Engineering: PCA, texture analysis (GLCM), image preprocessing
  • Model Evaluation: Cross-validation, confusion matrices, ROC curves, multiple metrics
  • Hyperparameter Tuning: Grid search and optimization strategies
  • Deep Learning: CNN architectures, transfer learning, fine-tuning
  • Computer Vision: Image-based feature extraction and prediction
  • Data Visualization: Comprehensive plots for model performance and insights
  • Pipeline Development: End-to-end ML workflows from data to deployment

📥 Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • 2+ GB of free disk space (for datasets)

Setup Instructions

  1. Clone the repository

    git clone https://github.com/Reslan-Tinawi/CS909-data-mining.git
    cd CS909-data-mining
  2. Create a virtual environment (recommended)

    On Windows:

    python -m venv venv
    venv\Scripts\activate

    On macOS/Linux:

    python -m venv venv
    source venv/bin/activate
  3. Install dependencies

    For Assignment 1:

    cd assignment-1
    pip install -r requirements.txt

    For Assignment 2:

    cd assignment-2
    pip install -r requirements.txt
  4. Launch Jupyter

    jupyter lab

📁 Repository Structure

CS909-data-mining/
├── assignment-1/              # Fashion-MNIST Classification
│   ├── data/                  # Training and test datasets
│   ├── solution.ipynb         # Complete analysis and models
│   ├── requirements.txt       # Python dependencies
│   └── README.md             # Assignment 1 details
├── assignment-2/              # Protein Expression Prediction
│   ├── data/                  # Protein expression and tissue images
│   │   └── patches_256/      # Image patches
│   ├── solution.ipynb         # Complete analysis and models
│   ├── requirements.txt       # Python dependencies
│   └── README.md             # Assignment 2 details
├── LICENSE                    # MIT License
└── README.md                 # This file

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


Course: CS909 Data Mining - Warwick Data Analytics MSc
Academic Year: 2023-2024
Author: Reslan Tinawi

For questions or collaboration opportunities, feel free to reach out!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages