Skip to content

OsherBerGit/Galaxy-Machine-Learning-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌌 Galaxy Machine Learning Project

Python Streamlit scikit-learn NumPy

📖 About

Galaxy Classifier AI is a machine learning system that classifies galaxies as Spiral or Elliptical using image features. The project features a custom AdaBoost implementation from scratch and an interactive Streamlit dashboard for live predictions and research insights.

🛠 Tech Stack

  • ML Framework: Custom AdaBoost, scikit-learn
  • Image Processing: OpenCV, scikit-image
  • Frontend: Streamlit
  • Data: Pandas, NumPy, Matplotlib, Seaborn

✨ Features

🚀 Live Prediction

  • Upload galaxy images and get instant classification
  • View extracted features (color stats, entropy, shape metrics)

📊 Research Dashboard

  • Feature Analysis: Distribution plots, feature importance
  • Boosting Internals: Alpha decay, error evolution visualizations
  • Model Comparison: Benchmark against Random Forest, Gradient Boosting, SVM

⚙️ Technical Highlights

  • Manual AdaBoost: Implemented from scratch with weighted decision stumps
  • Feature Extraction: 11 features (RGB means, RGB stds, entropy, area, perimeter, circularity, eccentricity)
  • Grid Search: Hyperparameter tuning with detailed logging

🚀 Quick Start

  1. Clone & Install:

    git clone https://github.com/OsherBerGit/Galaxy-Machine-Learning-Project.git
    cd Galaxy-Machine-Learning-Project
    pip install -r requirements.txt
  2. Download Data: Get images_training_rev1.zip and training_solutions_rev1.csv from Kaggle Galaxy Zoo, extract to data/

  3. Prepare & Train:

    python src/1_prepare_data.py
    python src/2_feature_extraction.py
    python src/8_train_final_manual_model.py
  4. Run App:

    streamlit run main.py

📁 Project Structure

├── main.py                   # Streamlit dashboard
├── src/
│   ├── galaxy_adaboost.py    # Manual AdaBoost implementation
│   ├── 1_prepare_data.py     # Data preprocessing
│   ├── 2_feature_extraction.py
│   └── ...                   # Analysis scripts (3-7)
├── data/                     # Dataset & results
├── models/                   # Trained models
└── plots/                    # Generated visualizations

Data source: Galaxy Zoo Challenge on Kaggle

About

A machine learning galaxy classifier distinguishing Spiral and Elliptical galaxies, featuring custom AdaBoost implementation from scratch and interactive Streamlit dashboard.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages