Skip to content

FaizarM/MLOps-Obesity-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

End-to-End MLOps Pipeline — Obesity Level Classification

An end-to-end Machine Learning Operations (MLOps) pipeline that takes a tabular classification problem from raw data all the way to a served, containerized, and monitored model. Built as a hands-on implementation of the full MLOps lifecycle: experimentation, experiment tracking, CI/CD, containerization, and real-time monitoring.

Problem: Predict an individual's obesity level (7 classes) from eating habits and physical condition. Model: Random Forest — ~94% accuracy / 0.94 weighted F1. The model itself is intentionally simple; the focus of this project is the operations tooling around it.


Architecture

flowchart LR
    A[Raw Dataset] --> B[Preprocessing<br/>EDA + cleaning + encoding]
    B --> C[Model Training<br/>RandomForest + GridSearchCV]
    C --> D[Experiment Tracking<br/>MLflow + DagsHub]
    C --> E[CI/CD<br/>GitHub Actions + MLflow Project]
    E --> F[Docker Image<br/>Docker Hub]
    F --> G[Model Serving<br/>REST /invocations]
    G --> H[Monitoring & Alerting<br/>Prometheus + Grafana]
Loading

Tech Stack

Stage Tools
Preprocessing & EDA pandas, scikit-learn, matplotlib, seaborn
Experiment tracking MLflow, DagsHub
CI/CD GitHub Actions, MLflow Project, conda
Containerization Docker, Docker Hub
Monitoring & alerting Prometheus, Grafana

Repository Structure

.
├── Preprocessing/      # EDA notebook + automated preprocessing script
├── Modelling/          # MLflow training, hyperparameter tuning, DagsHub logging
├── Workflow-CI/        # MLflow Project + CI workflow (build & push Docker image)
└── Monitoring/         # Prometheus exporter, inference load script, Grafana evidence

Pipeline Stages

1. Preprocessing (Preprocessing/)

Exploratory data analysis and an automated preprocessing pipeline: missing-value handling, duplicate removal, IQR outlier capping, categorical encoding (binary / ordinal / one-hot), and feature scaling. The notebook documents the manual exploration; automate_*.py reproduces it programmatically and is triggered automatically on every push via GitHub Actions in the source repository.

2. Model Training & Tracking (Modelling/)

A Random Forest classifier trained with two flavors:

  • modelling.py — MLflow autolog with local tracking.
  • modelling_tuning.pyhyperparameter tuning (GridSearchCV) with manual logging of metrics (accuracy, precision, recall, F1) plus extra artifacts (confusion matrix, classification report, feature importance), logged online to DagsHub.

3. CI/CD Workflow (Workflow-CI/)

An MLflow Project packaged with conda.yaml, retrained through a GitHub Actions pipeline (ci.yml) that trains the model, uploads the model artifact, and builds & pushes a Docker image to Docker Hub using mlflow models build-docker.

4. Monitoring & Logging (Monitoring/)

The containerized model is served and instrumented with a custom Prometheus exporter (prometheus_exporter.py) exposing 13 metrics — request count/rate, latency, error rate, prediction distribution, and system CPU/memory/disk. inference.py generates traffic. Grafana visualizes all metrics on a dashboard with 3 alerting rules (high memory, high disk, high latency) routed to a webhook. Evidence is in Monitoring/screenshots/.


Results

  • Accuracy: ~0.94 · Weighted F1: ~0.94 (7-class classification)
  • Fully automated preprocessing & retraining on every push
  • Containerized model image published to Docker Hub
  • 13 live metrics monitored in Grafana with 3 active alerts

Live Resources

Dataset

Estimation of Obesity Levels Based On Eating Habits and Physical Condition — UCI Machine Learning Repository (2,111 records, 17 features, 7 target classes).


Built by Muhammad Fariz Abizar — Data Science.

About

End-to-end MLOps pipeline for obesity classification — MLflow & DagsHub tracking, CI/CD with GitHub Actions & Docker, Prometheus & Grafana monitoring.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages