End-to-End MLOps Pipeline — Obesity Level Classification

An end-to-end Machine Learning Operations (MLOps) pipeline that takes a tabular classification problem from raw data all the way to a served, containerized, and monitored model. Built as a hands-on implementation of the full MLOps lifecycle: experimentation, experiment tracking, CI/CD, containerization, and real-time monitoring.

Problem: Predict an individual's obesity level (7 classes) from eating habits and physical condition. Model: Random Forest — ~94% accuracy / 0.94 weighted F1. The model itself is intentionally simple; the focus of this project is the operations tooling around it.

Architecture

flowchart LR
    A[Raw Dataset] --> B[Preprocessing<br/>EDA + cleaning + encoding]
    B --> C[Model Training<br/>RandomForest + GridSearchCV]
    C --> D[Experiment Tracking<br/>MLflow + DagsHub]
    C --> E[CI/CD<br/>GitHub Actions + MLflow Project]
    E --> F[Docker Image<br/>Docker Hub]
    F --> G[Model Serving<br/>REST /invocations]
    G --> H[Monitoring & Alerting<br/>Prometheus + Grafana]

Tech Stack

Stage	Tools
Preprocessing & EDA	pandas, scikit-learn, matplotlib, seaborn
Experiment tracking	MLflow, DagsHub
CI/CD	GitHub Actions, MLflow Project, conda
Containerization	Docker, Docker Hub
Monitoring & alerting	Prometheus, Grafana

Repository Structure

.
├── Preprocessing/      # EDA notebook + automated preprocessing script
├── Modelling/          # MLflow training, hyperparameter tuning, DagsHub logging
├── Workflow-CI/        # MLflow Project + CI workflow (build & push Docker image)
└── Monitoring/         # Prometheus exporter, inference load script, Grafana evidence

Pipeline Stages

1. Preprocessing (`Preprocessing/`)

Exploratory data analysis and an automated preprocessing pipeline: missing-value handling, duplicate removal, IQR outlier capping, categorical encoding (binary / ordinal / one-hot), and feature scaling. The notebook documents the manual exploration; automate_*.py reproduces it programmatically and is triggered automatically on every push via GitHub Actions in the source repository.

2. Model Training & Tracking (`Modelling/`)

A Random Forest classifier trained with two flavors:

modelling.py — MLflow autolog with local tracking.
modelling_tuning.py — hyperparameter tuning (GridSearchCV) with manual logging of metrics (accuracy, precision, recall, F1) plus extra artifacts (confusion matrix, classification report, feature importance), logged online to DagsHub.

3. CI/CD Workflow (`Workflow-CI/`)

An MLflow Project packaged with conda.yaml, retrained through a GitHub Actions pipeline (ci.yml) that trains the model, uploads the model artifact, and builds & pushes a Docker image to Docker Hub using mlflow models build-docker.

4. Monitoring & Logging (`Monitoring/`)

The containerized model is served and instrumented with a custom Prometheus exporter (prometheus_exporter.py) exposing 13 metrics — request count/rate, latency, error rate, prediction distribution, and system CPU/memory/disk. inference.py generates traffic. Grafana visualizes all metrics on a dashboard with 3 alerting rules (high memory, high disk, high latency) routed to a webhook. Evidence is in Monitoring/screenshots/.

Results

Accuracy: ~0.94 · Weighted F1: ~0.94 (7-class classification)
Fully automated preprocessing & retraining on every push
Containerized model image published to Docker Hub
13 live metrics monitored in Grafana with 3 active alerts

Live Resources

Experiment repo (preprocessing + automated workflow): https://github.com/FaizarM/Eksperimen_SML_Muhammad-Fariz-Abizar
CI/CD repo (MLflow Project + Docker): https://github.com/FaizarM/Workflow-CI
MLflow experiment tracking (DagsHub): https://dagshub.com/FaizarM/Eksperimen_SML_Muhammad-Fariz-Abizar
Docker image: https://hub.docker.com/r/faizarm/obesity-model

Dataset

Estimation of Obesity Levels Based On Eating Habits and Physical Condition — UCI Machine Learning Repository (2,111 records, 17 features, 7 target classes).

Built by Muhammad Fariz Abizar — Data Science.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Modelling		Modelling
Monitoring		Monitoring
Preprocessing		Preprocessing
Workflow-CI		Workflow-CI
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End MLOps Pipeline — Obesity Level Classification

Architecture

Tech Stack

Repository Structure

Pipeline Stages

1. Preprocessing (`Preprocessing/`)

2. Model Training & Tracking (`Modelling/`)

3. CI/CD Workflow (`Workflow-CI/`)

4. Monitoring & Logging (`Monitoring/`)

Results

Live Resources

Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

End-to-End MLOps Pipeline — Obesity Level Classification

Architecture

Tech Stack

Repository Structure

Pipeline Stages

1. Preprocessing (Preprocessing/)

2. Model Training & Tracking (Modelling/)

3. CI/CD Workflow (Workflow-CI/)

4. Monitoring & Logging (Monitoring/)

Results

Live Resources

Dataset

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Preprocessing (`Preprocessing/`)

2. Model Training & Tracking (`Modelling/`)

3. CI/CD Workflow (`Workflow-CI/`)

4. Monitoring & Logging (`Monitoring/`)

Packages