I created this churn prediction project to strengthen my understanding of machine learning workflows, feature engineering, and user behavior analytics. This project reimagines a telecom churn dataset as if it were Netflix user activity, allowing me to practice realistic user-retention modeling.
- Duration: 1–2 days of focused ML exploration
- Process: Transforming telecom data into Netflix-style streaming behavior
- Focus: Clean preprocessing, feature engineering, and clear documentation
- Handling missing values
- Converting text values into numeric format
- Removing unnecessary identifiers
- Renaming columns for consistent terminology
- One-hot encoding of categorical features
- Creating the synthetic engagement feature WatchHours
- Mapping Yes/No fields to binary labels
- Designing Netflix-style attributes such as:
- MonthsSubscribed
- MonthlySubscriptionFee
- StreamingQuality
- TotalAmountPaid
- Logistic Regression model
- Train-test data splitting
- Standard scaling for numerical features
- Model evaluation using accuracy, precision, recall, and F1-score
- Confusion matrix heatmap
- Viewing dataset samples
- Inspecting encoded features
- Structuring a clean ML pipeline
- Designing streaming-like behavioral features
- Evaluating classification performance
- Preparing a reproducible notebook
- Feature engineering skills
- Preprocessing pipelines
- Understanding evaluation metrics
- Project structure and documentation
This project strengthened my ability to transform raw data into actionable insights, create realistic product-style features, and build a structured machine learning workflow. It reflects my growing confidence in data science, feature engineering, and model evaluation.
Author: Aryan Rajguru