Three ML models. One dashboard. Actionable intelligence in seconds.
|
K-Means Clustering Groups customers into 3 behavioral clusters for targeted marketing strategies |
Logistic Regression Predicts High Risk vs Low Risk — act before the customer leaves |
Linear Regression Forecasts exact $ spend next month for revenue planning |
RAW INPUT (9 Features)
Age · Gender · Location · Tenure · Avg Monthly Spend
Last Month Spend · Num Transactions · Days Since Purchase · Support Tickets
│
▼
┌───────────────────────┐
│ DATA PREPROCESSING │
│ Label Encode Gender │
│ One-Hot Encode Location│
│ StandardScaler │
└───────────────────────┘
│
▼
┌───────────────────────┐
│ PCA │ → Dimensionality reduction
│ (pca.pkl loaded) │ Captures max variance
└───────────────────────┘
│
┌─────┴──────┬──────────────┐
▼ ▼ ▼
┌────────┐ ┌─────────┐ ┌──────────┐
│K-Means │ │Logistic │ │ Linear │
│Cluster │ │Regress. │ │Regress. │
│kmeans │ │Classif. │ │Regress. │
│.pkl │ │_Model │ │_Model │
│ │ │.pkl │ │.pkl │
└────────┘ └─────────┘ └──────────┘
│ │ │
▼ ▼ ▼
Group 0/1/2 🔴 High Risk 💵 $XXX.XX
🟢 Low Risk next month
1. OPEN DASHBOARD
└── Sidebar: Choose input mode
├── 📋 Existing Customer → Select Customer ID
│ Auto-populates all fields from CSV
└── ✏️ Manual Entry → Fill in 9 feature fields manually
2. HIT PREDICT
└── Inputs encoded → scaled → PCA transformed
└── Passed simultaneously to all 3 pkl models
3. VIEW RESULTS
├── 🔵 Customer Segment → Group 0 / 1 / 2
├── 🔴🟢 Churn Risk → High Risk (red) or Low Risk (green)
└── 💵 Predicted Spend → $XXX.XX next month
🔴 Churn Classification — Logistic Regression
- Tuned params:
C,penalty,solver,class_weight - Method: RandomizedSearchCV — 5-fold Cross-Validation
- Output: Binary —
High Risk (1)orLow Risk (0) - Metric: Accuracy, Precision, Recall, F1-score
🟢 Spend Regression — Linear Regression
- Tuned params:
fit_intercept,positive - Method: RandomizedSearchCV to minimize MAE & RMSE
- Output: Continuous — predicted $ spend next month
- Metric: MAE, RMSE
🔵 Customer Clustering — K-Means
- k = 3 — determined by Elbow Method (WCSS vs k plot)
- Validated with Silhouette Score
- Output: Cluster label — Group 0, 1, or 2
- Trained on PCA-reduced feature space
Smart-Sales/
├── app.py # Streamlit UI + inference logic
├── Dataset/
│ ├── customer_data.csv # Raw customer profiles
│ └── preprocessed_data.csv # Cleaned & encoded training data
├── Models/
│ ├── Data_Preprocessing.py # Cleaning, encoding, train/test split
│ ├── Classification_Model.py # Trains churn logistic regression
│ ├── Regression_Model.py # Trains spend linear regression
│ └── Unsupervised_model.py # Trains K-Means with Elbow method
└── pkl/
├── scaler.pkl # StandardScaler artifact
├── pca.pkl # PCA model artifact
├── gender_encoder.pkl # LabelEncoder for Gender
├── Classification_Model.pkl # Churn prediction model
├── Regression_Model.pkl # Spend forecast model
└── kmeans_model.pkl # Customer clustering model
# Clone the repo
git clone https://github.com/Ronit178693/Smart-Sales-Custom.git
cd Smart-Sales-Custom
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Train models (generates .pkl files)
python Models/Data_Preprocessing.py
python Models/Classification_Model.py
python Models/Regression_Model.py
python Models/Unsupervised_model.py
# Launch the dashboard
streamlit run app.py| Intelligence | Business Use |
|---|---|
| 🔵 Customer Segments | Tailor campaigns per cluster — stop generic blasting |
| 🔴 Churn Risk | Trigger retention offers before the customer leaves |
| 💵 Future Spend | Forecast next month's revenue with customer-level precision |
| 📊 Combined View | Prioritize high-value, low-churn-risk customers for upsells |