A customer analytics platform built on retail banking transaction data. The project covers data modelling, feature engineering, unsupervised segmentation, explainable risk scoring, supervised ML classification, and a Power BI dashboard.
financial-intelligence-platform/
├── notebooks/
│ ├── 1_data_preparation.ipynb # data loading, synthetic dimensions, export
│ ├── 2_feature_engineering_segmentation.ipynb # features, K-Means, risk scorecard
│ └── 3_risk_classification_models.ipynb # Logistic Regression + Random Forest
├── data_raw/
│ └── creditcard.csv # Kaggle fraud dataset (not included in repo)
├── data_processed/
│ ├── FactTransactions.csv
│ ├── DimCustomer.csv
│ ├── DimAccount.csv
│ ├── DimMerchant.csv
│ ├── DimDate.csv
│ ├── CustomerFeatures.csv
│ ├── CustomerMonthlyFeatures.csv
│ ├── CustomerSegments.csv
│ └── RiskAlerts.csv
├── sql/
│ └── schema.sql # table definitions for all CSVs
├── reports/
│ └── executive_summary.md # findings and recommendations
├── project.pbix # Power BI dashboard
└── requirements.txt
1. Data Preparation (Notebook 1)
Loads the Kaggle credit card dataset and builds a realistic UK retail banking data model around it — customers, accounts, merchants, and channels — using synthetic but distribution-aware data generation.
2. Feature Engineering & Segmentation (Notebook 2)
Computes monthly and lifetime behavioural features per customer (spend volatility, ATM usage rate, online rate, cross-border activity, night transactions, discretionary spend).
Uses K-Means clustering to group customers into 5 behavioural segments:
- Digital Spenders
- Stable Essentials
- Cash-Heavy
- High-Volatility
- Cross-Border Lifestyle
Builds an explainable risk scorecard with reason codes (fraud history, spend spikes, etc.) and assigns each customer a risk tier: Low / Medium / High.
3. ML Classification (Notebook 3)
Trains Logistic Regression and Random Forest classifiers to predict at-risk customers. Random Forest achieves ~0.93 ROC-AUC. Feature importances show spend volatility and avg monthly spend are the strongest predictors.
4. Power BI Dashboard
Interactive dashboard with:
- Portfolio-level KPIs (total customers, risk distribution, fraud alerts)
- Segment breakdown and behaviour mapping
- Operational risk queue (Medium + High tier customers)
- Customer 360 drill-through (individual profiles with risk drivers)
- Clone the repo
- Download
creditcard.csvfrom Kaggle and place it indata_raw/ - Install dependencies:
pip install -r requirements.txt - Run the notebooks in order (1 → 2 → 3)
- Open
project.pbixin Power BI Desktop
| Area | Tools |
|---|---|
| Data processing | Python, Pandas, NumPy |
| Machine learning | Scikit-learn |
| Visualisation (code) | Matplotlib, Seaborn |
| Dashboard | Power BI |
| Schema | SQL |
Kaggle Credit Card Fraud Detection dataset — 284,807 transactions with PCA-transformed features and a fraud label. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud