Financial Risk & Customer Behaviour Intelligence

A customer analytics platform built on retail banking transaction data. The project covers data modelling, feature engineering, unsupervised segmentation, explainable risk scoring, supervised ML classification, and a Power BI dashboard.

Project Structure

financial-intelligence-platform/
├── notebooks/
│   ├── 1_data_preparation.ipynb           # data loading, synthetic dimensions, export
│   ├── 2_feature_engineering_segmentation.ipynb  # features, K-Means, risk scorecard
│   └── 3_risk_classification_models.ipynb  # Logistic Regression + Random Forest
├── data_raw/
│   └── creditcard.csv                     # Kaggle fraud dataset (not included in repo)
├── data_processed/
│   ├── FactTransactions.csv
│   ├── DimCustomer.csv
│   ├── DimAccount.csv
│   ├── DimMerchant.csv
│   ├── DimDate.csv
│   ├── CustomerFeatures.csv
│   ├── CustomerMonthlyFeatures.csv
│   ├── CustomerSegments.csv
│   └── RiskAlerts.csv
├── sql/
│   └── schema.sql                         # table definitions for all CSVs
├── reports/
│   └── executive_summary.md              # findings and recommendations
├── project.pbix                           # Power BI dashboard
└── requirements.txt

What This Project Does

1. Data Preparation (Notebook 1)

Loads the Kaggle credit card dataset and builds a realistic UK retail banking data model around it — customers, accounts, merchants, and channels — using synthetic but distribution-aware data generation.

2. Feature Engineering & Segmentation (Notebook 2)

Computes monthly and lifetime behavioural features per customer (spend volatility, ATM usage rate, online rate, cross-border activity, night transactions, discretionary spend).

Uses K-Means clustering to group customers into 5 behavioural segments:

Digital Spenders
Stable Essentials
Cash-Heavy
High-Volatility
Cross-Border Lifestyle

Builds an explainable risk scorecard with reason codes (fraud history, spend spikes, etc.) and assigns each customer a risk tier: Low / Medium / High.

3. ML Classification (Notebook 3)

Trains Logistic Regression and Random Forest classifiers to predict at-risk customers. Random Forest achieves ~0.93 ROC-AUC. Feature importances show spend volatility and avg monthly spend are the strongest predictors.

4. Power BI Dashboard

Interactive dashboard with:

Portfolio-level KPIs (total customers, risk distribution, fraud alerts)
Segment breakdown and behaviour mapping
Operational risk queue (Medium + High tier customers)
Customer 360 drill-through (individual profiles with risk drivers)

How to Run

Clone the repo
Download creditcard.csv from Kaggle and place it in data_raw/
Install dependencies: pip install -r requirements.txt
Run the notebooks in order (1 → 2 → 3)
Open project.pbix in Power BI Desktop

Tech Stack

Area	Tools
Data processing	Python, Pandas, NumPy
Machine learning	Scikit-learn
Visualisation (code)	Matplotlib, Seaborn
Dashboard	Power BI
Schema	SQL

Dataset

Kaggle Credit Card Fraud Detection dataset — 284,807 transactions with PCA-transformed features and a fraud label. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Risk & Customer Behaviour Intelligence

Project Structure

What This Project Does

How to Run

Tech Stack

Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data_processed		data_processed
notebooks		notebooks
reports		reports
sql		sql
.gitignore		.gitignore
README.md		README.md
project.pbix		project.pbix
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Financial Risk & Customer Behaviour Intelligence

Project Structure

What This Project Does

How to Run

Tech Stack

Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages