Skip to content

nikkhillkumar/financial-intelligence-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Financial Risk & Customer Behaviour Intelligence

A customer analytics platform built on retail banking transaction data. The project covers data modelling, feature engineering, unsupervised segmentation, explainable risk scoring, supervised ML classification, and a Power BI dashboard.


Project Structure

financial-intelligence-platform/
├── notebooks/
│   ├── 1_data_preparation.ipynb           # data loading, synthetic dimensions, export
│   ├── 2_feature_engineering_segmentation.ipynb  # features, K-Means, risk scorecard
│   └── 3_risk_classification_models.ipynb  # Logistic Regression + Random Forest
├── data_raw/
│   └── creditcard.csv                     # Kaggle fraud dataset (not included in repo)
├── data_processed/
│   ├── FactTransactions.csv
│   ├── DimCustomer.csv
│   ├── DimAccount.csv
│   ├── DimMerchant.csv
│   ├── DimDate.csv
│   ├── CustomerFeatures.csv
│   ├── CustomerMonthlyFeatures.csv
│   ├── CustomerSegments.csv
│   └── RiskAlerts.csv
├── sql/
│   └── schema.sql                         # table definitions for all CSVs
├── reports/
│   └── executive_summary.md              # findings and recommendations
├── project.pbix                           # Power BI dashboard
└── requirements.txt

What This Project Does

1. Data Preparation (Notebook 1)

Loads the Kaggle credit card dataset and builds a realistic UK retail banking data model around it — customers, accounts, merchants, and channels — using synthetic but distribution-aware data generation.

2. Feature Engineering & Segmentation (Notebook 2)

Computes monthly and lifetime behavioural features per customer (spend volatility, ATM usage rate, online rate, cross-border activity, night transactions, discretionary spend).

Uses K-Means clustering to group customers into 5 behavioural segments:

  • Digital Spenders
  • Stable Essentials
  • Cash-Heavy
  • High-Volatility
  • Cross-Border Lifestyle

Builds an explainable risk scorecard with reason codes (fraud history, spend spikes, etc.) and assigns each customer a risk tier: Low / Medium / High.

3. ML Classification (Notebook 3)

Trains Logistic Regression and Random Forest classifiers to predict at-risk customers. Random Forest achieves ~0.93 ROC-AUC. Feature importances show spend volatility and avg monthly spend are the strongest predictors.

4. Power BI Dashboard

Interactive dashboard with:

  • Portfolio-level KPIs (total customers, risk distribution, fraud alerts)
  • Segment breakdown and behaviour mapping
  • Operational risk queue (Medium + High tier customers)
  • Customer 360 drill-through (individual profiles with risk drivers)

How to Run

  1. Clone the repo
  2. Download creditcard.csv from Kaggle and place it in data_raw/
  3. Install dependencies: pip install -r requirements.txt
  4. Run the notebooks in order (1 → 2 → 3)
  5. Open project.pbix in Power BI Desktop

Tech Stack

Area Tools
Data processing Python, Pandas, NumPy
Machine learning Scikit-learn
Visualisation (code) Matplotlib, Seaborn
Dashboard Power BI
Schema SQL

Dataset

Kaggle Credit Card Fraud Detection dataset — 284,807 transactions with PCA-transformed features and a fraud label. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

About

Financial Intelligence Platform - Data Science Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors