Jonathan Sánchez jsanchez-ds

Hi, I'm Jonathan Sánchez

Data Scientist | Marketing Analytics | Machine Learning

Industrial Engineer from Universidad de Chile (distinción máxima) currently working at ClaroVTR as Efficiencies Engineer — building end-to-end forecasting systems with XGBoost / LightGBM / Prophet ensembles for enterprise clients (~$30M CLP/month identified savings).

Featured Projects

Project	Stack	Live
Telecom Quota Forecasting	Python, XGBoost, LightGBM, Prophet
Bank Marketing Analysis	PySpark, scikit-learn, XGBoost
Credit Choice Experiment	R, mlogit, caret
Medical Diagnosis Classification	Python, scikit-learn, imbalanced-learn
Gender Income Gap	R, fixest, glmnet, caret

Telecom Quota Forecasting

End-to-end quantile ML pipeline for monthly data-quota forecasting and per-subscriber plan optimization — a portfolio reproduction of a production system I built at work.

Quantile XGBoost targeting P90 + LightGBM with custom asymmetric loss (penalizes under-prediction 1.5x)
DTW shape clustering groups subscribers by consumption shape, not level
Tier-based ensemble — different model blends per volume tier; ~93% P90 coverage on validation
Pricing optimizer solves the small/large bag breakpoint with property-based tests

Python XGBoost LightGBM tslearn Prophet scikit-learn

Bank Marketing Campaign Analysis

Analysis of 45k calls from a Portuguese bank to predict term-deposit subscription. Includes a v2 iteration that diagnoses and fixes a SMOTE-in-CV data leakage bug.

Best model: Random Forest, ROC-AUC 0.7959 (duration excluded to avoid leakage)
Key insight: previously-contacted clients convert at 63.8% vs 9.3% — 7x more likely
Interactive demo: bank-marketing-analysis-jsanchez.streamlit.app — score any client profile in real time

PySpark scikit-learn XGBoost imbalanced-learn Streamlit

Credit Choice Experiment

Discrete-choice analysis of how visual salience of credit terms in digital ads affects consumer decisions. Randomized experiment with 4 ad-design conditions.

Conditional logit + mixed logit (mlogit) with unobserved heterogeneity via random coefficients
ML comparison — CART, SVM, KNN, Random Forest via caret
Key finding: simple logits show no treatment effect, but the mixed logit reveals a significant T3 effect once heterogeneity is allowed

Read the full rendered report — no R installation required

R mlogit caret randomForest

Gender Income Gap in Small Commerce

Quantifying the gender income gap among ~5,000 small merchants in Latin America using transactional data from a digital payments platform.

Fixed-effects regression with progressive controls for hours, business category, zone and age
Regularized models — Ridge / LASSO, Backward / Forward selection
Key finding: raw gap ~ 20.7%, partially mediated by hours and category — but a meaningful hourly-productivity gap persists

Read the full rendered report — no R installation required

R fixest glmnet caret earth randomForest

Medical Diagnosis Classification

Binary classification on the Wisconsin Breast Cancer dataset (569 records, 30 features) to detect malignant tumors.

SVM achieves 97.6% accuracy, AUC 0.99 with GridSearchCV + 5-fold CV
Class-imbalance handling: under-sampling vs. over-sampling comparison
Feature selection via correlation (30 → 16 features)

Read the full rendered report — no installation required

Python scikit-learn imbalanced-learn seaborn

Tech Stack

Languages: Python R SQL ML / Stats: scikit-learn XGBoost LightGBM Statsmodels mlogit fixest Data: Pandas NumPy PySpark Databricks Viz: Matplotlib Seaborn ggplot2 Plotly Streamlit Tools: Git Jupyter RMarkdown VS Code GitHub Actions

About Me

Universidad de Chile — Industrial Engineering
Experienced in discrete choice modeling, causal inference, and production ML
Based in Santiago, Chile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly