Data Scientist | Marketing Analytics | Machine Learning
Industrial Engineer from Universidad de Chile (distinción máxima) currently working at ClaroVTR as Efficiencies Engineer — building end-to-end forecasting systems with XGBoost / LightGBM / Prophet ensembles for enterprise clients (~$30M CLP/month identified savings).
| Project | Stack | Live |
|---|---|---|
| Telecom Quota Forecasting | Python, XGBoost, LightGBM, Prophet | |
| Bank Marketing Analysis | PySpark, scikit-learn, XGBoost | |
| Credit Choice Experiment | R, mlogit, caret | |
| Medical Diagnosis Classification | Python, scikit-learn, imbalanced-learn | |
| Gender Income Gap | R, fixest, glmnet, caret |
End-to-end quantile ML pipeline for monthly data-quota forecasting and per-subscriber plan optimization — a portfolio reproduction of a production system I built at work.
- Quantile XGBoost targeting P90 + LightGBM with custom asymmetric loss (penalizes under-prediction 1.5x)
- DTW shape clustering groups subscribers by consumption shape, not level
- Tier-based ensemble — different model blends per volume tier; ~93% P90 coverage on validation
- Pricing optimizer solves the small/large bag breakpoint with property-based tests
Python XGBoost LightGBM tslearn Prophet scikit-learn
Analysis of 45k calls from a Portuguese bank to predict term-deposit subscription. Includes a v2 iteration that diagnoses and fixes a SMOTE-in-CV data leakage bug.
- Best model: Random Forest, ROC-AUC 0.7959 (
durationexcluded to avoid leakage) - Key insight: previously-contacted clients convert at 63.8% vs 9.3% — 7x more likely
- Interactive demo: bank-marketing-analysis-jsanchez.streamlit.app — score any client profile in real time
PySpark scikit-learn XGBoost imbalanced-learn Streamlit
Discrete-choice analysis of how visual salience of credit terms in digital ads affects consumer decisions. Randomized experiment with 4 ad-design conditions.
- Conditional logit + mixed logit (
mlogit) with unobserved heterogeneity via random coefficients - ML comparison — CART, SVM, KNN, Random Forest via
caret - Key finding: simple logits show no treatment effect, but the mixed logit reveals a significant T3 effect once heterogeneity is allowed
Read the full rendered report — no R installation required
R mlogit caret randomForest
Quantifying the gender income gap among ~5,000 small merchants in Latin America using transactional data from a digital payments platform.
- Fixed-effects regression with progressive controls for hours, business category, zone and age
- Regularized models — Ridge / LASSO, Backward / Forward selection
- Key finding: raw gap ~ 20.7%, partially mediated by hours and category — but a meaningful hourly-productivity gap persists
Read the full rendered report — no R installation required
R fixest glmnet caret earth randomForest
Binary classification on the Wisconsin Breast Cancer dataset (569 records, 30 features) to detect malignant tumors.
- SVM achieves 97.6% accuracy, AUC 0.99 with GridSearchCV + 5-fold CV
- Class-imbalance handling: under-sampling vs. over-sampling comparison
- Feature selection via correlation (30 → 16 features)
Read the full rendered report — no installation required
Python scikit-learn imbalanced-learn seaborn
Languages: Python R SQL
ML / Stats: scikit-learn XGBoost LightGBM Statsmodels mlogit fixest
Data: Pandas NumPy PySpark Databricks
Viz: Matplotlib Seaborn ggplot2 Plotly Streamlit
Tools: Git Jupyter RMarkdown VS Code GitHub Actions
- Universidad de Chile — Industrial Engineering
- Experienced in discrete choice modeling, causal inference, and production ML
- Based in Santiago, Chile