Skip to content
View jsanchez-ds's full-sized avatar
  • Santiago, Chile

Block or report jsanchez-ds

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jsanchez-ds/README.md

Hi, I'm Jonathan Sánchez

Data Scientist | Marketing Analytics | Machine Learning

Industrial Engineer from Universidad de Chile (distinción máxima) currently working at ClaroVTR as Efficiencies Engineer — building end-to-end forecasting systems with XGBoost / LightGBM / Prophet ensembles for enterprise clients (~$30M CLP/month identified savings).

CV


Featured Projects

Project Stack Live
Telecom Quota Forecasting Python, XGBoost, LightGBM, Prophet CI
Bank Marketing Analysis PySpark, scikit-learn, XGBoost Streamlit
Credit Choice Experiment R, mlogit, caret Report
Medical Diagnosis Classification Python, scikit-learn, imbalanced-learn Report
Gender Income Gap R, fixest, glmnet, caret Report

Telecom Quota Forecasting

End-to-end quantile ML pipeline for monthly data-quota forecasting and per-subscriber plan optimization — a portfolio reproduction of a production system I built at work.

  • Quantile XGBoost targeting P90 + LightGBM with custom asymmetric loss (penalizes under-prediction 1.5x)
  • DTW shape clustering groups subscribers by consumption shape, not level
  • Tier-based ensemble — different model blends per volume tier; ~93% P90 coverage on validation
  • Pricing optimizer solves the small/large bag breakpoint with property-based tests

Python XGBoost LightGBM tslearn Prophet scikit-learn


Bank Marketing Campaign Analysis

Analysis of 45k calls from a Portuguese bank to predict term-deposit subscription. Includes a v2 iteration that diagnoses and fixes a SMOTE-in-CV data leakage bug.

  • Best model: Random Forest, ROC-AUC 0.7959 (duration excluded to avoid leakage)
  • Key insight: previously-contacted clients convert at 63.8% vs 9.3% — 7x more likely
  • Interactive demo: bank-marketing-analysis-jsanchez.streamlit.app — score any client profile in real time

PySpark scikit-learn XGBoost imbalanced-learn Streamlit


Credit Choice Experiment

Discrete-choice analysis of how visual salience of credit terms in digital ads affects consumer decisions. Randomized experiment with 4 ad-design conditions.

  • Conditional logit + mixed logit (mlogit) with unobserved heterogeneity via random coefficients
  • ML comparison — CART, SVM, KNN, Random Forest via caret
  • Key finding: simple logits show no treatment effect, but the mixed logit reveals a significant T3 effect once heterogeneity is allowed

Read the full rendered report — no R installation required

R mlogit caret randomForest


Gender Income Gap in Small Commerce

Quantifying the gender income gap among ~5,000 small merchants in Latin America using transactional data from a digital payments platform.

  • Fixed-effects regression with progressive controls for hours, business category, zone and age
  • Regularized models — Ridge / LASSO, Backward / Forward selection
  • Key finding: raw gap ~ 20.7%, partially mediated by hours and category — but a meaningful hourly-productivity gap persists

Read the full rendered report — no R installation required

R fixest glmnet caret earth randomForest


Medical Diagnosis Classification

Binary classification on the Wisconsin Breast Cancer dataset (569 records, 30 features) to detect malignant tumors.

  • SVM achieves 97.6% accuracy, AUC 0.99 with GridSearchCV + 5-fold CV
  • Class-imbalance handling: under-sampling vs. over-sampling comparison
  • Feature selection via correlation (30 → 16 features)

Read the full rendered report — no installation required

Python scikit-learn imbalanced-learn seaborn


Tech Stack

Languages: Python R SQL ML / Stats: scikit-learn XGBoost LightGBM Statsmodels mlogit fixest Data: Pandas NumPy PySpark Databricks Viz: Matplotlib Seaborn ggplot2 Plotly Streamlit Tools: Git Jupyter RMarkdown VS Code GitHub Actions


About Me

  • Universidad de Chile — Industrial Engineering
  • Experienced in discrete choice modeling, causal inference, and production ML
  • Based in Santiago, Chile

Connect

LinkedIn Email GitHub

Pinned Loading

  1. bank-marketing-analysis bank-marketing-analysis Public

    End-to-end analysis on the UCI Bank Marketing dataset (45k calls): EDA in PySpark, Decision Tree / Random Forest / XGBoost in scikit-learn, plus a v2 branch fixing SMOTE-in-CV leakage with imblearn…

    Jupyter Notebook

  2. credit-choice-experiment credit-choice-experiment Public

    Discrete choice modeling on a randomized credit-ad experiment: conditional logit, mixed logit with unobserved heterogeneity, and ML comparison (CART, SVM, KNN, RF) in R

  3. cv cv Public

    CV — Jonathan Sánchez Pesantes (LaTeX sources + compiled PDF)

    TeX

  4. gender-income-gap gender-income-gap Public

    Quantifying the gender income gap among ~5000 small merchants in Latin America using fixed-effects regression (fixest), Ridge/LASSO and ML (CART, MARS, KNN, Random Forest) in R