Skip to content

henrykang10/walmart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Walmart Inbound Case Forecasting & Labor Optimization

UMD Smith School of Business — MSBA Capstone Project
Sponsored by Walmart U.S. Operations — Workforce Management (WFM)
5-week-ahead department-level inbound case forecasting across 100 stores


Business Problem

Walmart's Workforce Management team forecasts inbound cases only at the total store level. With 2M+ associates across 4,600+ stores, and 82% of stores relying on a single aggregate forecast for labor scheduling, there is no visibility by department — despite significant differences in workload and stocking time across departments.

Business Impact of the Status Quo:

  • 8–12% labor inefficiency due to schedule misalignment
  • Overtime costs increase during high-volume weeks
  • Inconsistent associate workload reduces productivity and morale
  • Delayed shelf stocking impacts customer experience

Objective: Develop department-level inbound case forecasts for a 5-week horizon, enabling more accurate labor scheduling and improving operational efficiency across Walmart stores.


Success Criteria

Metric Target
Forecast accuracy ≥ 15–20% MAPE improvement vs. aggregate baseline
Scheduling efficiency ≥ 10% reduction in over/under-staffing
Labor utilization ≥ 8% improvement (reduced overtime and idle time)

Key Results

Model MAPE Notes
Baseline (7-day Moving Average) 8.64% Store-level aggregate
XGBoost Basic 7.87% Lag + calendar features
XGBoost + Holidays 7.84% Holiday flags added
XGBoost + Weather 7.84% Weather signals integrated
XGBoost All Features 7.17% Full feature set
Dept-Specific XGBoost + Optuna ~5–6% Per-department tuned models
Holt-Winters Exponential Smoothing 3.08% Best overall — 64% improvement over baseline

Best department-level MAPEs (Holt-Winters):

Department MAPE
Media & Gaming 2.06%
Cook & Dine 2.29%
Do It Yourself 2.65%
Home Decor ~3%
Consumer Electronics Highest variance (product launch sensitivity)

The 3.08% overall MAPE represents a 64% accuracy improvement over the existing baseline — exceeding the project target of 15–20%.


Methodology

Data Sources

  • inbound_cases_team9.csv — daily inbound case volume per store/department (417,000+ records)
  • stores_data.xlsx — store metadata: region, market area (100 stores, 36 states)
  • trucks.csv — truck arrival data per store per day
  • Open-Meteo API — historical weather (temperature, precipitation)
  • Google Trends (pytrends) — department-specific search volume as leading indicators
  • US Holiday calendar — major holidays + retail events (Black Friday, Cyber Monday, etc.)

Departments Analyzed

Home Decor · Baby & Toddler · Consumer Electronics · Wireless · Media & Gaming · Do It Yourself · Automotive · Cook & Dine

Feature Engineering (18+ features)

  • Lag features: 1, 3, 7, 14, 21, 28-day lags
  • Rolling statistics: 3, 7, 14, 28-day rolling mean and standard deviation
  • Calendar features: day of week, month, quarter, weekend flag, month start/end
  • Holiday features: exact holiday days, ±2 day windows, department-specific holiday lift scores
  • Weather signals: temperature, precipitation, is_cold, is_hot, is_rainy
  • External signals: Google Trends for Baby & Toddler and Consumer Electronics
  • Truck arrivals: top feature by importance (0.34 importance score in XGBoost)
  • Macro signals: housing starts (Home Decor), product launch calendar (Consumer Electronics)

Models Benchmarked

  • 7-day Moving Average (baseline)
  • XGBoost (7 iterative versions with progressive feature additions)
  • Random Forest
  • XGBoost + Random Forest Ensemble
  • Prophet (Meta)
  • SARIMA (weekly seasonality)
  • Holt-Winters Exponential Smoothing ← best performer
  • Hybrid: Holt-Winters + ML residual correction (GradientBoosting)

Hyperparameter Tuning

  • Optuna — Bayesian optimization for department-specific XGBoost models
  • RandomizedSearchCV — systematic grid search
  • TimeSeriesSplit — proper temporal cross-validation (no data leakage)

Key Findings

Truck arrivals are the dominant predictor. With a feature importance of 0.34, truck arrival data explains more variance in inbound volume than any other signal. Truck schedules act as a leading indicator for inbound case volume.

Holt-Winters outperforms complex ML models. Despite XGBoost with 18+ features and Optuna tuning, the simpler Holt-Winters model achieved the best overall MAPE. Inbound case volume follows strong, stable seasonal patterns that exponential smoothing captures efficiently.

Department heterogeneity is significant. Consumer Electronics (driven by product launches) and Baby & Toddler (driven by seasonal cycles) behave fundamentally differently from stable departments like Cook & Dine. Department-specific models significantly outperform a single global model.

Google Trends adds measurable signal. For Consumer Electronics and Baby & Toddler, integrating search trend data as leading indicators improved forecast accuracy during high-volatility periods.

Truck arrivals and 7-day rolling averages explain 80% of demand volume — confirming that inbound forecasting is highly structured and predictable with the right features.


Technical Stack

Layer Tools
Data processing Python, Pandas, NumPy
Machine learning XGBoost, Scikit-learn, Random Forest, GradientBoosting
Time series Statsmodels (Holt-Winters, SARIMA), Prophet
Hyperparameter tuning Optuna, RandomizedSearchCV
External data APIs Open-Meteo, Google Trends (pytrends)
Visualization Matplotlib, Seaborn
Environment Google Colab

Project Structure

walmart/
├── Walmart_T9.ipynb        # Full analysis notebook (EDA → modeling → evaluation)
└── README.md

Note: Raw data files are proprietary Walmart data provided for academic use under NDA and are not included in this repository.


Project Roadmap

Phase Timeline Description
Business Understanding Oct 25, 2025 Define problem, objectives, success criteria
Data Understanding Nov 2, 2025 EDA, data quality assessment
Data Preparation Nov 9, 2025 Cleaning, merging, feature engineering
Modeling Nov 16, 2025 Baseline + iterative model development
Evaluation Dec 8, 2025 MAPE/Bias comparison, model selection
Deployment & Presentation Dec 28, 2025 Final presentation to Walmart WFM team

Team

Team 9 — University of Maryland, Robert H. Smith School of Business
MS Business Analytics Capstone · Sponsored by Walmart U.S. Workforce Management

Member Role
Gurleenkaur Bhatia Data modeling, XGBoost pipeline, Holt-Winters
Henry Kangten Business framing, data preparation, feature engineering
Camilo Bascolo Model evaluation, accuracy scorecard
Kaushik Muthamilselvan External signal integration, Google Trends pipeline
Yuhyeon Seo SARIMA, Prophet implementation, visualization

Academic capstone project. Data provided by Walmart for educational purposes only. Not affiliated with or endorsed by Walmart Inc.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors