UMD Smith School of Business — MSBA Capstone Project
Sponsored by Walmart U.S. Operations — Workforce Management (WFM)
5-week-ahead department-level inbound case forecasting across 100 stores
Walmart's Workforce Management team forecasts inbound cases only at the total store level. With 2M+ associates across 4,600+ stores, and 82% of stores relying on a single aggregate forecast for labor scheduling, there is no visibility by department — despite significant differences in workload and stocking time across departments.
Business Impact of the Status Quo:
- 8–12% labor inefficiency due to schedule misalignment
- Overtime costs increase during high-volume weeks
- Inconsistent associate workload reduces productivity and morale
- Delayed shelf stocking impacts customer experience
Objective: Develop department-level inbound case forecasts for a 5-week horizon, enabling more accurate labor scheduling and improving operational efficiency across Walmart stores.
| Metric | Target |
|---|---|
| Forecast accuracy | ≥ 15–20% MAPE improvement vs. aggregate baseline |
| Scheduling efficiency | ≥ 10% reduction in over/under-staffing |
| Labor utilization | ≥ 8% improvement (reduced overtime and idle time) |
| Model | MAPE | Notes |
|---|---|---|
| Baseline (7-day Moving Average) | 8.64% | Store-level aggregate |
| XGBoost Basic | 7.87% | Lag + calendar features |
| XGBoost + Holidays | 7.84% | Holiday flags added |
| XGBoost + Weather | 7.84% | Weather signals integrated |
| XGBoost All Features | 7.17% | Full feature set |
| Dept-Specific XGBoost + Optuna | ~5–6% | Per-department tuned models |
| Holt-Winters Exponential Smoothing | 3.08% | Best overall — 64% improvement over baseline |
Best department-level MAPEs (Holt-Winters):
| Department | MAPE |
|---|---|
| Media & Gaming | 2.06% |
| Cook & Dine | 2.29% |
| Do It Yourself | 2.65% |
| Home Decor | ~3% |
| Consumer Electronics | Highest variance (product launch sensitivity) |
The 3.08% overall MAPE represents a 64% accuracy improvement over the existing baseline — exceeding the project target of 15–20%.
inbound_cases_team9.csv— daily inbound case volume per store/department (417,000+ records)stores_data.xlsx— store metadata: region, market area (100 stores, 36 states)trucks.csv— truck arrival data per store per day- Open-Meteo API — historical weather (temperature, precipitation)
- Google Trends (pytrends) — department-specific search volume as leading indicators
- US Holiday calendar — major holidays + retail events (Black Friday, Cyber Monday, etc.)
Home Decor · Baby & Toddler · Consumer Electronics · Wireless · Media & Gaming · Do It Yourself · Automotive · Cook & Dine
- Lag features: 1, 3, 7, 14, 21, 28-day lags
- Rolling statistics: 3, 7, 14, 28-day rolling mean and standard deviation
- Calendar features: day of week, month, quarter, weekend flag, month start/end
- Holiday features: exact holiday days, ±2 day windows, department-specific holiday lift scores
- Weather signals: temperature, precipitation, is_cold, is_hot, is_rainy
- External signals: Google Trends for Baby & Toddler and Consumer Electronics
- Truck arrivals: top feature by importance (0.34 importance score in XGBoost)
- Macro signals: housing starts (Home Decor), product launch calendar (Consumer Electronics)
- 7-day Moving Average (baseline)
- XGBoost (7 iterative versions with progressive feature additions)
- Random Forest
- XGBoost + Random Forest Ensemble
- Prophet (Meta)
- SARIMA (weekly seasonality)
- Holt-Winters Exponential Smoothing ← best performer
- Hybrid: Holt-Winters + ML residual correction (GradientBoosting)
- Optuna — Bayesian optimization for department-specific XGBoost models
- RandomizedSearchCV — systematic grid search
- TimeSeriesSplit — proper temporal cross-validation (no data leakage)
Truck arrivals are the dominant predictor. With a feature importance of 0.34, truck arrival data explains more variance in inbound volume than any other signal. Truck schedules act as a leading indicator for inbound case volume.
Holt-Winters outperforms complex ML models. Despite XGBoost with 18+ features and Optuna tuning, the simpler Holt-Winters model achieved the best overall MAPE. Inbound case volume follows strong, stable seasonal patterns that exponential smoothing captures efficiently.
Department heterogeneity is significant. Consumer Electronics (driven by product launches) and Baby & Toddler (driven by seasonal cycles) behave fundamentally differently from stable departments like Cook & Dine. Department-specific models significantly outperform a single global model.
Google Trends adds measurable signal. For Consumer Electronics and Baby & Toddler, integrating search trend data as leading indicators improved forecast accuracy during high-volatility periods.
Truck arrivals and 7-day rolling averages explain 80% of demand volume — confirming that inbound forecasting is highly structured and predictable with the right features.
| Layer | Tools |
|---|---|
| Data processing | Python, Pandas, NumPy |
| Machine learning | XGBoost, Scikit-learn, Random Forest, GradientBoosting |
| Time series | Statsmodels (Holt-Winters, SARIMA), Prophet |
| Hyperparameter tuning | Optuna, RandomizedSearchCV |
| External data APIs | Open-Meteo, Google Trends (pytrends) |
| Visualization | Matplotlib, Seaborn |
| Environment | Google Colab |
walmart/
├── Walmart_T9.ipynb # Full analysis notebook (EDA → modeling → evaluation)
└── README.md
Note: Raw data files are proprietary Walmart data provided for academic use under NDA and are not included in this repository.
| Phase | Timeline | Description |
|---|---|---|
| Business Understanding | Oct 25, 2025 | Define problem, objectives, success criteria |
| Data Understanding | Nov 2, 2025 | EDA, data quality assessment |
| Data Preparation | Nov 9, 2025 | Cleaning, merging, feature engineering |
| Modeling | Nov 16, 2025 | Baseline + iterative model development |
| Evaluation | Dec 8, 2025 | MAPE/Bias comparison, model selection |
| Deployment & Presentation | Dec 28, 2025 | Final presentation to Walmart WFM team |
Team 9 — University of Maryland, Robert H. Smith School of Business
MS Business Analytics Capstone · Sponsored by Walmart U.S. Workforce Management
| Member | Role |
|---|---|
| Gurleenkaur Bhatia | Data modeling, XGBoost pipeline, Holt-Winters |
| Henry Kangten | Business framing, data preparation, feature engineering |
| Camilo Bascolo | Model evaluation, accuracy scorecard |
| Kaushik Muthamilselvan | External signal integration, Google Trends pipeline |
| Yuhyeon Seo | SARIMA, Prophet implementation, visualization |
Academic capstone project. Data provided by Walmart for educational purposes only. Not affiliated with or endorsed by Walmart Inc.