Samar Krimi
A sales prediction for food items sold at various stores to help the retailer to understand the properties of products and outlets that play crucial roles in increasing sales.
https://drive.google.com/file/d/1syH81TVrbBsdymLT_jl2JIf6IjPXtSQw/view
8523 rows and 12 columns
The Item_Outlet_Sales Distribution dramatically decreases with greater prices
Perfect correlation equal to 1 for : Item_Weight, Item_Visibility , Item_MRP, Outlet_Establishment_Year & Item_Outlet_Sales. Moderate correlation equal to 0.57 for : Item_MRP with Item_Outlet_Sales Very low correlation for : The rest of Items and Outlet_Establishment_Year
The Average outlet sales by product category show that : Seafood & Starchy Food are the best-selling they are rich in carbohydrates, nutrients, fiber and vitamins, they are a real source of energy for the body. They are ideal for physical and mental activities. In second order are Fruits and Vegetables & Snack Foods which are the most consumed by most people and thirdly Dairy, Canned & Breads. Baking Goods & Soft Drinks are the least sold, may be retailer must change their places in the store and I think that they are not recommended for people with certain diseases.
- Trends by Line Graph
The Item_MR trend seems very fluctuating over the year in which store was established. The the list price of the product increase between 2000 & 2004 years, it falls sharply in 2008 maybe due to the financial crisis. Item_Outlet_Sales trend seems almost stable through these years but it falls very sharply in 1998, maybe some sales of certains products are out of stock or a particular store had an incident in this year.
Preventing data leakage
- This model performs poorly on the training set and testing set regarding R2 metric, this is a case of high bias (underfit).
- For RMSE metric, our model is incorrect by about 1,115 thousand dollars.
- This model performs perfect on the training data regarding R2 metric.
- It performs poorly on the testing data, this is a case of high variance (overfitting).
- For RMSE metric, our model is incorrect by about 1,5 thousand dollars it penalizes more larger errors than Linear model.
- The best hyperparameter is for depth=5.
- The R2 testing score become higher, it grows to 0.595 after tuning.
- The training (0.604) and test (0.595) results have moved closer to each other.
- The RMSE score is reduced, this model is incorrect by an average of 1,07 thousand dollars. By tunning the decision tree model we have reduced overfitting and RMSE metric
Item_Outlet_Sales can be improved and must be studied according to certain criteria :
- Increasing the amount of best sellers like seafood.
- Focused on fruits and vegetables beneficial to health by making special offers.
- Decrease the amount of unhealthy items like baking Goods & soft Drinks.
- Have always available items sold at various stores to prevent to avoid stock shortage especially in period of health crisis or eventual war.
- Consider selling online and on social networks.
- Changing the packaging and place in the supermarket.
May be change model evaluation such as an improved Random Tree Model and have a better metric results.
-
For Linear Regression Model, the top 3 most impactful features are:
-Outlet_Type_Supermarket Type3 increased item outlet sales by 2,126.20
-Item_MRP increased item outlet sales by 965.13
-Outlet_Type_Supermarket Type1 increased item outlet sales by 737.28
-
For Decision Tree & Random Forest Models, the top 5 most important features :
-Item_MRP
-Outlet_Type_Grocery_Store
-Item_Visibility
-Outlet_Type_Supermarket Type3
-Item_Weight
Item_MRP is by far the single most important feature for predicting item outlet sales.
Outlet_Type_Grocery Store is the second most important.
Item_Visibility, Outlet_Type_Supermarket Type3, Item_Weight, are somewhat important.
Everything else is unimportant.
-
Comparing the top 5 most important features according to SHAP vs. my RandomForest Regressor:
- The top 5 most important features were the same according to Shap vs the RandomForest.
- However, the order was slightly different (Item_Visibility is 4th according to shap instead of 3rd).
-
The top 3 most important features and their effects were the following:
Item_MRP: The higher the MRP, the higher the predicted sales. Outlet_Type_Grocery Store: Being a grocery store dramatically decreased the predicted sales. Outlet_Type_Supermarket Type3: Being a supermarket type 3 dramatically increased the predicted sales.
-
The top 5 most important features according the Shap Summary Plot are :
- Item_MRP
- Outlet_Type_Grocery_Store
- Outlet_Type_Supermarket Type3
- Item_Visibility
- Item_Weight
-
Use the top features from SHAP/feature importance
- Using top features to select 2 examples outlets :
- Outlet_Type_Grocery Store : a store having low sales
- Outlet_Type_Supermarket Type3 : a store having high sales
- Create filters for each of the outlet types.
- Select Item_Visibility within each OutletType.
- Using top features to select 2 examples outlets :
- Individual Shap Force Plots
As we can see in the force plot above for example 1:
There are some features increasing the prediction such as :
- Item_Visibility
- Item_Type_Soft Drinks
- Item_Weight
- Item_MRP
Yet, there are many more features pushing the prediction in the opposite direction such as Outlet_Type_Grocery Store.
As we can see in the force plot above for example 2: While there were a bit factors decreasing the predictions. There were many more features raising the prediction such as:
- Item_Type_Frozen Foods
- Item_Visibility
- Item_MRP
- Outlet_Type_Grocery Store
- Outlet_Type_Supermarket Type3
- A Lime tabular explanation
As we can see in the LIME explanation above for example 1, there were 2 factors contributing to the raising predicted sales, such as:
- Item_MRP
- Item_Type_Hard Drinks
- Outlet_Type_Supermarket Type2
The other features reduced the predicted sales.
As we can see in the LIME explanation above for example 2, there were 3 factors contributing to the decreasing predicted sales, such as:
- Outlet_Type_Grocery Store
- Outlet_Type_Supermarket Type3
- Item_MRP
- Item_Type_Starchy Foods
The other features reduced the predicted sales.









