Main¶
Data Loaders¶
This module is to read the data
- Todo:
- Add more readers
- Add features to read from Kaggle using API
-
class
retail_sales_prediction.utils.data_loader.readDataStore(data_dir)[source]¶ This module is to read the data from data store.
In our case, our date store is local and is in .csv format
-
read_holidays(file_name)[source]¶ Holidays data table reader Args:
file_name: holidays data table nameReturns: dataframe
-
read_items(file_name)[source]¶ reads the data table with items information Args:
file_name: name of the items data tableReturns: dataframe
-
read_oil(file_name)[source]¶ reading the oil prices time series Args:
file_name: oil prices file nameReturns: dataframe
-
read_stores(file_name)[source]¶ reads the data table with stores information Args:
file_name: Name of the file for stores informationReturns: dataframe
-
read_test(file_name)[source]¶ Test data reader Args:
file_name: name of the test data tableReturns: dataframe with index on [‘store_nbr’, ‘item_nbr’, ‘date’]
-
Data Preparation¶
This module is to prepare data for training, validation and test periods. This module contains functionality to create new features and slice the data into various splits to be used as training our models.
- Todo:
- Add rolling training, validation and test class
- Can be re-factored using pandas functionality of rolling generator
- Convert the code to handle spark dataframes
-
class
retail_sales_prediction.utils.data_preparation.FeaturePreparation(df_train, df_test, df_items, df_stores)[source]¶ This class prepare the data to be used for training our models. The code is inherited from Kaggle notebooks mostly and transformed into this module. Args:
df_train: Training Dataframe df_test: Test Dataframe df_items: Items dataframe df_stores: Stores dataframe-
get_test_data(df_2017, promo_2017, df_2017_item, promo_2017_item, df_2017_store_class, df_2017_store_class_index, df_2017_promo_store_class, df_2017_promo_store_class_index, test_start_date=datetime.date(2017, 7, 26))[source]¶ Generating X_test set Args:
df_2017: promo_2017: df_2017_item: promo_2017_item: df_2017_store_class: df_2017_store_class_index: df_2017_promo_store_class: df_2017_promo_store_class_index: test_start_date:Returns:
-
get_training_data(df_2017, promo_2017, df_2017_item, promo_2017_item, df_2017_store_class, df_2017_store_class_index, df_2017_promo_store_class, df_2017_promo_store_class_index, anchor_date=datetime.date(2017, 6, 14), num_days=6)[source]¶ Creating X_train and y_train Args:
df_2017: promo_2017: df_2017_item: promo_2017_item: df_2017_store_class: df_2017_store_class_index: df_2017_promo_store_class: df_2017_promo_store_class_index: anchor_date: num_days:Returns:
-
get_validation_data(df_2017, promo_2017, df_2017_item, promo_2017_item, df_2017_store_class, df_2017_store_class_index, df_2017_promo_store_class, df_2017_promo_store_class_index, val_start_date=datetime.date(2017, 7, 26))[source]¶ Generating X_val and y_val sets Args:
df_2017: promo_2017: df_2017_item: promo_2017_item: df_2017_store_class: df_2017_store_class_index: df_2017_promo_store_class: df_2017_promo_store_class_index: val_start_date:Returns:
-
Model Collection¶
This modules consists of collection of various machine learning models. We start with Light GBM.
Depending on the time, we can add more
- Todo:
- Add more machine learning models, such as GBM, RF and XGBoost
- Spark Compatible GBM and Light GBM Models
- Add Model Diagnostic plots using SHAP Library
- Feature Reduction
- Config file