Main

Data Loaders

This module is to read the data

Todo:
  • Add more readers
  • Add features to read from Kaggle using API
class retail_sales_prediction.utils.data_loader.readDataStore(data_dir)[source]

This module is to read the data from data store.

In our case, our date store is local and is in .csv format

read_holidays(file_name)[source]

Holidays data table reader Args:

file_name: holidays data table name

Returns: dataframe

read_items(file_name)[source]

reads the data table with items information Args:

file_name: name of the items data table

Returns: dataframe

read_oil(file_name)[source]

reading the oil prices time series Args:

file_name: oil prices file name

Returns: dataframe

read_stores(file_name)[source]

reads the data table with stores information Args:

file_name: Name of the file for stores information

Returns: dataframe

read_test(file_name)[source]

Test data reader Args:

file_name: name of the test data table

Returns: dataframe with index on [‘store_nbr’, ‘item_nbr’, ‘date’]

read_train(file_name)[source]

Training data reader. Original training data starts from 2013. Args:

file_name: Name of training data file name

Returns: dataframe, unit_sales are converted into log

read_transactions(file_name)[source]

reade the transactions data table Args:

file_name: name of transaction data table file

Returns: dataframe

Data Preparation

This module is to prepare data for training, validation and test periods. This module contains functionality to create new features and slice the data into various splits to be used as training our models.

Todo:
  • Add rolling training, validation and test class
  • Can be re-factored using pandas functionality of rolling generator
  • Convert the code to handle spark dataframes
class retail_sales_prediction.utils.data_preparation.FeaturePreparation(df_train, df_test, df_items, df_stores)[source]

This class prepare the data to be used for training our models. The code is inherited from Kaggle notebooks mostly and transformed into this module. Args:

df_train: Training Dataframe df_test: Test Dataframe df_items: Items dataframe df_stores: Stores dataframe
get_test_data(df_2017, promo_2017, df_2017_item, promo_2017_item, df_2017_store_class, df_2017_store_class_index, df_2017_promo_store_class, df_2017_promo_store_class_index, test_start_date=datetime.date(2017, 7, 26))[source]

Generating X_test set Args:

df_2017: promo_2017: df_2017_item: promo_2017_item: df_2017_store_class: df_2017_store_class_index: df_2017_promo_store_class: df_2017_promo_store_class_index: test_start_date:

Returns:

get_training_data(df_2017, promo_2017, df_2017_item, promo_2017_item, df_2017_store_class, df_2017_store_class_index, df_2017_promo_store_class, df_2017_promo_store_class_index, anchor_date=datetime.date(2017, 6, 14), num_days=6)[source]

Creating X_train and y_train Args:

df_2017: promo_2017: df_2017_item: promo_2017_item: df_2017_store_class: df_2017_store_class_index: df_2017_promo_store_class: df_2017_promo_store_class_index: anchor_date: num_days:

Returns:

get_validation_data(df_2017, promo_2017, df_2017_item, promo_2017_item, df_2017_store_class, df_2017_store_class_index, df_2017_promo_store_class, df_2017_promo_store_class_index, val_start_date=datetime.date(2017, 7, 26))[source]

Generating X_val and y_val sets Args:

df_2017: promo_2017: df_2017_item: promo_2017_item: df_2017_store_class: df_2017_store_class_index: df_2017_promo_store_class: df_2017_promo_store_class_index: val_start_date:

Returns:

pre_process_data()[source]

Reshaping the data compatible with our model Returns:

prepare_dataset(df, promo_df, t2017, is_train=True, name_prefix=None)[source]

Engineering new features Args:

df: promo_df: t2017: is_train: name_prefix:

Returns:

Model Collection

This modules consists of collection of various machine learning models. We start with Light GBM.

Depending on the time, we can add more

Todo:
  • Add more machine learning models, such as GBM, RF and XGBoost
  • Spark Compatible GBM and Light GBM Models
  • Add Model Diagnostic plots using SHAP Library
  • Feature Reduction
  • Config file
retail_sales_prediction.utils.run_model.run_model_lgbm(feature_prep, X_train, y_train, X_val, y_val, X_test, config, num_days=6)[source]

Training the Light GBM Model. Args:

feature_prep: X_train: y_train: X_val: y_val: X_test: num_days:

Returns: :param model_params: