A Simple Approach to Hierarchical Time Series Forecasting with Machine Learning | by Leonie Monigatti

[ad_1]

How you can “increase” your cyclical gross sales knowledge forecast with LightGBM and Python

Hierarchical time series forecasting — Hierarchical time collection forecasting (Picture drawn by the writer)

Welcome to a different version of “The Kaggle Blueprints,” the place we’ll analyze Kaggle competitions’ successful options for classes we are able to apply to our personal knowledge science tasks.

This version will evaluate the strategies and approaches from the “M5 Forecasting — Accuracy” competitors, which ended on the finish of June 2020.

The target of the “M5 Forecasting — Accuracy” competitors was to forecast the following 28 days of 42,840 hierarchical time collection of gross sales knowledge.

Hierarchical time collection — Not like widespread multivariate time collection issues, hierarchical time collection might be aggregated on completely different ranges: e.g., merchandise stage, retailer stage, and state stage. On this competitors, the opponents got over 40,000 time collection of three,000 particular person merchandise from 3 completely different classes, bought in 10 shops throughout 3 states.

Hierarchical time series — Hierarchical time collection (Picture by the writer)

Cyclical — Gross sales knowledge is often cyclical, which implies that the gross sales knowledge is time-dependent. E.g., you will note repeating patterns, like rising gross sales across the finish of the week (weekly cycle), at the start of a month (month-to-month cycle), or throughout the holidays (annual cycle).

Multistep — The duty is to forecast the gross sales knowledge 28 days into the longer term (28 steps).

To observe alongside on this article, your dataset ought to look one thing like this:

Insert your data here: How your hierarchical time series data should be formatted — Insert your knowledge right here: How your hierarchical time collection knowledge ought to be formatted (Picture by the writer)

A well-liked strategy amongst opponents was formulating the time collection forecasting downside as a regression downside and modeling utilizing Machine Studying (ML) [6].

A time collection forecasting downside might be formulated as a regression downside by splitting the predictions into single steps — maintaining the hole between the historic knowledge and the prediction fixed amongst knowledge factors.
As a substitute of feeding the sequence of previous values to the ML mannequin, you may combination the historic knowledge factors to historic options.

Time Series Forecasting as a regression problem — Time Collection Forecasting as a regression downside (Picture by the writer)

Thus, the principle steps to strategy a hierarchical time collection forecasting downside with ML are:

Building a Simple Baseline
Feature Engineering from Historical Data
Modeling and Validating a Time Series Forecasting Problem with Machine Learning

As with all good ol’ ML downside, we’ll begin by constructing a easy baseline. With time collection forecasting issues, a great place to begin is to take the worth from the final timestamp because the prediction — the naive strategy.

You possibly can enhance the naive strategy by referencing the final cycle when you’ve got a cyclical time collection. For instance, in case your time collection depends upon the weekday, you may take the last month, group by the weekday, and take the average [2].

Baseline for time series forecasting: naive approach — Baseline for time collection forecasting: naive strategy (Picture by the writer)

In distinction to utilizing a classical statistical strategy, function engineering is an important step when creating an ML mannequin. Thus, as an alternative of feeding the historic knowledge on to the ML mannequin, you’ll aggregate the historical data into historical features [4].

Timestamp options

A time collection has no less than two options: A timestamp and a price. Alone the timestamp can be utilized to create a number of new options.

First, you may extract options from the timestamp by merely dissembling it into its parts, e.g., day, week, month, yr, and so on. [4].

# Convert to DateTime
df['date'] = pd.to_datetime(df['date'])# Make some options from date
df['day'] = df['date'].dt.day
df['week'] = df['date'].dt.week
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.yr
# and so on.

Second, you may create new options primarily based on the date [1, 3]: Is it a weekday or weekend? Is it a vacation? Is a particular occasion taking place (e.g., a sports activities occasion)?

df['dayofweek'] = df['date'].dt.dayofweek
df['weekend'] = (df['dayofweek']>=5)
# and so on.

Aggregation options

Subsequent, you may create new options by aggregating the historic knowledge and creating statistical options like the utmost, minimal, commonplace deviation, and imply [1, 3, 4, 8, 10].

As a result of we’re working with a hierarchical time collection, we’ll group the time collection by completely different LEVEL (e.g., store_id).

FEATURE = 'worth'
LEVEL_1 = 'store_id'
LEVEL_N = 'item_id'# Fundamental aggregations
df[f'{FEATURE}_max'] = df.groupby([LEVEL_1, LEVEL_N])[FEATURE].remodel('max')
df[f'{FEATURE}_min'] = df.groupby([LEVEL_1, LEVEL_N])[FEATURE].remodel('min')
df[f'{FEATURE}_std'] = df.groupby([LEVEL_1, LEVEL_N])[FEATURE].remodel('std')
df[f'{FEATURE}_mean'] = df.groupby([LEVEL_1, LEVEL_N])[FEATURE].remodel('imply')
# Normalization (min/max scaling)
df[f'{FEATURE}_norm'] = df[FEATURE]/df[f'{FEATURE}_max']
# Some gadgets are might be inflation dependent and a few gadgets are very "secure"
df[f'{FEATURE}_nunique'] = df.groupby([LEVEL_1, LEVEL_N])[FEATURE].remodel('nunique')
# Function "momentum" 
df[f'{FEATURE}_momentum'] = df[FEATURE]/df.groupby([LEVEL_1, LEVEL_N])[FEATURE].remodel(lambda x: x.shift(1))

Lag options

A well-liked function engineering method for time collection knowledge is to create lagged options [4, 5, 10]. To have the ability to use this function on the testing knowledge, the lag ought to be bigger than the time hole between coaching and testing knowledge.

Lag of seven days (Picture by the writer)

LEVEL = 'store_id'
TARGET = 'gross sales'
lag = 7df[f"lag_{lag}"] = df.groupby(LEVEL)[TARGET].shift(lag).fillna(0)

Rolling options

One other standard function engineering method for time collection knowledge is to create options primarily based on a rolling window (e.g., imply or commonplace deviation) [1, 3, 10].

You possibly can apply this function engineering method to the FEATURE straight and even to the lagged model of it.

Mean of rolling window of 28 days — Imply of a rolling window of 28 days (Picture by the writer)

window = 28df[f"rolling_mean_{window}"] = df.groupby(LEVEL)[FEATURE].remodel(lambda x : x.rolling(window).imply()).fillna(0)

Hierarchy as categorical options

When working with hierarchical time collection, you too can embrace the node identifiers of the completely different ranges of the hierarchy (e.g., store_id, item_id) as categorical variables [1, 3].

Your ensuing dataframe ought to look one thing like this earlier than we feed it to the ML mannequin:

Training data structure for training an ML (GBDT) model for time series forecasting — Coaching knowledge construction for coaching an ML (GBDT) mannequin for time collection forecasting (Picture by the writer)

A couple of variations exist between modeling and validating an everyday ML downside (e.g., regression or classification) and a hierarchical time collection forecasting downside with ML.

Modeling multivariate and hierarchical time collection

Modeling a hierarchical time collection downside is much like modeling a multivariate one.

Modeling multivariate time collection — Autoregressive and sequence-to-sequence fashions can normally solely mannequin one time collection (univariate time collection downside) without delay. Thus, when encountering a multivariate time collection downside (like hierarchical time collection), you would need to construct a number of forecasting fashions — one mannequin for every time collection.

Many opponents used LightGBM, an ML mannequin and gradient-boosting framework, for modeling [1, 3, 5, 7, 8, 10]. When utilizing LightGBM, you may mannequin a number of time collection with a single LightGBM mannequin as an alternative of constructing a number of forecasting fashions

Modeling strategies for multivariate time series — Modeling methods for multivariate time collection (Picture by the writer)

For the reason that time collection knowledge is hierarchical, many opponents grouped comparable time collection by hierarchy stage (e.g., by retailer) and modeled them collectively [3, 8, 10].

Modelling strategy for hierarchical time series forecasting with Machine Learning — Modeling technique for hierarchical time collection forecasting with Machine Studying (Picture by the writer)

Validating forecasting fashions

When validating a time collection forecasting mannequin, it’s essential to maintain the well timed order of the time collection in thoughts [6]. When you used the favored KFold cross-validation technique, you’ll use future knowledge to foretell previous occasions. When forecasting, it’s essential to keep away from leaking future info to make predictions in regards to the previous.

Avoid leaking future information to make predictions about the past in time series forecasting validation — Keep away from leaking future info to make predictions in regards to the previous in time collection forecasting validation (Picture by the writer)

As a substitute, you must outline a number of cross-validation intervals after which practice a mannequin with all the info earlier than that interval [3, 8, 10]. E.g., for every week (VALIDATION_PERIOD = 7) of the final month (N_FOLDS = 4).

Cross Validation for Time Series Forecasting — Cross Validation for Time Collection Forecasting (Picture by the writer)

To place the whole lot collectively, you should utilize the next code snippet for reference:

from datetime import datetime, timedelta
import lightgbm as lgbN_FOLDS = 4
VALIDATION_PERIOD = 7
for store_id in STORES_IDS:
for fold in vary(N_FOLDS):
training_date = train_df['timestamp'].max() - timedelta(VALIDATION_PERIOD) * (N_FOLDS-fold)
valid_date = training_date + timedelta(VALIDATION_PERIOD)
print(f"nFold {fold}: ntraining knowledge from {train_df['timestamp'].min()} to {training_date}nvalidation knowledge from {training_date + timedelta(1)} to {valid_date}")
practice = train_df[train_df['timestamp'] <= training_date]
val  = train_df[(train_df['timestamp'] > training_date) & (train_df['timestamp'] <= valid_date) ]
X_train = practice[features]
y_train = practice[target]
X_val = val[features]
y_val = val[target]
train_data = lgb.Dataset(X_train, label = y_train)
valid_data = lgb.Dataset(X_val, label = y_val)
estimator = lgb.practice(lgb_params,
train_data,
valid_sets = [valid_data],
verbose_eval = 100,
)Mo

When evaluating a hierarchical time collection forecasting mannequin, it would make sense to create a simple dashboard [9] to investigate the mannequin’s efficiency on every stage.

There are a lot of extra classes to be discovered from reviewing the educational assets Kagglers have created throughout the course of the “M5 Forecasting — Accuracy” competitors. There are additionally many alternative options for this sort of downside assertion.

On this article, we centered on the overall strategy that was standard amongst many opponents: Formulating the time collection forecasting downside as a regression downside, engineering options from historic knowledge, after which making use of an ML mannequin to it.

This text makes use of synthetical knowledge for the reason that authentic competitors dataset is barely obtainable for non-commercial use. The time collection used on this article are generated from the sum of a sine wave, a linear perform, and a white noise sign.

Subscribe for free to get notified once I publish a brand new story.

Turn into a Medium member to learn extra tales from different writers and me. You possibly can assist me through the use of my referral link if you enroll. I’ll obtain a fee at no additional value to you.

Discover me on LinkedIn, Twitter, and Kaggle!

[1] Alan Lahoud (2020). 5th place solution in Kaggle Discussions (accessed March seventh, 2023)

[2] Chris Miles (2020). Simple model: avg last 28 days grouped by weekday in Kaggle Notebooks (accessed March sixth, 2023)

[3] Eugene Tang (2020). 7th place solution in Kaggle Discussions (accessed March seventh, 2023)

[4] Konstantin Yakovlev (2020). M5 — Simple FE in Kaggle Notebooks (accessed March seventh, 2023)

[5] Konstantin Yakovlev (2020). M5 — Three shades of Dark: Darker magic in Kaggle Notebooks (accessed March seventh, 2023)

[6] LogicAI (2023). Kaggle Days Paris 2022_Jean Francois Puget_Sales forecasting and fraud detection on YouTube. (accessed 21. February 2023)

[7] Matthias (2020). 2nd place solution in Kaggle Discussions (accessed March seventh, 2023)

[8 ] monsaraida (2020). 4th place solution in Kaggle Discussions (accessed March seventh, 2023)

[9] Tomonori Masui (2020). M5 — WRMSSE Evaluation Dashboard in Kaggle Notebooks (accessed March seventh, 2023)

[10] Yeonjun In (2020). 1st place solution in Kaggle Discussions (accessed March seventh, 2023)

[ad_2]

Source link

A Simple Approach to Hierarchical Time Series Forecasting with Machine Learning | by Leonie Monigatti | Mar, 2023

Meet Neural Functional Networks (NFNs): An AI Framework That Can Process Neural Network Weights While Respecting Their Permutation Symmetries

This Is Where JavaScript Beats Python For Machine Learning

Editor

This Is Where JavaScript Beats Python For Machine Learning

Leave a Reply Cancel reply

Browse by Category

Categories

Recommended

A Simple Approach to Hierarchical Time Series Forecasting with Machine Learning | by Leonie Monigatti | Mar, 2023

How you can “increase” your cyclical gross sales knowledge forecast with LightGBM and Python

Timestamp options

Aggregation options

Lag options

Rolling options

Hierarchy as categorical options

Modeling multivariate and hierarchical time collection

Validating forecasting fashions

Meet Neural Functional Networks (NFNs): An AI Framework That Can Process Neural Network Weights While Respecting Their Permutation Symmetries

This Is Where JavaScript Beats Python For Machine Learning

Editor

This Is Where JavaScript Beats Python For Machine Learning

Leave a Reply Cancel reply

Browse by Category

Browse by Tags

Categories

Recommended