Forecasting with Tree-Based Models for Time Series

Introduction

Decision tree-based models are versatile tools in machine learning, commonly used for classification and regression on structured data, but also applicable to time series data with appropriate feature engineering. This article details how to leverage decision trees for time series forecasting by extracting lagged features and rolling statistics from raw time series data.

Building Decision Trees for Time Series Forecasting

The article utilizes the monthly airline passengers dataset from the sktime library to demonstrate a practical approach to time series forecasting using decision trees. The core idea is to transform the time series into a supervised learning problem by creating features that represent past values and trends.

Key Insights

Lagged Features: Creating lagged features allows the model to learn dependencies between past and present values.
Rolling Statistics: Rolling mean and standard deviation prevent data leakage and capture trends in the time series.
sktime Library: Provides convenient access to time series datasets for experimentation and model building.

Working Example

import pandas as pd
from sktime.datasets import load_airline

# Load the airline passenger dataset
y = load_airline()

# Function to create lagged features and rolling statistics
def make_lagged_df_with_rolling(series, lags=12, roll_window=3):
    df = pd.DataFrame({"y": series})
    for lag in range(1, lags+1):
        df[f"lag_{lag}"] = df["y"].shift(lag)
    df[f"roll_mean_{roll_window}"] = df["y"].shift(1).rolling(roll_window).mean()
    df[f"roll_std_{roll_window}"] = df["y"].shift(1).rolling(roll_window).std()
    return df.dropna()

# Create the feature dataframe
df_features = make_lagged_df_with_rolling(y, lags=12, roll_window=3)

# Split the data into training and testing sets
train_size = int(len(df_features) * 0.8)
train, test = df_features.iloc[:train_size], df_features.iloc[train_size:]
X_train, y_train = train.drop("y", axis=1), train["y"]
X_test, y_test = test.drop("y", axis=1), test["y"]

# Train a Decision Tree Regressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error
dt_reg = DecisionTreeRegressor(max_depth=5, random_state=42)
dt_reg.fit(X_train, y_train)
y_pred = dt_reg.predict(X_test)

# Evaluate the model
print("Forecasting:")
print("MAE:", mean_absolute_error(y_test, y_pred))

Practical Applications

Demand Forecasting: Retail companies can use this approach to predict future product demand based on historical sales data.
Pitfall: Ignoring data leakage by including future information in the feature engineering process can lead to overly optimistic performance estimates.

References:

https://machinelearningmastery.com/forecasting-the-future-with-tree-based-models-for-time-series/

On This Page

Introduction

Building Decision Trees for Time Series Forecasting

Key Insights

Working Example

Practical Applications

Continue reading

Related Content

Cisco Released Cisco Time Series Model: Their First Open-Weights Foundation Model based on Decoder-only Transformer Architecture

How AutoGluon Enables Modern AutoML Pipelines for Production-Grade Tabular Models with Ensembling and Distillation

Why Decision Trees Fail (and How to Fix Them)