Skip to main content

On This Page

Forecasting with Tree-Based Models for Time Series

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Introduction

Decision tree-based models are versatile tools in machine learning, commonly used for classification and regression on structured data, but also applicable to time series data with appropriate feature engineering. This article details how to leverage decision trees for time series forecasting by extracting lagged features and rolling statistics from raw time series data.

Building Decision Trees for Time Series Forecasting

The article utilizes the monthly airline passengers dataset from the sktime library to demonstrate a practical approach to time series forecasting using decision trees. The core idea is to transform the time series into a supervised learning problem by creating features that represent past values and trends.

Key Insights

  • Lagged Features: Creating lagged features allows the model to learn dependencies between past and present values.
  • Rolling Statistics: Rolling mean and standard deviation prevent data leakage and capture trends in the time series.
  • sktime Library: Provides convenient access to time series datasets for experimentation and model building.

Working Example

import pandas as pd
from sktime.datasets import load_airline

# Load the airline passenger dataset
y = load_airline()

# Function to create lagged features and rolling statistics
def make_lagged_df_with_rolling(series, lags=12, roll_window=3):
    df = pd.DataFrame({"y": series})
    for lag in range(1, lags+1):
        df[f"lag_{lag}"] = df["y"].shift(lag)
    df[f"roll_mean_{roll_window}"] = df["y"].shift(1).rolling(roll_window).mean()
    df[f"roll_std_{roll_window}"] = df["y"].shift(1).rolling(roll_window).std()
    return df.dropna()

# Create the feature dataframe
df_features = make_lagged_df_with_rolling(y, lags=12, roll_window=3)

# Split the data into training and testing sets
train_size = int(len(df_features) * 0.8)
train, test = df_features.iloc[:train_size], df_features.iloc[train_size:]
X_train, y_train = train.drop("y", axis=1), train["y"]
X_test, y_test = test.drop("y", axis=1), test["y"]

# Train a Decision Tree Regressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error
dt_reg = DecisionTreeRegressor(max_depth=5, random_state=42)
dt_reg.fit(X_train, y_train)
y_pred = dt_reg.predict(X_test)

# Evaluate the model
print("Forecasting:")
print("MAE:", mean_absolute_error(y_test, y_pred))

Practical Applications

  • Demand Forecasting: Retail companies can use this approach to predict future product demand based on historical sales data.
  • Pitfall: Ignoring data leakage by including future information in the feature engineering process can lead to overly optimistic performance estimates.

References:

Continue reading

Next article

Git and GitLab: Version Control and DevOps Platforms

Related Content