Estacionariedade: Why Historical Averages Are Dangerous for Your Projections
These articles are AI-generated summaries. Please check the original sources for full details.
Estacionariedade: Why Historical Averages Are Dangerous for Your Projections
Many business stakeholders instinctively project future values by calculating the average of past data, but this approach is often flawed, especially with time series data. The core issue lies in Estacionariedade (Stationarity): if a metric has a strong trend, its mean and variance aren’t constant over time, leading to inaccurate forecasts.
Why This Matters
In business, relying on simple historical averages for forecasting can lead to significant errors, particularly in financial or e-commerce contexts. Ignoring non-stationarity can result in systematically incorrect predictions, potentially costing companies revenue or leading to poor strategic decisions. A study of forecasting accuracy found that models failing to account for trends exhibited up to 40% higher error rates compared to those that did.
Key Insights
- Augmented Dickey-Fuller (ADF) Test: A statistical test used to determine if a time series is stationary; a p-value less than 0.05 indicates stationarity.
- Root Unit & Stochastic Trend: Many business metrics exhibit a “root unit,” meaning current values strongly depend on past values and a random shock, creating a trend that drifts away from the historical mean.
- SARIMAX Integration (d): The ‘d’ parameter in SARIMAX models represents the order of integration, effectively modeling the difference between current and previous values to achieve stationarity.
Working Example
from statsmodels.tsa.stattools import adfuller
import pandas as pd
def teste_estacionariedade(serie, nome):
resultado = adfuller(serie.dropna())
print(f"Métrica: {nome}")
print(f"Estatística ADF: {resultado[0]}")
print(f"p-valor: {resultado[1]}")
if resultado[1] < 0.05:
print("Resultado: Estacionária (d=0)")
else:
print("Resultado: NÃO Estacionária (Requer d=1)")
print("-" * 30)
# Example usage (assuming df_main is your DataFrame)
# Create dummy data for demonstration
data = {'ticket_medio': [100, 110, 120, 130, 140], 'sessoes': [500, 480, 460, 440, 420]}
df_main = pd.DataFrame(data)
teste_estacionariedade(df_main['ticket_medio'], 'Ticket Médio')
teste_estacionariedade(df_main['sessoes'], 'Sessões')
Practical Applications
- E-commerce Revenue Forecasting: Online retailers can use SARIMAX with appropriate differencing to project future revenue based on historical trends, accounting for seasonality and growth.
- Pitfall: Applying a simple moving average to a time series with a trend will consistently overestimate or underestimate future values, leading to inaccurate inventory planning and lost sales.
References:
Continue reading
Next article
Getting Started with Docker
Related Content
How to Extract Tables from PDFs Using Python (Without Losing Your Mind)
This article details methods for extracting tables from PDFs using Python, acknowledging the complexities beyond simple text extraction and offering an API solution.
Portfolio Optimization with skfolio: A Scikit-Learn Compatible Approach to Modern Investment Strategies
Optimize investment portfolios using skfolio, a scikit-learn compatible library for building, testing, and tuning strategies. This technical guide demonstrates how to implement mean-variance, risk-parity, and hierarchical clustering methods while utilizing robust covariance estimators and Black-Litterman views to achieve higher Sharpe ratios through systematic hyperparameter tuning.
How Can We Build Scalable and Reproducible Machine Learning Experiment Pipelines Using Meta Research Hydra?
This article explains how to use Meta's Hydra framework to create scalable and reproducible ML experiments through structured configurations, overrides, and multirun simulations.