Cracking the Code of Complex Time Series with SARIMA
Understanding the Model That Combines Seasonality and Trends
SARIMA is an evolution of the well known ARIMA, and it combines the strengths of seasonal adjustments and trend analysis to provide more accurate and insightful data interpretation.
In this article, we’ll explore the fundamentals of SARIMA, how it works, and why it’s an essential tool for anyone dealing with complex time series data using Python.
The Foundations of ARIMA
ARIMA stands for autoregressive integrated moving average. It is a popular and widely used statistical model for analyzing and forecasting time series data. The ARIMA model combines three key components: autoregression (AR), differencing (I for integrated), and moving average (MA). In further details, it is as follows:
The autoregressive part (AR) of the model specifies that the output variable depends linearly on its own previous values. It is denoted by p, which is the number of lag observations included in the model.
The integrated part (I) of the model involves differencing the data series to make it stationary. Stationarity implies that the properties of the series do not depend on the time at which the series is observed. It is denoted by d, which is the number of times the raw observations are differenced.
The moving average part (MA) specifies that the output variable depends linearly on the past error terms. It is denoted by q, which is the size of the moving average window.
Hence, the ARIMA model is simply a function of p, q, and d:
The three above parameters are determined in the data preparation step by using the autocorrelation function (ACF) and the partial autocorrelation function (PACF). How is this done? Well, first let’s define both functions:
The autocorrelation function (ACF) measures the correlation between observations of a time series separated by different time lags. It helps in identifying the strength and direction of the relationship between the observations at different lags. A gradual decline in autocorrelations (i.e., “decay”) suggests non-stationarity in the series.
Significant spikes in the ACF plot indicate strong correlations at specific lags.
The partial autocorrelation function (PACF) measures the correlation between observations of a time series separated by a certain lag, controlling for the values of the time series at shorter lags. It helps in identifying the direct effect of a lag on the series, without the influence of shorter lags. A sharp drop after a few lags suggests a cut-off point, indicating the order of the autoregressive part.
Significant spikes in the PACF plot indicate the lags that have a direct effect on the series.
The parameter d represents the number of differences needed to make the time series stationary (generally, it is 1). A stationary series has a constant mean and variance over time. To check for stationarity, we can use the ADF test.
The parameter p represents the number of lag observations included in the model. We have to look for the lag at which the PACF plot cuts off sharply (i.e., where the partial autocorrelations become insignificant). This lag suggests the order of the autoregressive part.
The parameter q represents the size of the moving average window. Look for the lag at which the ACF plot cuts off sharply (i.e., where the autocorrelations become insignificant). This lag suggests the order of the moving average part.
If you want to see more of my work, you can visit my website for the books catalogue by simply following this link:
Coding ARIMA in Python
Let’s take an example to clarify all of this. We will generate a synthetic time series, calculate the three parameters, apply the ARIMA model and forecast the future values. Use the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
# Generate synthetic data
date_range = pd.date_range(start="2020-01-01", periods=365, freq="D")
data = pd.DataFrame({
"date": date_range,
"value": np.sin(np.arange(len(date_range)) * 2 * np.pi / 30) + np.random.normal(0, 0.1, size=len(date_range))
})
data.set_index("date", inplace=True)
# Plot the data
plt.figure(figsize=(10, 6))
plt.plot(data, color = 'black')
plt.title("Synthetic Time Series Data")
plt.legend()
plt.show()
plt.grid()
Next, check for the stationarity of the above data. If the p-value is greater than 0.05, then it is not stationary. Afterwards, difference the data, to make it stationary and plot the new graph:
# Check for stationarity using the Augmented Dickey-Fuller test
result = adfuller(data['value'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:', result[4])
# If non-stationary, difference the data
data_diff = data['value'].diff().dropna()
# Plot differenced data
plt.figure(figsize=(10, 6))
plt.plot(data_diff, color = 'black')
plt.title("Differenced Time Series Data")
plt.legend()
plt.show()
plt.grid()
Plot the ACF and PACD graphs:
# Plot ACF and PACF to identify p and q
fig, axes = plt.subplots(1, 2, figsize=(15, 6))
plot_acf(data_diff, ax=axes[0])
plot_pacf(data_diff, ax=axes[1])
plt.show()
Fit and predict using ARIMA:
# Define the ARIMA model with identified parameters (example: p=1, d=1, q=1)
model = ARIMA(data['value'], order=(1, 1, 1))
model_fit = model.fit()
# Print model summary
print(model_fit.summary())
# Forecast
forecast = model_fit.forecast(steps=30)
forecast_index = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=30, freq="D")
# Plot the forecast
plt.figure(figsize=(10, 6))
plt.plot(data['value'], color = 'black', label='Original Data')
plt.plot(forecast_index, forecast, label='Forecast', color='red')
plt.title("ARIMA Forecast")
plt.legend()
plt.show()
plt.grid()
Obviously, ARIMA (1, 1, 1) comes from the fact that we differenced the data once, and the ACF/PCF cut off sharply at 1.
Coding SARIMA in Python
SARIMA, which stands for seasonal autoregressive integrated moving average, is an advanced form of the ARIMA model. It is used for time series forecasting, especially when your data exhibits seasonality that might influence the target variable.
The seasonal order (p, d, q, s) in SARIMA is specifically designed to model and remove the seasonal patterns from the data. It assumes these patterns repeat over fixed intervals (e.g., yearly, monthly).
We will redo the previous example but include a seasonal pattern with an order of 30 days (as the data is supposed to be daily).
Fit and predict using SARIMA:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
# Generate synthetic data
date_range = pd.date_range(start="2020-01-01", periods=365, freq="D")
data = pd.DataFrame({
"date": date_range,
"value": np.sin(np.arange(len(date_range)) * 2 * np.pi / 30) + np.random.normal(0, 0.1, size=len(date_range))
})
data.set_index("date", inplace=True)
# Specify the SARIMA model (30-day seasonal monthly pattern)
model = SARIMAX(data['value'],
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 30))
# Fit the model
model_fit = model.fit(disp=False)
# Make a forecast for the next 50 days
forecast = model_fit.forecast(steps=50)
forecast_index = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=50, freq="D")
plt.figure(figsize=(10, 6))
plt.plot(data['value'], color = 'black', label='Original Data')
plt.plot(forecast_index, forecast, label='Forecast', color='red')
plt.title("ARIMA Forecast")
plt.legend()
plt.show()
plt.grid()
Notice how the SARIMA model succeeds in capturing the essence of the time series, and provides a better interpretation. The forecast period is 50 days forward in this example.
You can also check out my other newsletter The Weekly Market Sentiment Report that sends weekly directional views every weekend to highlight the important trading opportunities using a mix between sentiment analysis (COT report, put-call ratio, etc.) and rules-based technical analysis.
If you liked this article, do not hesitate to like and comment, to further the discussion!