Fiveable

๐Ÿ”ฎForecasting Unit 5 Review

QR code for Forecasting practice questions

5.4 Autoregressive Integrated Moving Average (ARIMA) Models

๐Ÿ”ฎForecasting
Unit 5 Review

5.4 Autoregressive Integrated Moving Average (ARIMA) Models

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ”ฎForecasting
Unit & Topic Study Guides

ARIMA models combine autoregressive, integrated, and moving average components to forecast time series data. They capture both short-term and long-term dependencies, making them versatile for various forecasting tasks.

The model structure is denoted as ARIMA(p,d,q), where p, d, and q represent the orders of autoregressive, differencing, and moving average terms. This framework allows for flexible modeling of complex time series patterns.

ARIMA Model Structure

General Notation and Assumptions

  • ARIMA models are denoted as ARIMA(p,d,q), where:
    • p represents the order of the autoregressive term
    • d represents the degree of differencing
    • q represents the order of the moving average term
  • ARIMA models assume that future values of a time series depend on:
    • Past values of the series (autoregressive component)
    • Past forecast errors (moving average component)
  • This structure allows ARIMA models to capture both short-term and long-term dependencies in the data

Autoregressive and Moving Average Components

  • The autoregressive component (AR) models the relationship between:
    • An observation
    • A certain number of lagged observations
  • The moving average component (MA) models the relationship between:
    • An observation
    • A residual error from a moving average model applied to lagged observations
  • The orders p and q determine the number of lag terms included in the AR and MA components, respectively

ARIMA Model Components

Autoregressive Component

  • The autoregressive (AR) component captures the linear dependence between:
    • An observation
    • A certain number of lagged observations
  • The order p determines the number of lag terms included in the AR component
    • Example: In an ARIMA(1,0,0) model, the current observation depends on the immediately preceding observation

Moving Average Component

  • The moving average (MA) component captures the linear dependence between:
    • An observation
    • A certain number of lagged forecast errors
  • The order q determines the number of lag terms included in the MA component
    • Example: In an ARIMA(0,0,1) model, the current observation depends on the immediately preceding forecast error

Differencing Component

  • The differencing component (I) is used to remove non-stationarity in the data by computing differences between consecutive observations
  • The order d determines the number of times the differencing operation is applied
    • Example: First-order differencing (d=1) computes the difference between each observation and its preceding observation
  • Differencing helps to eliminate trends and seasonality in the data, making it suitable for ARIMA modeling

Seasonal ARIMA Models

  • ARIMA models can incorporate seasonal components, denoted as SARIMA(p,d,q)(P,D,Q)m, where:
    • P, D, and Q represent the seasonal autoregressive, differencing, and moving average terms, respectively
    • m represents the number of periods per season
  • Seasonal ARIMA models capture both non-seasonal and seasonal patterns in the data
    • Example: A SARIMA(1,1,1)(1,1,1)12 model for monthly data with a yearly seasonality

ARIMA Models for Forecasting

Model Development Process

  • The development of an ARIMA model involves an iterative process:
    • Model identification: Determine the appropriate orders (p,d,q) based on data characteristics, such as ACF and PACF plots
    • Parameter estimation: Fit the identified model to the data using maximum likelihood estimation or other optimization techniques
    • Diagnostic checking: Assess the adequacy of the fitted model by examining residuals for independence, normality, and homoscedasticity
    • Forecasting: Use the fitted model to generate future predictions and prediction intervals

Interpreting ARIMA Models

  • Interpreting ARIMA models requires understanding:
    • The significance and magnitude of the estimated coefficients
    • The impact of differencing and seasonal components on the forecasted values
  • The coefficients of the AR and MA terms indicate the strength and direction of the relationship between the current observation and the lagged observations or forecast errors
    • Example: A positive AR coefficient suggests that an increase in the lagged observation leads to an increase in the current observation

Forecasting with ARIMA Models

  • Forecasting with ARIMA models involves using the fitted model to generate future predictions
  • Prediction intervals are used to quantify the uncertainty associated with the forecasts
    • Example: A 95% prediction interval indicates the range within which the actual future value is expected to fall with a 95% probability
  • The accuracy of ARIMA forecasts depends on the quality of the model fit and the stability of the underlying data generating process

Differencing Order for ARIMA Models

Purpose of Differencing

  • Differencing is a technique used to remove non-stationarity in a time series by computing differences between consecutive observations
  • The goal of differencing is to obtain a stationary series suitable for ARIMA modeling
    • Example: If a time series exhibits a linear trend, first-order differencing can remove the trend and make the series stationary

Determining the Appropriate Order of Differencing

  • The appropriate order of differencing (d) can be determined by examining:
    • The plot of the original time series data
    • The ACF plot for signs of non-stationarity (trends or seasonal patterns)
  • Statistical tests, such as the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test, can assess the stationarity of a time series
    • The ADF test checks for the presence of a unit root (non-stationarity)
    • The KPSS test checks for the presence of stationarity
  • The order of differencing is typically limited to 0, 1, or 2 to avoid over-differencing and the loss of important information in the data

Limitations of Higher-Order Differencing

  • Higher orders of differencing (d > 2) may lead to over-differencing and the loss of important information in the data
  • Over-differencing can introduce unnecessary complexity and instability in the ARIMA model
    • Example: If a time series is already stationary, differencing it further may create an artificial pattern or introduce additional noise
  • It is essential to balance the need for achieving stationarity with the preservation of meaningful information in the data when determining the appropriate order of differencing