Fiveable

๐Ÿ’ปAdvanced R Programming Unit 9 Review

QR code for Advanced R Programming practice questions

9.3 ARIMA and SARIMA models

๐Ÿ’ปAdvanced R Programming
Unit 9 Review

9.3 ARIMA and SARIMA models

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ’ปAdvanced R Programming
Unit & Topic Study Guides

Time series analysis is all about predicting the future based on past patterns. ARIMA and SARIMA models are powerful tools that help us do this by breaking down complex data into simpler parts we can understand and work with.

These models look at how past values and random shocks affect current data. By understanding these relationships, we can make better predictions and gain insights into what's driving changes in our time series data.

ARIMA Model Components

Autoregressive (AR) Component

  • Models the relationship between an observation and a certain number of lagged observations
  • Current value is a linear combination of previous values plus an error term
  • Order of AR model, AR(p), represents the number of lagged observations included
  • Captures the memory or persistence in the time series (current value influenced by past values)

Integrated (I) Component

  • Represents the number of differencing operations applied to make a time series stationary
  • Differencing computes the differences between consecutive observations
  • Removes trends or seasonality from the time series
  • Order of integration, I(d), indicates the number of times the series needs to be differenced to achieve stationarity

Moving Average (MA) Component

  • Models the relationship between an observation and past forecast errors
  • Captures the short-term fluctuations or shocks in the time series
  • Order of MA model, MA(q), represents the number of lagged forecast errors included
  • Captures the impact of past random shocks or disturbances on the current value of the series

Combining AR, I, and MA Components

  • ARIMA models combine AR, I, and MA components to capture both short-term and long-term dynamics
  • Notation ARIMA(p, d, q) represents the orders of AR, I, and MA components, respectively
  • Assumes the time series is stationary after differencing
  • Residuals are uncorrelated and normally distributed

ARIMA Model Selection

Assessing Stationarity

  • Stationarity is a crucial assumption for ARIMA modeling
  • Stationary time series has constant mean, variance, and autocovariance over time
  • Visual inspection of time series plot provides initial insights into stationarity (constant level, no trends or seasonality)
  • Statistical tests (Augmented Dickey-Fuller test, Kwiatkowski-Phillips-Schmidt-Shin test) formally assess stationarity

Identifying AR and MA Orders

  • Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots identify appropriate orders of AR and MA components
  • ACF measures correlation between a time series and its lagged values
  • PACF measures correlation between a time series and its lagged values after removing the effect of intermediate lags
  • For AR(p) process, PACF plot shows significant spikes up to lag p, ACF plot decays gradually
  • For MA(q) process, ACF plot shows significant spikes up to lag q, PACF plot decays gradually

Seasonal ARIMA (SARIMA) Models

  • SARIMA models are used when the time series exhibits seasonal patterns
  • Incorporate seasonal AR, I, and MA components to capture non-seasonal and seasonal dynamics
  • Notation SARIMA(p, d, q)(P, D, Q)m represents orders of non-seasonal and seasonal components, m is the number of periods per season

Model Selection Criteria

  • Information criteria (Akaike Information Criterion, Bayesian Information Criterion) compare and select among different ARIMA or SARIMA models
  • Lower values of AIC or BIC indicate better model fit while penalizing model complexity
  • Principle of parsimony suggests choosing the simplest model that adequately captures the dynamics of the time series

ARIMA Model Parameter Interpretation

Maximum Likelihood Estimation (MLE)

  • Commonly used to estimate the parameters of ARIMA and SARIMA models
  • Finds parameter values that maximize the likelihood of observing the given data under the assumed model
  • Likelihood function measures the probability of observing the data given the model parameters

Interpretation of ARIMA Parameters

  • AR parameters (ฯ†) represent weights assigned to lagged observations in the autoregressive component (capture persistence or memory)
  • MA parameters (ฮธ) represent weights assigned to lagged forecast errors in the moving average component (capture impact of past shocks or disturbances)
  • Constant term (c) represents the mean of the differenced series

Interpretation of SARIMA Parameters

  • Seasonal AR parameters (ฮฆ) capture the seasonal persistence or memory in the series
  • Seasonal MA parameters (ฮ˜) capture the impact of past seasonal shocks or disturbances

Statistical Significance of Parameters

  • Assessed using t-tests or confidence intervals
  • Significant parameters indicate the corresponding component (AR, MA, seasonal AR, or seasonal MA) is important in explaining the dynamics of the time series
  • Non-significant parameters may suggest model simplification by removing the corresponding component

ARIMA Model Diagnostics

Residual Analysis

  • Crucial to assess the adequacy of the fitted ARIMA or SARIMA model
  • Residuals should be uncorrelated, normally distributed, and have constant variance (homoscedasticity)
  • Visual inspection of residual plots (residual vs. fitted values, residual ACF and PACF plots) reveals patterns or deviations from assumptions

Ljung-Box Test

  • Assesses the presence of residual autocorrelation
  • Examines the joint significance of residual autocorrelations up to a specified lag
  • Significant Ljung-Box test suggests the presence of residual autocorrelation (model may not adequately capture the dynamics of the series)

Addressing Residual Autocorrelation

  • Increase the orders of AR or MA components to capture the remaining autocorrelation
  • Include additional seasonal components if residual autocorrelation exhibits seasonal patterns
  • Consider using more advanced models (ARIMAX, SARIMAX) that incorporate exogenous variables to explain the remaining variation in the series

Reassessing Stationarity Assumptions

  • Reassess stationarity assumptions after fitting the model
  • If the differenced series used for modeling is not stationary, additional differencing may be necessary
  • Residuals of the fitted model should also be stationary (if not, model may not capture long-term dynamics adequately)

Avoiding Overfitting

  • Follow the principle of parsimony to avoid overfitting
  • Overfitted model may have a good fit to the training data but poor generalization to new data
  • Regularization techniques (adding a penalty term to the likelihood function) can prevent overfitting by shrinking parameter estimates towards zero