Intro to the Autoregressive Model
What is an Autoregressive Model?
An autoregressive (AR) model predicts future behavior based on past behavior. It’s used for forecasting when there is some correlation between values in a time series and the values that precede and succeed them. You only use past data to model the behavior, hence the name autoregressive (the Greek prefix auto– means “self.”). The process is a linear regression of the data in the current series against one or more past values in the same series.
What you will learn.
- Deep understanding of the AR process.
- Effect of coefficients on AR1 series.
- Summary of AR1 process.
- How to find the best lag for your time series.
- How to model the Autoregressive series.
Understanding AR processes.
Let's create some AR1 processes.AR1 process is correlated to the most recent lag. I have created some AR1 processes of 1000 samples each, I vary Φ1 to see what effect it has on the AR1 process.AR1 process is defined by the below formula.
· Where c is some constant.
· Φ1 is the coefficient of the AR1 process.
· yt is the current time index value.
· yt-1 is the value at first lag.
· εt is the error that is normally distributed.
Here is the code that I used to create the AR1 process. This is not the only way to create an AR1 process, but it is one of the many ways to create AR1 processes.
Understanding how different values Φ1 affect time series (AR1).
Cases Φ1 is between [0–1]
when Φ1 = 0
Series will behave like white noise. This type of series has no autocorrelation with its lag value.
when Φ1 = 0.9
There is a good correlation between lag one value. Time series goes up and down also there seems to be some repeating pattern in data.
when Φ1 = 1
The time series grows linearly. all we are doing is adding constant c and white noise to the previous time stamp.
Case Φ1 > 1
when Φ1 = 1.2
Time series grow exponentially. y grows 20% on average, hence we have exponential growth.
Case Φ1 < -1
when Φ1 = -1.2
There still is exponential growth but the series oscillates from negative and positive values over each iteration.
Summary of AR1 process.
Φ1>1: Series has exponential growth; time series is not stationary
Φ1=1 and c>0: Series grows linearly; time series is not stationary.
|Φ1|< 1: Series varies around its mean value; time series is stationary.
Φ1 < 0: Series oscillates between positive and negative values
Similarly, we have AR(p) process, these time series depend on more than one lag. It is defined as
The thing to note is above Φ1 behavior does not hold for AP(p) processes, there is a much more complex relationship between weights.
Now that you have a good understanding of Autoregressive processes let's discuss how this knowledge is going to be helpful to you when modeling the Autoregressive model.
When using AR model, the first question is whether your series is autocorrelated with its past lag. The next question is how many lags I should take, is my data correlated to multiple lags? let's try to answer these.
How many lags should I take?
Domain knowledge:
For example, from domain knowledge, you came to know that the stock price for company x is highly dependent previous day's price. Then you can train AR1 model to forecast the future stock price.
Statistical technique:
PACF function
In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressing the values of the time series at all shorter lags.
Using the PACF plot we can visually tell how many lags are autocorrelated. Let's run the PACF plot on the AR1 process that we generated earlier.
There is a peak at lag zero and lag one. Series is correlated to itself that is what the peak at zero tells us. The peak at one tells that data is highly autocorrelated to its previous lag. so, from the PACF plot, we were able to determine lags at which series is highly autocorrelated. Series is highly autocorrelated at lag one.
To summarise we created AR1 process and took PACF to find the lag at which the series is correlated and concluded that the series is correlated at lag one.
Autoregressive Modeling.
Let's train AutoReg model from the stats model library with a lag of one on the AR1 series we created earlier and see how well the model can fit the data.
From summary statistics, we can clearly see all the parameters are statically significant. y.L1 (Φ1) is the coefficient at lag one which is 0.91 it is close to Φ1 (0.9) which we used to create data; this shows the AR model was able to learn from data very well.
Conclusion
First, we learned about what are autoregressive processes, we created our own AR processes with different coefficients and understood how manipulating coefficients changes data. How to find the best lag for autoregressive modeling and technique to arrive at the best lag. Next, we learned how to model AR series to forecast future timestamps.
References
https://www.statisticshowto.com/autoregressive-model/
https://machinelearningmastery.com/autoregression-models-time-series-forecasting-python/