0% found this document useful (0 votes)
20 views

L4_Modeling_Cycles

EC4304

Uploaded by

Jay Tan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

L4_Modeling_Cycles

EC4304

Uploaded by

Jay Tan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Lecture 4

Charterizing and Modeling Cycles

Semester 1 AY 2024/2025

1
The Cycle Component

• Recall that it can be useful to represent the mean of a time


series as a sum of 3 components:

E(Yt+h |Ωt ) = Tt + St + Ct
I Trend
I Seasonal
I Cycle

• In this lecture, we will talk about how we think about the last
(and the most complicated) component: Cycle.
• “Cycle” is whatever persistent dynamics that remain after
accounting for trend and seasonality.

2
The Time Series Processes

• So far we dealt with simple deterministic models of trend


and seasonality.
• The cyclical component will be represented by some
stochastic time series process.
• The properties this process possesses will be crucial for
modeling and forecasting considerations.
I E.g., if the past is not connected with the future, or the
probabilistic structure keeps changing with time, we are
doomed to fail.

3
Mean and Variance Stationarity

• A time series Yt is mean stationary if

E(Yt ) = µ, ∀t.
I Counter-example: A time series with a linear trend is not mean
stationary since its mean depends on time.
• A time series Yt is variance stationary if

Var(Yt ) = σ 2 , ∀t.
I Counter-example: a series with variance that trends (increases)
with time is not variance stationary.
• Thus, we assume Ct is both mean and variance stationary.

4
Autocovariance

• Unlike in iid data, in time series we need to account for the


covariance structure of Yt with itself at different time
displacements.
• This is done through the autocovariance function.
• The k-th order autocovariance of Yt is:

γ(t, k) = cov(Yt , Yt−k )


= E{[Yt − E(Yt )][Yt−k − E(Yt−k )]}
= E[(Yt − µ)(Yt−k − µ)] (mean stationary)

• Note that γ(t, 0) = Var(Yt ) = σ 2 .

5
Autocorrelation

• To better understand the linear dependence of a series on its


past values, we normalize the autocovariance function to lie
between -1 and 1, which gives the autocorrelation function.
• The k-th order autocorrelation of Yt is:
cov(Yt , Yt−k )
ρ(t, k) = p
Var(Yt )Var(Yt−k )
cov(Yt , Yt−k )
= (variance stationary)
Var(Yt )

6
Covariance Stationarity

• A time series Yt is covariance stationary if its mean,


variance, and the autocovariance function do NOT depend on
time. Also, its 2nd moment (variance) is finite. That is,
I E(Yt ) = µ, Var(Yt ) = σ 2 , γ(t, k) = γ(t), and E(Yt2 ) < ∞.
• Also called weak stationarity, wide-sense stationarity, or
second-order stationarity.
• “Stationary” typically means covariance stationarity in books.
• We will assume Ct is covariance stationary.

7
Covariance Stationarity: Summary
If Yt is covariance stationary, then

• E(Yt ) = µ and E(Yt2 ) = µ2 < ∞, ∀t


• Var(Yt ) = σ 2 , ∀t
• γ(t, k) = Cov(Yt , Yt−k ) = γ(k) ∀t.

And its autocovariance function has the following properties.

• γ(0) = Cov(Yt , Yt ) = Var(Yt ) = σ 2


• γ(k) = γ(−k), ∀k
• |γ(k)| ≤ γ(0), ∀k

As for the autocorrelation function,


γ(k)
• ρ(k) = γ(0) , ρ(0) = 1, and −1 ≤ ρ(k) ≤ 1, ∀k
• ρ(k) = ρ(−k), ∀k
8
Cov. Stationary Process Example: US SA
Unemp. Rate 1/4
Figure 1: Seasonally Adjusted US Unemployment Rate

9
Cov. Stationary Process Example: US SA
Unemp. Rate 2/4
Figure 2: Monthly US Unemployment Rate vs. Its 4 Lags

10
Cov. Stationary Process Example: US SA
Unemp. Rate 3/4
Autocorrelations

Does this look covariance stationary?

. correlate UNRATE L(1/6).UNRATE


(obs=913)

| L. L2. L3. L4. L5. L6.


| UNRATE UNRATE UNRATE UNRATE UNRATE UNRATE UNRATE
-------+---------------------------------------------------------------
UNRATE |
--. | 1.0000
L1. | 0.9699 1.0000
L2. | 0.9377 0.9700 1.0000
L3. | 0.9080 0.9378 0.9699 1.0000
L4. | 0.8766 0.9080 0.9378 0.9699 1.0000
L5. | 0.8485 0.8767 0.9080 0.9378 0.9699 1.0000
L6. | 0.8194 0.8486 0.8768 0.9080 0.9377 0.9699 1.0000

11
Cov. Stationary Process Example: US SA
Unemp. Rate 4/4
Autocorrelation Plot (Correlogram)

12
What is “Strong” Stationarity Then?

• Yt is a strictly stationary time series if its unconditional joint


probability distribution does NOT change when shifted in time.
I Formally, the joint probability distribution of {Yt1 , Yt2 , . . . Ytn }
is identical to that of {Yt1 +k , Yt2 +k , . . . Ytn +k }, for any
{t1 , t2 , . . . tn } and any k.
• The entire probabilistic structure of a strictly stationary process
is invariant under a time displacement (very stringent condi-
tion!).
• Weaker condition, but potentially stronger than weak station-
arity.
I Stationarity up to order m (roughly speaking, all joint
moments up to m exist and stay constant over time).

13
White Noise
• A (zero-mean) white noise process has zero autocorrelations:
ρ(k) = 0 for k > 0.
I For example,
Yt = t , t ∼ (0, σ 2 )
where i is uncorrelated over time, i.e., Cov(t , s ) = 0, for any
t and s.∗ We can also write: Yt ∼ WN(0, σ 2 )
• Since a WN process is serially uncorrelated, it is linearly unfore-
castable.
• Gaussian white noise is an important special case

t ∼ N(0, σ 2 ).

• Basic building block of any time series model. Good approxi-


mation for asset returns and some growth rates.

Note that it is not necessarily iid.

14
Gaussian White Noise Simulation

Figure 3: Yt = t , t ∼ N(0, 1).

15
WN Process Example: S&P Stock Return

16
Ergodicity 1/4
• We think of the time series expectation as an ensemble
average:
I
1X (i)
E(Yt ) = plimI→∞ Y ,
I i=1 t
(i)
where Yt is the observation at date t from sequence i.

17
Ergodicity 2/4

• In practice, we have a sample of T observations from one


sequence, so the sample average is taken over time, i.e.,
T
1 X (i)
Y .
T t t

• A process is called ergodic for the m-th moment if the time


average converges to the ensemble average as T grows large.
• This happens when ρ(k) → 0 as k → ∞.
• We need ergodicity for asymptotics (Ergodic Theorem: LLN
for time series).

18
Ergodicity 3/4

19
Ergodicity 4/4

• For stationary Gaussian process, ergodicity for the mean and


second moment requires:

X
|γ(j)| < ∞.
j=0
I This implies γ(j) must go to zero as j increases.
• Typically, requirements for weak stationarity and ergodicity
coincide, but not always.
• Implication: If Yt is ergodic, the long-horizon forecast (large
h) converges to the unconditional mean: ŶT +h|T ≈ E(Yt ).
• What’s not ergodic: seasonal and trend components. NSA
(not seasonally adjusted) series may not be ergodic.

20
Unemployment Insurance Claims: SA vs. NSA

21
Autocorrelation with Geometric Decay

• Geometric decay: ρ(k) ≈ c k for some c < 1.


• Such smooth decline to zero suggests ergodic series.
• Long-range forecasts are close to the unconditional mean.
• This feature is commonly found in economic variables
measured in levels.

22
US SA Unemployment Rate

23
Negative Autocorrelation

• ρ(1) < 0: Yt changes direction in adjacent periods


(anti-persistence).
• Ergodic if |ρ(k)| goes to 0 as k grows.
• Occurs in some economic variables measured as changes or
differences.
• Tends to alternate: ρ(1) < 0, then ρ(k) > 0 for some k and
so on.
• Forecasts can have opposite sign from current level.

24
Change in SA Unemployment Insurance Claims

25
Autocorrelation with Slow Decay

• Power law: ρ(k) ≈ k −d for some d > 0.


• Although decay to zero is very slow, it can still be ergodic.
• Not common for economic variables. A leading example in
finance: absolute returns and other volatility proxies
(especially at higher frequencies).
• This type of process is called a long memory process.
Originated in hydrology (River Nile reservoirs study, H. E.
Hurst (1951)).

26
S&P 500 Absolute Returns example

27
Estimation: Mean and Autocovariance

• The population mean µ = E(Yt ) is estimated by the sample


mean:
T
1 X
µ̂ = Yt
T t=1
• The population autocovariance γ(k) = E[(Yt − µ)(Yt−k − µ)]
is estimated by the sample covariance:
T
1 X
γ̂(k) = (Yt − µ̂)(Yt−k − µ̂)
T t=k+1

• Note that ergodicity ensures the LLN works.

28
Estimation: Autocorrelations

• The population autocorrelation ρ(k) = γ(k)/γ(0) is estimated


by the ratio of the sample analogues:

γ̂(k)
ρ̂(k) =
γ̂(0)

• Note that estimated autocovariances and autocorrelations are


subject to sampling uncertainty. Also, estimates get much
worse as k gets large relative to T .
• Thus, we can estimate small positive/negative ρ(k) even if it
is zero in population. Thus, plotting confidence intervals to
see which are significantly different from zero.

29
Confidence Bands for Autocorrelations 1/2

• If Yt is independent white noise, then


1
Var(ρ̂) ≈
T
• One can show that ρ̂(k) ∼ N(0, 1/T ).
√ √
• 95% Confidence Interval: [−2/ T , 2/ T ]. Reported by
some packages as default.
• Bartlett’s formula: If Yt is MA(q) (i.e., ρ(k) = 0, for k > q):
q
1
!
2
Var(ρ̂(k)) ≈ 1+2
X
ρ(i) ,k > q
T i=1

30
Confidence Bands for Autocorrelations 2/2

√ √
• If sample autocorrelations are all within [−2/ T , 2/ T ],
then Yt is likely white noise.
• Otherwise, examine Bartlett bands as an approximation.
I It is important to note that the interval is pointwise, i.e.,
constructed for each individual ρ(k) estimate (i.e., not a joint
test).
• Stata reports Bartlett bands as a shaded region. The
interpretation is if the estimate falls outside the shaded
region, then it is significantly different from 0.

31
Joint Tests for Autocorrelations 1/2
• Often we are interested in knowing whether all autocorrelations
(up to, say, m < k in practice) are jointly zero.
• Recall that ρ̂(k) ∼ N(0, 1/T ). Thus, T ρ̂2 (k) ∼ χ21 .
• Also, it can shown that the autocorrelations at various displace-
ment are approximately independent of one another.
• Box-Pierce Q-statistic:
m
QBP = T ρ̂2 (i) ∼ χ2m
X

i=1
• Ljung-Box Q-statistic:
m
1
QLB = T (T + 2) ρ̂2 (i) ∼ χ2m
X

i=1
(T − i)
• These are the so-called Portmanteau tests.
• The distributions above are under H0 : Yt is white noise.
• Stata reports the QLB when the corrgram command is exe-
32
cuted.
Joint Tests for Autocorrelations 2/2

• How to choose m? Should be “large, but not too large.”


• Diebold suggests somewhere around T 1/2 .
• These tests should be used with caution. Many known problems
with the asymptotic approximation.
• Depending on what kind of processes you think you are up
against in practice, you may consider improved versions: Lobato
et al. (2001), Pena and Rodriguez (2002), Delgado and Velasco
(2010, 2011).
I Advanced, need own coding.

33
Lags and the Lag Operator Notation

• Recall that we call Yt−1 the first lag, Yt−2 the second lag, etc.
• The lag operator L is a useful way to manipulate lags.
• It is defined by the relation: LYt = Yt−1 .
• Taking it to a power means iterative application:

L2 Yt = LLYt = LYt−1 = Yt−2

• In general we have: Lk Yt = Yt−k .


• Polynomial of order k: A(L) = b0 + b1 L + b2 L2 + · · · + bk Lk

34
Remarks 1: Lag Operator in Stata

Stata uses the same notation (Note that we need tsset or


tsmktim before use).

• gen x1 = L.x: Creates the first lag of variable x (first


value missing).
• gen x5 = L5.x: Creates the fifth lag of variable x (first
5 values missing).
• Can use them inline, e.g.: reg rgdp L.rgdp L2.rgdp
• Conveniently include many lags: reg rgdp
L(1/12).rgdp
• Note: Capitalization doesn’t matter, can type l.x, l2.x
etc.

35
Wold’s Theorem

Theorem 1

Let {Yt } be any mean-zero covariance stationary process, not


containing any deterministic trend or seasonality. Then, it can
be expressed as:

Yt = B(L)t = t ∼ WN(0, σ 2 ),
X
bi t−i ,
i=0

where b0 = 1 and
P∞ 2 < ∞.
i=0 bi

• In short, any (even nonlinear) stationary process can be


approximated by the general linear combination above
(though nonlinear processes may give better approximation).

36
Wold’s Theorem: Innovations

• We can see that stationary time series processes are constructed


as linear functions of innovations, or shocks, t , which are
white noise.
• t are (1-step-ahead) forecast errors from the optimal linear
predictor that regresses Yt on a constant and all available lags:
Yt−1 , Yt−2 , . . .
• The innovations t are often further assumed serially indepen-
dent: E(kt | Ωt−1 ) = E(kt ) for all k. That is, this rules out
serial dependence in the conditional variance.

37
Wold Representation: Moments
• Unconditional moments (easy to derive):
E(Yt ) = 0

Var(Yt ) = E(Yt2 ) = σ 2 bi2
X

i=0
• Conditional moments (assuming t independent):

E(Yt | Ωt−1 ) = bi t−i ;
X

i=1
Var(Yt | Ωt−1 ) = E([Yt − E(Yt | Ωt−1 )]2 | Ωt−1 )
= E(2t | Ωt−1 ) = E(2t ) = σ 2 ,
where Ωt−1 = {t−1 , t−2 , . . .}
• Conditional mean moves over time in response to the
information set. This is particularly important for forecasting
(why?).
38
Modeling Cycles

• The infinite order polynomial in the Wold representation may


not look very useful (infinitely many parameters).
• However, it is possible that, at least approximately:
q
Θ(L) θi Li
P
B(L) ≈ = Ppi=0
Φ(L) j=0 φj L
j

• B(L) may be a rational polynomial with (p + q) parameters.


• We may hope to find an accurate approximation of the Wold
representation that is quite parsimonious.
• In this lecture, we will learn three models: Moving Average
(i.e., p = 0), Autoregressive (i.e., q = 0), and Autoregressive
Moving Average (i.e., neither p nor q is zero).

39
Moving Average (MA) Processes

• The Wold representation is an example of an MA(∞).


• MA models are linear functions of stochastic errors.
• The simplest example is MA of order 1, MA(1):

Yt = t + θt−1 = (1 + θL)t , t ∼ WN(0, σ 2 )

• The MA coefficient θ controls the degree of serial correlation.


It can be positive or negative.
• The innovations affect Yt over two periods: 1)
contemporaneous impact; 2) one-period delayed impact.

40
MA(1) Examples

41
Mean and Variance of MA(1)

• The unconditional mean:

E(Yt ) = E(t + θt−1 )


= E(t ) + θE(t−1 ) = 0

• The unconditional variance:

Var(Yt ) = Var(t + θt−1 )


= Var(t ) + Var(θt−1 )
= σ 2 (1 + θ2 )

• Variance depends on θ: larger (in absolute value) coefficient


implies more variability.

42
Conditional Mean and Variance of MA(1)
• The conditional mean:
E(Yt | Ωt−1 ) = E(t + θt−1 | Ωt−1 )
= E(t | Ωt−1 ) + θE(t−1 | Ωt−1 )
= θt−1
• The conditional variance:
Var(Yt | Ωt−1 ) = Var(t + θt−1 | Ωt−1 )
= Var(t | Ωt−1 ) + Var(θt−1 | Ωt−1 )
= σ2
• θt−1 is the best forecast for Yt under squared loss. The
optimal forecast error then is t .
• The conditional variance, the innovation variance, and the
1-step forecast variance are the same.
43
Autocovariance of MA(1)
• First autocovariance:
γ(1) = E(Yt Yt−1 )
= E((t + θt−1 )(t−1 + θt−2 ))
= E(t t−1 ) + θE(2t−1 ) + θE(t t−2 ) + θ2 E(t−1 t−2 )
= θσ 2
• Autocovariances for k > 1:
γ(k) = E(Yt Yt−k )
= E((t + θt−1 )(t−k + θt−k−1 ))
= E(t t−k ) + θE(t−1 t−k ) + θE(t t−k−1 )
+ θ2 E(t−1 t−k−1 )
=0
• The autocovariance function is zero for all k > 1.
44
Autocorrelation of MA(1)

• First autocorrelation:
γ(1) θσ 2 θ
ρ(1) = = 2 =
γ(0) σ (1 + θ2 ) (1 + θ2 )

• Since autocovariances for k > 1 are 0, so are the


autocorrelations.
I The process has very short memory (1 period).
• The sign of θ determines the sign of the first autocorrelation.
• ρ(1) varies between (−0.5, 0.5) because θ varies between
(−1, 1),

45
Stationarity and Invertibility of MA(1)

• MA(1) is covariance stationary for any value of θ.


• Note under some conditions we could write MA(1) in terms of
the current period shock and lags of Yt , i.e., obtain its
autoregressive representation.
• This might be desirable, as it is natural to think of forecasting
Yt using its own past values, not innovations. There are also
technical reasons related to estimation.
• If we can obtain the above, the MA process is said to be
invertible.

46
Inversion of MA(1) 1/3

• Rewrite the process as: t = Yt − θt−1


• Lag the above one period: t−1 = Yt−1 − θt−2
• Combine the two: t = Yt − θ(Yt−1 − θt−2 )
• Do the two steps (lag and combine) again:

t = Yt − θYt−1 + θ2 (Yt−2 − θt−3 )


= Yt − θYt−1 + θ2 Yt−2 − θ3 t−3

47
Inversion of MA(1) 2/3

• Repeat the steps in the previous slide infinitely many times:

t = Yt − θYt−1 + θ2 Yt−2 − θ3 Yt−3 + θ4 Yt−4 + . . .

• Then we can write Yt as:

Yt = t + θYt−1 − θ2 Yt−2 + θ3 Yt−3 − θ4 Yt−4 + · · ·



=− (−θ)i Yt−i + t
X

i=1

• This series converges (and inversion exists) if |θ| < 1.


• In lag operator notation we can write (valid if |θ| < 1):

(1 + θL)−1 Yt = t

48
Inversion of MA(1) 3/3

• Power series expansion converges for |b| < 1:

(1 − bL)−1 = lim (1 + bL + b 2 L2 + b 3 L3 + · · · + b j Lj )
j→∞

• Here we have:

(1 + θL)−1 = (1 − (−θL))−1 = 1 − θL + θ2 L2 − θ3 L3 + . . .

• Another way to state the invertibility condition: all roots of


the MA polynomial (of L) are outside the unit circle
(equivalently, inverses of roots all lie inside the unit circle).
• To see this, we solve (1 + θL) = 0 to get L = −1/θ. Then
| − 1/θ| > 1 if |θ| < 1.

49
Invertible MA Processes

• Key feature: We can express t as a function of only present


and all the past values of the data (need all the future values
if noninvertible!).
• This fits well with the forecaster’s logic: Current values are
linked with past values even though the model is not an
explicit autoregression.
• We will thus always work with invertible MA models.
• t associated with the invertible MA representation are
sometimes called fundamental innovations.

50
MA(q) Processes

• The Moving Average of order q (MA(q)) is given by:

Yt = t + θ1 t−1 + θ2 t−2 + · · · + θq t−q , t ∼ WN(0, σ 2 )

• Can rewrite the equation in the lag polynomial as:

Yt = (1 + θ1 L + θ2 L2 + · · · + θq Lq )t = Θ(L)t

• The first q autocorrelations are nonzero, those above q are


zero.
• Invertible if all the q roots of the polynomial are outside the
unit circle.

51
MA(q) Process vs. Wold Representation
• Wold:

Yt = B(L)t = bi t−i , b0 = 1, t ∼ WN(0, σ 2 )
X

i=0

• MA(q):
q
Yt = Θ(L)t = θi t−i , θ0 = 1, t ∼ WN(0, σ 2 )
X

i=0

• Effectively use the approximation:


Pq
Θ(L) i=0 θi L
i
B(L) ≈ = = Θ(L)
Φ(L) 1
• Seems a very natural approximation to the Wold
representation, which is MA(∞).

52
Example: Quarterly Consumption Growth

53
MA(2) Model Estimation
• In Stata, we can use the arima command to estimate the
MA(2) by nonlinear optimization.

arima consgr, arima(0,0,2)

(setting optimization to BHHH)


Iteration 0: log likelihood = -891.78615
Iteration 1: log likelihood = -890.11453
Iteration 2: log likelihood = -890.02205
Iteration 3: log likelihood = -889.95543
Iteration 4: log likelihood = -889.90391
(switching optimization to BFGS)
Iteration 5: log likelihood = -889.85688
Iteration 6: log likelihood = -889.64113
Iteration 7: log likelihood = -889.61024
Iteration 8: log likelihood = -889.58151
Iteration 9: log likelihood = -889.56826
Iteration 10: log likelihood = -889.55593
Iteration 11: log likelihood = -889.5521
Iteration 12: log likelihood = -889.55201
Iteration 13: log likelihood = -889.55201

Another syntax: arima consgr, ma(1/2). Note that arima consgr, ma(2) will
suppress the first term. 54
MA(2) Estimation: Nonlinear Least Square

54
MA(2) Model Estimation

Sample: 1947q2 thru 2024q2 Number of obs = 309


Wald chi2(2) = 48.13
Log likelihood = -889.552 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
| OPG
consgr | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
consgr |
_cons | 3.322972 .319276 10.41 0.000 2.697202 3.948741
-------------+----------------------------------------------------------------
ARMA |
ma |
L1. | -.0886977 .0172324 -5.15 0.000 -.1224725 -.0549229
L2. | .176492 .0345969 5.10 0.000 .1086833 .2443007
-------------+----------------------------------------------------------------
/sigma | 4.304923 .0539136 79.85 0.000 4.199254 4.410591
------------------------------------------------------------------------------
Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

55
Autoregressive (AR) Processes

• The first order autoregressive process (AR(1)) is given by:

Yt = φYt−1 + t , t ∼ WN(0, σ 2 )

• Can rewrite the equation in the lag operator as:

(1 − φL)Yt = t

• The sign of φ determines whether Yt and Yt−1 are positively


or negatively correlated.

56
AR Process Examples

57
Inversion of AR(1)

• Rewrite the process as an MA(∞):

Yt = φYt−1 + t
= t + φ(φYt−2 + t−1 )

= t + φt−1 + φ2 t−2 + · · · =
X
φi t−i
i=0

• We need |φ| < 1 for this inversion to make sense, as well as


for stationarity (or can think of inverting (1 − φL) similarly to
the MA case).
• If φ = 1 then Yt = t + t−1 + t−2 + . . . does not converge,
so the infinite sum is not defined.

58
Mean and Variance of AR(1)

• Can recycle computation of moments for the general linear


process.
• Compute the unconditional mean from the MA(∞) inversion:
∞ ∞
!
E(Yt ) = E = φi E(t−i ) = 0
X X
φi t−i
i=0 i=0

• Use the same representation to compute variance:


∞ ∞
!
Var(Yt ) = Var = φ2i Var (t−i )
X X
φi t−i
i=0 i=0
σ2
=
1 − φ2

59
Alternative Variance Computation 1/2
• Apply the variance operator on both sides:

Var(Yt ) = Var(φYt−1 + t )

which implies

Var(Yt ) = φ2 Var (Yt−1 ) + Var(t )

• Assuming variance stationarity, Var(Yt ) = Var(Yt−1 ) has to


hold. Solving out for Var(Yt ) then yields:

Var(Yt ) = φ2 Var(Yt ) + σ 2
σ2
Var(Yt ) =
1 − φ2
• If φ = 1 then Var(Yt ) is infinite. This is inconsistent with
covariance stationarity.
60
Alternative Variance Computation 2/2

• Another insight is if φ = 1, we will have:

Var(Yt ) = Var(Yt−1 ) + σ 2 > Var(Yt−1 )

• So Var(Yt ) depends on time. This is inconsistent with


variance stationarity.
• |φ| < 1 is necessary for covariance stationarity.
• Alternatively, check the lag polynomial root condition: must
be outside the unit circle.

61
Remarks 2: Detour:Random Walk / Unit Root

• An AR(1) with φ = 1 is known as a random walk or a


unit root process:

Yt = Yt−1 + t

• By back substitution we can get:


t−1
Yt = Y0 +
X
t−i
i=0
• Infinite memory: the shocks have permanent effects.
Called random walk because it wanders without mean
reversion.
• The unit root term arose because the lag polynomial
(1 − L) has a root at L = 1. Note that ∆Yt = Yt − Yt−1
is white noise.
62
63
Conditional Mean and Variance of AR(1)

• Conditional mean:

E(Yt |Ωt−1 ) = E(φYt−1 + t |Ωt−1 )


= φYt−1 + E (t |Ωt−1 ) = φYt−1

• Conditional variance:

Var(Yt |Ωt−1 ) = Var(φYt−1 + t |Ωt−1 ) = Var(t |Ωt−1 ) = σ 2

• Note the conditional mean adapts to new information in a


very simple way.

64
Autocovariance of AR(1) 1/2
• Take the original equation:

Yt = φYt−1 + t

• Multiply both side by Yt−k :

Yt Yt−k = φYt−1 Yt−k + t Yt−k

• Now take expectations on both sides:

E(Yt Yt−k ) = E(φYt−1 Yt−k ) + E(t Yt−k )

γ(k) = φγ(k − 1)
• This is called the Yule-Walker equation.
• We can recursively work out autocovariances: just need to
know γ(0).
65
Autocovariance of AR(1) 2/2

• Using the variance we derived earlier we note:

σ2
γ(1) = φγ(0) = φ ,
1 − φ2
σ2
γ(2) = φγ(1) = φ2
1 − φ2
..
.
σ2
γ(k) = φγ(k − 1) = φk .
1 − φ2

• Autocorrelation is found by division by γ(0):

ρ(k) = φk , k = 0, 1, 2 . . .

66
Autocorrelation of AR(1)

• Stationary AR(1) autocorrelations exhibit geometric decay


(recall example of the unemployment rate):

ρ(k) = φk , k = 0, 1, 2 . . .

• If φ is small, the autocorrelations decay rapidly towards zero.


• If φ is large (close to 1), autocorrelations decay moderately.
• Thus, the AR(1) parameter describes the persistence in the
time series.

67
Example: AR(1) for Unemployment Rate

. reg UNRATE L.UNRATE

Source | SS df MS Number of obs = 918


-------------+---------------------------------- F(1, 916) = 14622.93
Model | 2521.49599 1 2521.49599 Prob > F = 0.0000
Residual | 157.949862 916 .172434348 R-squared = 0.9411
-------------+---------------------------------- Adj R-squared = 0.9410
Total | 2679.44585 917 2.92196931 Root MSE = .41525

------------------------------------------------------------------------------
UNRATE | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
UNRATE |
L1. | .9694779 .0080172 120.93 0.000 .9537437 .985212
|
_cons | .1747403 .0476544 3.67 0.000 .0812158 .2682648
------------------------------------------------------------------------------

• Note estimation by OLS (more on this later), very high persistence of the unemployment rate.

68
Example: AR(1) for Unemployment Rate

69
AR(p) Processes

• We can consider the p-th order autoregressive process:

Yt = φ1 Yt−1 + φ2 Yt−2 + . . . + φp Yt−p + t , t ∼ WN(0, σ 2 )

• Lag operator form:

Φ(L)Yt = (1−φ1 L−φ2 L2 −. . .−φp Lp )Yt = t , t ∼ WN(0, σ 2 )

• Stationary if all the roots of the lag polynomial are outside of


the unit circle.
I Quick check (necessary, not sufficient): Sum of AR coefficients
should be less than 1.
• Autocorrelations can display richer patterns than just gradual
decay.

70
AR(1)/(p) Processes vs. Wold Representation
• Wold:

Yt = B(L)t = bi t−i , b0 = 1, t ∼ WN(0, σ 2 )
X

i=0

• AR(1):

Yt = (1 − φL)−1 t = φi t−i , t ∼ WN(0, σ 2 )
X

i=0

• AR(p) effectively use the approximation:


Θ(L) 1 1
B(L) ≈ = Pp =
Φ(L) i=0 φi L
i Φ(L)

• Note that AR(1) is an infinite MA, but only one free


parameter.
71
ARMA(p,q) Processes

• AR and MA models are often combined to obtain better and


more parsimonious approximation to the Wold representation,
resulting in ARMA(p,q) models.
• ARMA processes can arise naturally as sums of several AR,
AR+MA, or as AR with measurement errors.
• Simplest example is ARMA(1,1):

Yt = φYt−1 + t + θt−1 , t ∼ WN(0, σ 2 )

• Need |φ| < 1 for stationarity and |θ| < 1 for invertibility.

72
ARMA(p,q) Processes
• ARMA(p,q) generalization:

Yt = φ1 Yt−1 + φ2 Yt−2 + · · · + φp Yt−p + t


+ θ1 t−1 + θ2 t−2 + · · · + θq t−q

• We can write it using the lag operator,

Φ(L)Yt = Θ(L)t

which implies
Θ(L)
Yt = t
Φ(L)
• Need all roots of AR polynomial outside the unit circle for
stationarity and all roots of the MA polynomial outside the
unit circle for invertibility.

73
Summary: ARMA Models

• Workhorse model class in professional forecasting.


• Information set easy to assemble: just past values of Yt !
• Provide a (often quite formidable) benchmark to evaluate
more complex models against.
• Theoretical backbone: Wold theorem.
• Caution: linear models, so although easy to estimate, may not
be the optimal approximation if data comes from a nonlinear
process.

74
Summary

• We think of Ct as a covariance stationary and ergodic time


series process.
• Wold representation theorem tells us that we can approximate
any stationary process by a general linear process (has MA(∞)
structure).
• Thus, a parsimonious approximation of the Wold representation
is the target.
• MA, AR, and ultimately ARMA models naturally arise to achieve
this goal.

75
Before you leave...

• Diebold, Francis X., “Elements of Forecasting,” 4th edition.


Chapters 7 and 8.
• Stock, James H. and Mark W. Watson, “Introduction to
Econometrics,” 4th International Edition. Chapter 15.

76
Appendix

77
Useful Stata Commands

• lag 1

L.var

• lag m, where m is an integer

Lm.var

• lags 1 to m, where m is an integer

L(1/m).var

78
• Plot autocorrelations and confidence bands for k lags (if we
don’t specify, the default is 40).

ac varname, lag(k)

• show the autocorrelations in number form, together with


Q-statistics. Lag option works same as for ac.

corrgram varname, lag(k)

• Estimate ARMA(p,q) model by maximum likelihood (for pure


AR set q = 0, for pure MA set p = 0).

arima varname, arima(p,0,q)

79

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy