Lecture 1
Lecture 1
Luca Gambetti
UAB
1
Contacts
2
Goal of the course
The main objective of the course is to provide the students with the knowledge
of a comprehensive set of tools necessary for empirical research with time series
data.
Description
This is the second part of an introductory 40-hours course in Time Series Anal-
ysis with applications in macroeconomics. This part focuses on the theory of
multivariate time series models.
3
Contents
4
References
5
Grades
Econometric Software
GRETL, MATLAB.
6
1. STATIONARY VECTOR PROCESSES1
1
This part is partly based on the Hamilton textbook and Marco Lippi’s notes.
7
1 Some Preliminary Definitions and Results
• Random Vector : A vector X = (X1, ..., Xn) whose components are scalar-
valued random variables on a probability space.
8
1.1 The Lag operator
• The lag operator L maps a sequence {Xt} into a sequence {Yt} such that
Yt = LXt = Xt−1, for all t.
• If we apply L to a constant c, Lc = c.
9
1.2 Polynomials in the lag operator
• Lag polynomials can also be inverted. For a polynomial φ(L), we are look-
ing for the values of the coefficients αi of φ(L)−1 = α0 + α1L + α2L2 + ... such
that φ(L)−1φ(L) = 1.
Case 1: p = 1. Let φ(L) = (1 − φL) with |φ| < 1. To find the inverse write
note that all the coefficients of the non-zero powers of L must be equal to zero.
This gives
α0 = 1 ⇒ α1 = φ
−φ + α1 = 0 ⇒ α1 = φ
−φα1 + α2 = 0 ⇒ α2 = φ2
−φα2 + α3 = 0 ⇒ α3 = φ3
10
P∞
and so on. In general αk = φk , so (1−φL)−1 = j j
j=0 φ L provided that |φ| < 1.
It is easy to check this because
so
2 2 1 − φk+1Lk+1
k k
(1 + φL + φ L + ... + φ L ) =
(1 − φL)
Pk j 1
and k → ∞ j=0 φ Lj → (1−φL) .
11
Case 2: p = 2. Let φ(L) = (1 − φ1L − φ2L2). To find the inverse it is useful
to factor the polynomial in the following way
where the λ1, λ2 are the reciprocal of the roots of the above left-hand side poly-
nomial or equivalently the eigenvalues of the matrix
φ1 φ2
1 0
Suppose λ1, λ2 < 0 and λ1 6= λ2. We have that (1 − φ1L − φ2L2)−1 = (1 −
λ1L)−1(1 − λ2L)−1. Therefore we can use what we have seen above for the case
p = 1. We can write
λ1 λ2
(1 − λ1L)−1(1 − λ2L)−1 = (λ1 − λ2)−1 −
1 − λ1L 1 − λ2L
λ1 2
= 1 + λ1L + λ1L + ... −
λ1 − λ2
λ2
1 + λ2L + λ2L2 + ...
−
λ1 − λ2
= (c1 + c2) + (c1λ1 + c2λ2)L + (c1λ21 + c2λ22)L2 + ...
12
where c1 = λ1/(λ1 − λ2), c2 = −λ2/(λ1 − λ2)
• Matrix of polynomial in the lag operator : A(L) if its elements are poly-
nomial in the lag operator, i.e.
1 L 1 0 0 1
A(L) = = + L
0 2+L 0 2 0 1
We also define
1 0
A(0) =
0 2
and
1 1
A(1) =
0 3
13
1.3 Covariance Stationarity
Let Yt be a n-dimensional random vector, Yt0 = [Y1t, ..., Ynt]. Then Yt is co-
variance (weakly) stationary if E(Yt) = µ, and the autocovariance matrix
E(Yt − µ)(Yt−j − µ)0 = Γj for all t, j, that is are independent of t and both
finite.
− Stationarity of each of the components of Yt does not imply stationarity of
the vector Yt. Stationarity in the vector case requires that the components
of the vector are stationary and costationary.
− Although γj = γ−j for a scalar process, the same is not true for a vector
process. The correct relation is
Γ0j = Γ−j
14
1.4 Convergence of random variables
15
• Convergence in distribution Let FT denote the cumulative distribution func-
tion of xT and F the cumulative distribution of the scalar x. The sequence is
d L
said to converge in distribution written as xT → x (or xT → x), if for all real
numbers c for which F is continuous
d
Example Suppose that XT → N (0, 1). Then XT2 converges in distribution to the
d
square of N (0, 1), i.e. XT2 → χ21.
17
Proposition C.2 L. Suppose {XT }T =1∞ and {YT }T =1∞ are sequences of
n × 1 random vectors and AT is a sequence of n × n random matrices, x s a n × 1
random vector, c s a fixed n × 1 vector, and A is a fixed n × n matrix.
1. If plimXT , plimYT and plimAT exist then
(a) plim(XT + YT ) = plimXT + plimYT , plim(XT − YT ) = plimXT − plimYT ,
(b) plimc0XT = c0plimXT
(c) plimXT0 YT = (plimXT )0(plimYT )
(d) plimAT XT = (plimAT )(plimXT )
d d
2. If XT → X and plim(Xt − YT ) = 0 then YT → X.
d
3. If XT → X and plimYT = c, then
d
(a) Xt + YT → X + c
d
(b) YT0 Xt → c0X
d d
4. If XT → X and plimAT = A then AT XT → AX
d
5. If XT → X and plimAT = 0 then plimAT XT = 0
18
d
Example Let {XT } be a sequence of n × 1 random vectors with XT → N (µ, Ω),
p
and Let {YT } be a sequence of n × 1 random vectors with YT → C. Then by
d
3.(b) YT0 XT → N (C 0µ, CΩC 0) .
19
1.5 Limit Theorems
The Law of Large Numbers and the Central Limit Theorem are the most impor-
tant results for computing the limits of sequences of random variables.
There are many versions of LLN and CLT that differ on the assumptions about
the dependence of the variables.
21
2 Some stationary processes
A n-dimensional vector white noise 0t = [1t, ..., nt] ∼ W N (0, Ω) is such if
E(t) = 0 and Γk = Ω (Ω a symmetric positive definite matrix) if k = 0 and 0
if k 6= 0. If t, τ are independent the process is an independent vector White
Noise (i.i.d). If also t ∼ N the process is a Gaussian WN.
Important: A vector whose components are white noise is not necessarily a white
0
noise. Example:
2 let ut be a scalar white noise and
define t = (ut, ut−1) . Then
σu 0 0 0
E(t0t) = and E( 0
t t−1 ) = .
0 σu2 σu2 0
22
2.2 Vector Moving Average (VMA)
Given the n-dimensional vector White Noise t a vector moving average of order
q is defined as
Yt = µ + t + C1t−1 + ... + Cq t−q
where Cj are n × n matrices of coefficients and µ is the mean of Yt.
• The VMA(1)
Let us consider the VMA(1)
Yt = µ + t + C1t−1
with autocovariances
23
• The VMA(q)
Let us consider the VMA(q)
with t ∼ W N (0, Ω), µ is the mean of Yt. The variance of the process is given
by
with autocovariances
24
• The VMA(∞)
A useful process, as we will see, is the VMA(∞)
∞
X
Yt = µ + Cj εt−j (1)
j=0
A very important result is that if the sequence {Cj } is absolutely summable (i.e.
P∞ P P
j=0 kC j k < ∞ where kC j k = m n cmn,j , or equivalenty each sequence
formed by the elements of the matrix is absolutely summable) then infinite se-
quence above generates a well defined (mean square convergent) process (see for
instance proposition C.10L).
25
(a) The autocovariance between the ith variable at time t and the jth variable
at time s periods earlier, E(Yit − µi)(Yjt−s − µj ) exists and is given by
the row i column j element of
∞
X
Γs = Cs+v ΩCv0
v=0
for s = 0, 1, 2, ....
(b) The sequence of matrices {Γs}∞
s=0 is absolutely summable.
If furthermore {εt}∞ t=−∞ is an i.i.d. sequence with E|εi1 t εi2 t εi3 t εi4 t | ≤ ∞ for
i1, i2, i3, i4 = 1, 2, ..., n, then also
(c) E|Yi1t1 Yi2t2 Yi3t3 Yi4t4 | ≤ ∞ for all t1, t2, t3, t4
p
(d) (1/T ) Tt=1 YitYjt−s → E(YitYjt−s), for i, j = 1, 2, ..., n and for all s
P
Implications:
1. Result (a) implies that the second moments of a M A(∞) with absolutely
summable coefficients can be found by taking the limit of the autocovariance
of an M A(q).
26
2. Result (b) ensures ergodicity for the mean
3. Result (c) says that Yt has bounded fourth moments
4. Result (d) says that Yt is ergodic for second moments
27
2.3 Invertibility and fundamentalness
• The VMA is invertible if and only if the determinant of C(L) vanishes only
outside the unit circle, i.e. if det(C(z)) 6= 0 for all |z| ≤ 1.
• The VMA is fundamental if and only if the det(C(z)) 6= 0 for all |z| < 1.
In the previous example the process is fundamental if and only if |θ| ≥ 1. In the
case |θ| = 1 the process is fundamental but noninvertible.
28
• Provided that |θ| > 1 the MA process can be inverted and the shock can
be obtained as a combination of present and past values of Yt. In fact
L
1 − θ−L
Y1t ε1t
1 =
0 θ−L Y2t ε2t
• Notice that for any noninvertible process with determinant that does not van-
ish on the unit circle there is an invertible process with identical autocovariance
structure.
yt = ut + mut−1
30
2.4 Wold Decomposition
Yt = C(L)εt + µt (2)
(2) represents the Wold representation of Yt which is unique and for which
the following properties hold:
(a) t is the innovation for Yt, i.e. t = Yt − Proj(Yt|Yt−1, Yt−1, ...).
(b) t is White noise, Et = 0, Et0τ = 0, for t 6= τ , Et0t = Ω
(c) The coefficients are square summable ∞ 2
P
j=0 kCj k < ∞.
(d) C0 = I
31
• The result is very powerful since holds for any covariance stationary process.
• However the theorem does not implies that (2) is the true representation of
the process. For instance the process could be stationary but non-linear or non-
invertible.
32
2.5 Other fundamental MA(∞) Representations
Yt = C(L)Rut
= D(L)ut
33
3 VAR: representations
34
• To show that this matrix lag polynomial exists and how it maps into the coef-
ficients in C(L), note that by assumption we have the identity
After distributing, the identity implies that coefficients on the lag operators must
be zero, which implies the following recursive solution for the VAR coefficients:
A0 = I
A1 = −A0C1
Ak = −A0Ck − A1Ck − ... − Ak−1C1
• As noted, the VAR is of infinite order (i.e. infinite number of lags required
to fully represent joint density). In practice, the VAR is usually restricted for
estimation by truncating the lag-length.
35
Note: Here we are considering zero mean processes. In case the mean of Yt is not
zero we should add a constant in the VAR equations.
Yt = AYt−1 + et
36
• SUR representation The VAR(p) can be stacked as
Y = XΓ + u
X32
Let γ = vec(Γ), then the VAR can be rewritten as
Yt = (In ⊗ Xt0)γ + t
37
4 VAR: Stationarity
Yt = µ + AYt−1 + εt
= µ + A(µ + AYt−2 + εt−1) + εt
= (I + A)µ + A2Yt−2 + Aεt−1 + εt
...
j−1
X
Yt = (I + A + ... + Aj )µ + Aj Yt−j + Aiεt−i
i=0
38
Pj−1
3. the infinite sum i=0 Aiεt−i exists in mean square (see e.g. proposition
C.10L);
4. (I + A + ... + Aj )µ → (I − A)−1 and Aj → 0 as j goes to infinity.
Therefore if the eigenvalues are smaller than one in modulus then Yt has the
following representation
∞
X
−1
Yt = (I − A) + Aiεt−i
i=0
39
• For a VAR(p) the stability condition also requires that all the eigenvalues of A
(the AR matrix of the companion form of Yt) are smaller than one in modulus or
all the roots larger than one. Therefore we have that a VAR(p) is called stable if
det(I − A1z − A2z 2, ..., Apz p) 6= 0 for |z| ≤ 1.
• Notice that the converse is not true. An unstable process can be stationary.
40
4.2 Back the Wold representation
Yt = (I − AL)−1et
X∞
= Aj et−j
i=1
= C(L)et
41
Example A stationary VAR(1)
Y1t 0.5 0.3 Y1t−1 1t
= +
Y2t 0.02 0.8 Y2t−1 2t
1 0.3 0.81
E(t0t) = Ω = λ=
0.3 .1 0.48
Let us consider the companion form of a stationary (zero mean for simplicity)
VAR(p) defined earlier
Yt = AYt−1 + et (5)
The variance of Yt is given by
Σ = E(YtYt0 )
= AΣA0 + Ω (6)
a closed form solution to (6) can be obtained in terms of the vec operator. Let
A, B, C be matrices such that the product ABC exists. A property of the vec
operator is that
vec(ABC) = (C 0 ⊗ A)vec(B)
Applying the vec operator to both sides of (7) we have
vec(Σ) = (A ⊗ A)vec(Σ) + vec(Ω)
If we define A = (A ⊗ A) then we have
vec(Σ) = (I − A)−1vec(Ω)
43
The jth autocovariance of Yt (denoted Γj ) can be found by post multiplying (6)
by Yt−j and taking expectations:
Thus
Γj = AΓj−1
or
Γ j = Aj Γ
The variance Σ and the jth autocovariance Γj of the original series Yt is given by
the first n rows and columns of Σ and Γj respectively.
44
6 VAR: specification
• Specification of the VAR is key for empirical analysis. We have to decide about
the following:
1. Number of lags p.
2. Which variables.
3. Type of transformations.
45
6.1 Number of lags
As in the univariate case, care must be taken to account for all systematic dy-
namics in multivariate models. In VAR models, this is usually done by choosing
a sufficient number of lags to ensure that the residuals in each of the equations
are white noise.
• AIC: Akaike information criterion Choosing the p that minimizes the fol-
lowing
AIC(p) = T ln |Ω̂| + 2(n2p)
• AIC overestimate the true order with positive probability and underestimate
the true order with zero probability.
47
6.2 Type of variables
VAR models are small scale models so usually 2 to 8 variables are used.
48
6.3 Type of transformations
• Problem: many economic time series display trend over time and are clearly
non stationary (mean not constant).
• Trend-stationary series
Yt = µ + bt + εt, εt ∼ W N.
• Difference-stationary series
Yt = µ + Yt−1 + εt, εt ∼ W N.
49
Figure 1: blu: log(GDP). green: log(CPI)
50
• These series can be thought as generated by some nonstationary process. Here
there are some examples
51
Example: Trend stationary
Y1t 0 0.5 0.3 Y1t−1 1t 1 0.3 0.81
= t+ + Ω= , λ=
Y2t 0.01 0.02 0.8 Y2t−1 2t 0.3 .1 0.48
52
So:
53
• Dickey-Fuller test In 1979, Dickey and Fuller have proposed the following test
for stationarity
1. Estimate with OLS the following equation
∆xt = b + γxt + εt
xt = b + xt + εt
xt = b + axt + εt
with a = 1 + γ < 1.
• An alternative is to specify the equation augmented by a deterministic trend
∆xt = b + γxt + ct + εt
54
With this specification under the alternative the preocess is stationary with a
deterministic linear trend.
A(L)∆xt = b + γxt + εt
or
A(L)∆xt = b + γxt + ct + εt
• If the test statistic is smaller than (negative) the critical value, then the null
hypothesis of unit root is rejected.
55
• Transformations I: first differences
Let ∆ = 1 − L be the first differences filter, i.e. a filter such that ∆Yt = Yt − Yt−1
and let us consider the simple case of a random walk with drift
Yt = µ + Yt−1 + t
∆Yt = µ + t
Yt = µ + δt + t
∆Yt = δ + ∆t
which is a stationary process but is not invertible because it contains a unit root
in the MA part.
56
• log(GDP) and log(CPI) in first differences
57
• Transformations II: removing deterministic trend
Removing a deterministic trend (linear or quadratic) from a process from a trend
stationary variable is ok. However this is not enough if the process is a unit root
with drift. To see this consider again the process
Yt = µ + Yt−1 + t
58
• log(GDP) and log(CPI) linearly detrended
59
• Transformations of trending variables: Hodrick-Prescott filter
The filter separates the trend from the cyclical component of a scalar time series.
Suppose yt = gt + ct, where gt is the trend component and ct is the cycle. The
trend is obtained by solving the following minimization problem
T
X T −1
X
min c2t + λ [(gt+1 − gt) − (gt − gt−1)]2
{gt }Tt=1
t=1 t=2
The parameter λ is a positive number (quarterly data usually =1600) which pe-
nalizes variability in the growth component series while the first part is the penalty
to the cyclical component. The larger λ the smoother the trend component.
60