0% found this document useful (0 votes)

87 views

Ec1 12

This document discusses heteroscedasticity, or unequal variances of the error terms in a regression model. It describes how heteroscedasticity violates an assumption of ordinary least squares regression. Several tests for detecting heteroscedasticity are presented, including visual inspection of a plot of residuals and the Goldfeld-Quandt test, which compares residual sums of squares from regressions on data subsets. Reasons why heteroscedasticity may occur are provided, as well as the implications of heteroscedasticity for regression coefficient estimation and hypothesis testing.

Uploaded by

LOKENDRA

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views

Ec1 12

Uploaded by

LOKENDRA

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

RS – Lecture 12

Lecture 12
Heteroscedasticity

Heteroscedasticity
• Assumption (A3) is violated in a particular way:  has unequal
variances, but i and j are still not correlated with each other. Some
observations (lower variance) are more informative than others
(higher variance).

f(y|x)

.
. E(y|x) = b0 + b1x

.
x1 x2 x3 x 2

1
RS – Lecture 12

Heteroscedasticity
• Now, we have the CLM regression with hetero-(different) scedastic
(variance) disturbances.
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3’) Var[i] = 2 i, i > 0. (CLM  i = 1, for all i.)
(A4) X has full column rank – rank(X)=k–, where T ≥ k.

• Popular normalization: i i = 1. (A scaling, absorbed into 2.)

• A characterization of the heteroscedasticity: Well defined

estimators and methods for testing hypotheses will be obtainable if
the heteroscedasticity is “well behaved” in the sense that
i / i i → 0 as T → . -i.e., no single observation
becomes dominant.
(1/T)i i → some stable constant. (Not a plim!)

GR Model and Testing

• Implications for conventional OLS and hypothesis testing:
1. b is still unbiased.
2. Consistent? We need the more general proof. Not difficult.
3. If plim b = , then plim s2 = 2 (with the normalization).
4. Under usual assumptions, we have asymptotic normality.

• Two main problems with OLS estimation under heterocedasticity:

(1) The usual standard errors are not correct. (They are biased!)
(2) OLS is not BLUE.

• Since the standard errors are biased, we cannot use the usual t-
statistics or F–statistics or LM statistics for drawing inferences. This
is a serious issue.

2
RS – Lecture 12

Heteroscedasticity: Inference Based on OLS

• Q: But, what happens if we still use s2(XX)-1?
A: It depends on XX – XX. If they are nearly the same, the OLS
covariance matrix will give OK inferences.

But, when will XX – XX be nearly the same? The answer is based
on a property of weighted averages. Suppose i is randomly drawn
from a distribution with E[i] = 1. Then,
(1/T)i i xi2 
p
E[x2] –just like (1/T)i xi2.

• Remark: For the heteroscedasticity to be a significant issue for

estimation and inference by OLS, the weights must be correlated with
x and/or xi2. The higher correlation, heteroscedasticity becomes
more important (b is more inefficient).

Finding Heteroscedasticity
• There are several theoretical reasons why the i may be related to x
and/or xi2:
1. Following the error-learning models, as people learn, their errors of
behavior become smaller over time. Then, σ2i is expected to decrease.
2. As data collecting techniques improve, σ2i is likely to decrease.
Companies with sophisticated data processing techniques are likely to
commit fewer errors in forecasting customer’s orders.
3. As incomes grow, people have more discretionary income and, thus,
more choice about how to spend their income. Hence, σ2i is likely to
increase with income.
4. Similarly, companies with larger profits are expected to show
greater variability in their dividend/buyback policies than companies
with lower profits.

3
RS – Lecture 12

Finding Heteroscedasticity
• Heteroscedasticity can also be the result of model misspecification.
• It can arise as a result of the presence of outliers (either very small or
very large). The inclusion/exclusion of an outlier, especially if T is
small, can affect the results of regressions.
• Violations of (A1) – model is correctly specified-–, can produce
heteroscedasticity, due to omitted variables from the model.
• Skewness in the distribution of one or more regressors included in
the model can induce heteroscedasticity. Examples are economic
variables such as income, wealth, and education.
• David Hendry notes that heteroscedasticity can also arise because of
– (1) incorrect data transformation (e.g., ratio or first difference
transformations).
– (2) incorrect functional form (e.g., linear vs log–linear models).

Finding Heteroscedasticity
• Heteroscedasticity is usually modeled using one the following
specifications:
- H1 : σt2 is a function of past εt2 and past σt2 (GARCH model).
- H2 : σt2 increases monotonically with one (or several) exogenous
variable(s) (x1,, . . . , xT ).
- H3 : σt2 increases monotonically with E(yt).
- H4 : σt2 is the same within p subsets of the data but differs across the
subsets (grouped heteroscedasticity). This specification allows for structural
breaks.

• These are the usual alternatives hypothesis in the heteroscedasticity

tests.

4
RS – Lecture 12

Finding Heteroscedasticity
• Visual test
In a plot of residuals against dependent variable or other variable will
often produce a fan shape.

180
160
140
120
100
Series1
80
60
40
20
0
0 50 100 150

Testing for Heteroscedasticity

• Usual strategy when heteroscedasticity is suspected: Use OLS along
the White estimator. This will give us consistent inferences.

• Q: Why do we want to test for heteroscedasticity?

A: OLS is no longer efficient. There is an estimator with lower
asymptotic variance (the GLS/FGLS estimator).

• We want to test: H0: E(ε2|x1, x2,…, xk) = E(ε2) = 2

• The key is whether E[2] = 2i is related to x and/or xi2. Suppose

we suspect a particular independent variable, say X1, is driving i.

• Then, a simple test: Check the RSS for large values of X1, and the
RSS for small values of X1. This is the Goldfeld-Quandt test.

5
RS – Lecture 12

Testing for Heteroscedasticity

• The Goldfeld-Quandt test
- Step 1. Arrange the data from small to large values of the
independent variable suspected of causing heteroscedasticity, Xj.

- Step 2. Run two separate regressions, one for small values of Xj and
one for large values of Xj, omitting d middle observations (≈ 20%).
Get the RSS for each regression: RSS1 for small values of Xj and
RSS2 for large Xj’s.

- Step 3. Calculate the F ratio

GQ = RSS2/RSS1, ~ Fdf,df with df =[(T – d) – 2(k+1)]/2 (A5 holds).

If (A5) does not hold, we rely on asymptotic theory. Then,

GQ is asymptoticallly χ2.

Testing for Heteroscedasticity

• The Goldfeld-Quandt test
Note: When we suspect more than one variable is driving the i’s,
this test is not very useful.

• But, the GQ test is a popular to test for structural breaks (two

regimes) in variance. For these tests, we rewrite step 3 to allow for a
different sample size in the sub-samples 1 and 2.

- Step 3. Calculate the F-test ratio

GQ = [RSS2/ (T2 – k)]/[RSS1/ (T1 – k)]

6
RS – Lecture 12

Testing for Heteroscedasticity: GQ Test

Example: We test if the 3-factor FF model for IBM and GE returns
shows heteroscedasticity with a GQ test, using gqtest in package lmtest.
• IBM returns
> library(lmtest)
> gqtest(ibm_x ~ Mkt_RF + SMB + HML, fraction = .20)
Goldfeld-Quandt test

data: ibm_x ~ Mkt_RF + SMB + HML

GQ = 1.1006, df1 = 224, df2 = 223, p-value = 0.2371  cannot reject H0 at 5% level.
alternative hypothesis: variance increases from segment 1 to 2

• GE returns
gqtest(ge_x ~ Mkt_RF + SMB + HML, fraction = .20)
Goldfeld-Quandt test

data: ge_x ~ Mkt_RF + SMB + HML

GQ = 2.744, df1 = 281, df2 = 281, p-value < 2.2e-16  reject H0 at 5% level.
alternative hypothesis: variance increases from segment 1 to 2
23

Testing for Heteroscedasticity: LR Test

• The Likelihood Ratio Test
Let’s define the likelihood function, assuming normality, for a
general case, where we have g different variances:
g g
T Ti 1 1
ln L   ln 2 
2 i 1
2
ln i2 
2 
i 1
2
i
( yi  X i)( yi  X i)

We have two models:

(R) Restricted under H0: i2 = 2. From this model, we calculate ln L
T T
ln LR   [ln(2)  1]  ln(ˆ 2 )
2 2
(U) Unrestricted. From this model, we calculate the log likelihood.
g
T Ti
ln LU   [ln(2)  1] 
2  2 ln ˆ ;
i 1
2
i ˆ i2  T1 ( yi  X i b)( yi  X i b)
i

7
RS – Lecture 12

Testing for Heteroscedasticity: LR Test

• Now, we can estimate the Likelihood Ratio (LR) test:
g
LR  2(ln LU  ln LR )  T ln ˆ 2  T ln ˆ
i 1
i i
2
a
  2g 1

Under the usual regularity conditions, LR is approximated by a χ2g-1.

• Using specific functions for i2, this test has been used by
Rutemiller and Bowers (1968) and in Harvey’s (1976) groupwise
heteroscedasticity paper.

Testing for Heteroscedasticity

• Score LM tests
• We want to develop tests of H0: E(ε2|x1, x2,…, xk) = 2 against an
H1 with a general functional form.

• Recall the central issue is whether E[2] = 2i is related to x

and/or xi2. Then, a simple strategy is to use OLS residuals to estimate
disturbances and look for relationships between ei2 and xi and/or xi2.

• Suppose that the relationship between ε2 and X is linear:

ε2 = Xα + v
Then, we test: H0: α = 0 against H1: α ≠ 0.

• We can base the test on how the squared OLS residuals e correlate
with X.

8
RS – Lecture 12

Testing for Heteroscedasticity

• Popular heteroscedasticity LM tests:
- Breusch and Pagan (1979)’s LM test (BP).
- White (1980)’s general test.

• Both tests are based on OLS residuals. That is, calculated under H0:
No heteroscedasticity.

• The BP test is an LM test, based on the score of the log likelihood

function, calculated under normality. It is a general tests designed to
detect any linear forms of heteroskedasticity.

• The White test is an asymptotic Wald-type test, normality is not

needed. It allows for nonlinearities by using squares and
crossproducts of all the x’s in the auxiliary regression.

Testing for Heteroscedasticity: BP Test

• Let’s start with a general form of heteroscedasticity:
hi(α0 + zi,1’ α1 + .... + zi,m’ αm) = i2
• We want to test: H0: E(εi2|z1, z2,…, zk) = hi(zi’α) =2
or H0: α1 = α2 = ... = αm =  (m restrictions)
• Assume normality. That is, the log likelihood function is:
log L = constant + ½Σ log 2i – ½ Σ εi2/i2
Then, construct an LM test:
LM = S(θR)’ I(θR)-1 S(θR) θ= (β,α)
S(θ)=∂log L/∂θ’=[-Σi X’εi ;-½Σ(∂h/∂α)zii +½Σi-4 εi2(∂h/∂α)zi]
-2 -2

I(θ) = E[- ∂2log L/∂θ∂θ’]

• We have block diagonality, we can rewrite the LM test, under H0:

LM = S(α0,0)’ [I22 – I21 I11 I21]-1 S(α0,0)

9
RS – Lecture 12

Testing for Heteroscedasticity: BP Test

• We have block diagonality, we can rewrite the LM test, under H0:
LM = S(α0,0)’ [I22 – I21 I11 I21]-1 S(α0,0)
S(α0,0) = -½Σi (∂h/∂α|α0,R,0)z′ R-2 + ½ Σi R-4ei2(∂h/∂α |α0,R,0)z′
= ½ R-2 (∂h/∂α |α0,R,0) Σi zi (ei2/R2 – 1)
= ½ R-2 (∂h/∂α |α0,R,0) Σi zi ωi
ωi =ei2/R2 – 1 = gi – 1
I22(α0,0) = E[- ∂2log L/∂ α∂ α’] = ½ [R-2 (∂h/∂α |α0,R,0 )]2 Σi zi zi′
I21(α0,0) = 0
R2 = (1/T) Σi ei2 (MLE of under H0).
Then,
LM = ½ (Σi zi ωi)′[Σi zi zi′]-1(Σi zi ωi)= ½ W′Z (Z′Z)-1 Z′W ~ χ2m

Note: Recall R2 = [y′X (X′X)-1 X′y – T 𝑦 ]/[y′y – T 𝑦 ] = ESS/TSS

Also note that under H0: E[ωi] = 0, E[ωi2] = 1.

Testing for Heteroscedasticity: BP Test

• LM = ½ W′Z (Z′Z)-1 Z′W = ½ ESS
ESS = Explained SS in regression of ωi (=ei2/R2 - 1) against zi.

• Under the usual regularity conditions, and under H0,

√T (αML – α) 
d
N(0, 2 4 (Z′Z/T)-1)
Then,
LM-BP= (2 R4)-1 ESSe  d χ .
ESSe = ESS in regression of ei2 (=gi R2) against zi.
Since R4 
p
4  LM-BP  d
χ

Note: Recall R2= [y′X (X′X)-1 X′y – T𝑦 ]/[y′y – T 𝑦 ]

Under H0: E[ωi]=0, E[ωi2]=1, the LM test is equivalent to a T R2.

(Think of y  0 & y’y/T=1 above).

10
RS – Lecture 12

Testing for Heteroscedasticity: BP Test

• Variations:
(1) Glesjer (1969) test. Use absolute values instead of ei2 to estimate
the varying second moment. Following our previous example,
|ei|= α0 + zi,1’ α1 + .... + zi,m’ αm + vi

(2) Harvey-Godfrey (1978) test. Use ln(ei2). Then, the implied model
for i2 is an exponential model.
ln(ei2) = α0 + zi,1’ α1 + .... + zi,m’ αm + vi

Note: Implied model for i2 = exp{α0 + zi,1’ α1 + .... + zi,m’ αm + vi}.

Testing for Heteroscedasticity: BP Test

• Variations:
(3) Koenker’s (1981) studentized LM test. A usual problem with
statistic LM is that it crucially depends on the assumption that ε is
normal. Koenker (1981) proposed studentizing the statistic LM-BP
by
LM-S = (2 R4) LM-BP/[Σ (εi2 – R2)2 /T]  d
χ2m

The studentized version of the test is asymptotically equivalent to a

T*R2 test, where R2 is calculated from a regression of ei2/R2 on the
variables Z. (Omitting R2 from the denominator is OK.)

11
RS – Lecture 12

Testing for Heteroscedasticity: BP Test

• We have the following steps:
- Step 1. Run OLS on DGP:
y = X  + . –Keep ei and compute R2 = RSS/T

- Step 2. (Auxiliary Regression). Run the regression of ei2 on the m

explanatory variables, z. In our example,
ei2 = α0 + zi,1’ α1 + .... + zi,m’ αm + vi –Keep R2.
- Step 3. Use the R2 from Step 2. Let’s call it 𝑅 . Calculate
LM = T * 𝑅 χ .

Testing for Heteroscedasticity: Example – IBM

Example: We suspect that squared Mkt_RF (x1) –a measure of the
overall market’s variance- drives heteroscedasticity. We do a
studentized LM-BP test for IBM in the 3-factor FF model:

fit_ibm_ff3 <- lm (ibm_x ~ Mkt_RF + SMB + HML) # Step 1 – OLS in DGP (3-factor FF model)
e_ibm <- fit_ibm_ff3$residuals # Step 1 – keep residuals
e_ibm2 <- e_ibm^2 # Step 1 – squared residuals
Mkt_RF2 <- Mkt_RF^2
fit <- lm (e_ibm2 ~ Mkt_RF2) # Step 2 – Auxiliary regression
Re_2 <- summary(fit_BP)$r.squared # Step 2 – keep R^2
LM_BP_test <- Re2 * T
> LM_BP_test # Step 3 – Compute LM-BP test: R^2 * T
[1] 0.25038
> p_val <- 1 - pchisq(LM_BP_test, df = 1) # p-value of LM_test
> p_val
[1] 0.6168019

LM-BP Test: 0.25028  cannot reject H0 at 5% level (χ2[1],.05≈3.84);

with a p-value= .6168. 27

12
RS – Lecture 12

Testing for Heteroscedasticity: Example – IBM

Example (continuation): The bptest in the lmtest package performs a
studentized LM-BP test for the same variables used in the model
(Mkt, SMB and HML). For IBM in the 3-factor FF model:

> bptest(ibm_x ~ Mkt_RF + SMB + HML) #bptest only allows to test H1: fxi=model variables)

studentized Breusch-Pagan test

data: ibm_x ~ Mkt_RF + SMB + HML

BP = 4.1385, df = 3, p-value = 0.2469

LM-BP Test: 4.1385  cannot reject H0 at 5% level (χ2[3],.05≈7.815);

with a p-value = 0.2469.

Note: Heteroscedasticity in financial time series is very common. In

general, it is driven by squared market returns or squared past errors.
28

Testing for Heteroscedasticity: Example – DIS

Example: We suspect that squared Market returns drive
heteroscedasticity. We do an LM-BP (studentized) test for Disney:
lr_dis <- log(x_dis[-1]/x_dis[-T]) # Log returns for DIS
dis_x <- lr_dis – RF # Disney excess returns
fit_dis_ff3 <- lm (dis_x ~ Mkt_RF + SMB + HML) # Step 1 – OLS in DGP (3-factor FF model)
e_dis <- fit_dis_ff3$residuals # Step 1 – keep residuals
e_dis2 <- e_dis^2 # Step 2 – squared residuals
fit <- lm (e_dis2 ~ Mkt_RF2) # Step 2 – Auxiliary regression
Re_e2 <- summary(fit_BP)$r.squared # Step 2 – Keep R^2 from Auxiliary reg
LM_BP_test <- Re_e2 * T # Step 3 – Compute LM Test: R^2 * T
> LM_BP_test
[1] 14.15224
> p_val <- 1 - pchisq(LM_BP_test, df = 1) # p-value of LM_test
> p_val
[1] 0.0001685967

LM-BP Test: 14.15  reject H0 at 5% level (χ2[1],.05≈3.84); with a p-

value = .0001. 26

13
RS – Lecture 12

Testing for Heteroscedasticity: Example – DIS

Example (continuation): We do the same test, but with SMB
squared for Disney:
SMB2 <- SMB^2
fit <- lm (e_dis2 ~ SMB2)
Re_e2 <- summary(fit_BP)$r.squared
LM_BP_test <- Re_e2 * T
> LM_BP_test
[1] 7.564692
> p_val <- 1 - pchisq(LM_BP_test, df = 1) # p-value of LM_test
> p_val
[1] 0.005952284

LM-BP Test: 7.56  reject H0 at 5% level (χ2[1],.05≈3.84); with a p-

value= .006.

Testing for Heteroscedasticity: White Test

• Based on the difference between OLS and true OLS variances:
2 (XX – XX)= X𝚺X - 2XX = Σi (E[εi2] – 2)xi ′xi
• Empirical counterpart: (1/T) Σi (ei2 – s2)xi ′xi
• We can express each element of the k(k+1) matrix as:
(1/T) Σi (ei2 – s2)ψi ψi: Kolmogorov-Gabor polynomial
ψi = (ψ1i, ψ2i, ..., ψmi)’ ψli = ψqi ψpi p≥q, p,q=1,2,...,k
l= 1,2,...,m m=k(k-1)/2
• White heteroscedasticity test:
W = [(1/T) Σi (ei2 – s2)ψi]’ DT-1[(1/T) Σi (ei2 – s2)ψi] 
d χ2m
where
DT = Var [(1/T)Σi (ei2 – s2)ψi]
Note: W is asymptotically equivalent to a T R2 test, where R2 is
calculated from a regression of ei2/R2on the ψi’s.

14
RS – Lecture 12

Testing for Heteroscedasticity: White Test

• Usual calculation of the White test
– Step 1. Run OLS on DGP:
y = X  + . –Keep ei

– Step 2. (Auxiliary Regression). Regress e2 on all the explanatory

variables (Xj), their squares (Xj2), and all their cross products. For
example, when the model contains k = 2 explanatory variables, the
test is based on:
ei2 =β0 + β1 x1,i + β2 x2,i + β3 x1,i2 + β4 x2,i2 + β5 x1x2,i + vi
Let m be the number of regressors in auxiliary regression. Keep R2,
say 𝑅 .

– Step 3. Compute the LM statistic

LM = T * 𝑅 χ .

Testing for Heteroscedasticity: White Test

Example: White Test for 3-factor FF model residuals for IBM:
HML2 <- HML^2;
Mkt_HML <- Mkt_RF*HML
Mkt_SMB <- Mkt_RF*SMB
SMB_HML <- SMB*HML
xx2 <- cbind(Mkt_RF2, SMB2, HML2, Mkt_HML, Mkt_SMB, SMB_HML)
fit_ibm_W <- lm(e_ibm2 ~ xx2) # Not including original variables OK
r2_e2 <- summary(fit_ibm_W)$r.squared # Keep R^2 from Auxiliary regression
> r2_e2
[1] 0.0166492
lm_t <- T * r2_e2 # Compute LM test: R^2 * sample size (T)
> lm_t
[1] 10.93483
df_lm <- ncol(xx2)
qchisq(.95, df = df_lm)

LM-White Test: 10.93  cannot reject H0 at 5% level (χ2[6],.05≈12.59).

15
RS – Lecture 12

Testing for Heteroscedasticity: White Test

Example (continuation): Now, we do a White Test for the 3 factor
F-F model for DIS and GE returns.
• For DIS, we get:
fit_dis_W <- lm (e_dis2 ~ xx2)
Re_2W <- summary(fit_dis_W)$r.squared
LM_W_test <- Re_2W * T
> LM_W_test
[1] 25.00148  reject H0 at 5% level (χ2[6],05 ≈ 12.59).
>qchisq(.95, df = df_lm)
[1] 12.59159
> p_val <- 1 - pchisq(LM_W_test, df = 6) # p-value of LM_test
> p_val
[1] 0.0003412389

• For GE, we get:

LM-White Test: 20.15 (p-value=0.0026)  reject H0 at 5% level.
31

Testing for Heteroscedasticity: White Test

Example: We do a White Test for the residuals in the encompassing
(IFE + PPP) model for changes in the USD/GBP (T=363):
fit_gbp <- lm(lr_usdgbp ~ inf_dif + int_dif)
e_gbp <- fit_gbp$residuals
e_gbp2 <- e_gbp^2
int_dif2 <- int_dif^2
inf_dif2 <- inf_dif^2
int_inf_dif <- int_dif*inf_dif

fit_W <- lm (e_gbp2 ~ int_dif2 + inf_dif2+ int_inf_dif)

Re_e2W <- summary(fit_W)$r.squared
LM_W_test <- Re_e2W * T
p_val <- 1 - pchisq(LM_W_test, df = 3) # p-value of LM_test

> LM_W_test
[1] 15.46692
> p_val
[1] 0.001458139  reject H0 at 5% level
32

16
RS – Lecture 12

Testing for Heteroscedasticity: Remarks

• Drawbacks of the Breusch-Pagan test:
- It has been shown to be sensitive to violations of the normality
assumption.
- Three other popular LM tests: the Glejser test; the Harvey-Godfrey
test, and the Park test, are also sensitive to such violations.

• Drawbacks of the White test

- If a model has several regressors, the test can consume a lot of df’s.
- In cases where the White test statistic is statistically significant,
heteroscedasticity may not necessarily be the cause, but model
specification errors.
- It is general. It does not give us a clue about how to model
heteroscedasticity to do FGLS. The BP test points us in a direction.

Testing for Heteroscedasticity: Remarks

• Drawbacks of the White test (continuation)
- In simulations, it does not perform well relative to others, especially,
for time-varying heteroscedasticity, typical of financial time series.
- The White test does not depend on normality; but the Koenker’s test
is also not very sensitive to normality. In simulations, Koenker’s test
seems to have more power –see, Lyon and Tsai (1996) for a Monte
Carlo study of the heteroscedasticity tests presented here.
.

17
RS – Lecture 12

Testing for Heteroscedasticity: Remarks

• General problems with heteroscedasticity tests:
- The tests rely on the first four assumptions of the CLM being true.
- In particular, (A2) violations. That is, if the zero conditional mean
assumption, then a test for heteroskedasticity may reject the null
hypothesis even if Var(y|X) is constant.
- This is true if our functional form is specified incorrectly (omitted
variables or specifying a log instead of a level). Recall David Hendry’s
comment.

• Knowing the true source (functional form) of heteroscedasticity

may be difficult. A practical solution is to avoid modeling
heteroscedasticity altogether and use OLS along the White
heterosekdasticity-robust standard errors.

Estimation: WLS form of GLS

• While it is always possible to estimate robust standard errors for
OLS estimates, if we know the specific form of the heteroskedasticity,
we can obtain more efficient estimates than OLS: GLS.

• GLS basic idea: Efficient estimation through the transform the

model into one that has homoskedastic errors – called WLS.

• Suppose the heteroskedasticity can be modeled as:

Var(ε|x) = 2 h(x)

• The key is to figure out what h(x) looks like. Suppose that we know
hi. For example, hi(x)=xi2. (make sure hi is always positive.)

• Then, use 1/√ xi2 to transform the model.

18
RS – Lecture 12

Estimation: WLS form of GLS

• Suppose that we know hi(x) = xi2. Then, use 1/√xi2 to transform
the model:
Var(εi/√hi|x) = 2

• Thus, if we divide our whole equation by √hi we get a (transformed)

model where the error is homoskedastic.

• Assuming weights are known, we have a two-step GLS estimation:

- Step 1: Use OLS, then the residuals to estimate the weights.
- Step 2: Weighted least squares using the estimated weights.

• Greene has a proof based on our asymptotic theory for the

asymptotic equivalence of the second step to true GLS.

Estimation: FGLS
• More typical is the situation where we do not know the form of the
heteroskedasticity. In this case, we need to estimate h(xi).

• Typically, we start by assuming a fairly flexible model, such as

Var(ε|x) = h(x) = 2 exp(X) –make sure Var(εi|x)>0.

But, we don’t know , it must be estimated. By our assumptions:

ε2 = 2 exp(X) v with E(v|X) = 1.
Then, if E(v) = 1
ln(ε2) = X + u (*)
where E(u) = 0 and u is independent of X.

We know that e is an estimate of ε, so we can estimate (*) by OLS.

19
RS – Lecture 12

Estimation: FGLS
• Now, an estimate of h is obtained as ĥ = exp(ĝ), and the inverse of
this is our weight. Now, we can do GLS as usual.

• Summary of FGLS
(1) Run the original OLS model, save the residuals, e. Get ln(e2).
(2) Regress ln(e2) on all of the independent variables. Get fitted
values, ĝ.
(3) Do WLS using 1/sqrt[exp(ĝ)] as the weight.
(4) Iterate to gain efficiency.

• Remark: We are using WLS just for efficiency –OLS is still unbiased
and consistent. Sandwich estimator gives us consistent inferences.

Estimation: MLE
• ML estimates all the parameters simultaneously. To construct the
likelihood, we assume a distribution for ε. Under normality (A5):
T T
T 1 1 1
ln L   ln 2 
2 2 i1 
ln i2  
2 i1 i2
( yi  X i)( yi  X i)

• Suppose i2 = exp(α0 + zi,1 α1 + .... + zi,m αm )= exp(zi’α)

• Then, the first derivatives of the log likelihood wrt θ=(β,α) are:
T
 ln L
 
  x i  i / i2  X '  1
i 1
T T T
 ln L 1 1 1
i

2 i 1

1 / i2 exp(zi ' ) zi  ( ) i2 / i4 exp(zi ' ) zi 
2 i 1 2  z (
i 1
i
2
i / i2  1)

• Then, we get the f.o.c. We get a non-linear system of equations.

20
RS – Lecture 12

Estimation: MLE
• We take second derivatives to calculate the information matrix :
T
 ln L2
  '
 
i 1
x i x i ' /  i2  X '  1 X

T
 ln L 1
  i '

2  x z '
i 1
i i i /  i2

T
 ln L 1
 i  i '

2  z z '
i 1
i i
2
i /  i2

• Then,
 ln L  X '  X 0 
1
I ()  E[ ] 1 
'  0 Z'Z
 2 

• We can estimate the model using Newton’s method:

θj+1 = θj – Ht-1 gt gt = ∂log L/∂θ’

Estimation: MLE
• We estimate the model using Newton’s method:
θj+1 = θj – Hj-1 gj gj = ∂log Lj/∂θ’

Since Ht is block diagonal,

βj+1 = βj – (X’ Σj-1 X)-1 X’ Σj-1 εj
αj+1 = αj – (½Z’Z)-1 [½ Σi zi (εi2/i2 – 1)] = αj – (Z’Z)-1 Z’v,
where
v = (εi2/i2 – 1)
Convergence will be achieved when gj = ∂log Lj/∂θ’ is close to zero.

• We have an iterative algorithm  Iterative FGLS = MLE!

21
RS – Lecture 12

Heteroscedasticity: Log Transformations

• A log transformation of the data, can eliminte (or reduce) a certain
type of heteroskedasticity.
- Assume - t = E[Zt]
- Var[Zt] = δ t2 (Variance proportional to the
squared mean)
• We log transformed the data: log(Zt). Then, we use the delta
method to approximate the variance of the transformed variable.
Recall: Var[f(X)] using delta method:
Var[ f ( X )]  f ' ( ) 2 Var[ X ]

• Then, the variance of log(Zt) is roughly constant:

Var [log( Z t )]  (1 /  t ) 2 Var [ Z t ]  

ARCH Models
• Until the early 1980s econometrics had focused almost solely on
modeling the conditional means of series:
yt = E[yt| It] + εt, εt ~ D(0,σ2)
Suppose we have an AR(1) process:
yt = α + β yt-1 + εt.
Then, the conditional mean, conditioning on information set at time
t, It , is:
Et[yt+1| It] = α + β yt

• Recall the distinction between conditional moments and

unconditional ones. The unconditional mean and variance are:
E[yt] = α/(1 - β) = constant
Var[yt] = σ2/(1 - β2) = constant

The conditional mean is time varying; the unconditional mean is not!

22
RS – Lecture 12

ARCH Models
• Similar idea for the variance.

Unconditional variance:
Var [yt] = E[(yt –E[yt])2] = σ2/(1-β2)

Conditional variance:
Vart-1[yt] = Et-1[(yt –Et-1[yt])2] = Et-1[εt2]

Remark: Conditional moments are time varying; unconditional

moments are not!

ARCH Models
• The unconditional variance measures the overall uncertainty. In the
AR(1) example, the information available at time t, It , plays no role:
Var[yt] = σ2/(1-β2) .

• The conditional variance, Var[yt|It], is a better measure of

uncertainty at time t. It is a function of information at time t, It.
Conditional
yt
Variance

mean
Variance

t Time

23
RS – Lecture 12

ARCH Models: Stylized Facts of Asset Returns

- Thick tails - Mandelbrot (1963): leptokurtic (thicker than Normal)

- Volatility clustering - Mandelbrot (1963): “large changes tend to be

followed by large changes of either sign.”

- Leverage Effects – Black (1976), Christie (1982): Tendency for changes

in stock prices to be negatively correlated with changes in volatility.

- Non-trading Effects, Weekend Effects – Fama (1965), French and Roll

(1986) : When a market is closed information accumulates at a
different rate to when it is open –for example, the weekend effect,
where stock price volatility on Monday is not three times the volatility
on Friday.

ARCH Models: Stylized Facts of Asset Returns

- Expected events – Cornell (1978), Patell and Wolfson (1979), etc:
Volatility is high at regular times such as news announcements or
other expected events, or even at certain times of day –for example,
less volatile in the early afternoon.

- Volatility and serial correlation – LeBaron (1992): Inverse relationship

between the two.

- Co-movements in volatility – Ramchand and Susmel (1998): Volatility is

positively correlated across markets/assets.

• We need a model that accommodates all these facts.

24
RS – Lecture 12

ARCH Models: Stylized Facts of Asset Returns

• Easy to check leptokurtosis (Stylized Fact #1)
Figure: Descriptive Statistics and Distribution for Monthly S&P500 Returns

Statistic
Mean (%) 0.626332
(p-value: 0.0004)
Standard Dev (%) 4.37721
Skewness -0.43764
Excess Kurtosis 2.29395
Jarque-Bera 145.72
(p-value: <0.0001)
AR(1) 0.0258
(p-value: 0.5249)

ARCH Models: Stylized Facts of Asset Returns

• Easy to check Volatility Clustering (Stylized Fact #2)

25
RS – Lecture 12

ARCH Models: Engle (1982)

• We start with assumptions (A1) to (A5), but with a specific (A3’):
Yt   X t   t  t ~ N (0,  t 2 )
q
 t 2  Var t 1 (  t )  E t 1 (  t2 )     
i 1
2
i t i     ( L ) 2

define  t   t2   t2

 t2     ( L ) t2   t

• This is an AR(q) model for squared innovations. That is, we have an

ARCH model: Auto-Regressive Conditional Heteroskedasticity

This model estimates the unobservable (latent) variance.

Note: We are dealing with a variance, we usually impose

ω>0 and αi >0 for all i.
Robert F. Engle, USA

ARCH Models: Engle (1982)

• The unconditional variance is determined by:
q q
 2  E [ t 2 ]      i E [ t2 i ]      i 2
i 1 i 1


That is,  2
 q
1 
i 1
i

To obtain a positive σ2, we impose another restriction: (1-Σi αi)>0.

• Example: ARCH(1)
Yt   X t   t  t ~ N (0,  t 2 )
 t 2     1  t21
• We need to impose restrictions: α1>0 & 1 - α1>0.

26
RS – Lecture 12

ARCH Models: Engle (1982)

• Even though the errors may be serially uncorrelated they are not
independent: There will be volatility clustering and fat tails. Let’s
define standardized errors:
zt   t /  t
• They have conditional mean zero and a time invariant conditional
variance equal to 1. That is, zt ~ D(0,1). If zt is assumed to follow a
N(0,1), with a finite fourth moment (use Jensen’s inequality). Then:

E( t4 )  E( zt4 ) E( t4 )  E( zt4 ) E( t2 ) 2  E( zt4 ) E( t2 ) 2  3E( t2 ) 2

 ( t )  E( t4 ) / E( t2 ) 2  3.

• For an ARCH(1), the 4th moment for an ARCH(1):

 ( t )  3(1   2 ) /(1  3 2 ) if 3 2  1.

ARCH Models: Engle (1982)

• More convenient, but less intuitive, presentation of the ARCH(1)
model:
𝜀 𝜎 𝜐
where υt is i.i.d. with mean 0, and Var[υt]=1. Since υt is i.i.d., then:
𝐸 𝜀 𝐸 𝜎 𝜐 𝐸 𝜎 𝐸 𝜐 𝜔 𝛼 𝜀

• It turns out that σt2 is a very persistent process. Such a process can
be captured with an ARCH(q), where q is large. This is not efficient.

• In practice, q is often large. A more parsimonious representation is

the Generalized ARCH model or GARCH(q, p):
𝜎 𝜔 ∑ 𝛼𝜀 ∑ 𝛽𝜎
𝜔 𝛼 𝐿 𝜀 𝛽 𝐿 𝜎

27
RS – Lecture 12

GARCH: Bollerslev (1986)

• A more parsimonious representation is the GARCH(q, p):
𝜎 𝜔 ∑ 𝛼𝜀 ∑ 𝛽𝜎

which is an ARMA(max(p,q), p) model for the squared innovations.

• Popular GARCH model: GARCH(1,1):

𝜎 𝜔 𝛼 𝜀 𝛽𝜎
with an unconditional variance: Var[εt2] = σ2 = ω /(1- α1 - β1).

 Restrictions: ω > 0, α1 > 0, β1 > 0; (1- α1 - β1) > 0.

• Technical details: This is covariance stationary if all the roots of

α(L) + β(L) = 1
lie outside the unit circle. For the GARCH(1,1) this amounts to
α1 + β1 < 1.

GARCH: Bollerslev (1986)

• Technical details: This is covariance stationary if all the roots of
α(L) + β(L) = 1
lie outside the unit circle. For the GARCH(1,1) this amounts to
α1 + β1 < 1.

• Bollerslev (1986) showed that if 3α12 + 2α1β1 + β1 2 < 1, the

second and 4th (unconditional) moments of εt exist:

E[ t2 ] 
(1  1  1 )
3 2 (1  1  1 )
E[ t4 ]  if (1  1  211  31 )  0
2 2

(1  1  1 )(1  1  211  31 )

2 2

28
RS – Lecture 12

GARCH-X
• In the GARCH-X model, exogenous variables are added to the
conditional variance equation.

Consider the GARCH(1,1)-X model:

 t2     1 t21   1 t21   f ( X t 1 ,  )

where f(Xt,𝜃) is strictly positive for all t. Usually, Xt, is an observed

economic variable or indicator, say liquidity index, and f(.) is a non-
linear transformation, which is non-negative.

Examples: Glosten et al. (1993) and Engle and Patton (2001) use 3-
mo T-bill rates for modeling stock return volatility. Hagiwara and
Herce (1999) use interest rate differentials between countries to
model FX return volatility. The US congressional budget office uses
inflation in an ARCH(1) model for interest rate spreads.

IGARCH
• Recall the technical detail: The standard GARCH model:
 t2     ( L ) 2   ( L ) 2

is covariance stationary if α(1) + β(1) < 1.

• But strict stationarity does not require such a stringent restriction

(That is, that the unconditional variance does not depend on t).

In the GARCH(1,1) model, if α1 + β1 =1, we have the Integrated

GARCH (IGARCH) model.

• In the IGARCH model, the autoregressive polynomial in the ARMA

representation has a unit root: a shock to the conditional variance is
“persistent.”

29
RS – Lecture 12

IGARCH
• Variance forecasts are generated with: E t [ t2 j ]   t2  j

• That is, today’s variance remains important for future forecasts of all
horizons.

• Nelson (1990) establishes that, as this satisfies the requirement for

strict stationarity, it is a well defined model.

• In practice, it is often found that α1 + β1 are close to 1.

• It is often argued that IGARCH is a product of omitted variables;

For example, structural breaks. See Lamoreux and Lastrapes (1989),
Hamilton and Susmel (1994), & Mikosch and Starica (2004).

• Shepard and Sheppard (2010) argue for a GARCH-X explanation.

GARCH: Variations – GARCH-in-mean

• The time-varying variance affects mean returns:
Mean equation: 𝑦 𝑋𝛾 𝛿𝜎 𝜀, 𝜀 ~ N(0, 𝜎 )
Variance equation: 𝜎 𝜔 𝛼 𝜀 𝛽𝜎

• We have a dynamic mean-variance relations. It describes a specific

form of the risk-return trade-off.

• Finance intuition says that 𝛿 has to be positive and significant.

However, in empirical work, it does not work well: 𝛿 is not significant
or negative.

30
RS – Lecture 12

GARCH: Variations – Asymmetric GJR

• GJR-GARCH model – Glosten, Jagannathan & Runkle (JF, 1993):

𝜎 𝜔 𝛼𝜀 𝛾𝜀 ∗𝐼 𝛽𝜎

where 𝐼 =1 if εt-i < 0;

=0 otherwise.

• Using the indicator variable 𝐼 , this model captures sign

(asymmetric) effects in volatility: Negative news (εt-i < 0) increase the
conditional volatility (leverage effect).

• The GARCH(1,1) version:

𝜎 𝜔 𝛼 𝜀 𝛾 𝜀 𝐼 𝛽𝜎
where 𝐼 =1 if εt-i < 0;
=0 otherwise. 61

GARCH: Variations – Asymmetric GJR

• The GARCH(1,1) version:
𝜎 𝜔 𝛼 𝜀 𝛾 𝜀 𝐼 𝛽𝜎

When εt-1 < 0  𝜎 𝜔 𝛼 𝛾 𝜀 𝛽𝜎

εt-1 > 0  𝜎 𝜔 𝛼 𝜀 𝛽𝜎

• This is a very popular variation of the GARCH models. The

leverage effect is significant.

• There is another variation, the Exponential GARCH, or EGARCH,

that also captures the asymmetric effect of negative news on the
conditional variance.

31
RS – Lecture 12

GARCH: Variations – EGARCH

• EGARCH model – Nelson (Econometrica, 1991).
It models an exponential function for the time-varying variance:
q p
log( t2 )      i ( zt i   (| zt i |  E | zt i |))    j log( t2 j )
i 1 j 1

where z is a standardized i.i.d. D(0, 1) innovation.

• By design, we have the variance follows an exponential function.

Thus, no non-negative restrictions on the parameters are imposed.

• Negative news (zt-i < 0) increase 𝜎 (leverage effect).

Note: Nelson provides formulas of the unconditional moments,

under the GED. But, under leptokurtic distributions such as the
Student-t the unconditional variance does not exist. (Intuition: we
have an exponential formulation, with a large shock it can explode.)

GARCH: Variations – NARCH

• Non-linear ARCH model NARCH – Higgins and Bera (1992) and
Hentschel (1995).

These models apply the Box-Cox-type transformation to the

conditional variance:

𝜎 𝜔 𝛼 |𝜀 𝜅| 𝛽𝜎

Special case: γ = 2 (standard GARCH model).

Note: The variance depends on both the size and the sign of the
variance which helps to capture leverage type (asymmetric) effects.

32
RS – Lecture 12

GARCH: Variations – TARCH

• Threshold ARCH (TARCH) – Rabemananjara & Zakoian (1993)
Large events –i.e., large errors- have a different effect from small
events. We use 2 indicator variables, 𝐼 𝜀 𝜅 &𝐼 𝜀 𝜅 : one
for “large events,” 𝜀 𝜅 , & one for “small events,” 𝜀 𝜅 :

𝜎 𝜔 𝛼 𝐼 𝜀 𝜅 𝛼 𝐼 𝜀 𝜅 𝜀 𝛽𝜎

There are two variances:

𝜎 𝜔 𝛼 𝜀 𝛽𝜎 , if 𝜀 𝜅

• We can modify the model in many ways. For example, we can allow
for the asymmetric effects of negative news. 65

GARCH: Variations – SWARCH

• Switching ARCH (SWARCH) – Hamilton and Susmel (JE, 1994).

Intuition: σt2 depends on the state of the economy –regime. It’s based on
Hamilton’s (1989) time series models with changes of regime:
q
 t2   s , s t t 1
 i 1
i , st , st 1  t21
The key is to select a parsimonious representation:
 2t q
 t2 i
s
  
i 1
i
s
t ti

For a SWARCH(1) with 2 states (1 and 2) we have 4 possible σt2:

 t
2
  1
  1  t2 1  1 /  1 , s t  1, s t  1  1
 t
2
  1
  1 2
t 1  / 2,
1
s t  1, s t  1  2
 t
2
  2
  1 2
t 1  / ,
2 1
s t  2 , s t 1  1
 t
2
  2
  1  t2 1  2 /  2 , s t  2 , s t 1  2

33
RS – Lecture 12

GARCH: Forecasting and Persistence

• Consider the forecast in a GARCH(1,1) model
t21    1 t2  1t2    t2 (1 zt2  1 ) ( t2  t2 zt2 )

Taking expectation at time t

E t [ t21 ]     t2 ( 11   1 )
Then, by repeated substitutions:
j 1
E t [ t2 j ]   [  ( 1   1 ) i ]   t2 ( 1   1 ) j
i0

As j→∞, the forecast reverts to the unconditional variance:

ω/(1 – α1 – β1).

• When α1+β1=1, today’s volatility affect future forecasts forever:

Et [ t2 j ]   t2  j

ARCH Estimation: MLE

• All of these models can be estimated by maximum likelihood. First
we need to construct the sample likelihood.

• Since we are dealing with dependent variables, we use the

conditioning trick to get the joint distribution:
𝑓 𝑦 ,𝑦 ,…,𝑦 ;𝛉 𝑓 𝑦 𝐱 𝟏 ; 𝛉 𝑓 𝑦 𝑦 , 𝐱 , 𝐱 ; 𝛉 𝑓 𝑦 𝑦 , 𝑦 , 𝐱 , 𝐱 , 𝐱 ; 𝛉 ...
. . . 𝑓 𝑦 |𝑦 , … , 𝑦 , 𝐱 ,…,𝐱 ;𝛉 .
Taking logs:
𝐿 log 𝑓 𝑦 , 𝑦 , . . . , 𝑦 ; 𝛉 log 𝑓 𝑦 |𝐱 𝟏 ; 𝛉 log 𝑓 𝑦 |𝑦 , 𝐱 , 𝐱 ; 𝛉
… log 𝑓 𝑦 |𝑦 ,…,𝑦 ,𝐱 ,…,𝐱 ;𝛉

log 𝑓 𝑦 |𝑌 ,𝑋 ;𝛉

• We maximize this function with respect to the k mean parameters

(γ) and the m variance parameters (ω, α, β). 68

34
RS – Lecture 12

ARCH Estimation: MLE

• Note that the δL/δγ = 0 (k f.o.c.’s) will give us GLS.

• Denote δL/δθ = S(yt,θ) = 0 -S(.) = Score vector.

- We have a (k+m x k+m) system. But, it is a non-linear system. We

will need to use numerical optimization. Gauss-Newton or BHHH
(also approximates H by the product of S(yt,θ)’s) can be easily
implemented.

- Given the AR structure, we will need to make assumptions about σ0

(and ε0,ε1 , ..εp if we assume an AR(p) process for the mean).

- Alternatively, we can take σ0 (and ε0,ε1 , ..εp) as parameters to be

estimated (it can be computationally more intensive and estimation
can lose power.)

ARCH Estimation: MLE

• If the conditional density is well specified and θ0 belongs to Ω, then
T
S t ( yt , 0 )
T 1 / 2 (ˆ   0 )  N ( 0 , A01 ), where A01  T 1 
t 1 
• Under the correct specification assumption, A0=B0, where
T
B 0  T 1  E [ S t ( y t ,  0 ), S t ( y t ,  0 )' ]
t 1

We estimate A0 and B0 by replacing θ0 by its estimated MLE value.

• The estimator B0 has a computational advantage over A0.: Only first

derivatives are needed. But A0 = B0 only if the distribution is
correctly specified. This is very difficult to know in practice.

• Common practice in empirical studies: Assume the necessary

regularity conditions are satisfied.

35
RS – Lecture 12

ARCH Estimation: MLE – ARCH(1)

Example: ARCH(1) model.
Mean equation: 𝑦 𝑿𝜸 𝜀, 𝜀 ~ N(0, 𝜎 )
Variance equation: 𝜎 𝜔 𝛼 𝜀
We write the pdf for the normal distribution,
𝜸
𝑓 𝜀 |𝛾, 𝜔, 𝛼 exp = exp

We form the likelihood L (the joint pdf):

L ∏ exp 2𝜋 / ∏ exp

We take logs to form the log likelihood, 𝐿 = log L:

𝑇 1 1
𝐿 log 𝑓 log 2𝜋 log 𝜎 𝜀 /𝜎
2 2 2

Then, we maximize 𝐿 with respect to θ = (𝛾, 𝜔, 𝛼 ) the function 𝐿. 71

ARCH Estimation: MLE – ARCH(1)

Example (continuation): ARCH(1) model.
𝑇 1 1
𝐿 log 2𝜋 log 𝜔 𝛼 𝜀 𝜀 / 𝜔 𝛼 𝜀
2 2 2
Taking derivatives with respect to θ=(ω, α1, γ), where γ is a vector of k
mean parameters:
𝜕𝐿 1 1 1
𝜀 / 𝜔 𝛼 𝜀
𝜕𝜔 2 𝜔 𝛼 𝜀 2
𝜕𝐿 1 1
𝜀 / 𝜔 𝛼 𝜀 𝜀 𝜀 / 𝜔 𝛼 𝜀
𝜕𝛼 2 2

∑ 𝐱 𝜀 /𝜎 (kx1 vector of derivatives)

𝛄

36
RS – Lecture 12

ARCH Estimation: MLE

• Then, we set the f.o.c.  δL/δθ = 0.

• We have a (k+2) system. It is a non-linear system. The system is

solved using numerical optimization (usually, Newton-Raphson).

• In R, the function optim does numerical optimization.

• Take the last f.o.c., the kx1 vector, 0:

𝛄
∑ 𝑿′ 𝜀 /𝜎 , =∑ 𝑿′ 𝑦 𝑿𝜸 /𝜎 , =0
𝛄
𝑿 𝑿
∑ 𝜸 =0
, , ,

• The last equation shows that MLE is GLS for the mean parameters,
𝜸: each observation is weighted by the inverse of 𝜎 , .

ARCH Estimation: MLE

• In general, we have a (k+m x k+m) system; k mean parameters and
m variance parameters. But, it is a non-linear system. We use
numerical optimization.

- Given the AR structure, we will need to make assumptions about σ0

(and ε0, ε1 , ..., εp if we assume an AR(p) process for the mean).

- Alternatively, we can take σ0 (and ε0, ε1 , ..., εp) as parameters to be

estimated (it can be computationally more intensive and estimation
can lose power.)

37
RS – Lecture 12

ARCH Estimation: MLE – Example (in R)

• Log likelihood of AR(1)-GARCH(1,1) Model:
log_lik_garch11 <- function(theta, data) {
mu <- theta[1]; rho1 <- theta[2]; alpha0 <- abs(theta[3]); alpha1 <- abs(theta[4]); beta1 <-
abs(theta[5]);
chk0 <- (1 - alpha1 - beta1)
r <- ts(data)
n <- length(r)

u <- vector(length=n); u <- ts(u)

for (t in 2:n)
{u[t] = r[t]- mu – rho*r[t-1]} # this setup allows for ARMA in mean

h <- vector(length=n); h <- ts(h)

h[1] = alpha0/chk0 # set initial value for h[t] series
if (chk0==0) {h[1]=.000001} # check to avoid dividing by 0
for (t in 2:n)
{h[t] = abs(alpha0 + alpha1*(u[t-1]^2) + beta1*h[t-1])
if (h[t]==0) {h[t]=.00001} } #check to avoid log(0)

return(-sum(- 0.5log(abs(h[2:n])) - 0.5(u[2:n]^2)/abs(h[2:n]))) # ignore constants

}

ARCH Estimation: MLE – Example (in R)

Example 1: GARCH(1,1) model for changes in CHF/USD. We will
use R function optim (mln can also be used) to maximize the function.
PPP_da <- read.csv("http://www.bauer.uh.edu/rsusmel/4397/ppp_2020_m.csv",head=TRUE,sep=",")
x_chf <- PPP_da$CHF_USD # CHF/USD 1971-2020 monthly data
T <- length(x_chf)
z <- log(x_chf[-1]/x_chf[-T])
theta0 = c(-0.002, 0.026, 0.001, 0.19, 0.71) # initial values
ml_2 <- optim(theta0, log_lik_garch11, data=z, method="BFGS", hessian=TRUE)

logL_g11 <- log_lik_garch11(ml_2$par, z) # value of log likelihood

logL_g11

ml_2$par # estimated parameters

I_Var_m2 <- ml_2$hessian

eigen(I_Var_m2) # check if Hessian is pd.
sqrt(diag(solve(I_Var_m2))) # parameters SE
chf_usd <- ts(z, frequency=12, start=c(1971,1)) 76
plot.ts(chf_usd) # time series plot of data

38
RS – Lecture 12

ARCH Estimation: MLE – Example (in R)

Example 1 (continuation):
> logL_g11 # Log likelihood value
[1] –1745.197

> ml_2$par # Extract from ml_2 function parameters

[1] -0.0021051742 0.0260003610 0.00012375 0.1900276519 0.7100718082
> I_Var_m2 <- ml_2$hessian # Extract Hessian (matrix of 2nd derivatives)

> eigen(I_Var_m2) # Check if Hessian is pd to invert.

eigen() decomposition
$values # Eigenvalues: if positives => Hessian is pd
[1] 1.687400e+08 6.954454e+05 7.200084e+03 5.120984e+02 2.537958e+02

$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] 4.265907e-05 9.999960e-01 -0.0011397586 0.0018331957 -0.0018541203
[2,] -3.333961e-06 -2.188159e-03 -0.0010048203 0.9769058449 -0.2136566699
[3,] 9.999998e-01 -4.223001e-05 -0.0003544245 0.0001291633 0.0005770707
[4,] -3.599974e-06 -1.702277e-03 -0.8603563865 -0.1097470278 -0.4977344477 77
[5,] -6.893837e-04 6.416141e-04 -0.5096905472 0.1833226197 0.8405994743

ARCH Estimation: MLE – Example (in R)

Example 1 (continuation):
> sqrt(diag(solve(I_Var_m2))) # Invert Hessian: Parameters Var on diag
[1] 1.203690e-03 4.419049e-02 7.749756e-05 5.014454e-02 3.955411e-02

> t_stats <- ml_2$par/sqrt(diag(solve(I_Var_m2)))

> t_stats
[1] -1.7489333 0.5883701 1.5967743 3.7895984 17.9519078

39
RS – Lecture 12

ARCH Estimation: MLE – Example (in R)

Example 1 (continuation): Summary for CHF/USD changes
ef,t = [log(St) - log(St-1)] = a0 + a1 ef,t-1 + 𝜀 , 𝜀 It-1 ~ N(0, 𝜎 )
𝜎 𝝎 𝜶𝟏 𝜀 𝜷𝟏 𝜎

• T: 562 (January 1971 - July 2020, monthly).

The estimated model for ef,t is given by:

ef,t = -0.00211 + 0.02600 ef,t-1,
(.0012) (0.044)
𝜎 = 0.00012 + 0.19003 𝜀 + 0.71007 𝜎 .
(0.00096)* (0.050)* (0.040)*
2
Unconditional σ = 0.00012 /(1- 0.19003 - 0.71007) = 0.001201201
Log likelihood: 1745.197

Note: α1 + ß1 = .90 < 1. (Persistent.) 79

ARCH Estimation: MLE – Example (in R)

40
RS – Lecture 12

ARCH Estimation: MLE – Example (in R)

Example 2: Using Robert Shiller’s monthly data set for the S&P 500
(1871:Jan - 2020:Aug, T=1,795), we estimate an AR(1)-GARCH(1,1)
model:
rt = [log(Pt) - log(Pt-1)] = a0 + a1 rt-1 + 𝜀 , 𝜀 It-1 ~ N(0, 𝜎 )
𝜎 𝝎 𝜶𝟏 𝜀 𝜷𝟏 𝜎

The estimated model for st is given by:

rt = 0.338 + 0.278 rt-1,
(.08)* (0.025)*
𝜎 = 0.756 + 0.126 𝜀 + 0.826 𝜎 .
(0.151)* (0.017)* (0.021)*
2
Unconditional σ = 0.756 /(1 - 0.126 - 0.826) = 15.4630
Log likelihood: 4795.08

Note: α1 + ß1 = .952 < 1. (Very persistent.) 81

ARCH Estimation: MLE – Example (in R)

Example 2: Below, we plot the time-varying variance. Certain events
are clearly different, for example, the 1930 great depression, with a
peak variance of 282 (18 times unconditional variance!). The covid-19
volatility similar to the 2008-2009 financial crisis recession:

41
RS – Lecture 12

ARCH Estimation: MLE – Regularity Conditions

Note: The appeal of MLE is the optimal properties of the resulting

estimators under ideal conditions.

• Crowder (1976) gives one set of sufficient regularity conditions for

the MLE in models with dependent observations to be consistent
and asymptotically normally distributed.

• Verifying these regularity conditions is very difficult for general

ARCH models - proof for special cases like GARCH(1,1) exists.

Example: For the GARCH(1,1) model: if E[ln(α1zt2 +β1)] < 0, the

model is strictly stationary and ergodic. See Nelson (1990) &
Lumsdaine (1996).

ARCH Estimation: MLE – Regularity Conditions

• Block-diagonality
In many applications of ARCH, the parameters can be partitioned
into mean parameters, θ1, and variance parameters, θ2.

Then, δμt(θ)/δθ2 = 0 and, although, δσt(θ)/δθ1≠0, the Information

matrix is block-diagonal (under general symmetric distributions for zt
and for particular ARCH specifications).

Not a bad result:

- Regression can be consistently done with OLS.
- Asymptotically efficient estimates for the ARCH parameters can be
obtained on the basis of the OLS residuals.

42
RS – Lecture 12

ARCH Estimation: MLE – Remarks

• But, block diagonality cannot buy everything:
- Conventional OLS standard errors could be terrible.

- When testing for serial correlation, in the presence of ARCH, the

conventional Bartlett s.e. – T-1/2– could seriously underestimate the
true standard errors.

ARCH Estimation: QMLE

• The assumption of conditional normality is difficult to justify in
many empirical applications. But, it is convenient.

• The MLE based on the normal density may be given a quasi-

maximum likelihood (QMLE) interpretation.

• If the conditional mean and variance functions are correctly

specified, the normal quasi-score evaluated at θ0 has a martingale
difference property:
E{δL/δθ = S(yt,θ0)} = 0

Since this equation holds for any value of the true parameters, the
QMLE, say θQMLE is Fisher-consistent –i.e., E[S(yT, yT-1, …, y1 ; θ)] = 0
for any θ ∈ Ω.

43
RS – Lecture 12

ARCH Estimation: QMLE

• The asymptotic distribution for the QMLE takes the form:
T 1 / 2 (ˆQMLE   0 )  N ( 0 , A0 1 B 0 A0 1 ).

The covariance matrix (A0-1 B0 A0-1) is called “robust.” Robust to

departures from “normality.”

• Bollerslev and Wooldridge (1992) study the finite sample distribution

of the QMLE and the Wald statistics based on the robust covariance
matrix estimator:

For symmetric departures from conditional normality, the QMLE is

generally close to the exact MLE.

For non-symmetric conditional distributions both the asymptotic and

the finite sample loss in efficiency may be large.

ARCH Estimation: Non-Normality

• The basic GARCH model allows a certain amount of leptokurtosis.
It is often insufficient to explain real world data.

Solution: Assume a distribution other than the normal which help to

allow for the fat tails in the distribution.
• t Distribution - Bollerslev (1987)
The t distribution has a degrees of freedom parameter which allows
greater kurtosis. The t likelihood function is
lt  ln((0.5(v  1))(0.5v) 1 (v  2) 1 / 2 (1  zt (v  2) 1 ) ( v1) / 2 )  0.5 ln( 2t )

where Γ is the gamma function and v is the degrees of freedom.

As υ→∞, this tends to the normal distribution.
• GED Distribution - Nelson (1991)

44
RS – Lecture 12

ARCH Estimation: GMM

• Suppose we have an ARCH(q). We need moment conditions:
(1)  E [ m1 ]  E [ x t ' ( y t  x t γ )]  0
( 2 )  E [ m 2 ]  E [ε t2 ' ( t2   t2 )]  0
(3)  E [ m 3 ]  E [ t2   /(1   1  ...   q )]  0

Note: (1) refers to the conditional mean, (2) refers to the conditional
variance, and (3) to the unconditional mean.

• GMM objective function:

^ ^
Q ( X, y; θ )  E [ m ( θ ; X, y )]' W E [ m ( θ ; X, y )]
where
^ ^ ^ ^
E [ m ( θ ; X, y )]  [ E [ m 1 ]' E [ m 2 ]' E [ m 3 ]' ]'

ARCH Estimation: GMM

• γ has k free parameters; α has q free parameters. Then, we have r =
k + q + 1 parameters. Note that:
m(θ; X,y) has r = k + q + 1 equations.

Dimensions: Q is 1x1; E[m(θ; X,y)] is rx1; W is rxr.

• Problem is over-identified: more equations than parameters so cannot

solve E[m(θ; X,y)]=0, exactly.

• Choose a weighting matrix W for objective function and minimize

using numerical optimization.

• Optimal weighting matrix: W = {E[m(θ; X,y)] E[m(θ; X,y)]’}-1.

Var(θ) = (1/T)[DW-1D’]-1,
where D = δE[m(θ;X,y)]/δθ’ –expressions evaluated at θGMM.

45
RS – Lecture 12

ARCH Estimation: Testing

• Standard BP test , with auxiliary regression given by:
et2 = α0 + α1 et-12 + .... + αm et-q2 + vt

H0: α1= α2= ... = αq= 0 (No ARCH). It is not possible to do

GARCH test, since we are using the same lagged squared residuals.

Then, the LM test is (T-q)* R2 d

 χ2q – Engle’s (1982).

• In ARCH Models, testing as usual: LR, Wald, and LM tests.

Reliable inference from the LM, Wald and LR test statistics

generally does require moderately large sample sizes of at least two
hundred or more observations.

ARCH Estimation: Testing

• Issues:
- Non-negative constraints must be imposed. θ0 is often on the
boundary of Ω. (Two sided tests may be conservative.)
- Lack of identification of certain parameters under H0 creates a
singularity of the Information matrix under H0. For example, under
H0: α1=0 (No ARCH), in the GARCH(1,1), ω and β1 are not jointly
identified. See Davies (1977).

• Ignoring ARCH
- You suspect yt has an AR structure: yt = γ0 + γ1 yt-1 + εt
Hamilton (2008) finds that OLS t-test with no correction for ARCH
spuriously reject H0: γ1= 0 with arbitrarily high probability for
sufficiently large T. White’s (1980) SE help. NW SE help less.

46
RS – Lecture 12

ARCH Estimation: Testing

Figure. From Hamilton (2008). Fraction of samples in which OLS t-
test leads to rejection of H0: γ1=0 as a function of T for regression
with Gaussian errors (solid line) and Student’s t errors (dashed line).
Note: H0 is actually true and test has nominal size of 5%.

Testing for Heteroscedasticity: ARCH

• ARCH Test for the 3 factor F-F model for IBM returns (T=320),
with one lag:
IBMRet – rf = 0 + 1 (MktRet – rf) + 2 SMB + 4 HML + 

> b <- solve(t(x)%% x)%% t(x)%*%y #OLS regression

> e <- y - x%*%b
> e2 <- e^2
> xx1 <- e2[1:T-1]
> fit2 <- lm(e2[2:T]~xx1)
> r2_e2 <- summary(fit2)$r.squared
> r2_e2
[1] 0.2656472
> lm_t <- (T-1)*r2_e2
> lm_t
[1] 84.74147

LM-ARCH Test: 84.74  reject H0 at 5% level (χ2[1],05≈3.84), the

usual result for financial time series.

47
RS – Lecture 12

GARCH: Forecasting and Persistence (Again)

• Consider the forecast in a GARCH(1,1) model:
𝜎 𝜔 𝛼 𝜀 𝛽𝜎 𝜔 𝜎 𝛼 𝑧 𝛽 𝜀 𝜎 𝑧

Taking expectation at time t

𝐸 𝜎 𝜔 𝜎 𝛼 1 𝛽
Then, by repeated substitutions:
𝐸 𝜎 𝜔∑ 𝛼 𝛽 𝜎 𝛼 𝛽
As j → ∞, the forecast reverts to the unconditional variance:
ω/(1 – α1 – β1).

• When α1 + β1 = 1, today’s volatility affect future forecasts forever:

𝐸 𝜎 𝜎 𝑗𝜔

GARCH: Forecasting and Persistence

Example 1: We want to forecast next month (September 2020)
variance for CHF/USD changes. Recall we estimated 𝜎 :
𝜎 = 0.00012 + 0.19003 𝜀 + 0.71007 𝜎 .
getting 𝜎 : = 0.003672220 (=𝜎 : = sqrt(0.00367) = 6.1%)
We based the 𝜎 : forecast on:
𝐸 𝜎 𝜔 ∗ ∑ 𝛼 𝛽 𝜎 𝛼 𝛽

Then, 𝛼 𝛽 = 0.190 + 0.710 = 0.900

𝐸 : 𝜎 : 0.00012 0.00367 ∗ 0.9) = 0.003423

We also forecast 𝜎 :

𝐸 : 𝜎 : 0.00012 ∗ {1+ (0.9)+ (0.9)2} + 0.00367 ∗ (0.9)3

= 0.00300063 96

48
RS – Lecture 12

GARCH: Forecasting and Persistence

Example 1 (continuation):
We forecast volatility for March 2021:
𝐸 : 𝜎 : 0.00012 ∗ {1 + (0.9) + (0.9)2+ … + (0.9)5} +
+ 0.00367 ∗ (0.9)6 = 0.002512659

Remark: We observe that as the forecast horizon increases (j → ∞),

the forecast reverts to the unconditional variance:
𝜔/(1 – α1 – β1) = 0.00012/(1 − 0.9) = 0.0012
 𝜎 = sqrt(0.0012) = 0.0346 (3.46% ≈ close to sample
SD = 3.36%)

GARCH: Forecasting and Persistence

Example 2: On August 2020, we forecast the December’s variance
for the S&P500 changes. Recall we estimated 𝜎 :
𝜎 = 0.756 + 0.125 𝜀 + 0.826 𝜎 .
getting 𝜎 : = 43.037841

We based the 𝜎 : forecast on:

𝐸 𝜎 𝜔∗ ∑ 𝛼 𝛽 𝜎 𝛼 𝛽

Then, since 𝜶𝟏 𝜷𝟏 = 0.952

𝐸 : 𝜎 : 0.756 ∗ {1+ (0.952) + (0.952)2 + (0.952)3} +

+ 43.037841 ∗ (0.952)4 = 38.02797

Lower variance forecasted for the end of the year, but still far from
the unconditional variance of 15.4.
98

49
RS – Lecture 12

ARCH: Which Model to Use

• Questions
1) Lots of ARCH models. Which one to use?
2) Choice of p and q. How many lags to use?

• Hansen and Lunde (2004) compared lots of ARCH models:

- It turns out that the GARCH(1, 1) is a great starting model.
- Add a leverage effect for financial series and it’s even better.
- A t-distribution is also a good addition.

RV Models: Intuition
• The idea of realized volatility is to estimate the latent (unobserved)
variance using the realized data, without any modeling. Recall the
definition of sample variance:
𝑻
1
𝒔𝟐 𝒙𝒊 𝒙 𝟐
𝑇 1
• Suppose we want to calculate the daily variance for stock returns. We
know how to compute it: we use daily information, for T days, and
apply the above definition.

• Alternatively, we use hourly data for the whole day (with k hours).
Since hourly returns are very small, ignoring 𝒙 seems OK. We use 𝑟 ,𝒊
as the ith hourly variance on day t. Then, we add 𝑟 ,𝒊 over the day:

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑟 ,𝒊
10
0

50
RS – Lecture 12

RV Models: Intuition
• In more general terms, we use higher frequency data to estimate a
lower frequency variance:

𝑅𝑉 𝑟 ,𝒊

where rt,i is the realized returns in (higher frequency) interval i of the

(lower frequency) period t. We estimate the t-frequency variance, using k
i-intervals. If we have daily returns and we want to estimate the monthly
variance, then, k is equal to the number of days in a month.

• It can be shown that as the interval i becomes smaller (i → 0),

𝑅𝑉 → Return Variation [t – 1, t].

That is, with an increasing number of observations we get an accurate

measure of the latent variance.
10
1

RV Models: High Frequency

• Note that RV is a model-free measure of variation –i.e., no need for
ARCH-family specifications. The measure is called realized variance (RV).
The square root of the realized variance is the realized volatility (RVol,
RealVol):
𝑅𝑉𝑜𝑙 𝑠𝑞𝑟𝑡 𝑅𝑉

• Given the previous theoretical result, RV is commonly used with

intra-daily data, called high frequency (HF) data.

• It lead to a revolution in the field of volatility, creating new models

and new ways of thinking about volatility and how to model it.

• We usually associate realized volatility with an observable proxy of the

unobserved volatility.
10
2

51
RS – Lecture 12

RV Models: High Frequency – Tick Data

• The theory behind realized variation measures dictates that the
sampling frequency, or k in the RVt formula above, goes to ∞. Then,
use highest frequency available, say millisecond to millisecond returns.

• Intra-daily data applications are the most common. But, when using
intra-daily data, RV calculations are affected by microstructure effects: bid-
ask bounce, infrequent trading, calendar effects, etc. rt,i does not look
uncorrelated.

Example: The bid-ask bounce induces serial correlation in intra-day

returns, which biases RVt.

• As the sampling frequency increases, the “noise” (microstructure

effects) becomes more dominant and swallows the “signal” (true
volatility). 10
3

RV Models: High Frequency – Tick Data

• In practice, sampling a typical stock price every few seconds can
overestimate the true volatility by a factor of two or more.

• The usual solutions:

(1) Filter data using an ARMA model to get rid of the autocorrelations
and/or dummy variables to get rid of calendar effects.

Then, used the filtered data to compute RVt.

(2) Sample at frequencies where the impact of microstructure effects is

minimized and/or eliminated.

We follow solution (2).

10
4

52
RS – Lecture 12

RV Models: High Frequency – Practice

• In intra-daily RV estimation, it is common to use 10’ intervals. They
have good properties. However, there are estimations with 1’ intervals.

• Some studies suggest using an optimal frequency, where optimal

frequency is the one that minimizes the MSE.

• Hansen and Lunde (2006) find that for very liquid assets, such as the
S&P 500 index, a 5’ sampling frequency provides a reasonable choice.
Thus, to calculate daily RV, we need to add 78 five-minute intervals.

10
5

RV Models: High Frequency – TAQ

Example: Based on TAQ (Trade and Quote) NYSE data, we use 5’
realized returns to calculate 30’ variances –i.e., we use six 5’ intervals.
Then, the 30’ variance, or RVt=30-min, is:
𝑅𝑉 ∑ 𝑟, , 𝑡 1,2, . . . . , 𝑇=15
rt,j is the 5’ return during the jth interval on the half hour t. Then, we
calculate 30’ variances for the whole day –i.e., we calculate 13
variances, since the trading day goes from 9:30 AM to 4:00 PM.

The Realized Volatility, RVol, is:

𝑅𝑉𝑜𝑙 𝑅𝑉

10
6

53
RS – Lecture 12

RV Models: High Frequency – TAQ

Example: Below, we show the first transaction of the SPY TAQ
(Trade and Quote) data (tick-by-tick trade data) on January 2, 2014.
SYMBOL DATE TIME PRICE SIZE
SPY 20140102 9:30:00 183.98 500
SPY 20140102 9:30:00 183.98 500
SPY 20140102 9:30:00 183.98 200
SPY 20140102 9:30:00 183.98 500
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.98 800
SPY 20140102 9:30:00 183.98 100
SPY 20140102 9:30:00 183.98 100
SPY 20140102 9:30:00 183.97 200
SPY 20140102 9:30:00 183.98 100
SPY 20140102 9:30:00 183.97 200
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.97 100
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.98 2600
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.97 400 10
7

RV Models: High Frequency – TAQ

Example: Below, we show the first transaction of the AAPL TAQ
(Trade and Quote) data (tick-by-tick quote data) on January 2, 2014: 4 AM
SYMBOL DATE TIME BID OFR BIDSIZ OFRSIZ MODE EX
AAPL 20140102 4:00:00 455.39 0 1 0 12 T
AAPL 20140102 4:00:00 553.5 558 2 2 12 P
AAPL 20140102 4:00:01 455.39 561.02 1 2 12 T
AAPL 20140102 4:00:45 552.1 558 1 2 12 P
AAPL 20140102 4:00:51 552.1 558.4 1 2 12 P
AAPL 20140102 4:00:51 552.1 558.8 1 2 12 P
AAPL 20140102 4:00:51 552.1 559 1 1 12 P
AAPL 20140102 4:01:14 553 559 1 1 12 P
AAPL 20140102 4:01:30 553.01 561.02 1 2 12 T
AAPL 20140102 4:01:43 553.01 559 1 1 12 T
AAPL 20140102 4:01:44 553.05 559 1 1 12 P
AAPL 20140102 4:01:49 455.39 559 1 1 12 T
AAPL 20140102 4:01:49 553.61 559 1 1 12 T
AAPL 20140102 4:02:02 553.05 559 1 2 12 P
AAPL 20140102 4:02:04 455.39 559 1 1 12 T
AAPL 20140102 4:02:04 548.28 559 1 1 12 T
AAPL 20140102 4:02:33 553.05 558.83 1 2 12 P
AAPL 20140102 4:02:33 555.17 558.83 2 2 12 P
AAPL 20140102 4:03:50 555.2 558.83 5 2 12 P
10
8

54
RS – Lecture 12

RV Models: High Frequency – TAQ

Example (continuation): We read SPY trade data for 2014:Jan.
> HF_da <- read.csv("c:/Financial Econometrics/SPY_2014.csv", head=TRUE, sep=",")
> summary(HF_da)
SYMBOL DATE TIME PRICE SIZE G127
SPY:6800865 Min. :20140102 9:30:00 : 21436 Min. :176.6 Min. : 1 Min. :0
1st Qu.:20140110 16:00:00: 11352 1st Qu.:178.9 1st Qu.: 100 1st Qu.:0
Median :20140121 9:30:01 : 5922 Median :182.6 Median : 100 Median :0
Mean :20140119 15:59:59: 4090 Mean :181.4 Mean : 337 Mean :0
3rd Qu.:20140128 15:59:55: 3198 3rd Qu.:183.5 3rd Qu.: 300 3rd Qu.:0
Max. :20140131 15:50:00: 2916 Max. :189.2 Max. :4715350 Max. :0
(Other) :6751951
CORR COND EX
Min. :0.0e+00 @ :3351783 T :1649158
1st Qu.:0.0e+00 F :2888182 P :1335135
Median :0.0e+00 : 524409 Z :1182126
Mean :1.9e-04 O : 18057 D :1062382
3rd Qu.:0.0e+00 4 : 9098 K : 437900
Max. :1.2e+01 6 : 8142 J : 356539
(Other): 1194 (Other): 777625 10
9

RV Models: High Frequency – TAQ

Example (continuation): Using the SPY trade data, we calculate
using 5’-returns a daily realized volatilitiy for the first 4 days in 2014
(2014:01:02 - 2014:01:07). Originally, we have T = 1,048,570.

HF_da <- read.csv("http://www.bauer.uh.edu//rsusmel//4397//SPY_2014.csv",

head=TRUE, sep=",")
summary(HF_da)
pt <- as.POSIXct(paste(HF_da$DATE, HF_da$TIME), format="%Y%m%d %H:%M:%S")
library(xts)
hf_1 <- xts(x=HF_da, order.by = pt) # Define a specific time series data set
# pt pastes together DATE and Time.
spy_p <- as.numeric(hf_1$PRICE) # Read price data as numeric

T <- length(spy_5_p)
spy_ret <- log(spy_p[-1]/spy_p[-T])
plot(spy_ret, type="l", ylab="Return", main="Tick by Tick Return (2014:01:02 - 2014:01:07)")
mean(spy_ret)
sd(spy_ret) 11
0

55
RS – Lecture 12

RV Models: High Frequency – TAQ

Example (continuation): We plot the tick-by-tick data.

Very noisy data, with lots of “jumps”:

Mean tick by tick return: -3.7365e-09
Tick-by-tick SD: 6.3163e-05
111

RV Models: High Frequency – TAQ

Example (continuation): For the whole month of January 2020:

> mean(spy_ret)
[1] -4.796933e-09
> sd(spy_ret)
[1] 7.804991e-05
11
2

56
RS – Lecture 12

RV Models: High Frequency – TAQ

Example (continuation): We plot the autocorrelogram for the TAQ
SPY data:

> acf_spy_raw

Autocorrelations of series ‘spy_ret’, by lag

0 1 2 3 4 5 6 7 8 9 10
1.000 -0.469 -0.013 -0.010 0.014 -0.008 0.000 -0.002 -0.001 0.000 0.000

Note: We have only a significant autocorrelation, the 1st-order

autocorrelation: -0.459. 11
3

RV Models: High Frequency – TAQ

Example (continuation): We aggregate the tick-by-tick data in 5’
intervals using the function aggregateTrades in the R package
highfrequency. It needs as an input an xts object (hf_1, for us).

library(highfrequency)
spy_5 <- aggregateTrades(
hf_1,
on = "minutes", # you can use also seconds, days, weeks, etc.
k = 5, # number of units in for “on”
marketOpen = "09:30:00",
marketClose = "16:00:00",
tz = "GMT"
)

spy_5_p <- as.numeric(spy_5$PRICE)

T <- length(spy_5_p)
spy_5_ret <- log(spy_5_p[-1]/spy_5_p[-T])
plot(spy_5_ret, type="l", ylab="Return", main="5-minute Return (2014:01:02 - 2014:01:07)")11
4

57
RS – Lecture 12

RV Models: High Frequency – TAQ

Example (continuation): We plot the 5-minute return data.
Smoother, easier to read.

RVolt=2014:01:02 = 0.0053344
RVolt=2014:01:03 = 0.0043888
RVolt=2014:01:04 = 0.0059836
RVolt=2014:01:05 = 0.0052772 11
5

RV Models: High Frequency – TAQ

Example (continuation): We plot the autocorrelogram for the 5’
TAQ SPY data:

> acf_spy_5 <- acf(spy_5_ret, main = "5-minute SPY Data: January 2014")
> acf_spy_5
Autocorrelations of series ‘spy_ret’, by lag

0 1 2 3 4 5 6 7 8 9 10
1.000 -0.105 -0.024 -0.104 0.018 0.147 0.016 -0.024 -0.088 0.048 0.037

Note: We have a negative 1st-order autocorrelation: -0.105, thought not

significant. However, the autocorrelation of order 5 is significant. 11
6

58
RS – Lecture 12

RV Models: High Frequency – TAQ

Example (continuation): We plot the 10-minute return data.
Smoothing increases.

RVolt=2014:01:02 = 0.005478294
RVolt=2014:01:03 = 0.004256046
RVolt=2014:01:04 = 0.006190508
RVolt=2014:01:05 = 0.005145601 11
7

RV Models: High Frequency – TAQ

Example (continuation): We plot the autocorrelogram for the 10’
TAQ SPY data:

Note: Now, none of the autocorrelations is significant. The 10-minute

returns look independent.

11
8

59
RS – Lecture 12

RV Models: R Script
Example: R script to compute realized volatility
MSCI_da <- read.csv("http://www.bauer.uh.edu/rsusmel/4397/MSCI_daily.csv", head=TRUE, sep=",")
x_us <- MSCI_da$USAT <- length(x_us)
us_r <- log(x_us[-1]/x_us[-T])

x <- us_r # US log returns from MSCI USA Index

T <- length(x)
rvs=NULL # create vector to fill with RV
i <- 1
k <- 21 # k: observations per period
while (i < T-k) {
s2 <- sum(x[i:(i+k)]^2) # realized variance
i <- k + i
rvs <- rbind(rvs,s2)
}
rvol <- sqrt(rvs) # realized volatility
mean(rvol) # mean
sd(rvol) # variance

RV Models: Monthly RV From Daily Data

Example: Using daily data we calculate 1-mo Realized Volatility
(k=21 days) for log returns for the MSCI (1970: Jan – 2020: Oct).

> mean(rvol) # average monthly Rvol in the sample

[1] 0.04326531  very close to monthly S&P Volatility: 4.49%
> sd(rvol) # standard deviation of monthly Rvol in the sample
[1] 0.02592653  dividing by sqrt(T) we get the SE = 0.001 (very small)

60
RS – Lecture 12

RV Models: Log Rules

• The log approximations rules for the variance and SD are used to
change frequencies for the RV and RVol. For example, suppose we are
calculating RV based on frequency j, RVt=j. Suppose we are interested
in the J-period RVt=J, then, the annual variance can be calculated as
𝑅𝑉 𝐽 ∗ 𝑅𝑉

The RVolt=j is the square root of RVt=j.

RV Models: Log Rules

Example: We calculated using 10’ data the daily realized variance,
RVt=daily. Then, the annual variance can be calculated as
𝑅𝑉 260 ∗ 𝑅𝑉
where 260 is the number of trading days in the year. The annualized
RVOL is the squared root of 𝑅𝑉 :
𝑅𝑉𝑂𝐿 𝑠𝑞𝑟𝑡 260 ∗ 𝑅𝑉𝑂𝐿

We can use time series models –say, an ARIMA model- for RVt to
forecast daily volatility.

61
RS – Lecture 12

RV Models: Quarterly RV From Daily Data

Example: Using daily data we calculate 3-mo Realized Volatility
(k=66 days) for log returns for the MSCI (1970: March – 2020: Oct).

> mean(rvol) # average monthly Rvol in the sample

[1] 0.07725361  log approximation: sqrt(3) * 0.04326 = 0.07493 (close!)
> sd(rvol) # standard deviation of monthly Rvol in the sample
[1] 0.02592653

RV Models: Properties
• Under some conditions (bounded kurtosis and autocorrelation of
squared returns less than 1), RVt is consistent and m.s. convergent.
• Realized volatility is a measure. It has a distribution.
• For returns, the distribution of RV is non-normal (as expected). It
tends to be skewed right and leptokurtic. For log returns, the
distribution is approximately normal.
• Daily returns standardized by RV measures are nearly Gaussian.
• RV is highly persistent.
• The key problem is the choice of sampling frequency (or number of
observations per day).

62
RS – Lecture 12

Realized Volatility (RV) Models - Properties

• The key problem is the choice of sampling frequency (or number of
observations per day).

— Bandi and Russell (2003) propose a data-based method for

choosing frequency that minimizes the MSE of the measurement
error.
— Simulations and empirical examples suggest optimal sampling is
around 1-3 minutes for equity returns.

RV Models - Variation
• Another method: AR model for volatility:
|  t |    |  t 1 |  t

The εt are estimated from a first step procedure -i.e., a regression.

Asymmetric/Leverage effects can also be introduced.

OLS estimation possible. Make sure that the variance estimates are
positive.

63
RS – Lecture 12

Other Models - Parkinson’s (1980) estimator

• The Parkinson’s (1980) estimator:
s2t = {Σt [ln(Ht) – ln(Lt)]2 /(4ln(2)T)},

where Ht is the highest price and Lt is the lowest price.

• There is an RV counterpart, using HF data: Realized Range (RR):

RRt = {Σj [100 * (ln(Ht,j) – ln(Lt,j))]2 /(4ln(2)},

where Ht,j and Lt,j are the highest and lowest price in the jth interval.

• These “range” estimators are very good and very efficient.

Reference: Christensen and Podolskij (2005).

Stochastic volatility (SV/SVOL) models

• Now, instead of a known volatility at time t, like ARCH models, we
allow for a stochastic shock to σt, υt:
 t     t 1 t ; t ~ N (0,2 )
Or using logs:
log  t     log  t 1  t ; t ~ N (0,  2 )
• The difference with ARCH models: The shocks that govern the
volatility are not necessarily εt’s.

• Usually, the standard model centers log volatility around ω:

log  t     (log  t 1   )  t

Then,
E[log(σt)] = ω
Var[log(σt)] = κ2 συ2/(1 – β2).
 Unconditional distribution: log(σt) ~ N(ω, κ2)

64
RS – Lecture 12

Stochastic volatility (SV/SVOL) models

• Like ARCH models, SV models produce returns with kurtosis > 3
(and, also, positive autocorrelations between squared excess returns):
Var[rt = E[(rt – E[rt])2] = E[σt2zt2] = E[σt2] E[zt2]
= E[σt2] = exp(2ω + 2 κ2 ) (property of log normal)

kurt[rt = E[(rt - E[rt])4] / {(E[(rt - E[rt])2] )2 }

= E[σt4] E[zt4] / {(E[σt2] )2 (E[zt2])2 }
= 3 exp(4ω + 8κ2 ) / exp(4ω + 4κ2 ) = 3 exp(4κ2 ) > 3!

• We have 3 SVOL parameters to estimate: φ = (ω, β, σv).

• Estimation:
- GMM: Using moments, like the sample variance and kurtosis of
returns. Complicated -see Anderson and Sorensen (1996).
- Bayesian: Using MCMC methods (mainly, Gibbs sampling). Modern
approach.

Stochastic volatility (SV/SVOL) models

• The Bayesain approach takes advantage of the idea of hierarchical
structure:
- f(y|ht) (distribution of the data given the volatilities)
- f(ht|φ) (distribution of the volatilities given the parameters)
- f(φ) (distribution of the parameters)

Algorithm: MCMC (JPR (1994).)

Augment the parameter space to include ht.
Using a proper prior for f(ht,φ) MCMC methods provides inference
about the joint posterior f(ht,φ|y). We’ll go over this topic in Lecture 17.

Classic references: Jacquier, E., Poulson, N., Rossi, P. (1994), “Bayesian

analysis of stochastic volatility models,” Journal of Business and Economic
Statistics. (Estimation). Heston, S.L. (1993), “A closed-form solution for
options with stochastic volatility with applications to bond and currency
options,” Review of Financial Studies. (Theory)

Multiple Linear Regression Assignment
100% (1)
Multiple Linear Regression Assignment
4 pages
Statistics Cheat Sheet
100% (3)
Statistics Cheat Sheet
23 pages
Ec1 12
No ratings yet
Ec1 12
65 pages
Lecture 10 Heteroscedasticity
No ratings yet
Lecture 10 Heteroscedasticity
6 pages
Econometery ch2
No ratings yet
Econometery ch2
38 pages
Outline: Basic Econometrics in Transportation Basic Econometrics in Transportation
No ratings yet
Outline: Basic Econometrics in Transportation Basic Econometrics in Transportation
7 pages
Heteroscedasticity 2024
No ratings yet
Heteroscedasticity 2024
36 pages
Econometricis Chapter Four PBMs
No ratings yet
Econometricis Chapter Four PBMs
26 pages
L1090_lecture7_AU24
No ratings yet
L1090_lecture7_AU24
27 pages
Chapter9_Heteroscedasticity - Copy
No ratings yet
Chapter9_Heteroscedasticity - Copy
17 pages
Chapter 3 Heteroscedasticity
No ratings yet
Chapter 3 Heteroscedasticity
10 pages
Chapter 4 - Acct
No ratings yet
Chapter 4 - Acct
16 pages
Heteros Ce Dasti City
No ratings yet
Heteros Ce Dasti City
8 pages
Heteros Ce Dasti City
No ratings yet
Heteros Ce Dasti City
15 pages
Heteros Kedasti City
No ratings yet
Heteros Kedasti City
26 pages
Lecture # 3 (Heteroskedasticity in Cross-Sectional Data)
No ratings yet
Lecture # 3 (Heteroskedasticity in Cross-Sectional Data)
5 pages
Lecture 4
No ratings yet
Lecture 4
43 pages
Heteroskedasticity
No ratings yet
Heteroskedasticity
9 pages
F-504 heteoscedasticity
No ratings yet
F-504 heteoscedasticity
13 pages
Econometrics I: Nicolás Corona Juárez, Ph.D. 4.11.2020
No ratings yet
Econometrics I: Nicolás Corona Juárez, Ph.D. 4.11.2020
45 pages
Chapter Four Violations of Basic Classical Assumptions: Y and The Random Error Term U
No ratings yet
Chapter Four Violations of Basic Classical Assumptions: Y and The Random Error Term U
32 pages
Heteroscedasticity
No ratings yet
Heteroscedasticity
21 pages
Heteroscadasticity
No ratings yet
Heteroscadasticity
11 pages
economatrics_postmte_1
No ratings yet
economatrics_postmte_1
46 pages
Econometrics Course For RDAE Chapter 5
No ratings yet
Econometrics Course For RDAE Chapter 5
82 pages
Chapter 4 New Edited
No ratings yet
Chapter 4 New Edited
45 pages
Chap 11 Heterscedasticity
100% (1)
Chap 11 Heterscedasticity
45 pages
HETEROSCEDASTICITY
No ratings yet
HETEROSCEDASTICITY
9 pages
Hsts423 Unit 4
No ratings yet
Hsts423 Unit 4
13 pages
Heteroscedasticity3 150218115247 Conversion Gate01
No ratings yet
Heteroscedasticity3 150218115247 Conversion Gate01
10 pages
ECTRX Topic6 Heteroscedasticity (14)
No ratings yet
ECTRX Topic6 Heteroscedasticity (14)
31 pages
2A.3 Lecture Slides8 Heteroskedasticity
No ratings yet
2A.3 Lecture Slides8 Heteroskedasticity
20 pages
OMF Lecture 7
No ratings yet
OMF Lecture 7
72 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
Heteroscedasticity Notes
No ratings yet
Heteroscedasticity Notes
9 pages
Econometrics Multiple Regression Analysis: Heteroskedasticity
No ratings yet
Econometrics Multiple Regression Analysis: Heteroskedasticity
19 pages
Topic 9 Heteroscedasticity
No ratings yet
Topic 9 Heteroscedasticity
83 pages
Heteroscedasticity
No ratings yet
Heteroscedasticity
16 pages
Handout 7
No ratings yet
Handout 7
20 pages
Heteroskedasticity-2024
No ratings yet
Heteroskedasticity-2024
19 pages
Heteroscedsaticity Lecture 2023
No ratings yet
Heteroscedsaticity Lecture 2023
20 pages
Ecd202 Lec09 2023
No ratings yet
Ecd202 Lec09 2023
18 pages
Chris Brooks_Chapter 5_slides
No ratings yet
Chris Brooks_Chapter 5_slides
71 pages
4 Heteroscedasticity
No ratings yet
4 Heteroscedasticity
10 pages
55
No ratings yet
55
22 pages
LEC12
No ratings yet
LEC12
21 pages
Heteros Ce Dasti City
No ratings yet
Heteros Ce Dasti City
17 pages
Multicollinearity AND Heteroskedasticity
No ratings yet
Multicollinearity AND Heteroskedasticity
75 pages
Heteroscedasticity Workshop
No ratings yet
Heteroscedasticity Workshop
72 pages
HS Breakdown
No ratings yet
HS Breakdown
8 pages
Further Regression Topics II
No ratings yet
Further Regression Topics II
32 pages
chap 7 ecn 320
No ratings yet
chap 7 ecn 320
39 pages
ch11 Heteroscedasticity
No ratings yet
ch11 Heteroscedasticity
31 pages
Models, Testing, and Correction of Heteroskedasticity James L. Powell
No ratings yet
Models, Testing, and Correction of Heteroskedasticity James L. Powell
8 pages
Hetero Part
No ratings yet
Hetero Part
11 pages
chapter4
No ratings yet
chapter4
62 pages
14 Heteroskedasticity
No ratings yet
14 Heteroskedasticity
52 pages
OLS Assumptions
No ratings yet
OLS Assumptions
40 pages
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Asymptotic Expansions
From Everand
Asymptotic Expansions
A. Erdélyi
3/5 (1)
Econometrics by Shalabh Sir
No ratings yet
Econometrics by Shalabh Sir
488 pages
ISS Statistics Paper 3 2020
No ratings yet
ISS Statistics Paper 3 2020
8 pages
ISS Statistics Paper 3 2018
No ratings yet
ISS Statistics Paper 3 2018
6 pages
Paper:: Jam 2017 Answer Key Mathematical Statistics Code: MS
No ratings yet
Paper:: Jam 2017 Answer Key Mathematical Statistics Code: MS
1 page
Linear Regression Models For Panel Data Using SAS, STATA, LIMDEP and SPSS
100% (2)
Linear Regression Models For Panel Data Using SAS, STATA, LIMDEP and SPSS
67 pages
Logic by W Stanley Jevons
No ratings yet
Logic by W Stanley Jevons
140 pages
Deductive Vs Inductive Method: Vaish College of Education, Rohtak
No ratings yet
Deductive Vs Inductive Method: Vaish College of Education, Rohtak
10 pages
ANOVA Examples
No ratings yet
ANOVA Examples
5 pages
Nama: I Komang Widi Astawa NIM: 1705542024 Jurusan: Teknik Elektro
No ratings yet
Nama: I Komang Widi Astawa NIM: 1705542024 Jurusan: Teknik Elektro
4 pages
Deductive & Inductive Forms of Reasoning
No ratings yet
Deductive & Inductive Forms of Reasoning
54 pages
A Spontaneous Order - The Capitalist Case For A Stateless Society - Chase Rachels - 2015 - (Sem Edição) - Inglês
No ratings yet
A Spontaneous Order - The Capitalist Case For A Stateless Society - Chase Rachels - 2015 - (Sem Edição) - Inglês
364 pages
ISLR Chap 6 Shaheryar
No ratings yet
ISLR Chap 6 Shaheryar
22 pages
Assignments Ashoka University
No ratings yet
Assignments Ashoka University
32 pages
Mco-3 Eng Solved Assignment 2019-20
0% (1)
Mco-3 Eng Solved Assignment 2019-20
19 pages
Sudeep Rath Pizza Hut Case Analysis
No ratings yet
Sudeep Rath Pizza Hut Case Analysis
3 pages
Boghossian, Paul - What Is Inference
No ratings yet
Boghossian, Paul - What Is Inference
18 pages
Final Exam - Stat101 - SpringA
No ratings yet
Final Exam - Stat101 - SpringA
5 pages
Statistical Inferences Assignment-2
No ratings yet
Statistical Inferences Assignment-2
3 pages
Ge 4 MMW Midterm Coverage
No ratings yet
Ge 4 MMW Midterm Coverage
18 pages
Charles Sanders Peirce's MS L75
No ratings yet
Charles Sanders Peirce's MS L75
129 pages
Floydeth V. Cortez Dissertation Seminar
No ratings yet
Floydeth V. Cortez Dissertation Seminar
20 pages
DLP Mon Mi FN
No ratings yet
DLP Mon Mi FN
9 pages
Multiple Regression
100% (1)
Multiple Regression
58 pages
Critical Thinking For Students
100% (25)
Critical Thinking For Students
129 pages
Week 3-Chi-Square
No ratings yet
Week 3-Chi-Square
3 pages
EPM 321 NOTES
No ratings yet
EPM 321 NOTES
133 pages
FormulaSheet Stat 124
No ratings yet
FormulaSheet Stat 124
2 pages
Logic Notes
No ratings yet
Logic Notes
4 pages
Forecasting - Solutions
No ratings yet
Forecasting - Solutions
46 pages
Hypothesis Testing - Class 2
No ratings yet
Hypothesis Testing - Class 2
30 pages
GEC 4 Revised AY 2019-2020
100% (1)
GEC 4 Revised AY 2019-2020
23 pages
Instant ebooks textbook ISE Critical Thinking a Students Introduction Seventh Edition Gregory Bassham download all chapters
100% (3)
Instant ebooks textbook ISE Critical Thinking a Students Introduction Seventh Edition Gregory Bassham download all chapters
71 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.