0% found this document useful (0 votes)
87 views

Ec1 12

This document discusses heteroscedasticity, or unequal variances of the error terms in a regression model. It describes how heteroscedasticity violates an assumption of ordinary least squares regression. Several tests for detecting heteroscedasticity are presented, including visual inspection of a plot of residuals and the Goldfeld-Quandt test, which compares residual sums of squares from regressions on data subsets. Reasons why heteroscedasticity may occur are provided, as well as the implications of heteroscedasticity for regression coefficient estimation and hypothesis testing.

Uploaded by

LOKENDRA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Ec1 12

This document discusses heteroscedasticity, or unequal variances of the error terms in a regression model. It describes how heteroscedasticity violates an assumption of ordinary least squares regression. Several tests for detecting heteroscedasticity are presented, including visual inspection of a plot of residuals and the Goldfeld-Quandt test, which compares residual sums of squares from regressions on data subsets. Reasons why heteroscedasticity may occur are provided, as well as the implications of heteroscedasticity for regression coefficient estimation and hypothesis testing.

Uploaded by

LOKENDRA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

RS – Lecture 12

Lecture 12
Heteroscedasticity

Heteroscedasticity
• Assumption (A3) is violated in a particular way:  has unequal
variances, but i and j are still not correlated with each other. Some
observations (lower variance) are more informative than others
(higher variance).

f(y|x)

.
. E(y|x) = b0 + b1x

.
x1 x2 x3 x 2

1
RS – Lecture 12

Heteroscedasticity
• Now, we have the CLM regression with hetero-(different) scedastic
(variance) disturbances.
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3’) Var[i] = 2 i, i > 0. (CLM  i = 1, for all i.)
(A4) X has full column rank – rank(X)=k–, where T ≥ k.

• Popular normalization: i i = 1. (A scaling, absorbed into 2.)

• A characterization of the heteroscedasticity: Well defined


estimators and methods for testing hypotheses will be obtainable if
the heteroscedasticity is “well behaved” in the sense that
i / i i → 0 as T → . -i.e., no single observation
becomes dominant.
(1/T)i i → some stable constant. (Not a plim!)

GR Model and Testing


• Implications for conventional OLS and hypothesis testing:
1. b is still unbiased.
2. Consistent? We need the more general proof. Not difficult.
3. If plim b = , then plim s2 = 2 (with the normalization).
4. Under usual assumptions, we have asymptotic normality.

• Two main problems with OLS estimation under heterocedasticity:


(1) The usual standard errors are not correct. (They are biased!)
(2) OLS is not BLUE.

• Since the standard errors are biased, we cannot use the usual t-
statistics or F–statistics or LM statistics for drawing inferences. This
is a serious issue.

2
RS – Lecture 12

Heteroscedasticity: Inference Based on OLS


• Q: But, what happens if we still use s2(XX)-1?
A: It depends on XX – XX. If they are nearly the same, the OLS
covariance matrix will give OK inferences.

But, when will XX – XX be nearly the same? The answer is based
on a property of weighted averages. Suppose i is randomly drawn
from a distribution with E[i] = 1. Then,
(1/T)i i xi2 
p
E[x2] –just like (1/T)i xi2.

• Remark: For the heteroscedasticity to be a significant issue for


estimation and inference by OLS, the weights must be correlated with
x and/or xi2. The higher correlation, heteroscedasticity becomes
more important (b is more inefficient).

Finding Heteroscedasticity
• There are several theoretical reasons why the i may be related to x
and/or xi2:
1. Following the error-learning models, as people learn, their errors of
behavior become smaller over time. Then, σ2i is expected to decrease.
2. As data collecting techniques improve, σ2i is likely to decrease.
Companies with sophisticated data processing techniques are likely to
commit fewer errors in forecasting customer’s orders.
3. As incomes grow, people have more discretionary income and, thus,
more choice about how to spend their income. Hence, σ2i is likely to
increase with income.
4. Similarly, companies with larger profits are expected to show
greater variability in their dividend/buyback policies than companies
with lower profits.

3
RS – Lecture 12

Finding Heteroscedasticity
• Heteroscedasticity can also be the result of model misspecification.
• It can arise as a result of the presence of outliers (either very small or
very large). The inclusion/exclusion of an outlier, especially if T is
small, can affect the results of regressions.
• Violations of (A1) – model is correctly specified-–, can produce
heteroscedasticity, due to omitted variables from the model.
• Skewness in the distribution of one or more regressors included in
the model can induce heteroscedasticity. Examples are economic
variables such as income, wealth, and education.
• David Hendry notes that heteroscedasticity can also arise because of
– (1) incorrect data transformation (e.g., ratio or first difference
transformations).
– (2) incorrect functional form (e.g., linear vs log–linear models).

Finding Heteroscedasticity
• Heteroscedasticity is usually modeled using one the following
specifications:
- H1 : σt2 is a function of past εt2 and past σt2 (GARCH model).
- H2 : σt2 increases monotonically with one (or several) exogenous
variable(s) (x1,, . . . , xT ).
- H3 : σt2 increases monotonically with E(yt).
- H4 : σt2 is the same within p subsets of the data but differs across the
subsets (grouped heteroscedasticity). This specification allows for structural
breaks.

• These are the usual alternatives hypothesis in the heteroscedasticity


tests.

4
RS – Lecture 12

Finding Heteroscedasticity
• Visual test
In a plot of residuals against dependent variable or other variable will
often produce a fan shape.

180
160
140
120
100
Series1
80
60
40
20
0
0 50 100 150

Testing for Heteroscedasticity


• Usual strategy when heteroscedasticity is suspected: Use OLS along
the White estimator. This will give us consistent inferences.

• Q: Why do we want to test for heteroscedasticity?


A: OLS is no longer efficient. There is an estimator with lower
asymptotic variance (the GLS/FGLS estimator).

• We want to test: H0: E(ε2|x1, x2,…, xk) = E(ε2) = 2

• The key is whether E[2] = 2i is related to x and/or xi2. Suppose


we suspect a particular independent variable, say X1, is driving i.

• Then, a simple test: Check the RSS for large values of X1, and the
RSS for small values of X1. This is the Goldfeld-Quandt test.

5
RS – Lecture 12

Testing for Heteroscedasticity


• The Goldfeld-Quandt test
- Step 1. Arrange the data from small to large values of the
independent variable suspected of causing heteroscedasticity, Xj.

- Step 2. Run two separate regressions, one for small values of Xj and
one for large values of Xj, omitting d middle observations (≈ 20%).
Get the RSS for each regression: RSS1 for small values of Xj and
RSS2 for large Xj’s.

- Step 3. Calculate the F ratio


GQ = RSS2/RSS1, ~ Fdf,df with df =[(T – d) – 2(k+1)]/2 (A5 holds).

If (A5) does not hold, we rely on asymptotic theory. Then,


GQ is asymptoticallly χ2.

Testing for Heteroscedasticity


• The Goldfeld-Quandt test
Note: When we suspect more than one variable is driving the i’s,
this test is not very useful.

• But, the GQ test is a popular to test for structural breaks (two


regimes) in variance. For these tests, we rewrite step 3 to allow for a
different sample size in the sub-samples 1 and 2.

- Step 3. Calculate the F-test ratio


GQ = [RSS2/ (T2 – k)]/[RSS1/ (T1 – k)]

6
RS – Lecture 12

Testing for Heteroscedasticity: GQ Test


Example: We test if the 3-factor FF model for IBM and GE returns
shows heteroscedasticity with a GQ test, using gqtest in package lmtest.
• IBM returns
> library(lmtest)
> gqtest(ibm_x ~ Mkt_RF + SMB + HML, fraction = .20)
Goldfeld-Quandt test

data: ibm_x ~ Mkt_RF + SMB + HML


GQ = 1.1006, df1 = 224, df2 = 223, p-value = 0.2371  cannot reject H0 at 5% level.
alternative hypothesis: variance increases from segment 1 to 2

• GE returns
gqtest(ge_x ~ Mkt_RF + SMB + HML, fraction = .20)
Goldfeld-Quandt test

data: ge_x ~ Mkt_RF + SMB + HML


GQ = 2.744, df1 = 281, df2 = 281, p-value < 2.2e-16  reject H0 at 5% level.
alternative hypothesis: variance increases from segment 1 to 2
23

Testing for Heteroscedasticity: LR Test


• The Likelihood Ratio Test
Let’s define the likelihood function, assuming normality, for a
general case, where we have g different variances:
g g
T Ti 1 1
ln L   ln 2 
2 i 1
2
ln i2 
2 
i 1
2
i
( yi  X i)( yi  X i)

We have two models:


(R) Restricted under H0: i2 = 2. From this model, we calculate ln L
T T
ln LR   [ln(2)  1]  ln(ˆ 2 )
2 2
(U) Unrestricted. From this model, we calculate the log likelihood.
g
T Ti
ln LU   [ln(2)  1] 
2  2 ln ˆ ;
i 1
2
i ˆ i2  T1 ( yi  X i b)( yi  X i b)
i

7
RS – Lecture 12

Testing for Heteroscedasticity: LR Test


• Now, we can estimate the Likelihood Ratio (LR) test:
g
LR  2(ln LU  ln LR )  T ln ˆ 2  T ln ˆ
i 1
i i
2
a
  2g 1

Under the usual regularity conditions, LR is approximated by a χ2g-1.

• Using specific functions for i2, this test has been used by
Rutemiller and Bowers (1968) and in Harvey’s (1976) groupwise
heteroscedasticity paper.

Testing for Heteroscedasticity


• Score LM tests
• We want to develop tests of H0: E(ε2|x1, x2,…, xk) = 2 against an
H1 with a general functional form.

• Recall the central issue is whether E[2] = 2i is related to x


and/or xi2. Then, a simple strategy is to use OLS residuals to estimate
disturbances and look for relationships between ei2 and xi and/or xi2.

• Suppose that the relationship between ε2 and X is linear:


ε2 = Xα + v
Then, we test: H0: α = 0 against H1: α ≠ 0.

• We can base the test on how the squared OLS residuals e correlate
with X.

8
RS – Lecture 12

Testing for Heteroscedasticity


• Popular heteroscedasticity LM tests:
- Breusch and Pagan (1979)’s LM test (BP).
- White (1980)’s general test.

• Both tests are based on OLS residuals. That is, calculated under H0:
No heteroscedasticity.

• The BP test is an LM test, based on the score of the log likelihood


function, calculated under normality. It is a general tests designed to
detect any linear forms of heteroskedasticity.

• The White test is an asymptotic Wald-type test, normality is not


needed. It allows for nonlinearities by using squares and
crossproducts of all the x’s in the auxiliary regression.

Testing for Heteroscedasticity: BP Test


• Let’s start with a general form of heteroscedasticity:
hi(α0 + zi,1’ α1 + .... + zi,m’ αm) = i2
• We want to test: H0: E(εi2|z1, z2,…, zk) = hi(zi’α) =2
or H0: α1 = α2 = ... = αm =  (m restrictions)
• Assume normality. That is, the log likelihood function is:
log L = constant + ½Σ log 2i – ½ Σ εi2/i2
Then, construct an LM test:
LM = S(θR)’ I(θR)-1 S(θR) θ= (β,α)
S(θ)=∂log L/∂θ’=[-Σi X’εi ;-½Σ(∂h/∂α)zii +½Σi-4 εi2(∂h/∂α)zi]
-2 -2

I(θ) = E[- ∂2log L/∂θ∂θ’]

• We have block diagonality, we can rewrite the LM test, under H0:


LM = S(α0,0)’ [I22 – I21 I11 I21]-1 S(α0,0)

9
RS – Lecture 12

Testing for Heteroscedasticity: BP Test


• We have block diagonality, we can rewrite the LM test, under H0:
LM = S(α0,0)’ [I22 – I21 I11 I21]-1 S(α0,0)
S(α0,0) = -½Σi (∂h/∂α|α0,R,0)z′ R-2 + ½ Σi R-4ei2(∂h/∂α |α0,R,0)z′
= ½ R-2 (∂h/∂α |α0,R,0) Σi zi (ei2/R2 – 1)
= ½ R-2 (∂h/∂α |α0,R,0) Σi zi ωi
ωi =ei2/R2 – 1 = gi – 1
I22(α0,0) = E[- ∂2log L/∂ α∂ α’] = ½ [R-2 (∂h/∂α |α0,R,0 )]2 Σi zi zi′
I21(α0,0) = 0
R2 = (1/T) Σi ei2 (MLE of under H0).
Then,
LM = ½ (Σi zi ωi)′[Σi zi zi′]-1(Σi zi ωi)= ½ W′Z (Z′Z)-1 Z′W ~ χ2m

Note: Recall R2 = [y′X (X′X)-1 X′y – T 𝑦 ]/[y′y – T 𝑦 ] = ESS/TSS


Also note that under H0: E[ωi] = 0, E[ωi2] = 1.

Testing for Heteroscedasticity: BP Test


• LM = ½ W′Z (Z′Z)-1 Z′W = ½ ESS
ESS = Explained SS in regression of ωi (=ei2/R2 - 1) against zi.

• Under the usual regularity conditions, and under H0,


√T (αML – α) 
d
N(0, 2 4 (Z′Z/T)-1)
Then,
LM-BP= (2 R4)-1 ESSe  d χ .
ESSe = ESS in regression of ei2 (=gi R2) against zi.
Since R4 
p
4  LM-BP  d
χ

Note: Recall R2= [y′X (X′X)-1 X′y – T𝑦 ]/[y′y – T 𝑦 ]

Under H0: E[ωi]=0, E[ωi2]=1, the LM test is equivalent to a T R2.


(Think of y  0 & y’y/T=1 above).

10
RS – Lecture 12

Testing for Heteroscedasticity: BP Test


• Variations:
(1) Glesjer (1969) test. Use absolute values instead of ei2 to estimate
the varying second moment. Following our previous example,
|ei|= α0 + zi,1’ α1 + .... + zi,m’ αm + vi

(2) Harvey-Godfrey (1978) test. Use ln(ei2). Then, the implied model
for i2 is an exponential model.
ln(ei2) = α0 + zi,1’ α1 + .... + zi,m’ αm + vi

Note: Implied model for i2 = exp{α0 + zi,1’ α1 + .... + zi,m’ αm + vi}.

Testing for Heteroscedasticity: BP Test


• Variations:
(3) Koenker’s (1981) studentized LM test. A usual problem with
statistic LM is that it crucially depends on the assumption that ε is
normal. Koenker (1981) proposed studentizing the statistic LM-BP
by
LM-S = (2 R4) LM-BP/[Σ (εi2 – R2)2 /T]  d
χ2m

The studentized version of the test is asymptotically equivalent to a


T*R2 test, where R2 is calculated from a regression of ei2/R2 on the
variables Z. (Omitting R2 from the denominator is OK.)

11
RS – Lecture 12

Testing for Heteroscedasticity: BP Test


• We have the following steps:
- Step 1. Run OLS on DGP:
y = X  + . –Keep ei and compute R2 = RSS/T

- Step 2. (Auxiliary Regression). Run the regression of ei2 on the m


explanatory variables, z. In our example,
ei2 = α0 + zi,1’ α1 + .... + zi,m’ αm + vi –Keep R2.
- Step 3. Use the R2 from Step 2. Let’s call it 𝑅 . Calculate
LM = T * 𝑅 χ .

Testing for Heteroscedasticity: Example – IBM


Example: We suspect that squared Mkt_RF (x1) –a measure of the
overall market’s variance- drives heteroscedasticity. We do a
studentized LM-BP test for IBM in the 3-factor FF model:

fit_ibm_ff3 <- lm (ibm_x ~ Mkt_RF + SMB + HML) # Step 1 – OLS in DGP (3-factor FF model)
e_ibm <- fit_ibm_ff3$residuals # Step 1 – keep residuals
e_ibm2 <- e_ibm^2 # Step 1 – squared residuals
Mkt_RF2 <- Mkt_RF^2
fit <- lm (e_ibm2 ~ Mkt_RF2) # Step 2 – Auxiliary regression
Re_2 <- summary(fit_BP)$r.squared # Step 2 – keep R^2
LM_BP_test <- Re2 * T
> LM_BP_test # Step 3 – Compute LM-BP test: R^2 * T
[1] 0.25038
> p_val <- 1 - pchisq(LM_BP_test, df = 1) # p-value of LM_test
> p_val
[1] 0.6168019

LM-BP Test: 0.25028  cannot reject H0 at 5% level (χ2[1],.05≈3.84);


with a p-value= .6168. 27

12
RS – Lecture 12

Testing for Heteroscedasticity: Example – IBM


Example (continuation): The bptest in the lmtest package performs a
studentized LM-BP test for the same variables used in the model
(Mkt, SMB and HML). For IBM in the 3-factor FF model:

> bptest(ibm_x ~ Mkt_RF + SMB + HML) #bptest only allows to test H1: fxi=model variables)

studentized Breusch-Pagan test

data: ibm_x ~ Mkt_RF + SMB + HML


BP = 4.1385, df = 3, p-value = 0.2469

LM-BP Test: 4.1385  cannot reject H0 at 5% level (χ2[3],.05≈7.815);


with a p-value = 0.2469.

Note: Heteroscedasticity in financial time series is very common. In


general, it is driven by squared market returns or squared past errors.
28

Testing for Heteroscedasticity: Example – DIS


Example: We suspect that squared Market returns drive
heteroscedasticity. We do an LM-BP (studentized) test for Disney:
lr_dis <- log(x_dis[-1]/x_dis[-T]) # Log returns for DIS
dis_x <- lr_dis – RF # Disney excess returns
fit_dis_ff3 <- lm (dis_x ~ Mkt_RF + SMB + HML) # Step 1 – OLS in DGP (3-factor FF model)
e_dis <- fit_dis_ff3$residuals # Step 1 – keep residuals
e_dis2 <- e_dis^2 # Step 2 – squared residuals
fit <- lm (e_dis2 ~ Mkt_RF2) # Step 2 – Auxiliary regression
Re_e2 <- summary(fit_BP)$r.squared # Step 2 – Keep R^2 from Auxiliary reg
LM_BP_test <- Re_e2 * T # Step 3 – Compute LM Test: R^2 * T
> LM_BP_test
[1] 14.15224
> p_val <- 1 - pchisq(LM_BP_test, df = 1) # p-value of LM_test
> p_val
[1] 0.0001685967

LM-BP Test: 14.15  reject H0 at 5% level (χ2[1],.05≈3.84); with a p-


value = .0001. 26

13
RS – Lecture 12

Testing for Heteroscedasticity: Example – DIS


Example (continuation): We do the same test, but with SMB
squared for Disney:
SMB2 <- SMB^2
fit <- lm (e_dis2 ~ SMB2)
Re_e2 <- summary(fit_BP)$r.squared
LM_BP_test <- Re_e2 * T
> LM_BP_test
[1] 7.564692
> p_val <- 1 - pchisq(LM_BP_test, df = 1) # p-value of LM_test
> p_val
[1] 0.005952284

LM-BP Test: 7.56  reject H0 at 5% level (χ2[1],.05≈3.84); with a p-


value= .006.

27

Testing for Heteroscedasticity: White Test


• Based on the difference between OLS and true OLS variances:
2 (XX – XX)= X𝚺X - 2XX = Σi (E[εi2] – 2)xi ′xi
• Empirical counterpart: (1/T) Σi (ei2 – s2)xi ′xi
• We can express each element of the k(k+1) matrix as:
(1/T) Σi (ei2 – s2)ψi ψi: Kolmogorov-Gabor polynomial
ψi = (ψ1i, ψ2i, ..., ψmi)’ ψli = ψqi ψpi p≥q, p,q=1,2,...,k
l= 1,2,...,m m=k(k-1)/2
• White heteroscedasticity test:
W = [(1/T) Σi (ei2 – s2)ψi]’ DT-1[(1/T) Σi (ei2 – s2)ψi] 
d χ2m
where
DT = Var [(1/T)Σi (ei2 – s2)ψi]
Note: W is asymptotically equivalent to a T R2 test, where R2 is
calculated from a regression of ei2/R2on the ψi’s.

14
RS – Lecture 12

Testing for Heteroscedasticity: White Test


• Usual calculation of the White test
– Step 1. Run OLS on DGP:
y = X  + . –Keep ei

– Step 2. (Auxiliary Regression). Regress e2 on all the explanatory


variables (Xj), their squares (Xj2), and all their cross products. For
example, when the model contains k = 2 explanatory variables, the
test is based on:
ei2 =β0 + β1 x1,i + β2 x2,i + β3 x1,i2 + β4 x2,i2 + β5 x1x2,i + vi
Let m be the number of regressors in auxiliary regression. Keep R2,
say 𝑅 .

– Step 3. Compute the LM statistic


LM = T * 𝑅 χ .

Testing for Heteroscedasticity: White Test


Example: White Test for 3-factor FF model residuals for IBM:
HML2 <- HML^2;
Mkt_HML <- Mkt_RF*HML
Mkt_SMB <- Mkt_RF*SMB
SMB_HML <- SMB*HML
xx2 <- cbind(Mkt_RF2, SMB2, HML2, Mkt_HML, Mkt_SMB, SMB_HML)
fit_ibm_W <- lm(e_ibm2 ~ xx2) # Not including original variables OK
r2_e2 <- summary(fit_ibm_W)$r.squared # Keep R^2 from Auxiliary regression
> r2_e2
[1] 0.0166492
lm_t <- T * r2_e2 # Compute LM test: R^2 * sample size (T)
> lm_t
[1] 10.93483
df_lm <- ncol(xx2)
qchisq(.95, df = df_lm)

LM-White Test: 10.93  cannot reject H0 at 5% level (χ2[6],.05≈12.59).


30

15
RS – Lecture 12

Testing for Heteroscedasticity: White Test


Example (continuation): Now, we do a White Test for the 3 factor
F-F model for DIS and GE returns.
• For DIS, we get:
fit_dis_W <- lm (e_dis2 ~ xx2)
Re_2W <- summary(fit_dis_W)$r.squared
LM_W_test <- Re_2W * T
> LM_W_test
[1] 25.00148  reject H0 at 5% level (χ2[6],05 ≈ 12.59).
>qchisq(.95, df = df_lm)
[1] 12.59159
> p_val <- 1 - pchisq(LM_W_test, df = 6) # p-value of LM_test
> p_val
[1] 0.0003412389

• For GE, we get:


LM-White Test: 20.15 (p-value=0.0026)  reject H0 at 5% level.
31

Testing for Heteroscedasticity: White Test


Example: We do a White Test for the residuals in the encompassing
(IFE + PPP) model for changes in the USD/GBP (T=363):
fit_gbp <- lm(lr_usdgbp ~ inf_dif + int_dif)
e_gbp <- fit_gbp$residuals
e_gbp2 <- e_gbp^2
int_dif2 <- int_dif^2
inf_dif2 <- inf_dif^2
int_inf_dif <- int_dif*inf_dif

fit_W <- lm (e_gbp2 ~ int_dif2 + inf_dif2+ int_inf_dif)


Re_e2W <- summary(fit_W)$r.squared
LM_W_test <- Re_e2W * T
p_val <- 1 - pchisq(LM_W_test, df = 3) # p-value of LM_test

> LM_W_test
[1] 15.46692
> p_val
[1] 0.001458139  reject H0 at 5% level
32

16
RS – Lecture 12

Testing for Heteroscedasticity: Remarks


• Drawbacks of the Breusch-Pagan test:
- It has been shown to be sensitive to violations of the normality
assumption.
- Three other popular LM tests: the Glejser test; the Harvey-Godfrey
test, and the Park test, are also sensitive to such violations.

• Drawbacks of the White test


- If a model has several regressors, the test can consume a lot of df’s.
- In cases where the White test statistic is statistically significant,
heteroscedasticity may not necessarily be the cause, but model
specification errors.
- It is general. It does not give us a clue about how to model
heteroscedasticity to do FGLS. The BP test points us in a direction.

Testing for Heteroscedasticity: Remarks


• Drawbacks of the White test (continuation)
- In simulations, it does not perform well relative to others, especially,
for time-varying heteroscedasticity, typical of financial time series.
- The White test does not depend on normality; but the Koenker’s test
is also not very sensitive to normality. In simulations, Koenker’s test
seems to have more power –see, Lyon and Tsai (1996) for a Monte
Carlo study of the heteroscedasticity tests presented here.
.

17
RS – Lecture 12

Testing for Heteroscedasticity: Remarks


• General problems with heteroscedasticity tests:
- The tests rely on the first four assumptions of the CLM being true.
- In particular, (A2) violations. That is, if the zero conditional mean
assumption, then a test for heteroskedasticity may reject the null
hypothesis even if Var(y|X) is constant.
- This is true if our functional form is specified incorrectly (omitted
variables or specifying a log instead of a level). Recall David Hendry’s
comment.

• Knowing the true source (functional form) of heteroscedasticity


may be difficult. A practical solution is to avoid modeling
heteroscedasticity altogether and use OLS along the White
heterosekdasticity-robust standard errors.

Estimation: WLS form of GLS


• While it is always possible to estimate robust standard errors for
OLS estimates, if we know the specific form of the heteroskedasticity,
we can obtain more efficient estimates than OLS: GLS.

• GLS basic idea: Efficient estimation through the transform the


model into one that has homoskedastic errors – called WLS.

• Suppose the heteroskedasticity can be modeled as:


Var(ε|x) = 2 h(x)

• The key is to figure out what h(x) looks like. Suppose that we know
hi. For example, hi(x)=xi2. (make sure hi is always positive.)

• Then, use 1/√ xi2 to transform the model.

18
RS – Lecture 12

Estimation: WLS form of GLS


• Suppose that we know hi(x) = xi2. Then, use 1/√xi2 to transform
the model:
Var(εi/√hi|x) = 2

• Thus, if we divide our whole equation by √hi we get a (transformed)


model where the error is homoskedastic.

• Assuming weights are known, we have a two-step GLS estimation:


- Step 1: Use OLS, then the residuals to estimate the weights.
- Step 2: Weighted least squares using the estimated weights.

• Greene has a proof based on our asymptotic theory for the


asymptotic equivalence of the second step to true GLS.

Estimation: FGLS
• More typical is the situation where we do not know the form of the
heteroskedasticity. In this case, we need to estimate h(xi).

• Typically, we start by assuming a fairly flexible model, such as


Var(ε|x) = h(x) = 2 exp(X) –make sure Var(εi|x)>0.

But, we don’t know , it must be estimated. By our assumptions:


ε2 = 2 exp(X) v with E(v|X) = 1.
Then, if E(v) = 1
ln(ε2) = X + u (*)
where E(u) = 0 and u is independent of X.

We know that e is an estimate of ε, so we can estimate (*) by OLS.

19
RS – Lecture 12

Estimation: FGLS
• Now, an estimate of h is obtained as ĥ = exp(ĝ), and the inverse of
this is our weight. Now, we can do GLS as usual.

• Summary of FGLS
(1) Run the original OLS model, save the residuals, e. Get ln(e2).
(2) Regress ln(e2) on all of the independent variables. Get fitted
values, ĝ.
(3) Do WLS using 1/sqrt[exp(ĝ)] as the weight.
(4) Iterate to gain efficiency.

• Remark: We are using WLS just for efficiency –OLS is still unbiased
and consistent. Sandwich estimator gives us consistent inferences.

Estimation: MLE
• ML estimates all the parameters simultaneously. To construct the
likelihood, we assume a distribution for ε. Under normality (A5):
T T
T 1 1 1
ln L   ln 2 
2 2 i1 
ln i2  
2 i1 i2
( yi  X i)( yi  X i)

• Suppose i2 = exp(α0 + zi,1 α1 + .... + zi,m αm )= exp(zi’α)

• Then, the first derivatives of the log likelihood wrt θ=(β,α) are:
T
 ln L
 
  x i  i / i2  X '  1
i 1
T T T
 ln L 1 1 1
i

2 i 1

1 / i2 exp(zi ' ) zi  ( ) i2 / i4 exp(zi ' ) zi 
2 i 1 2  z (
i 1
i
2
i / i2  1)

• Then, we get the f.o.c. We get a non-linear system of equations.

20
RS – Lecture 12

Estimation: MLE
• We take second derivatives to calculate the information matrix :
T
 ln L2
  '
 
i 1
x i x i ' /  i2  X '  1 X

T
 ln L 1
  i '

2  x z '
i 1
i i i /  i2

T
 ln L 1
 i  i '

2  z z '
i 1
i i
2
i /  i2

• Then,
 ln L  X '  X 0 
1
I ()  E[ ] 1 
'  0 Z'Z
 2 

• We can estimate the model using Newton’s method:


θj+1 = θj – Ht-1 gt gt = ∂log L/∂θ’

Estimation: MLE
• We estimate the model using Newton’s method:
θj+1 = θj – Hj-1 gj gj = ∂log Lj/∂θ’

Since Ht is block diagonal,


βj+1 = βj – (X’ Σj-1 X)-1 X’ Σj-1 εj
αj+1 = αj – (½Z’Z)-1 [½ Σi zi (εi2/i2 – 1)] = αj – (Z’Z)-1 Z’v,
where
v = (εi2/i2 – 1)
Convergence will be achieved when gj = ∂log Lj/∂θ’ is close to zero.

• We have an iterative algorithm  Iterative FGLS = MLE!

21
RS – Lecture 12

Heteroscedasticity: Log Transformations


• A log transformation of the data, can eliminte (or reduce) a certain
type of heteroskedasticity.
- Assume - t = E[Zt]
- Var[Zt] = δ t2 (Variance proportional to the
squared mean)
• We log transformed the data: log(Zt). Then, we use the delta
method to approximate the variance of the transformed variable.
Recall: Var[f(X)] using delta method:
Var[ f ( X )]  f ' ( ) 2 Var[ X ]

• Then, the variance of log(Zt) is roughly constant:


Var [log( Z t )]  (1 /  t ) 2 Var [ Z t ]  

ARCH Models
• Until the early 1980s econometrics had focused almost solely on
modeling the conditional means of series:
yt = E[yt| It] + εt, εt ~ D(0,σ2)
Suppose we have an AR(1) process:
yt = α + β yt-1 + εt.
Then, the conditional mean, conditioning on information set at time
t, It , is:
Et[yt+1| It] = α + β yt

• Recall the distinction between conditional moments and


unconditional ones. The unconditional mean and variance are:
E[yt] = α/(1 - β) = constant
Var[yt] = σ2/(1 - β2) = constant

The conditional mean is time varying; the unconditional mean is not!


.

22
RS – Lecture 12

ARCH Models
• Similar idea for the variance.

Unconditional variance:
Var [yt] = E[(yt –E[yt])2] = σ2/(1-β2)

Conditional variance:
Vart-1[yt] = Et-1[(yt –Et-1[yt])2] = Et-1[εt2]

Remark: Conditional moments are time varying; unconditional


moments are not!

ARCH Models
• The unconditional variance measures the overall uncertainty. In the
AR(1) example, the information available at time t, It , plays no role:
Var[yt] = σ2/(1-β2) .

• The conditional variance, Var[yt|It], is a better measure of


uncertainty at time t. It is a function of information at time t, It.
Conditional
yt
Variance

mean
Variance

t Time

23
RS – Lecture 12

ARCH Models: Stylized Facts of Asset Returns


- Thick tails - Mandelbrot (1963): leptokurtic (thicker than Normal)

- Volatility clustering - Mandelbrot (1963): “large changes tend to be


followed by large changes of either sign.”

- Leverage Effects – Black (1976), Christie (1982): Tendency for changes


in stock prices to be negatively correlated with changes in volatility.

- Non-trading Effects, Weekend Effects – Fama (1965), French and Roll


(1986) : When a market is closed information accumulates at a
different rate to when it is open –for example, the weekend effect,
where stock price volatility on Monday is not three times the volatility
on Friday.

ARCH Models: Stylized Facts of Asset Returns


- Expected events – Cornell (1978), Patell and Wolfson (1979), etc:
Volatility is high at regular times such as news announcements or
other expected events, or even at certain times of day –for example,
less volatile in the early afternoon.

- Volatility and serial correlation – LeBaron (1992): Inverse relationship


between the two.

- Co-movements in volatility – Ramchand and Susmel (1998): Volatility is


positively correlated across markets/assets.

• We need a model that accommodates all these facts.

24
RS – Lecture 12

ARCH Models: Stylized Facts of Asset Returns


• Easy to check leptokurtosis (Stylized Fact #1)
Figure: Descriptive Statistics and Distribution for Monthly S&P500 Returns

Statistic
Mean (%) 0.626332
(p-value: 0.0004)
Standard Dev (%) 4.37721
Skewness -0.43764
Excess Kurtosis 2.29395
Jarque-Bera 145.72
(p-value: <0.0001)
AR(1) 0.0258
(p-value: 0.5249)

ARCH Models: Stylized Facts of Asset Returns

• Easy to check Volatility Clustering (Stylized Fact #2)

25
RS – Lecture 12

ARCH Models: Engle (1982)


• We start with assumptions (A1) to (A5), but with a specific (A3’):
Yt   X t   t  t ~ N (0,  t 2 )
q
 t 2  Var t 1 (  t )  E t 1 (  t2 )     
i 1
2
i t i     ( L ) 2

define  t   t2   t2

 t2     ( L ) t2   t

• This is an AR(q) model for squared innovations. That is, we have an


ARCH model: Auto-Regressive Conditional Heteroskedasticity

This model estimates the unobservable (latent) variance.

Note: We are dealing with a variance, we usually impose


ω>0 and αi >0 for all i.
Robert F. Engle, USA

ARCH Models: Engle (1982)


• The unconditional variance is determined by:
q q
 2  E [ t 2 ]      i E [ t2 i ]      i 2
i 1 i 1


That is,  2
 q
1 
i 1
i

To obtain a positive σ2, we impose another restriction: (1-Σi αi)>0.

• Example: ARCH(1)
Yt   X t   t  t ~ N (0,  t 2 )
 t 2     1  t21
• We need to impose restrictions: α1>0 & 1 - α1>0.

26
RS – Lecture 12

ARCH Models: Engle (1982)


• Even though the errors may be serially uncorrelated they are not
independent: There will be volatility clustering and fat tails. Let’s
define standardized errors:
zt   t /  t
• They have conditional mean zero and a time invariant conditional
variance equal to 1. That is, zt ~ D(0,1). If zt is assumed to follow a
N(0,1), with a finite fourth moment (use Jensen’s inequality). Then:

E( t4 )  E( zt4 ) E( t4 )  E( zt4 ) E( t2 ) 2  E( zt4 ) E( t2 ) 2  3E( t2 ) 2


 ( t )  E( t4 ) / E( t2 ) 2  3.

• For an ARCH(1), the 4th moment for an ARCH(1):


 ( t )  3(1   2 ) /(1  3 2 ) if 3 2  1.

ARCH Models: Engle (1982)


• More convenient, but less intuitive, presentation of the ARCH(1)
model:
𝜀 𝜎 𝜐
where υt is i.i.d. with mean 0, and Var[υt]=1. Since υt is i.i.d., then:
𝐸 𝜀 𝐸 𝜎 𝜐 𝐸 𝜎 𝐸 𝜐 𝜔 𝛼 𝜀

• It turns out that σt2 is a very persistent process. Such a process can
be captured with an ARCH(q), where q is large. This is not efficient.

• In practice, q is often large. A more parsimonious representation is


the Generalized ARCH model or GARCH(q, p):
𝜎 𝜔 ∑ 𝛼𝜀 ∑ 𝛽𝜎
𝜔 𝛼 𝐿 𝜀 𝛽 𝐿 𝜎

27
RS – Lecture 12

GARCH: Bollerslev (1986)


• A more parsimonious representation is the GARCH(q, p):
𝜎 𝜔 ∑ 𝛼𝜀 ∑ 𝛽𝜎

which is an ARMA(max(p,q), p) model for the squared innovations.

• Popular GARCH model: GARCH(1,1):


𝜎 𝜔 𝛼 𝜀 𝛽𝜎
with an unconditional variance: Var[εt2] = σ2 = ω /(1- α1 - β1).

 Restrictions: ω > 0, α1 > 0, β1 > 0; (1- α1 - β1) > 0.

• Technical details: This is covariance stationary if all the roots of


α(L) + β(L) = 1
lie outside the unit circle. For the GARCH(1,1) this amounts to
α1 + β1 < 1.

GARCH: Bollerslev (1986)


• Technical details: This is covariance stationary if all the roots of
α(L) + β(L) = 1
lie outside the unit circle. For the GARCH(1,1) this amounts to
α1 + β1 < 1.

• Bollerslev (1986) showed that if 3α12 + 2α1β1 + β1 2 < 1, the


second and 4th (unconditional) moments of εt exist:

E[ t2 ] 
(1  1  1 )
3 2 (1  1  1 )
E[ t4 ]  if (1  1  211  31 )  0
2 2

(1  1  1 )(1  1  211  31 )


2 2

28
RS – Lecture 12

GARCH-X
• In the GARCH-X model, exogenous variables are added to the
conditional variance equation.

Consider the GARCH(1,1)-X model:


 t2     1 t21   1 t21   f ( X t 1 ,  )

where f(Xt,𝜃) is strictly positive for all t. Usually, Xt, is an observed


economic variable or indicator, say liquidity index, and f(.) is a non-
linear transformation, which is non-negative.

Examples: Glosten et al. (1993) and Engle and Patton (2001) use 3-
mo T-bill rates for modeling stock return volatility. Hagiwara and
Herce (1999) use interest rate differentials between countries to
model FX return volatility. The US congressional budget office uses
inflation in an ARCH(1) model for interest rate spreads.

IGARCH
• Recall the technical detail: The standard GARCH model:
 t2     ( L ) 2   ( L ) 2

is covariance stationary if α(1) + β(1) < 1.

• But strict stationarity does not require such a stringent restriction


(That is, that the unconditional variance does not depend on t).

In the GARCH(1,1) model, if α1 + β1 =1, we have the Integrated


GARCH (IGARCH) model.

• In the IGARCH model, the autoregressive polynomial in the ARMA


representation has a unit root: a shock to the conditional variance is
“persistent.”

29
RS – Lecture 12

IGARCH
• Variance forecasts are generated with: E t [ t2 j ]   t2  j

• That is, today’s variance remains important for future forecasts of all
horizons.

• Nelson (1990) establishes that, as this satisfies the requirement for


strict stationarity, it is a well defined model.

• In practice, it is often found that α1 + β1 are close to 1.

• It is often argued that IGARCH is a product of omitted variables;


For example, structural breaks. See Lamoreux and Lastrapes (1989),
Hamilton and Susmel (1994), & Mikosch and Starica (2004).

• Shepard and Sheppard (2010) argue for a GARCH-X explanation.

GARCH: Variations – GARCH-in-mean


• The time-varying variance affects mean returns:
Mean equation: 𝑦 𝑋𝛾 𝛿𝜎 𝜀, 𝜀 ~ N(0, 𝜎 )
Variance equation: 𝜎 𝜔 𝛼 𝜀 𝛽𝜎

• We have a dynamic mean-variance relations. It describes a specific


form of the risk-return trade-off.

• Finance intuition says that 𝛿 has to be positive and significant.


However, in empirical work, it does not work well: 𝛿 is not significant
or negative.

60

30
RS – Lecture 12

GARCH: Variations – Asymmetric GJR


• GJR-GARCH model – Glosten, Jagannathan & Runkle (JF, 1993):

𝜎 𝜔 𝛼𝜀 𝛾𝜀 ∗𝐼 𝛽𝜎

where 𝐼 =1 if εt-i < 0;


=0 otherwise.

• Using the indicator variable 𝐼 , this model captures sign


(asymmetric) effects in volatility: Negative news (εt-i < 0) increase the
conditional volatility (leverage effect).

• The GARCH(1,1) version:


𝜎 𝜔 𝛼 𝜀 𝛾 𝜀 𝐼 𝛽𝜎
where 𝐼 =1 if εt-i < 0;
=0 otherwise. 61

GARCH: Variations – Asymmetric GJR


• The GARCH(1,1) version:
𝜎 𝜔 𝛼 𝜀 𝛾 𝜀 𝐼 𝛽𝜎

When εt-1 < 0  𝜎 𝜔 𝛼 𝛾 𝜀 𝛽𝜎


εt-1 > 0  𝜎 𝜔 𝛼 𝜀 𝛽𝜎

• This is a very popular variation of the GARCH models. The


leverage effect is significant.

• There is another variation, the Exponential GARCH, or EGARCH,


that also captures the asymmetric effect of negative news on the
conditional variance.

62

31
RS – Lecture 12

GARCH: Variations – EGARCH


• EGARCH model – Nelson (Econometrica, 1991).
It models an exponential function for the time-varying variance:
q p
log( t2 )      i ( zt i   (| zt i |  E | zt i |))    j log( t2 j )
i 1 j 1

where z is a standardized i.i.d. D(0, 1) innovation.

• By design, we have the variance follows an exponential function.


Thus, no non-negative restrictions on the parameters are imposed.

• Negative news (zt-i < 0) increase 𝜎 (leverage effect).

Note: Nelson provides formulas of the unconditional moments,


under the GED. But, under leptokurtic distributions such as the
Student-t the unconditional variance does not exist. (Intuition: we
have an exponential formulation, with a large shock it can explode.)

GARCH: Variations – NARCH


• Non-linear ARCH model NARCH – Higgins and Bera (1992) and
Hentschel (1995).

These models apply the Box-Cox-type transformation to the


conditional variance:

𝜎 𝜔 𝛼 |𝜀 𝜅| 𝛽𝜎

Special case: γ = 2 (standard GARCH model).

Note: The variance depends on both the size and the sign of the
variance which helps to capture leverage type (asymmetric) effects.

64

32
RS – Lecture 12

GARCH: Variations – TARCH


• Threshold ARCH (TARCH) – Rabemananjara & Zakoian (1993)
Large events –i.e., large errors- have a different effect from small
events. We use 2 indicator variables, 𝐼 𝜀 𝜅 &𝐼 𝜀 𝜅 : one
for “large events,” 𝜀 𝜅 , & one for “small events,” 𝜀 𝜅 :

𝜎 𝜔 𝛼 𝐼 𝜀 𝜅 𝛼 𝐼 𝜀 𝜅 𝜀 𝛽𝜎

There are two variances:

𝜎 𝜔 𝛼 𝜀 𝛽𝜎 , if 𝜀 𝜅

𝜎 𝜔 𝛼 𝜀 𝛽𝜎 , if 𝜀 𝜅

• We can modify the model in many ways. For example, we can allow
for the asymmetric effects of negative news. 65

GARCH: Variations – SWARCH


• Switching ARCH (SWARCH) – Hamilton and Susmel (JE, 1994).

Intuition: σt2 depends on the state of the economy –regime. It’s based on
Hamilton’s (1989) time series models with changes of regime:
q
 t2   s , s t t 1
 i 1
i , st , st 1  t21
The key is to select a parsimonious representation:
 2t q
 t2 i
s
  
i 1
i
s
t ti

For a SWARCH(1) with 2 states (1 and 2) we have 4 possible σt2:


 t
2
  1
  1  t2 1  1 /  1 , s t  1, s t  1  1
 t
2
  1
  1 2
t 1  / 2,
1
s t  1, s t  1  2
 t
2
  2
  1 2
t 1  / ,
2 1
s t  2 , s t 1  1
 t
2
  2
  1  t2 1  2 /  2 , s t  2 , s t 1  2

33
RS – Lecture 12

GARCH: Forecasting and Persistence


• Consider the forecast in a GARCH(1,1) model
t21    1 t2  1t2    t2 (1 zt2  1 ) ( t2  t2 zt2 )

Taking expectation at time t


E t [ t21 ]     t2 ( 11   1 )
Then, by repeated substitutions:
j 1
E t [ t2 j ]   [  ( 1   1 ) i ]   t2 ( 1   1 ) j
i0

As j→∞, the forecast reverts to the unconditional variance:


ω/(1 – α1 – β1).

• When α1+β1=1, today’s volatility affect future forecasts forever:


Et [ t2 j ]   t2  j

ARCH Estimation: MLE


• All of these models can be estimated by maximum likelihood. First
we need to construct the sample likelihood.

• Since we are dealing with dependent variables, we use the


conditioning trick to get the joint distribution:
𝑓 𝑦 ,𝑦 ,…,𝑦 ;𝛉 𝑓 𝑦 𝐱 𝟏 ; 𝛉 𝑓 𝑦 𝑦 , 𝐱 , 𝐱 ; 𝛉 𝑓 𝑦 𝑦 , 𝑦 , 𝐱 , 𝐱 , 𝐱 ; 𝛉 ...
. . . 𝑓 𝑦 |𝑦 , … , 𝑦 , 𝐱 ,…,𝐱 ;𝛉 .
Taking logs:
𝐿 log 𝑓 𝑦 , 𝑦 , . . . , 𝑦 ; 𝛉 log 𝑓 𝑦 |𝐱 𝟏 ; 𝛉 log 𝑓 𝑦 |𝑦 , 𝐱 , 𝐱 ; 𝛉
… log 𝑓 𝑦 |𝑦 ,…,𝑦 ,𝐱 ,…,𝐱 ;𝛉

      log 𝑓 𝑦 |𝑌 ,𝑋 ;𝛉

• We maximize this function with respect to the k mean parameters


(γ) and the m variance parameters (ω, α, β). 68

34
RS – Lecture 12

ARCH Estimation: MLE


• Note that the δL/δγ = 0 (k f.o.c.’s) will give us GLS.

• Denote δL/δθ = S(yt,θ) = 0 -S(.) = Score vector.

- We have a (k+m x k+m) system. But, it is a non-linear system. We


will need to use numerical optimization. Gauss-Newton or BHHH
(also approximates H by the product of S(yt,θ)’s) can be easily
implemented.

- Given the AR structure, we will need to make assumptions about σ0


(and ε0,ε1 , ..εp if we assume an AR(p) process for the mean).

- Alternatively, we can take σ0 (and ε0,ε1 , ..εp) as parameters to be


estimated (it can be computationally more intensive and estimation
can lose power.)

ARCH Estimation: MLE


• If the conditional density is well specified and θ0 belongs to Ω, then
T
S t ( yt , 0 )
T 1 / 2 (ˆ   0 )  N ( 0 , A01 ), where A01  T 1 
t 1 
• Under the correct specification assumption, A0=B0, where
T
B 0  T 1  E [ S t ( y t ,  0 ), S t ( y t ,  0 )' ]
t 1

We estimate A0 and B0 by replacing θ0 by its estimated MLE value.

• The estimator B0 has a computational advantage over A0.: Only first


derivatives are needed. But A0 = B0 only if the distribution is
correctly specified. This is very difficult to know in practice.

• Common practice in empirical studies: Assume the necessary


regularity conditions are satisfied.

35
RS – Lecture 12

ARCH Estimation: MLE – ARCH(1)


Example: ARCH(1) model.
Mean equation: 𝑦 𝑿𝜸 𝜀, 𝜀 ~ N(0, 𝜎 )
Variance equation: 𝜎 𝜔 𝛼 𝜀
We write the pdf for the normal distribution,
𝜸
𝑓 𝜀 |𝛾, 𝜔, 𝛼 exp = exp

We form the likelihood L (the joint pdf):


L ∏ exp 2𝜋 / ∏ exp

We take logs to form the log likelihood, 𝐿 = log L:


𝑇 1 1
𝐿 log 𝑓 log 2𝜋 log 𝜎 𝜀 /𝜎
2 2 2

Then, we maximize 𝐿 with respect to θ = (𝛾, 𝜔, 𝛼 ) the function 𝐿. 71

ARCH Estimation: MLE – ARCH(1)


Example (continuation): ARCH(1) model.
𝑇 1 1
𝐿 log 2𝜋 log 𝜔 𝛼 𝜀 𝜀 / 𝜔 𝛼 𝜀
2 2 2
Taking derivatives with respect to θ=(ω, α1, γ), where γ is a vector of k
mean parameters:
𝜕𝐿 1 1 1
𝜀 / 𝜔 𝛼 𝜀
𝜕𝜔 2 𝜔 𝛼 𝜀 2
𝜕𝐿 1 1
𝜀 / 𝜔 𝛼 𝜀 𝜀 𝜀 / 𝜔 𝛼 𝜀
𝜕𝛼 2 2

∑ 𝐱 𝜀 /𝜎 (kx1 vector of derivatives)


𝛄

36
RS – Lecture 12

ARCH Estimation: MLE


• Then, we set the f.o.c.  δL/δθ = 0.

• We have a (k+2) system. It is a non-linear system. The system is


solved using numerical optimization (usually, Newton-Raphson).

• In R, the function optim does numerical optimization.

• Take the last f.o.c., the kx1 vector, 0:


𝛄
∑ 𝑿′ 𝜀 /𝜎 , =∑ 𝑿′ 𝑦 𝑿𝜸 /𝜎 , =0
𝛄
𝑿 𝑿
∑ 𝜸 =0
, , ,

• The last equation shows that MLE is GLS for the mean parameters,
𝜸: each observation is weighted by the inverse of 𝜎 , .

ARCH Estimation: MLE


• In general, we have a (k+m x k+m) system; k mean parameters and
m variance parameters. But, it is a non-linear system. We use
numerical optimization.

- Given the AR structure, we will need to make assumptions about σ0


(and ε0, ε1 , ..., εp if we assume an AR(p) process for the mean).

- Alternatively, we can take σ0 (and ε0, ε1 , ..., εp) as parameters to be


estimated (it can be computationally more intensive and estimation
can lose power.)

37
RS – Lecture 12

ARCH Estimation: MLE – Example (in R)


• Log likelihood of AR(1)-GARCH(1,1) Model:
log_lik_garch11 <- function(theta, data) {
mu <- theta[1]; rho1 <- theta[2]; alpha0 <- abs(theta[3]); alpha1 <- abs(theta[4]); beta1 <-
abs(theta[5]);
chk0 <- (1 - alpha1 - beta1)
r <- ts(data)
n <- length(r)

u <- vector(length=n); u <- ts(u)


for (t in 2:n)
{u[t] = r[t]- mu – rho*r[t-1]} # this setup allows for ARMA in mean

h <- vector(length=n); h <- ts(h)


h[1] = alpha0/chk0 # set initial value for h[t] series
if (chk0==0) {h[1]=.000001} # check to avoid dividing by 0
for (t in 2:n)
{h[t] = abs(alpha0 + alpha1*(u[t-1]^2) + beta1*h[t-1])
if (h[t]==0) {h[t]=.00001} } #check to avoid log(0)

return(-sum(- 0.5*log(abs(h[2:n])) - 0.5*(u[2:n]^2)/abs(h[2:n]))) # ignore constants


}

ARCH Estimation: MLE – Example (in R)


Example 1: GARCH(1,1) model for changes in CHF/USD. We will
use R function optim (mln can also be used) to maximize the function.
PPP_da <- read.csv("http://www.bauer.uh.edu/rsusmel/4397/ppp_2020_m.csv",head=TRUE,sep=",")
x_chf <- PPP_da$CHF_USD # CHF/USD 1971-2020 monthly data
T <- length(x_chf)
z <- log(x_chf[-1]/x_chf[-T])
theta0 = c(-0.002, 0.026, 0.001, 0.19, 0.71) # initial values
ml_2 <- optim(theta0, log_lik_garch11, data=z, method="BFGS", hessian=TRUE)

logL_g11 <- log_lik_garch11(ml_2$par, z) # value of log likelihood


logL_g11

ml_2$par # estimated parameters

I_Var_m2 <- ml_2$hessian


eigen(I_Var_m2) # check if Hessian is pd.
sqrt(diag(solve(I_Var_m2))) # parameters SE
chf_usd <- ts(z, frequency=12, start=c(1971,1)) 76
plot.ts(chf_usd) # time series plot of data

38
RS – Lecture 12

ARCH Estimation: MLE – Example (in R)


Example 1 (continuation):
> logL_g11 # Log likelihood value
[1] –1745.197

> ml_2$par # Extract from ml_2 function parameters


[1] -0.0021051742 0.0260003610 0.00012375 0.1900276519 0.7100718082
> I_Var_m2 <- ml_2$hessian # Extract Hessian (matrix of 2nd derivatives)

> eigen(I_Var_m2) # Check if Hessian is pd to invert.


eigen() decomposition
$values # Eigenvalues: if positives => Hessian is pd
[1] 1.687400e+08 6.954454e+05 7.200084e+03 5.120984e+02 2.537958e+02

$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] 4.265907e-05 9.999960e-01 -0.0011397586 0.0018331957 -0.0018541203
[2,] -3.333961e-06 -2.188159e-03 -0.0010048203 0.9769058449 -0.2136566699
[3,] 9.999998e-01 -4.223001e-05 -0.0003544245 0.0001291633 0.0005770707
[4,] -3.599974e-06 -1.702277e-03 -0.8603563865 -0.1097470278 -0.4977344477 77
[5,] -6.893837e-04 6.416141e-04 -0.5096905472 0.1833226197 0.8405994743

ARCH Estimation: MLE – Example (in R)


Example 1 (continuation):
> sqrt(diag(solve(I_Var_m2))) # Invert Hessian: Parameters Var on diag
[1] 1.203690e-03 4.419049e-02 7.749756e-05 5.014454e-02 3.955411e-02

> t_stats <- ml_2$par/sqrt(diag(solve(I_Var_m2)))


> t_stats
[1] -1.7489333 0.5883701 1.5967743 3.7895984 17.9519078

78

39
RS – Lecture 12

ARCH Estimation: MLE – Example (in R)


Example 1 (continuation): Summary for CHF/USD changes
ef,t = [log(St) - log(St-1)] = a0 + a1 ef,t-1 + 𝜀 , 𝜀 It-1 ~ N(0, 𝜎 )
𝜎 𝝎 𝜶𝟏 𝜀 𝜷𝟏 𝜎

• T: 562 (January 1971 - July 2020, monthly).

The estimated model for ef,t is given by:


ef,t = -0.00211 + 0.02600 ef,t-1,
(.0012) (0.044)
𝜎 = 0.00012 + 0.19003 𝜀 + 0.71007 𝜎 .
(0.00096)* (0.050)* (0.040)*
2
Unconditional σ = 0.00012 /(1- 0.19003 - 0.71007) = 0.001201201
Log likelihood: 1745.197

Note: α1 + ß1 = .90 < 1. (Persistent.) 79

ARCH Estimation: MLE – Example (in R)

80

40
RS – Lecture 12

ARCH Estimation: MLE – Example (in R)


Example 2: Using Robert Shiller’s monthly data set for the S&P 500
(1871:Jan - 2020:Aug, T=1,795), we estimate an AR(1)-GARCH(1,1)
model:
rt = [log(Pt) - log(Pt-1)] = a0 + a1 rt-1 + 𝜀 , 𝜀 It-1 ~ N(0, 𝜎 )
𝜎 𝝎 𝜶𝟏 𝜀 𝜷𝟏 𝜎

The estimated model for st is given by:


rt = 0.338 + 0.278 rt-1,
(.08)* (0.025)*
𝜎 = 0.756 + 0.126 𝜀 + 0.826 𝜎 .
(0.151)* (0.017)* (0.021)*
2
Unconditional σ = 0.756 /(1 - 0.126 - 0.826) = 15.4630
Log likelihood: 4795.08

Note: α1 + ß1 = .952 < 1. (Very persistent.) 81

ARCH Estimation: MLE – Example (in R)


Example 2: Below, we plot the time-varying variance. Certain events
are clearly different, for example, the 1930 great depression, with a
peak variance of 282 (18 times unconditional variance!). The covid-19
volatility similar to the 2008-2009 financial crisis recession:

41
RS – Lecture 12

ARCH Estimation: MLE – Regularity Conditions

Note: The appeal of MLE is the optimal properties of the resulting


estimators under ideal conditions.

• Crowder (1976) gives one set of sufficient regularity conditions for


the MLE in models with dependent observations to be consistent
and asymptotically normally distributed.

• Verifying these regularity conditions is very difficult for general


ARCH models - proof for special cases like GARCH(1,1) exists.

Example: For the GARCH(1,1) model: if E[ln(α1zt2 +β1)] < 0, the


model is strictly stationary and ergodic. See Nelson (1990) &
Lumsdaine (1996).

ARCH Estimation: MLE – Regularity Conditions


• Block-diagonality
In many applications of ARCH, the parameters can be partitioned
into mean parameters, θ1, and variance parameters, θ2.

Then, δμt(θ)/δθ2 = 0 and, although, δσt(θ)/δθ1≠0, the Information


matrix is block-diagonal (under general symmetric distributions for zt
and for particular ARCH specifications).

Not a bad result:


- Regression can be consistently done with OLS.
- Asymptotically efficient estimates for the ARCH parameters can be
obtained on the basis of the OLS residuals.

42
RS – Lecture 12

ARCH Estimation: MLE – Remarks


• But, block diagonality cannot buy everything:
- Conventional OLS standard errors could be terrible.

- When testing for serial correlation, in the presence of ARCH, the


conventional Bartlett s.e. – T-1/2– could seriously underestimate the
true standard errors.

ARCH Estimation: QMLE


• The assumption of conditional normality is difficult to justify in
many empirical applications. But, it is convenient.

• The MLE based on the normal density may be given a quasi-


maximum likelihood (QMLE) interpretation.

• If the conditional mean and variance functions are correctly


specified, the normal quasi-score evaluated at θ0 has a martingale
difference property:
E{δL/δθ = S(yt,θ0)} = 0

Since this equation holds for any value of the true parameters, the
QMLE, say θQMLE is Fisher-consistent –i.e., E[S(yT, yT-1, …, y1 ; θ)] = 0
for any θ ∈ Ω.

43
RS – Lecture 12

ARCH Estimation: QMLE


• The asymptotic distribution for the QMLE takes the form:
T 1 / 2 (ˆQMLE   0 )  N ( 0 , A0 1 B 0 A0 1 ).

The covariance matrix (A0-1 B0 A0-1) is called “robust.” Robust to


departures from “normality.”

• Bollerslev and Wooldridge (1992) study the finite sample distribution


of the QMLE and the Wald statistics based on the robust covariance
matrix estimator:

For symmetric departures from conditional normality, the QMLE is


generally close to the exact MLE.

For non-symmetric conditional distributions both the asymptotic and


the finite sample loss in efficiency may be large.

ARCH Estimation: Non-Normality


• The basic GARCH model allows a certain amount of leptokurtosis.
It is often insufficient to explain real world data.

Solution: Assume a distribution other than the normal which help to


allow for the fat tails in the distribution.
• t Distribution - Bollerslev (1987)
The t distribution has a degrees of freedom parameter which allows
greater kurtosis. The t likelihood function is
lt  ln((0.5(v  1))(0.5v) 1 (v  2) 1 / 2 (1  zt (v  2) 1 ) ( v1) / 2 )  0.5 ln( 2t )

where Γ is the gamma function and v is the degrees of freedom.


As υ→∞, this tends to the normal distribution.
• GED Distribution - Nelson (1991)

44
RS – Lecture 12

ARCH Estimation: GMM


• Suppose we have an ARCH(q). We need moment conditions:
(1)  E [ m1 ]  E [ x t ' ( y t  x t γ )]  0
( 2 )  E [ m 2 ]  E [ε t2 ' ( t2   t2 )]  0
(3)  E [ m 3 ]  E [ t2   /(1   1  ...   q )]  0

Note: (1) refers to the conditional mean, (2) refers to the conditional
variance, and (3) to the unconditional mean.

• GMM objective function:


^ ^
Q ( X, y; θ )  E [ m ( θ ; X, y )]' W E [ m ( θ ; X, y )]
where
^ ^ ^ ^
E [ m ( θ ; X, y )]  [ E [ m 1 ]' E [ m 2 ]' E [ m 3 ]' ]'

ARCH Estimation: GMM


• γ has k free parameters; α has q free parameters. Then, we have r =
k + q + 1 parameters. Note that:
m(θ; X,y) has r = k + q + 1 equations.

Dimensions: Q is 1x1; E[m(θ; X,y)] is rx1; W is rxr.

• Problem is over-identified: more equations than parameters so cannot


solve E[m(θ; X,y)]=0, exactly.

• Choose a weighting matrix W for objective function and minimize


using numerical optimization.

• Optimal weighting matrix: W = {E[m(θ; X,y)] E[m(θ; X,y)]’}-1.


Var(θ) = (1/T)[DW-1D’]-1,
where D = δE[m(θ;X,y)]/δθ’ –expressions evaluated at θGMM.

45
RS – Lecture 12

ARCH Estimation: Testing


• Standard BP test , with auxiliary regression given by:
et2 = α0 + α1 et-12 + .... + αm et-q2 + vt

H0: α1= α2= ... = αq= 0 (No ARCH). It is not possible to do


GARCH test, since we are using the same lagged squared residuals.

Then, the LM test is (T-q)* R2 d


 χ2q – Engle’s (1982).

• In ARCH Models, testing as usual: LR, Wald, and LM tests.

Reliable inference from the LM, Wald and LR test statistics


generally does require moderately large sample sizes of at least two
hundred or more observations.

ARCH Estimation: Testing


• Issues:
- Non-negative constraints must be imposed. θ0 is often on the
boundary of Ω. (Two sided tests may be conservative.)
- Lack of identification of certain parameters under H0 creates a
singularity of the Information matrix under H0. For example, under
H0: α1=0 (No ARCH), in the GARCH(1,1), ω and β1 are not jointly
identified. See Davies (1977).

• Ignoring ARCH
- You suspect yt has an AR structure: yt = γ0 + γ1 yt-1 + εt
Hamilton (2008) finds that OLS t-test with no correction for ARCH
spuriously reject H0: γ1= 0 with arbitrarily high probability for
sufficiently large T. White’s (1980) SE help. NW SE help less.

46
RS – Lecture 12

ARCH Estimation: Testing


Figure. From Hamilton (2008). Fraction of samples in which OLS t-
test leads to rejection of H0: γ1=0 as a function of T for regression
with Gaussian errors (solid line) and Student’s t errors (dashed line).
Note: H0 is actually true and test has nominal size of 5%.

Testing for Heteroscedasticity: ARCH


• ARCH Test for the 3 factor F-F model for IBM returns (T=320),
with one lag:
IBMRet – rf = 0 + 1 (MktRet – rf) + 2 SMB + 4 HML + 

> b <- solve(t(x)%*% x)%*% t(x)%*%y #OLS regression


> e <- y - x%*%b
> e2 <- e^2
> xx1 <- e2[1:T-1]
> fit2 <- lm(e2[2:T]~xx1)
> r2_e2 <- summary(fit2)$r.squared
> r2_e2
[1] 0.2656472
> lm_t <- (T-1)*r2_e2
> lm_t
[1] 84.74147

LM-ARCH Test: 84.74  reject H0 at 5% level (χ2[1],05≈3.84), the


usual result for financial time series.

47
RS – Lecture 12

GARCH: Forecasting and Persistence (Again)


• Consider the forecast in a GARCH(1,1) model:
𝜎 𝜔 𝛼 𝜀 𝛽𝜎 𝜔 𝜎 𝛼 𝑧 𝛽 𝜀 𝜎 𝑧

Taking expectation at time t


𝐸 𝜎 𝜔 𝜎 𝛼 1 𝛽
Then, by repeated substitutions:
𝐸 𝜎 𝜔∑ 𝛼 𝛽 𝜎 𝛼 𝛽
As j → ∞, the forecast reverts to the unconditional variance:
ω/(1 – α1 – β1).

• When α1 + β1 = 1, today’s volatility affect future forecasts forever:


𝐸 𝜎 𝜎 𝑗𝜔

GARCH: Forecasting and Persistence


Example 1: We want to forecast next month (September 2020)
variance for CHF/USD changes. Recall we estimated 𝜎 :
𝜎 = 0.00012 + 0.19003 𝜀 + 0.71007 𝜎 .
getting 𝜎 : = 0.003672220 (=𝜎 : = sqrt(0.00367) = 6.1%)
We based the 𝜎 : forecast on:
𝐸 𝜎 𝜔 ∗ ∑ 𝛼 𝛽 𝜎 𝛼 𝛽

Then, 𝛼 𝛽 = 0.190 + 0.710 = 0.900

𝐸 : 𝜎 : 0.00012 0.00367 ∗ 0.9) = 0.003423


We also forecast 𝜎 :

𝐸 : 𝜎 : 0.00012 ∗ {1+ (0.9)+ (0.9)2} + 0.00367 ∗ (0.9)3


= 0.00300063 96

48
RS – Lecture 12

GARCH: Forecasting and Persistence


Example 1 (continuation):
We forecast volatility for March 2021:
𝐸 : 𝜎 : 0.00012 ∗ {1 + (0.9) + (0.9)2+ … + (0.9)5} +
+ 0.00367 ∗ (0.9)6 = 0.002512659

Remark: We observe that as the forecast horizon increases (j → ∞),


the forecast reverts to the unconditional variance:
𝜔/(1 – α1 – β1) = 0.00012/(1 − 0.9) = 0.0012
 𝜎 = sqrt(0.0012) = 0.0346 (3.46% ≈ close to sample
SD = 3.36%)

97

GARCH: Forecasting and Persistence


Example 2: On August 2020, we forecast the December’s variance
for the S&P500 changes. Recall we estimated 𝜎 :
𝜎 = 0.756 + 0.125 𝜀 + 0.826 𝜎 .
getting 𝜎 : = 43.037841

We based the 𝜎 : forecast on:


𝐸 𝜎 𝜔∗ ∑ 𝛼 𝛽 𝜎 𝛼 𝛽

Then, since 𝜶𝟏 𝜷𝟏 = 0.952


𝐸 : 𝜎 : 0.756 ∗ {1+ (0.952) + (0.952)2 + (0.952)3} +

+ 43.037841 ∗ (0.952)4 = 38.02797


Lower variance forecasted for the end of the year, but still far from
the unconditional variance of 15.4.
98

49
RS – Lecture 12

ARCH: Which Model to Use


• Questions
1) Lots of ARCH models. Which one to use?
2) Choice of p and q. How many lags to use?

• Hansen and Lunde (2004) compared lots of ARCH models:


- It turns out that the GARCH(1, 1) is a great starting model.
- Add a leverage effect for financial series and it’s even better.
- A t-distribution is also a good addition.

RV Models: Intuition
• The idea of realized volatility is to estimate the latent (unobserved)
variance using the realized data, without any modeling. Recall the
definition of sample variance:
𝑻
1
𝒔𝟐 𝒙𝒊 𝒙 𝟐
𝑇 1
• Suppose we want to calculate the daily variance for stock returns. We
know how to compute it: we use daily information, for T days, and
apply the above definition.

• Alternatively, we use hourly data for the whole day (with k hours).
Since hourly returns are very small, ignoring 𝒙 seems OK. We use 𝑟 ,𝒊
as the ith hourly variance on day t. Then, we add 𝑟 ,𝒊 over the day:

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑟 ,𝒊
10
0

50
RS – Lecture 12

RV Models: Intuition
• In more general terms, we use higher frequency data to estimate a
lower frequency variance:

𝑅𝑉 𝑟 ,𝒊

where rt,i is the realized returns in (higher frequency) interval i of the


(lower frequency) period t. We estimate the t-frequency variance, using k
i-intervals. If we have daily returns and we want to estimate the monthly
variance, then, k is equal to the number of days in a month.

• It can be shown that as the interval i becomes smaller (i → 0),


𝑅𝑉 → Return Variation [t – 1, t].

That is, with an increasing number of observations we get an accurate


measure of the latent variance.
10
1

RV Models: High Frequency


• Note that RV is a model-free measure of variation –i.e., no need for
ARCH-family specifications. The measure is called realized variance (RV).
The square root of the realized variance is the realized volatility (RVol,
RealVol):
𝑅𝑉𝑜𝑙 𝑠𝑞𝑟𝑡 𝑅𝑉

• Given the previous theoretical result, RV is commonly used with


intra-daily data, called high frequency (HF) data.

• It lead to a revolution in the field of volatility, creating new models


and new ways of thinking about volatility and how to model it.

• We usually associate realized volatility with an observable proxy of the


unobserved volatility.
10
2

51
RS – Lecture 12

RV Models: High Frequency – Tick Data


• The theory behind realized variation measures dictates that the
sampling frequency, or k in the RVt formula above, goes to ∞. Then,
use highest frequency available, say millisecond to millisecond returns.

• Intra-daily data applications are the most common. But, when using
intra-daily data, RV calculations are affected by microstructure effects: bid-
ask bounce, infrequent trading, calendar effects, etc. rt,i does not look
uncorrelated.

Example: The bid-ask bounce induces serial correlation in intra-day


returns, which biases RVt.

• As the sampling frequency increases, the “noise” (microstructure


effects) becomes more dominant and swallows the “signal” (true
volatility). 10
3

RV Models: High Frequency – Tick Data


• In practice, sampling a typical stock price every few seconds can
overestimate the true volatility by a factor of two or more.

• The usual solutions:


(1) Filter data using an ARMA model to get rid of the autocorrelations
and/or dummy variables to get rid of calendar effects.

Then, used the filtered data to compute RVt.

(2) Sample at frequencies where the impact of microstructure effects is


minimized and/or eliminated.

We follow solution (2).


10
4

52
RS – Lecture 12

RV Models: High Frequency – Practice


• In intra-daily RV estimation, it is common to use 10’ intervals. They
have good properties. However, there are estimations with 1’ intervals.

• Some studies suggest using an optimal frequency, where optimal


frequency is the one that minimizes the MSE.

• Hansen and Lunde (2006) find that for very liquid assets, such as the
S&P 500 index, a 5’ sampling frequency provides a reasonable choice.
Thus, to calculate daily RV, we need to add 78 five-minute intervals.

10
5

RV Models: High Frequency – TAQ


Example: Based on TAQ (Trade and Quote) NYSE data, we use 5’
realized returns to calculate 30’ variances –i.e., we use six 5’ intervals.
Then, the 30’ variance, or RVt=30-min, is:
𝑅𝑉 ∑ 𝑟, , 𝑡 1,2, . . . . , 𝑇=15
rt,j is the 5’ return during the jth interval on the half hour t. Then, we
calculate 30’ variances for the whole day –i.e., we calculate 13
variances, since the trading day goes from 9:30 AM to 4:00 PM.

The Realized Volatility, RVol, is:


𝑅𝑉𝑜𝑙 𝑅𝑉

10
6

53
RS – Lecture 12

RV Models: High Frequency – TAQ


Example: Below, we show the first transaction of the SPY TAQ
(Trade and Quote) data (tick-by-tick trade data) on January 2, 2014.
SYMBOL DATE TIME PRICE SIZE
SPY 20140102 9:30:00 183.98 500
SPY 20140102 9:30:00 183.98 500
SPY 20140102 9:30:00 183.98 200
SPY 20140102 9:30:00 183.98 500
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.98 800
SPY 20140102 9:30:00 183.98 100
SPY 20140102 9:30:00 183.98 100
SPY 20140102 9:30:00 183.97 200
SPY 20140102 9:30:00 183.98 100
SPY 20140102 9:30:00 183.97 200
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.97 100
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.98 2600
SPY 20140102 9:30:00 183.98 1000
SPY 20140102 9:30:00 183.97 400 10
7

RV Models: High Frequency – TAQ


Example: Below, we show the first transaction of the AAPL TAQ
(Trade and Quote) data (tick-by-tick quote data) on January 2, 2014: 4 AM
SYMBOL DATE TIME BID OFR BIDSIZ OFRSIZ MODE EX
AAPL 20140102 4:00:00 455.39 0 1 0 12 T
AAPL 20140102 4:00:00 553.5 558 2 2 12 P
AAPL 20140102 4:00:01 455.39 561.02 1 2 12 T
AAPL 20140102 4:00:45 552.1 558 1 2 12 P
AAPL 20140102 4:00:51 552.1 558.4 1 2 12 P
AAPL 20140102 4:00:51 552.1 558.8 1 2 12 P
AAPL 20140102 4:00:51 552.1 559 1 1 12 P
AAPL 20140102 4:01:14 553 559 1 1 12 P
AAPL 20140102 4:01:30 553.01 561.02 1 2 12 T
AAPL 20140102 4:01:43 553.01 559 1 1 12 T
AAPL 20140102 4:01:44 553.05 559 1 1 12 P
AAPL 20140102 4:01:49 455.39 559 1 1 12 T
AAPL 20140102 4:01:49 553.61 559 1 1 12 T
AAPL 20140102 4:02:02 553.05 559 1 2 12 P
AAPL 20140102 4:02:04 455.39 559 1 1 12 T
AAPL 20140102 4:02:04 548.28 559 1 1 12 T
AAPL 20140102 4:02:33 553.05 558.83 1 2 12 P
AAPL 20140102 4:02:33 555.17 558.83 2 2 12 P
AAPL 20140102 4:03:50 555.2 558.83 5 2 12 P
10
8

54
RS – Lecture 12

RV Models: High Frequency – TAQ


Example (continuation): We read SPY trade data for 2014:Jan.
> HF_da <- read.csv("c:/Financial Econometrics/SPY_2014.csv", head=TRUE, sep=",")
> summary(HF_da)
SYMBOL DATE TIME PRICE SIZE G127
SPY:6800865 Min. :20140102 9:30:00 : 21436 Min. :176.6 Min. : 1 Min. :0
1st Qu.:20140110 16:00:00: 11352 1st Qu.:178.9 1st Qu.: 100 1st Qu.:0
Median :20140121 9:30:01 : 5922 Median :182.6 Median : 100 Median :0
Mean :20140119 15:59:59: 4090 Mean :181.4 Mean : 337 Mean :0
3rd Qu.:20140128 15:59:55: 3198 3rd Qu.:183.5 3rd Qu.: 300 3rd Qu.:0
Max. :20140131 15:50:00: 2916 Max. :189.2 Max. :4715350 Max. :0
(Other) :6751951
CORR COND EX
Min. :0.0e+00 @ :3351783 T :1649158
1st Qu.:0.0e+00 F :2888182 P :1335135
Median :0.0e+00 : 524409 Z :1182126
Mean :1.9e-04 O : 18057 D :1062382
3rd Qu.:0.0e+00 4 : 9098 K : 437900
Max. :1.2e+01 6 : 8142 J : 356539
(Other): 1194 (Other): 777625 10
9

RV Models: High Frequency – TAQ


Example (continuation): Using the SPY trade data, we calculate
using 5’-returns a daily realized volatilitiy for the first 4 days in 2014
(2014:01:02 - 2014:01:07). Originally, we have T = 1,048,570.

HF_da <- read.csv("http://www.bauer.uh.edu//rsusmel//4397//SPY_2014.csv",


head=TRUE, sep=",")
summary(HF_da)
pt <- as.POSIXct(paste(HF_da$DATE, HF_da$TIME), format="%Y%m%d %H:%M:%S")
library(xts)
hf_1 <- xts(x=HF_da, order.by = pt) # Define a specific time series data set
# pt pastes together DATE and Time.
spy_p <- as.numeric(hf_1$PRICE) # Read price data as numeric

T <- length(spy_5_p)
spy_ret <- log(spy_p[-1]/spy_p[-T])
plot(spy_ret, type="l", ylab="Return", main="Tick by Tick Return (2014:01:02 - 2014:01:07)")
mean(spy_ret)
sd(spy_ret) 11
0

55
RS – Lecture 12

RV Models: High Frequency – TAQ


Example (continuation): We plot the tick-by-tick data.

Very noisy data, with lots of “jumps”:


Mean tick by tick return: -3.7365e-09
Tick-by-tick SD: 6.3163e-05
111

RV Models: High Frequency – TAQ


Example (continuation): For the whole month of January 2020:

> mean(spy_ret)
[1] -4.796933e-09
> sd(spy_ret)
[1] 7.804991e-05
11
2

56
RS – Lecture 12

RV Models: High Frequency – TAQ


Example (continuation): We plot the autocorrelogram for the TAQ
SPY data:

> acf_spy_raw

Autocorrelations of series ‘spy_ret’, by lag

0 1 2 3 4 5 6 7 8 9 10
1.000 -0.469 -0.013 -0.010 0.014 -0.008 0.000 -0.002 -0.001 0.000 0.000

Note: We have only a significant autocorrelation, the 1st-order


autocorrelation: -0.459. 11
3

RV Models: High Frequency – TAQ


Example (continuation): We aggregate the tick-by-tick data in 5’
intervals using the function aggregateTrades in the R package
highfrequency. It needs as an input an xts object (hf_1, for us).

library(highfrequency)
spy_5 <- aggregateTrades(
hf_1,
on = "minutes", # you can use also seconds, days, weeks, etc.
k = 5, # number of units in for “on”
marketOpen = "09:30:00",
marketClose = "16:00:00",
tz = "GMT"
)

spy_5_p <- as.numeric(spy_5$PRICE)


T <- length(spy_5_p)
spy_5_ret <- log(spy_5_p[-1]/spy_5_p[-T])
plot(spy_5_ret, type="l", ylab="Return", main="5-minute Return (2014:01:02 - 2014:01:07)")11
4

57
RS – Lecture 12

RV Models: High Frequency – TAQ


Example (continuation): We plot the 5-minute return data.
Smoother, easier to read.

RVolt=2014:01:02 = 0.0053344
RVolt=2014:01:03 = 0.0043888
RVolt=2014:01:04 = 0.0059836
RVolt=2014:01:05 = 0.0052772 11
5

RV Models: High Frequency – TAQ


Example (continuation): We plot the autocorrelogram for the 5’
TAQ SPY data:

> acf_spy_5 <- acf(spy_5_ret, main = "5-minute SPY Data: January 2014")
> acf_spy_5
Autocorrelations of series ‘spy_ret’, by lag

0 1 2 3 4 5 6 7 8 9 10
1.000 -0.105 -0.024 -0.104 0.018 0.147 0.016 -0.024 -0.088 0.048 0.037

Note: We have a negative 1st-order autocorrelation: -0.105, thought not


significant. However, the autocorrelation of order 5 is significant. 11
6

58
RS – Lecture 12

RV Models: High Frequency – TAQ


Example (continuation): We plot the 10-minute return data.
Smoothing increases.

RVolt=2014:01:02 = 0.005478294
RVolt=2014:01:03 = 0.004256046
RVolt=2014:01:04 = 0.006190508
RVolt=2014:01:05 = 0.005145601 11
7

RV Models: High Frequency – TAQ


Example (continuation): We plot the autocorrelogram for the 10’
TAQ SPY data:

Note: Now, none of the autocorrelations is significant. The 10-minute


returns look independent.

11
8

59
RS – Lecture 12

RV Models: R Script
Example: R script to compute realized volatility
MSCI_da <- read.csv("http://www.bauer.uh.edu/rsusmel/4397/MSCI_daily.csv", head=TRUE, sep=",")
x_us <- MSCI_da$USAT <- length(x_us)
us_r <- log(x_us[-1]/x_us[-T])

x <- us_r # US log returns from MSCI USA Index


T <- length(x)
rvs=NULL # create vector to fill with RV
i <- 1
k <- 21 # k: observations per period
while (i < T-k) {
s2 <- sum(x[i:(i+k)]^2) # realized variance
i <- k + i
rvs <- rbind(rvs,s2)
}
rvol <- sqrt(rvs) # realized volatility
mean(rvol) # mean
sd(rvol) # variance

RV Models: Monthly RV From Daily Data


Example: Using daily data we calculate 1-mo Realized Volatility
(k=21 days) for log returns for the MSCI (1970: Jan – 2020: Oct).

> mean(rvol) # average monthly Rvol in the sample


[1] 0.04326531  very close to monthly S&P Volatility: 4.49%
> sd(rvol) # standard deviation of monthly Rvol in the sample
[1] 0.02592653  dividing by sqrt(T) we get the SE = 0.001 (very small)

60
RS – Lecture 12

RV Models: Log Rules


• The log approximations rules for the variance and SD are used to
change frequencies for the RV and RVol. For example, suppose we are
calculating RV based on frequency j, RVt=j. Suppose we are interested
in the J-period RVt=J, then, the annual variance can be calculated as
𝑅𝑉 𝐽 ∗ 𝑅𝑉

The RVolt=j is the square root of RVt=j.

RV Models: Log Rules


Example: We calculated using 10’ data the daily realized variance,
RVt=daily. Then, the annual variance can be calculated as
𝑅𝑉 260 ∗ 𝑅𝑉
where 260 is the number of trading days in the year. The annualized
RVOL is the squared root of 𝑅𝑉 :
𝑅𝑉𝑂𝐿 𝑠𝑞𝑟𝑡 260 ∗ 𝑅𝑉𝑂𝐿

We can use time series models –say, an ARIMA model- for RVt to
forecast daily volatility.

61
RS – Lecture 12

RV Models: Quarterly RV From Daily Data


Example: Using daily data we calculate 3-mo Realized Volatility
(k=66 days) for log returns for the MSCI (1970: March – 2020: Oct).

> mean(rvol) # average monthly Rvol in the sample


[1] 0.07725361  log approximation: sqrt(3) * 0.04326 = 0.07493 (close!)
> sd(rvol) # standard deviation of monthly Rvol in the sample
[1] 0.02592653

RV Models: Properties
• Under some conditions (bounded kurtosis and autocorrelation of
squared returns less than 1), RVt is consistent and m.s. convergent.
• Realized volatility is a measure. It has a distribution.
• For returns, the distribution of RV is non-normal (as expected). It
tends to be skewed right and leptokurtic. For log returns, the
distribution is approximately normal.
• Daily returns standardized by RV measures are nearly Gaussian.
• RV is highly persistent.
• The key problem is the choice of sampling frequency (or number of
observations per day).

62
RS – Lecture 12

Realized Volatility (RV) Models - Properties


• The key problem is the choice of sampling frequency (or number of
observations per day).

— Bandi and Russell (2003) propose a data-based method for


choosing frequency that minimizes the MSE of the measurement
error.
— Simulations and empirical examples suggest optimal sampling is
around 1-3 minutes for equity returns.

RV Models - Variation
• Another method: AR model for volatility:
|  t |    |  t 1 |  t

The εt are estimated from a first step procedure -i.e., a regression.


Asymmetric/Leverage effects can also be introduced.

OLS estimation possible. Make sure that the variance estimates are
positive.

63
RS – Lecture 12

Other Models - Parkinson’s (1980) estimator


• The Parkinson’s (1980) estimator:
s2t = {Σt [ln(Ht) – ln(Lt)]2 /(4ln(2)T)},

where Ht is the highest price and Lt is the lowest price.

• There is an RV counterpart, using HF data: Realized Range (RR):


RRt = {Σj [100 * (ln(Ht,j) – ln(Lt,j))]2 /(4ln(2)},

where Ht,j and Lt,j are the highest and lowest price in the jth interval.

• These “range” estimators are very good and very efficient.

Reference: Christensen and Podolskij (2005).

Stochastic volatility (SV/SVOL) models


• Now, instead of a known volatility at time t, like ARCH models, we
allow for a stochastic shock to σt, υt:
 t     t 1 t ; t ~ N (0,2 )
Or using logs:
log  t     log  t 1  t ; t ~ N (0,  2 )
• The difference with ARCH models: The shocks that govern the
volatility are not necessarily εt’s.

• Usually, the standard model centers log volatility around ω:


log  t     (log  t 1   )  t

Then,
E[log(σt)] = ω
Var[log(σt)] = κ2 συ2/(1 – β2).
 Unconditional distribution: log(σt) ~ N(ω, κ2)

64
RS – Lecture 12

Stochastic volatility (SV/SVOL) models


• Like ARCH models, SV models produce returns with kurtosis > 3
(and, also, positive autocorrelations between squared excess returns):
Var[rt = E[(rt – E[rt])2] = E[σt2zt2] = E[σt2] E[zt2]
= E[σt2] = exp(2ω + 2 κ2 ) (property of log normal)

kurt[rt = E[(rt - E[rt])4] / {(E[(rt - E[rt])2] )2 }


= E[σt4] E[zt4] / {(E[σt2] )2 (E[zt2])2 }
= 3 exp(4ω + 8κ2 ) / exp(4ω + 4κ2 ) = 3 exp(4κ2 ) > 3!

• We have 3 SVOL parameters to estimate: φ = (ω, β, σv).

• Estimation:
- GMM: Using moments, like the sample variance and kurtosis of
returns. Complicated -see Anderson and Sorensen (1996).
- Bayesian: Using MCMC methods (mainly, Gibbs sampling). Modern
approach.

Stochastic volatility (SV/SVOL) models


• The Bayesain approach takes advantage of the idea of hierarchical
structure:
- f(y|ht) (distribution of the data given the volatilities)
- f(ht|φ) (distribution of the volatilities given the parameters)
- f(φ) (distribution of the parameters)

Algorithm: MCMC (JPR (1994).)


Augment the parameter space to include ht.
Using a proper prior for f(ht,φ) MCMC methods provides inference
about the joint posterior f(ht,φ|y). We’ll go over this topic in Lecture 17.

Classic references: Jacquier, E., Poulson, N., Rossi, P. (1994), “Bayesian


analysis of stochastic volatility models,” Journal of Business and Economic
Statistics. (Estimation). Heston, S.L. (1993), “A closed-form solution for
options with stochastic volatility with applications to bond and currency
options,” Review of Financial Studies. (Theory)

65

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy