Ec1 12
Ec1 12
Lecture 12
Heteroscedasticity
Heteroscedasticity
• Assumption (A3) is violated in a particular way: has unequal
variances, but i and j are still not correlated with each other. Some
observations (lower variance) are more informative than others
(higher variance).
f(y|x)
.
. E(y|x) = b0 + b1x
.
x1 x2 x3 x 2
1
RS – Lecture 12
Heteroscedasticity
• Now, we have the CLM regression with hetero-(different) scedastic
(variance) disturbances.
(A1) DGP: y = X + is correctly specified.
(A2) E[|X] = 0
(A3’) Var[i] = 2 i, i > 0. (CLM i = 1, for all i.)
(A4) X has full column rank – rank(X)=k–, where T ≥ k.
• Since the standard errors are biased, we cannot use the usual t-
statistics or F–statistics or LM statistics for drawing inferences. This
is a serious issue.
2
RS – Lecture 12
But, when will XX – XX be nearly the same? The answer is based
on a property of weighted averages. Suppose i is randomly drawn
from a distribution with E[i] = 1. Then,
(1/T)i i xi2
p
E[x2] –just like (1/T)i xi2.
Finding Heteroscedasticity
• There are several theoretical reasons why the i may be related to x
and/or xi2:
1. Following the error-learning models, as people learn, their errors of
behavior become smaller over time. Then, σ2i is expected to decrease.
2. As data collecting techniques improve, σ2i is likely to decrease.
Companies with sophisticated data processing techniques are likely to
commit fewer errors in forecasting customer’s orders.
3. As incomes grow, people have more discretionary income and, thus,
more choice about how to spend their income. Hence, σ2i is likely to
increase with income.
4. Similarly, companies with larger profits are expected to show
greater variability in their dividend/buyback policies than companies
with lower profits.
3
RS – Lecture 12
Finding Heteroscedasticity
• Heteroscedasticity can also be the result of model misspecification.
• It can arise as a result of the presence of outliers (either very small or
very large). The inclusion/exclusion of an outlier, especially if T is
small, can affect the results of regressions.
• Violations of (A1) – model is correctly specified-–, can produce
heteroscedasticity, due to omitted variables from the model.
• Skewness in the distribution of one or more regressors included in
the model can induce heteroscedasticity. Examples are economic
variables such as income, wealth, and education.
• David Hendry notes that heteroscedasticity can also arise because of
– (1) incorrect data transformation (e.g., ratio or first difference
transformations).
– (2) incorrect functional form (e.g., linear vs log–linear models).
Finding Heteroscedasticity
• Heteroscedasticity is usually modeled using one the following
specifications:
- H1 : σt2 is a function of past εt2 and past σt2 (GARCH model).
- H2 : σt2 increases monotonically with one (or several) exogenous
variable(s) (x1,, . . . , xT ).
- H3 : σt2 increases monotonically with E(yt).
- H4 : σt2 is the same within p subsets of the data but differs across the
subsets (grouped heteroscedasticity). This specification allows for structural
breaks.
4
RS – Lecture 12
Finding Heteroscedasticity
• Visual test
In a plot of residuals against dependent variable or other variable will
often produce a fan shape.
180
160
140
120
100
Series1
80
60
40
20
0
0 50 100 150
• Then, a simple test: Check the RSS for large values of X1, and the
RSS for small values of X1. This is the Goldfeld-Quandt test.
5
RS – Lecture 12
- Step 2. Run two separate regressions, one for small values of Xj and
one for large values of Xj, omitting d middle observations (≈ 20%).
Get the RSS for each regression: RSS1 for small values of Xj and
RSS2 for large Xj’s.
6
RS – Lecture 12
• GE returns
gqtest(ge_x ~ Mkt_RF + SMB + HML, fraction = .20)
Goldfeld-Quandt test
7
RS – Lecture 12
• Using specific functions for i2, this test has been used by
Rutemiller and Bowers (1968) and in Harvey’s (1976) groupwise
heteroscedasticity paper.
• We can base the test on how the squared OLS residuals e correlate
with X.
8
RS – Lecture 12
• Both tests are based on OLS residuals. That is, calculated under H0:
No heteroscedasticity.
9
RS – Lecture 12
10
RS – Lecture 12
(2) Harvey-Godfrey (1978) test. Use ln(ei2). Then, the implied model
for i2 is an exponential model.
ln(ei2) = α0 + zi,1’ α1 + .... + zi,m’ αm + vi
Note: Implied model for i2 = exp{α0 + zi,1’ α1 + .... + zi,m’ αm + vi}.
11
RS – Lecture 12
fit_ibm_ff3 <- lm (ibm_x ~ Mkt_RF + SMB + HML) # Step 1 – OLS in DGP (3-factor FF model)
e_ibm <- fit_ibm_ff3$residuals # Step 1 – keep residuals
e_ibm2 <- e_ibm^2 # Step 1 – squared residuals
Mkt_RF2 <- Mkt_RF^2
fit <- lm (e_ibm2 ~ Mkt_RF2) # Step 2 – Auxiliary regression
Re_2 <- summary(fit_BP)$r.squared # Step 2 – keep R^2
LM_BP_test <- Re2 * T
> LM_BP_test # Step 3 – Compute LM-BP test: R^2 * T
[1] 0.25038
> p_val <- 1 - pchisq(LM_BP_test, df = 1) # p-value of LM_test
> p_val
[1] 0.6168019
12
RS – Lecture 12
> bptest(ibm_x ~ Mkt_RF + SMB + HML) #bptest only allows to test H1: fxi=model variables)
13
RS – Lecture 12
27
14
RS – Lecture 12
15
RS – Lecture 12
> LM_W_test
[1] 15.46692
> p_val
[1] 0.001458139 reject H0 at 5% level
32
16
RS – Lecture 12
17
RS – Lecture 12
• The key is to figure out what h(x) looks like. Suppose that we know
hi. For example, hi(x)=xi2. (make sure hi is always positive.)
18
RS – Lecture 12
Estimation: FGLS
• More typical is the situation where we do not know the form of the
heteroskedasticity. In this case, we need to estimate h(xi).
19
RS – Lecture 12
Estimation: FGLS
• Now, an estimate of h is obtained as ĥ = exp(ĝ), and the inverse of
this is our weight. Now, we can do GLS as usual.
• Summary of FGLS
(1) Run the original OLS model, save the residuals, e. Get ln(e2).
(2) Regress ln(e2) on all of the independent variables. Get fitted
values, ĝ.
(3) Do WLS using 1/sqrt[exp(ĝ)] as the weight.
(4) Iterate to gain efficiency.
• Remark: We are using WLS just for efficiency –OLS is still unbiased
and consistent. Sandwich estimator gives us consistent inferences.
Estimation: MLE
• ML estimates all the parameters simultaneously. To construct the
likelihood, we assume a distribution for ε. Under normality (A5):
T T
T 1 1 1
ln L ln 2
2 2 i1
ln i2
2 i1 i2
( yi X i)( yi X i)
• Then, the first derivatives of the log likelihood wrt θ=(β,α) are:
T
ln L
x i i / i2 X ' 1
i 1
T T T
ln L 1 1 1
i
2 i 1
1 / i2 exp(zi ' ) zi ( ) i2 / i4 exp(zi ' ) zi
2 i 1 2 z (
i 1
i
2
i / i2 1)
20
RS – Lecture 12
Estimation: MLE
• We take second derivatives to calculate the information matrix :
T
ln L2
'
i 1
x i x i ' / i2 X ' 1 X
T
ln L 1
i '
2 x z '
i 1
i i i / i2
T
ln L 1
i i '
2 z z '
i 1
i i
2
i / i2
• Then,
ln L X ' X 0
1
I () E[ ] 1
' 0 Z'Z
2
Estimation: MLE
• We estimate the model using Newton’s method:
θj+1 = θj – Hj-1 gj gj = ∂log Lj/∂θ’
21
RS – Lecture 12
ARCH Models
• Until the early 1980s econometrics had focused almost solely on
modeling the conditional means of series:
yt = E[yt| It] + εt, εt ~ D(0,σ2)
Suppose we have an AR(1) process:
yt = α + β yt-1 + εt.
Then, the conditional mean, conditioning on information set at time
t, It , is:
Et[yt+1| It] = α + β yt
22
RS – Lecture 12
ARCH Models
• Similar idea for the variance.
Unconditional variance:
Var [yt] = E[(yt –E[yt])2] = σ2/(1-β2)
Conditional variance:
Vart-1[yt] = Et-1[(yt –Et-1[yt])2] = Et-1[εt2]
ARCH Models
• The unconditional variance measures the overall uncertainty. In the
AR(1) example, the information available at time t, It , plays no role:
Var[yt] = σ2/(1-β2) .
mean
Variance
t Time
23
RS – Lecture 12
24
RS – Lecture 12
Statistic
Mean (%) 0.626332
(p-value: 0.0004)
Standard Dev (%) 4.37721
Skewness -0.43764
Excess Kurtosis 2.29395
Jarque-Bera 145.72
(p-value: <0.0001)
AR(1) 0.0258
(p-value: 0.5249)
25
RS – Lecture 12
define t t2 t2
t2 ( L ) t2 t
That is, 2
q
1
i 1
i
• Example: ARCH(1)
Yt X t t t ~ N (0, t 2 )
t 2 1 t21
• We need to impose restrictions: α1>0 & 1 - α1>0.
26
RS – Lecture 12
• It turns out that σt2 is a very persistent process. Such a process can
be captured with an ARCH(q), where q is large. This is not efficient.
27
RS – Lecture 12
28
RS – Lecture 12
GARCH-X
• In the GARCH-X model, exogenous variables are added to the
conditional variance equation.
Examples: Glosten et al. (1993) and Engle and Patton (2001) use 3-
mo T-bill rates for modeling stock return volatility. Hagiwara and
Herce (1999) use interest rate differentials between countries to
model FX return volatility. The US congressional budget office uses
inflation in an ARCH(1) model for interest rate spreads.
IGARCH
• Recall the technical detail: The standard GARCH model:
t2 ( L ) 2 ( L ) 2
29
RS – Lecture 12
IGARCH
• Variance forecasts are generated with: E t [ t2 j ] t2 j
• That is, today’s variance remains important for future forecasts of all
horizons.
60
30
RS – Lecture 12
𝜎 𝜔 𝛼𝜀 𝛾𝜀 ∗𝐼 𝛽𝜎
62
31
RS – Lecture 12
𝜎 𝜔 𝛼 |𝜀 𝜅| 𝛽𝜎
Note: The variance depends on both the size and the sign of the
variance which helps to capture leverage type (asymmetric) effects.
64
32
RS – Lecture 12
𝜎 𝜔 𝛼 𝐼 𝜀 𝜅 𝛼 𝐼 𝜀 𝜅 𝜀 𝛽𝜎
𝜎 𝜔 𝛼 𝜀 𝛽𝜎 , if 𝜀 𝜅
𝜎 𝜔 𝛼 𝜀 𝛽𝜎 , if 𝜀 𝜅
• We can modify the model in many ways. For example, we can allow
for the asymmetric effects of negative news. 65
Intuition: σt2 depends on the state of the economy –regime. It’s based on
Hamilton’s (1989) time series models with changes of regime:
q
t2 s , s t t 1
i 1
i , st , st 1 t21
The key is to select a parsimonious representation:
2t q
t2 i
s
i 1
i
s
t ti
33
RS – Lecture 12
log 𝑓 𝑦 |𝑌 ,𝑋 ;𝛉
34
RS – Lecture 12
35
RS – Lecture 12
36
RS – Lecture 12
• The last equation shows that MLE is GLS for the mean parameters,
𝜸: each observation is weighted by the inverse of 𝜎 , .
37
RS – Lecture 12
38
RS – Lecture 12
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] 4.265907e-05 9.999960e-01 -0.0011397586 0.0018331957 -0.0018541203
[2,] -3.333961e-06 -2.188159e-03 -0.0010048203 0.9769058449 -0.2136566699
[3,] 9.999998e-01 -4.223001e-05 -0.0003544245 0.0001291633 0.0005770707
[4,] -3.599974e-06 -1.702277e-03 -0.8603563865 -0.1097470278 -0.4977344477 77
[5,] -6.893837e-04 6.416141e-04 -0.5096905472 0.1833226197 0.8405994743
78
39
RS – Lecture 12
80
40
RS – Lecture 12
41
RS – Lecture 12
42
RS – Lecture 12
Since this equation holds for any value of the true parameters, the
QMLE, say θQMLE is Fisher-consistent –i.e., E[S(yT, yT-1, …, y1 ; θ)] = 0
for any θ ∈ Ω.
43
RS – Lecture 12
44
RS – Lecture 12
Note: (1) refers to the conditional mean, (2) refers to the conditional
variance, and (3) to the unconditional mean.
45
RS – Lecture 12
• Ignoring ARCH
- You suspect yt has an AR structure: yt = γ0 + γ1 yt-1 + εt
Hamilton (2008) finds that OLS t-test with no correction for ARCH
spuriously reject H0: γ1= 0 with arbitrarily high probability for
sufficiently large T. White’s (1980) SE help. NW SE help less.
46
RS – Lecture 12
47
RS – Lecture 12
48
RS – Lecture 12
97
49
RS – Lecture 12
RV Models: Intuition
• The idea of realized volatility is to estimate the latent (unobserved)
variance using the realized data, without any modeling. Recall the
definition of sample variance:
𝑻
1
𝒔𝟐 𝒙𝒊 𝒙 𝟐
𝑇 1
• Suppose we want to calculate the daily variance for stock returns. We
know how to compute it: we use daily information, for T days, and
apply the above definition.
• Alternatively, we use hourly data for the whole day (with k hours).
Since hourly returns are very small, ignoring 𝒙 seems OK. We use 𝑟 ,𝒊
as the ith hourly variance on day t. Then, we add 𝑟 ,𝒊 over the day:
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑟 ,𝒊
10
0
50
RS – Lecture 12
RV Models: Intuition
• In more general terms, we use higher frequency data to estimate a
lower frequency variance:
𝑅𝑉 𝑟 ,𝒊
51
RS – Lecture 12
• Intra-daily data applications are the most common. But, when using
intra-daily data, RV calculations are affected by microstructure effects: bid-
ask bounce, infrequent trading, calendar effects, etc. rt,i does not look
uncorrelated.
52
RS – Lecture 12
• Hansen and Lunde (2006) find that for very liquid assets, such as the
S&P 500 index, a 5’ sampling frequency provides a reasonable choice.
Thus, to calculate daily RV, we need to add 78 five-minute intervals.
10
5
10
6
53
RS – Lecture 12
54
RS – Lecture 12
T <- length(spy_5_p)
spy_ret <- log(spy_p[-1]/spy_p[-T])
plot(spy_ret, type="l", ylab="Return", main="Tick by Tick Return (2014:01:02 - 2014:01:07)")
mean(spy_ret)
sd(spy_ret) 11
0
55
RS – Lecture 12
> mean(spy_ret)
[1] -4.796933e-09
> sd(spy_ret)
[1] 7.804991e-05
11
2
56
RS – Lecture 12
> acf_spy_raw
0 1 2 3 4 5 6 7 8 9 10
1.000 -0.469 -0.013 -0.010 0.014 -0.008 0.000 -0.002 -0.001 0.000 0.000
library(highfrequency)
spy_5 <- aggregateTrades(
hf_1,
on = "minutes", # you can use also seconds, days, weeks, etc.
k = 5, # number of units in for “on”
marketOpen = "09:30:00",
marketClose = "16:00:00",
tz = "GMT"
)
57
RS – Lecture 12
RVolt=2014:01:02 = 0.0053344
RVolt=2014:01:03 = 0.0043888
RVolt=2014:01:04 = 0.0059836
RVolt=2014:01:05 = 0.0052772 11
5
> acf_spy_5 <- acf(spy_5_ret, main = "5-minute SPY Data: January 2014")
> acf_spy_5
Autocorrelations of series ‘spy_ret’, by lag
0 1 2 3 4 5 6 7 8 9 10
1.000 -0.105 -0.024 -0.104 0.018 0.147 0.016 -0.024 -0.088 0.048 0.037
58
RS – Lecture 12
RVolt=2014:01:02 = 0.005478294
RVolt=2014:01:03 = 0.004256046
RVolt=2014:01:04 = 0.006190508
RVolt=2014:01:05 = 0.005145601 11
7
11
8
59
RS – Lecture 12
RV Models: R Script
Example: R script to compute realized volatility
MSCI_da <- read.csv("http://www.bauer.uh.edu/rsusmel/4397/MSCI_daily.csv", head=TRUE, sep=",")
x_us <- MSCI_da$USAT <- length(x_us)
us_r <- log(x_us[-1]/x_us[-T])
60
RS – Lecture 12
We can use time series models –say, an ARIMA model- for RVt to
forecast daily volatility.
61
RS – Lecture 12
RV Models: Properties
• Under some conditions (bounded kurtosis and autocorrelation of
squared returns less than 1), RVt is consistent and m.s. convergent.
• Realized volatility is a measure. It has a distribution.
• For returns, the distribution of RV is non-normal (as expected). It
tends to be skewed right and leptokurtic. For log returns, the
distribution is approximately normal.
• Daily returns standardized by RV measures are nearly Gaussian.
• RV is highly persistent.
• The key problem is the choice of sampling frequency (or number of
observations per day).
62
RS – Lecture 12
RV Models - Variation
• Another method: AR model for volatility:
| t | | t 1 | t
OLS estimation possible. Make sure that the variance estimates are
positive.
63
RS – Lecture 12
where Ht,j and Lt,j are the highest and lowest price in the jth interval.
Then,
E[log(σt)] = ω
Var[log(σt)] = κ2 συ2/(1 – β2).
Unconditional distribution: log(σt) ~ N(ω, κ2)
64
RS – Lecture 12
• Estimation:
- GMM: Using moments, like the sample variance and kurtosis of
returns. Complicated -see Anderson and Sorensen (1996).
- Bayesian: Using MCMC methods (mainly, Gibbs sampling). Modern
approach.
65