chapter4
chapter4
1
Model specification
• Model specification consists of two choices
– The set of variables that we include in a model.
– The functional form of the relationship we
specify.
• The model specification determines the
question we answer.
• Can’t get the right answer if we ask the
wrong question.
2
Model misspecification
• There are four basic types of model
misspecification
– Inclusion of an irrelevant variable
– Exclusion of a relevant variable
– Measurement error
– Erroneous functional form for the relationship
3
Irrelevant variables
• (1) yi 1 x1i 2 x2i i
• (2) yi 1 x1i i
• Estimating (1) while (2) is the true model
does not affect b1 if
– 2 0 , or
– x1 ' x2 0, i.e. the two variables are independent
• But estimating (1) does inflate the variance
of b1 .
4
Irrelevant variables
• The variance of b1 increases for two reasons
– Addition of an irrelevant variable reduces the
degrees of freedom, which is part of the
estimator for the variance of b1
– If 2 0 , but x1 ' x2 0 , the independent
variance of x1 will be reduced and the variance
of b1 will be increased
• Therefore, putting in to many variables changes the
test outcomes, BE CAREFULL
5
Omitted variables
• (1) yi 1 x1i 2 x2i i
• (2) yi 1 x1i i
7
Omitted variable
• It is clear that the bias is zero if the extra
variable in the model is:
– Uncorrelated with other explanatory variables.
– Uncorrelated with the dependent variable.
• From this a criterion for including a variable
is:
– The variable is correlated with other variables.
– And the variable is a cause of the dependent
variable.
8
Functional misspecification
• Misspecified linearity
• What if we assume a model to be linear when
it is actually not.
• Two types:
– Non-linear in explanatory variables;
– Non-linear in parameters.
9
Non-linear in variables
10
Non-linear in parameters
• Consider the model:
– y g x, 1 2 x
3
Or;
– y g x, 1 x1 x2
2 3
11
Non-linear in parameters
• y g x, 1 2 x 3
12
Ramsey’s RESET test
yi
13
Ramsey’s RESET test
• NOTE:
– the regression in Ramsey’s RESET test is an
auxiliary regression
– the estimates are used for testing only, not for
economic inferences
• WARNING: rejecting H0 does not always indicate
non-linearity, it might also indicate an omitted variable.
14
Heteroskedasticity
• In OLS we assume homoskedasticity:
var i 2
• Which means that
– The variance of the error term is constant for all
observations.
– All error terms are drawn from a distribution
with mean zero and variance sigma square.
• Which is equivalent to: the variance of i
does not depend on i.
15
Heteroskedasticity
• Now assume: var i i2
• Or in words:
– The variance of the error term depends on i.
– The variance of the error term is not constant
over all observations.
16
Heteroskedasticity
• In the case of homoskedastic errors the error
terms are stochastic.
17
Heteroskedasticity
• Why worry about heteroskedasticity?
– OLS is still unbiased and consistent.
– However the standard errors of OLS are biased in the
case of heteroskedastic error terms.
– Therefore, the t, F and LM(a Lagrange multiplier test
for error autocorrelation) tests can not be used for
drawing inferences.
18
Heteroskedasticity
• If there is no prior assumption on the form of the
heteroskedasticity, but there is a suggestion that it is related
to the explanatory variables, it is possible to correct the
standard errors for heteroskedasticity.
19
Heteroskedasticity
20
Testing for heteroskedasticity,
(DetectingHeteroscedasticity)
• The Breusch-Pagan test
– The BP-test starts from the assumption that
heteroskedasticity is multiplicative.
– Before correcting for it, this is first tested though.
-in stata, we can follow the steps
21
Testing for heteroskedasticity
• The White test
– The White-test starts from the assumption that
heteroskedasticity is multiplicative.
– However, it assumes a more flexible form than the
BP-test.
– The auxiliary regression in the white test is the
quadratic form of the explanatory variables.
22
Testing for heteroskedasticity
• The White test
– In the case of two explanatory variables the
auxiliary regression is:
ei2 0 1 x1i 2 x2i 3 x12i 4 x22i 5 x1i x2i i
– If the gamma’s are all zero there is no
heteroskedasticity.
– So the White-test is an F-test on the joint
significance of the gamma’s.
24
Heteroskedasticity
• Recapitulation
– If the number of observations is large it is possible
to approximate the form of heteroskedasticity using
the estimated OLS residuals.
– If an exponential form is assumed, the estimation of
the form can also be used to test for
heteroskedasticity, this is the Breusch-Pagan test
– If an quadratic form of the explanatory variables is
assumed, the estimation of the form can also be used
to test for heteroskedasticity, this is the White test.
25
Heteroskedasticity
Recapitulation
–If in either the BP-test or the White test homoskedasticity
of the error terms is rejected in favor of heteroskedasticity,
the auxiliary regression can be used in FGLS.
26
Ex: college graduates earnings
• Earningsi 0 1Malei i
27
Ex: college graduates earnings
• Earningsi 0 i (women)
• Earningsi 0 1 i (men)
28
Ex: trade taxes
• Suppose we have estimated the log-linear model
of the share of trade taxes (import and export)
in total government revenues (y) on a constant,
the share of imports and exports in GNP (x2)
and GNP/capita (x3) for 41 countries.
29
Ex: trade taxes
• White’s heteroskedasticity test:
– Regress the squared residuals on a constant, x2,
x3, x22, x32 and x2x3
– R2 is 0.1148
– Test statistic: nR2 = 4.7068
– The 5 percent critical chi-square value for 5 df is
11.0705, the 10 percent critical value is 9.2363,
and the 25 percent critical value is 6.62568
– There is no heteroskedasticity.
30
Autocorrelation
• Ex: agricultural production
• The Cobweb Phenomenon:
– Supply of farm products is described by:
Supplyt 0 1Pt 1 t
– Plantings of crops this year are influenced by the
price of the previous year.
– If price was high, production is high, which
causes price to decrease and production the next
year.
– Clearly the error terms are not random, but
repelling (i.e. negative autocorrelation).
31
Dubin-Watson test
• Durbin-Watson d test decision rules
Null hypothesis Decision If
No positive
autocorrelation
Reject
0 d dL
No positive
autocorrelation
No decision
d L d dU
No negative
autocorrelation
Reject
4 dL d 4
No negative
autocorrelation
No decision
4 dU d 4 d L
No autocorrelation,
positive of negative
Do not reject
dU d 4 dU
32
Ex: wages and productivity
• Suppose we regress average wage per hour on
average output per hour for the years 1959-1998
• The DW-statistic is d=0.1229.
• The lower and upper bound for the DW-
statistic. for 40 observations and one
explanatory variable are 1.44 and 1.54.
• d lies below the lower bound.
• Therefore, we conclude positive autocorrelation.
33
Ex: wages and productivity
• What does the positive autocorrelation mean
– Technically it means that the error term and the lagged
error term are positively correlated.
– In practice it means that when wages are growing
faster in one year than would be predicted by
productivity, they are expected to grow faster in the
next year as well, and vice versa.
– The wage change of this year is influencing that of
next year.
34
Durbin-Watson test
• The disadvantage of the Durbin-Watson test
is the indecisive zone.
• This zone narrows down with the number of
observations.
• With 4 regressors and 20 observations, the 5
percent lower and upper d-values are 0.894
and 1.828.
• With 75 observations this narrows down to
1.515 and 1.739.
35
Ex: wages and productivity
• LM-test
– Regress average wage per hour on average output
per hour and the lag of the residual.
– For the lag of the residual we find a value of
0.6385 with a p-value of 0.00271.
– What do we conclude?
– What does the value of 0.6385 mean?
– Can we interpret the parameter for average
output?
36
Ex: wages and productivity
• LM-test:
• What do we conclude?
– There is positive autocorrelation.
• What does the value of 0.6385 mean?
– 63.85% of the residual at time t-1 is transferred
to time t.
• Can we interpret the parameter for average output?
– No this is an auxiliary regression.
37
Multicollinearity
Multicollinearity
Perfect multicollinearity
because
)
S.E.(βˆ1)=S.E.(βˆ2)=∞
Multicollinearity
Special case of perfect
multicollinearity
If an explanatory variable X1 is related to independent variable Y
by definition, then Xt will mask the effects of any other
explanatory variables X2, ..., Xk .
perfect collinearity between Y and X1
We need to drop such dominant variable (in our case, X1) from
regression
Example: Study of Shoe factory output. Regressing Number of
shoes produced on Capital, Labour, and Shoe leather input
leads to very high R2 and insignificance of Capital and Labour.
Why?
Because the amount of shoe leather entering factory is
proportional to the number of shoes by definition.
Multicollinearity
Consequences of multicollinearity
1 Estimates remain unbiased
2 Standard errors of estimates increase
ˆ −β
βk kH 0
) t-statistics will be very low: t k = S.E.(βˆk)
3 Estimates are very sensitive to changes in specification
) OLS is forced to emphasize small differences between collinear
variables to distinguish between the effects they have
) A small change in specification can change those attributes
Multicollinearity
Example: Gasoline consumption
across US states
dropping UHM makes REG highly significant, because the
S.E.(βRˆeg ) dropped
Cˆi = 551.7 −53.6TAXt +0.186REGi
t = −3.18 15.88
Before, OLS put great importance to small
differences between REG
and UHM to explain movement in C
We could have dropped REG instead - they are
statistically very similar
Multicollinearity
Detection of multicollinearity
Multicollinearity
• 1.usingvarianceinflationfactor (for continuous
Ivs)
• 2.HighR2butfewsignificantt-ratio
• 3.Highpairwisecorrelationamongregressors
44
45
Example: Gasoline consumption
across US states
dropping UHM makes REG highly significant, because the
S.E.(βRˆeg ) dropped
Cˆi = 551.7 −53.6TAXt +0.186REGi
t = −3.18 15.88
Before, OLS put great importance to small
differences between REG
and UHM to explain movement in C
We could have dropped REG instead - they are
statistically very similar
Multicollinearity
Detection of multicollinearity
Multicollinearity
.
• .
48
High variance inflation factors (VIFs)
To detect multicollinearity in:
X1 = α1 + α2X2 + ... + αk Xk + v
2 VIF(βˆi ) = 1−R
1
2 where R i is the coefficient of determination in above
2
i
regression
Higher VIFs mean more severe multicollinearity
Rule of thumb: VIF(βˆ) > 5 indicates severe
multicollinearity Some researchers use TOL=tolerance
≡ 1/VIF
Multicollinearity
Remedies: Do nothing
Multicollinearity
Remedies: Drop redundant variable
Multicollinearity
Remedies: Transform variables
Use this when all variables are theoretically important
1 Combine variables
Ç Any linear combination. Simplest: summation
Ç New variable: X3i = X1i + X2i . Then estimate:
can answer using the estimation (a study of capital stock is not the same as
study of changes in capital stock (=investment))
Multicollinearity
Remedies: Increase sample size
Multicollinearity
Leaving multicollinearity unadjusted is
often best
Multicollinearity
Example
Estimate
Multicollinearity
Correlation r – Interpretation(an other detection of
Multicollinearity)
• Positive r indicates positive linear association
between x and y or variables, and negative r
indicates negative linear relationship
• R’s always between -1 and +1
• The strength increases as r moves away from zero
toward wither -1 or +1
• The extreme values +1 and -1 indicate perfect linear
relationship (points lie exactly along a straight line)
• Graded interpretation : r 0.1-0.3 = weak; 0.4-0.7 =
moderate and 0.8-1.0=strong correlation
Normality test
• The normality test helps to determine how
likely it is for a random variable underlying the
data set to be normally distributed.
• There are several normality tests such as
– the Skewness Kurtosis test,
– the Jarque Bera test,
– the Shapiro Wilk test,
– the Kolmogorov-Smirnov test, and
– the Chen-Shapiro test.
• For this course we will focus on two tests; Skewness Kurtosis
and Jarque Bera tests because they are simple and popular.
57
Skewness Kurtosis test for normality
58
considering a hypothetical data which contains the
following variables
Gross Domestic Product (GDP) (dependent variable).
Gross Fixed Capital Formation (GFC) (independent
variable).
Private Final Consumption Expenditure (PFC)
(independent variable).
59
• ‘sktest’ shows the number of observations (which is 84
here) and the probability of skewness which is 0.8035
implying that skewness is asymptotically normally
distributed (p-value of skewness > 0.05).
• Similarly, Pr(Kurtosis) indicates that kurtosis is also
asymptotically distributed (p-value of kurtosis > 0.05).
• Finally, chi(2) is 0.1426 which is greater than 0.05
implying its significance at a 5% level.
• Consequently, the null hypothesis cannot be rejected.
• Therefore, according to the Skewness test for normality,
residuals show normal distribution.
60
Jarque Bera test for normality
• The other test of normality is the Jarque Bera test.
• In order to perform this test, use the command ‘jb
resid’ in the command prompt.
• If the p-value is lower than the Chi(2) value then the null hypothesis cannot be
rejected.
• Therefore residuals are normality distributed.
• As per the above figure, chi(2) is 0.1211 which is greater than 0.05. Therefore, the
null hypothesis cannot be rejected.
61
Normality through histogram
• A histogram plot also indicates the normality of residuals.
• A bell-shaped curve shows the normal distribution of the series.
• In order to generate the histogram plot, follow the below procedure.
• Go to ‘Graphics’ in the main bar.
• Select ‘histogram’.
• Then choose the main variable (residual) and choose ‘Density’ under the
Y-axis section. Click on ‘OK’
• Click on ‘Add normal density plot’.
• Finally, click on ‘OK’ to generate the histogram plot showing the normality
distribution of the residuals (as indicated figure below).
• It shows a bell-shaped distribution of the residuals, confirms the normality test results
from the two tests above.
62