0% found this document useful (0 votes)
5 views

chapter4

Chapter 4 discusses violations of the assumptions of the classical model, focusing on model specification, misspecification, and heteroskedasticity. It highlights the importance of including relevant variables and the consequences of omitting or incorrectly specifying them, as well as the implications of heteroskedasticity on OLS estimators. Additionally, it covers methods for testing and correcting for these issues, such as Ramsey's RESET test and the Breusch-Pagan test.

Uploaded by

Elias Shiferaw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

chapter4

Chapter 4 discusses violations of the assumptions of the classical model, focusing on model specification, misspecification, and heteroskedasticity. It highlights the importance of including relevant variables and the consequences of omitting or incorrectly specifying them, as well as the implications of heteroskedasticity on OLS estimators. Additionally, it covers methods for testing and correcting for these issues, such as Ramsey's RESET test and the Breusch-Pagan test.

Uploaded by

Elias Shiferaw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Chapter 4

Violations of the Assumptions of the


Classical Model

1
Model specification
• Model specification consists of two choices
– The set of variables that we include in a model.
– The functional form of the relationship we
specify.
• The model specification determines the
question we answer.
• Can’t get the right answer if we ask the
wrong question.

2
Model misspecification
• There are four basic types of model
misspecification
– Inclusion of an irrelevant variable
– Exclusion of a relevant variable
– Measurement error
– Erroneous functional form for the relationship

3
Irrelevant variables
• (1) yi  1 x1i   2 x2i   i
• (2) yi  1 x1i   i
• Estimating (1) while (2) is the true model
does not affect b1 if
–  2  0 , or
– x1 ' x2  0, i.e. the two variables are independent
• But estimating (1) does inflate the variance
of b1 .

4
Irrelevant variables
• The variance of b1 increases for two reasons
– Addition of an irrelevant variable reduces the
degrees of freedom, which is part of the
estimator for the variance of b1
– If  2  0 , but x1 ' x2  0 , the independent
variance of x1 will be reduced and the variance
of b1 will be increased
• Therefore, putting in to many variables changes the
test outcomes, BE CAREFULL

5
Omitted variables
• (1) yi  1 x1i   2 x2i   i
• (2) yi  1 x1i   i

• Estimating (2) while (1) is the true model


does affect b1
• In this case the error term in (2) does not
satisfy all the Gauss-Markov assumptions:
• It is replaced by ei   2 x2   i
6
Omitted variable
• Eei   E 2 x2i   i   0
• We require it to be equal to zero to get an
unbiased estimate of 1

• Thus by excluding x2 we obtain a biased


estimate of 1

7
Omitted variable
• It is clear that the bias is zero if the extra
variable in the model is:
– Uncorrelated with other explanatory variables.
– Uncorrelated with the dependent variable.
• From this a criterion for including a variable
is:
– The variable is correlated with other variables.
– And the variable is a cause of the dependent
variable.
8
Functional misspecification
• Misspecified linearity
• What if we assume a model to be linear when
it is actually not.

• Two types:
– Non-linear in explanatory variables;
– Non-linear in parameters.

9
Non-linear in variables

• wagei   0  1age   2 age 2


  3male   4 age  male  

• This equation is linear in parameters.


• Therefore, we can apply OLS.

10
Non-linear in parameters
• Consider the model:
– y  g  x,    1   2 x
3
Or;

– y  g  x,    1 x1 x2
2 3

• Solution to the latter



– ln y  1   2 ln x1  3 ln x2
– Which is linear in parameters (e.g. Cobb-Douglas)

11
Non-linear in parameters
• y  g  x,    1   2 x 3

• There is no way to make this linear in parameters.

•Ramsey’s RESET test


• A test for the functional form:
• H0: the linear model is correct
• Intuition: If the linear model is correct, powers of the
predicted values of the dependent variable should not add
to the explanation of the dependent variable.

12
Ramsey’s RESET test

yi

• Ramsey’s RESET test:


– F-test for the joint significance of the alphas.

13
Ramsey’s RESET test
• NOTE:
– the regression in Ramsey’s RESET test is an
auxiliary regression
– the estimates are used for testing only, not for
economic inferences
• WARNING: rejecting H0 does not always indicate
non-linearity, it might also indicate an omitted variable.

14
Heteroskedasticity
• In OLS we assume homoskedasticity:
var  i    2
• Which means that
– The variance of the error term is constant for all
observations.
– All error terms are drawn from a distribution
with mean zero and variance sigma square.
• Which is equivalent to: the variance of  i
does not depend on i.

15
Heteroskedasticity
• Now assume: var  i    i2
• Or in words:
– The variance of the error term depends on i.
– The variance of the error term is not constant
over all observations.

16
Heteroskedasticity
• In the case of homoskedastic errors the error
terms are stochastic.

• In the case of heteroskedastic errors there is a


systematic pattern in the errors.

17
Heteroskedasticity
• Why worry about heteroskedasticity?
– OLS is still unbiased and consistent.
– However the standard errors of OLS are biased in the
case of heteroskedastic error terms.
– Therefore, the t, F and LM(a Lagrange multiplier test
for error autocorrelation) tests can not be used for
drawing inferences.

18
Heteroskedasticity
• If there is no prior assumption on the form of the
heteroskedasticity, but there is a suggestion that it is related
to the explanatory variables, it is possible to correct the
standard errors for heteroskedasticity.

• This is only possible in the case of many observations.

19
Heteroskedasticity

• Remedies if heteroskedasticity happen:


• White’s heteroskedasticity robust standard errors and can be used in
hypothesis testing.
• GLS

20
Testing for heteroskedasticity,
(DetectingHeteroscedasticity)
• The Breusch-Pagan test
– The BP-test starts from the assumption that
heteroskedasticity is multiplicative.
– Before correcting for it, this is first tested though.
-in stata, we can follow the steps

21
Testing for heteroskedasticity
• The White test
– The White-test starts from the assumption that
heteroskedasticity is multiplicative.
– However, it assumes a more flexible form than the
BP-test.
– The auxiliary regression in the white test is the
quadratic form of the explanatory variables.

22
Testing for heteroskedasticity
• The White test
– In the case of two explanatory variables the
auxiliary regression is:
ei2   0   1 x1i   2 x2i   3 x12i   4 x22i   5 x1i x2i  i
– If the gamma’s are all zero there is no
heteroskedasticity.
– So the White-test is an F-test on the joint
significance of the gamma’s.

– If the gamma’s are significantly different from zero


the estimated auxiliary regression can be used in
FGLS.
23
Heteroskedasticity
• Recapitulation
– If there is a suggestion of heteroskedasticity and
the number of observations is large, the estimated
standard errors of the estimated parameters can
be corrected using White’s heteroskedastic robust
standard errors.

24
Heteroskedasticity
• Recapitulation
– If the number of observations is large it is possible
to approximate the form of heteroskedasticity using
the estimated OLS residuals.
– If an exponential form is assumed, the estimation of
the form can also be used to test for
heteroskedasticity, this is the Breusch-Pagan test
– If an quadratic form of the explanatory variables is
assumed, the estimation of the form can also be used
to test for heteroskedasticity, this is the White test.

25
Heteroskedasticity
Recapitulation
–If in either the BP-test or the White test homoskedasticity
of the error terms is rejected in favor of heteroskedasticity,
the auxiliary regression can be used in FGLS.

26
Ex: college graduates earnings
• Earningsi   0  1Malei   i

• The homoskedasticity assumptions says that the variance of


the error does not depend on the regressors, in this case
Male.
• In other words the variance of the error term is the same for
men and women.

27
Ex: college graduates earnings
• Earningsi   0   i (women)
• Earningsi   0  1   i (men)

• Homoskedasticity says that the variance of


the error term does not depend on Male.
• This is equivalent to saying: the variance of
earnings is the same for men and women.

28
Ex: trade taxes
• Suppose we have estimated the log-linear model
of the share of trade taxes (import and export)
in total government revenues (y) on a constant,
the share of imports and exports in GNP (x2)
and GNP/capita (x3) for 41 countries.

• How do we perform White’s heteroskedasticity


test?

29
Ex: trade taxes
• White’s heteroskedasticity test:
– Regress the squared residuals on a constant, x2,
x3, x22, x32 and x2x3
– R2 is 0.1148
– Test statistic: nR2 = 4.7068
– The 5 percent critical chi-square value for 5 df is
11.0705, the 10 percent critical value is 9.2363,
and the 25 percent critical value is 6.62568
– There is no heteroskedasticity.

30
Autocorrelation
• Ex: agricultural production
• The Cobweb Phenomenon:
– Supply of farm products is described by:
Supplyt   0  1Pt 1   t
– Plantings of crops this year are influenced by the
price of the previous year.
– If price was high, production is high, which
causes price to decrease and production the next
year.
– Clearly the error terms are not random, but
repelling (i.e. negative autocorrelation).
31
Dubin-Watson test
• Durbin-Watson d test decision rules
Null hypothesis Decision If
No positive
autocorrelation
Reject
0  d  dL
No positive
autocorrelation
No decision
d L  d  dU
No negative
autocorrelation
Reject
4  dL  d  4
No negative
autocorrelation
No decision
4  dU  d  4  d L
No autocorrelation,
positive of negative
Do not reject
dU  d  4  dU

32
Ex: wages and productivity
• Suppose we regress average wage per hour on
average output per hour for the years 1959-1998
• The DW-statistic is d=0.1229.
• The lower and upper bound for the DW-
statistic. for 40 observations and one
explanatory variable are 1.44 and 1.54.
• d lies below the lower bound.
• Therefore, we conclude positive autocorrelation.

33
Ex: wages and productivity
• What does the positive autocorrelation mean
– Technically it means that the error term and the lagged
error term are positively correlated.
– In practice it means that when wages are growing
faster in one year than would be predicted by
productivity, they are expected to grow faster in the
next year as well, and vice versa.
– The wage change of this year is influencing that of
next year.

34
Durbin-Watson test
• The disadvantage of the Durbin-Watson test
is the indecisive zone.
• This zone narrows down with the number of
observations.
• With 4 regressors and 20 observations, the 5
percent lower and upper d-values are 0.894
and 1.828.
• With 75 observations this narrows down to
1.515 and 1.739.
35
Ex: wages and productivity
• LM-test
– Regress average wage per hour on average output
per hour and the lag of the residual.
– For the lag of the residual we find a value of
0.6385 with a p-value of 0.00271.
– What do we conclude?
– What does the value of 0.6385 mean?
– Can we interpret the parameter for average
output?

36
Ex: wages and productivity
• LM-test:
• What do we conclude?
– There is positive autocorrelation.
• What does the value of 0.6385 mean?
– 63.85% of the residual at time t-1 is transferred
to time t.
• Can we interpret the parameter for average output?
– No this is an auxiliary regression.

37
Multicollinearity

explanatory variables are linearly dependant


OLS estimates become inaccurate (or are not defined)
with perfect multicollinearity, it is impossible to estimate a
coefficient.
) Ex.: A=2+3B, and both A and B are used as independent variables in a
regression
) whenever there is a change in A, there is a change in B

) OLS estimator of βA and βB will be unable to distinguish the effects of A

and B on dependant variable


) this is an example of perfect multicollinearity

Multicollinearity
Perfect multicollinearity

"perfect" means that changes in one explanatory variable are


completely explained by changes in another variable
Example: real interest rate (rt ) = nominal interest rate (it ) +
inflation (πt ). Assume inflation is constant so rt = it + π. Estimate:
Moneyt = α + β1rt + β2it + st .
) Can we estimate β1 and β2?
) No! Try it. βˆ1 = βˆ2 =indeterminate

because
)
S.E.(βˆ1)=S.E.(βˆ2)=∞

We can not hold rt constant when moving it because rt and it move


together.

Multicollinearity
Special case of perfect
multicollinearity
If an explanatory variable X1 is related to independent variable Y
by definition, then Xt will mask the effects of any other
explanatory variables X2, ..., Xk .
perfect collinearity between Y and X1
We need to drop such dominant variable (in our case, X1) from
regression
Example: Study of Shoe factory output. Regressing Number of
shoes produced on Capital, Labour, and Shoe leather input
leads to very high R2 and insignificance of Capital and Labour.
Why?
Because the amount of shoe leather entering factory is
proportional to the number of shoes by definition.

Multicollinearity
Consequences of multicollinearity
1 Estimates remain unbiased
2 Standard errors of estimates increase
ˆ −β
βk kH 0
) t-statistics will be very low: t k = S.E.(βˆk)
3 Estimates are very sensitive to changes in specification
) OLS is forced to emphasize small differences between collinear
variables to distinguish between the effects they have
) A small change in specification can change those attributes

) Therefore, adding or deleting variables in presence of multicollinearity

causes large changes to βˆs


4 Overall fit (and level of significance) of regression does not
change
5 even though t-stats are low
Fit of coefficients that are not multicollinear is unaffected

Multicollinearity
Example: Gasoline consumption
across US states
dropping UHM makes REG highly significant, because the
S.E.(βRˆeg ) dropped
Cˆi = 551.7 −53.6TAXt +0.186REGi
t = −3.18 15.88
Before, OLS put great importance to small
differences between REG
and UHM to explain movement in C
We could have dropped REG instead - they are
statistically very similar

Multicollinearity
Detection of multicollinearity

How much multicollinearity is there in an equation?


Sample matters
The trick: find variables that are theoretically important but not
collinear
Because it is a matter of degree, there are no formal testing
rules for multicollinearity

Multicollinearity
• 1.usingvarianceinflationfactor (for continuous
Ivs)
• 2.HighR2butfewsignificantt-ratio
• 3.Highpairwisecorrelationamongregressors

44
45
Example: Gasoline consumption
across US states
dropping UHM makes REG highly significant, because the
S.E.(βRˆeg ) dropped
Cˆi = 551.7 −53.6TAXt +0.186REGi
t = −3.18 15.88
Before, OLS put great importance to small
differences between REG
and UHM to explain movement in C
We could have dropped REG instead - they are
statistically very similar

Multicollinearity
Detection of multicollinearity

How much multicollinearity is there in an equation?


Sample matters
The trick: find variables that are theoretically important but not
collinear
Because it is a matter of degree, there are no formal testing
rules for multicollinearity

Multicollinearity
.
• .

48
High variance inflation factors (VIFs)
To detect multicollinearity in:

Y = β0 + β1X1 + β2X2 + ... + βk Xk + s

1 Regress each Xi on all other explanatory variables,


e.g.:

X1 = α1 + α2X2 + ... + αk Xk + v
2 VIF(βˆi ) = 1−R
1
2 where R i is the coefficient of determination in above
2
i
regression
Higher VIFs mean more severe multicollinearity
Rule of thumb: VIF(βˆ) > 5 indicates severe
multicollinearity Some researchers use TOL=tolerance
≡ 1/VIF
Multicollinearity
Remedies: Do nothing

Every remedy for multicollinearity has a drawback


) Variables can remain significant if there is incomplete
multicollinearity
) Look for a remedy only when variables are insignificant

) Dropping a variable (remedy 2) can create omitted variable bias

Multicollinearity
Remedies: Drop redundant variable

If multicollinearity caused by several variables measuring the


same thing (see gas consumption example)
Which one to drop? Let theory guide you

Multicollinearity
Remedies: Transform variables
Use this when all variables are theoretically important
1 Combine variables
Ç Any linear combination. Simplest: summation
Ç New variable: X3i = X1i + X2i . Then estimate:

Ç Any non-linear change in definitions of variables changes degree of


multicollinearity
Ç In economics and finance, multicollinearity is often the result of a time

trend in the time-series data - so this is a suitable remedy


Ç However, differencing changes functional form and so alters the question we

can answer using the estimation (a study of capital stock is not the same as
study of changes in capital stock (=investment))

Multicollinearity
Remedies: Increase sample size

Increases estimation accuracy. Not always possible

Multicollinearity
Leaving multicollinearity unadjusted is
often best

Unless multicollinearity causes problem in regression, do not


fix it
Dropping a theoretically important collinear variable causes
omitted variable bias.
Therefore, coefficient on the remaining collinear variable will absorb
)

most of the effect of the dropped variable.

Multicollinearity
Example
Estimate

Sˆt = 3080 − 75000Pt + 4.23At − 1.04Bt


t = −3.00 3.99 − 2.04

You find out that r (At, Bt ) = 0.974 because the competing


firms match each other’s advertising expense.
Since At and Bt are collinear, you drop Bt and re-estimate:

Sˆt = 2586 − 78000Pt + 0.52At


t = −3.25 0.12

Multicollinearity
Correlation r – Interpretation(an other detection of
Multicollinearity)
• Positive r indicates positive linear association
between x and y or variables, and negative r
indicates negative linear relationship
• R’s always between -1 and +1
• The strength increases as r moves away from zero
toward wither -1 or +1
• The extreme values +1 and -1 indicate perfect linear
relationship (points lie exactly along a straight line)
• Graded interpretation : r 0.1-0.3 = weak; 0.4-0.7 =
moderate and 0.8-1.0=strong correlation
Normality test
• The normality test helps to determine how
likely it is for a random variable underlying the
data set to be normally distributed.
• There are several normality tests such as
– the Skewness Kurtosis test,
– the Jarque Bera test,
– the Shapiro Wilk test,
– the Kolmogorov-Smirnov test, and
– the Chen-Shapiro test.
• For this course we will focus on two tests; Skewness Kurtosis
and Jarque Bera tests because they are simple and popular.

57
Skewness Kurtosis test for normality

• Skewness is a measure of the asymmetry of the


probability distribution of a random variable about
its mean.
• It represents the amount and direction of skew.
• On the other hand, Kurtosis represents the height
and sharpness of the central peak relative to that of
a standard bell curve.
• The figure below shows the results obtained after
performing the Skewness and Kurtosis test for
normality in STATA.

58
considering a hypothetical data which contains the
following variables
Gross Domestic Product (GDP) (dependent variable).
Gross Fixed Capital Formation (GFC) (independent
variable).
Private Final Consumption Expenditure (PFC)
(independent variable).

59
• ‘sktest’ shows the number of observations (which is 84
here) and the probability of skewness which is 0.8035
implying that skewness is asymptotically normally
distributed (p-value of skewness > 0.05).
• Similarly, Pr(Kurtosis) indicates that kurtosis is also
asymptotically distributed (p-value of kurtosis > 0.05).
• Finally, chi(2) is 0.1426 which is greater than 0.05
implying its significance at a 5% level.
• Consequently, the null hypothesis cannot be rejected.
• Therefore, according to the Skewness test for normality,
residuals show normal distribution.

60
Jarque Bera test for normality
• The other test of normality is the Jarque Bera test.
• In order to perform this test, use the command ‘jb
resid’ in the command prompt.

• If the p-value is lower than the Chi(2) value then the null hypothesis cannot be
rejected.
• Therefore residuals are normality distributed.
• As per the above figure, chi(2) is 0.1211 which is greater than 0.05. Therefore, the
null hypothesis cannot be rejected.

61
Normality through histogram
• A histogram plot also indicates the normality of residuals.
• A bell-shaped curve shows the normal distribution of the series.
• In order to generate the histogram plot, follow the below procedure.
• Go to ‘Graphics’ in the main bar.
• Select ‘histogram’.
• Then choose the main variable (residual) and choose ‘Density’ under the
Y-axis section. Click on ‘OK’
• Click on ‘Add normal density plot’.
• Finally, click on ‘OK’ to generate the histogram plot showing the normality
distribution of the residuals (as indicated figure below).

• It shows a bell-shaped distribution of the residuals, confirms the normality test results
from the two tests above.

62

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy