Outline: Basic Econometrics in Transportation Basic Econometrics in Transportation
Outline: Basic Econometrics in Transportation Basic Econometrics in Transportation
Outline
Amir Samimi
2/25
3/25
An important
po ta t assumption
assu pt o in CLRM
C iss that (u2i) = σ2
t at E(u 1.. Ass peop
peoplee learn,
ea , their
t e errors
e o s of
o behavior
be av o become
beco e smaller
s a e over
ove
This is the assumption of equal (homo) spread (scedasticity). time.
Example: the higher income families on the average save more than the lower- As the number of hours of typing practice increases, the average number of
income families, but there is also more variability in their savings. typing errors as well as their variances decreases.
2. As incomes grow, people have more choices about the
disposition of their income.
Rich people have more choices about their savings behavior.
4. Heteroscedasticity can arise when there are outliers. Heteroscedasticity is likely to be more common in cross
cross-
An observation that is much different than other observations in the sample. sectional than in time series data.
5. Heteroscedasticity arises when model is not correctly specified. In cross-sectional data, one usually deals with members of a population at a
Very often what looks like heteroscedasticity may be due to the fact that
given point in time. These members may be of different sizes, income, etc.
some important variables are omitted from the model. In time series data, the variables tend to be of similar orders of magnitude
because one generally collects the data for the same entity over a period of
6. Skewness in distribution of a regressor is an other source. time.
Distribution of income and wealth in most societies is uneven
uneven, with the bulk
of the income and wealth being owned by a few at the top.
7. Other sources of heteroscedasticity:
Incorrect data transformation (ratio or first difference transformations).
Incorrect functional form (linear versus log–linear models).
6/25
7/25
O
OLS
S est
estimators
ato s aand
d ttheir
e vavariances
a ces when
w e Ideally,
dea y, we wou
would
d likee to ggive
ve less
ess we
weight
g t to the
t e observations
obse vat o s
. coming from populations with greater variability.
. Consider: Yi = β1 + β2Xi + ui = β1X0i + β2Xi + ui
Assume the heteroscedastic variances are known:
Is it still BLUE when we drop only the homoscedasticity
assumption?
We can easily prove that it is still linear and unbiased.
We can also show that it is a consistent estimator. Variance of transformed disturbance term is now homoscedastic:
It is no longer best and the minimum variance is not given by the equation
above.
What is BLUE in the presence of heteroscedasticity?
Apply OLS to the transformed model and get BLUE estimators.
8/25
9/25
10/25
11/25
There
e e are
a e noo hard-and-fast
a d a d ast rules
u es for
o detecting
detect g heteroscedasticity,
ete oscedast c ty, Nature of the Problem
Nature of problem may suggest heteroscedasticity is likely to be encountered.
only a few rules of thumb.
Residual variance around the regression of consumption on income increases
This is inevitable because σ2i can be known only if we have the entire Y
with income.
population corresponding to the chosen X’s,
More often than not, there is only one sample Y value corresponding to a Graphical Method
particular value of X. And there is no way one can know σ2i from just one Y Estimated u2i are plotted against estimated Yi
observation. Is the estimated mean value of Y systematically
Thus,
Thus heteroscedasticity may be a matter of intuition,
intuition educated guesswork,
guesswork or related
l t d to
t the
th squaredd residual?
id l?
prior empirical experience. a) no systematic pattern, perhaps no
Most of the detection methods are based on examination of OLS heteroscedasticity.
residuals. b-e) definite pattern, perhaps no
homoscedasticity.
Those are the ones we observe, and not ui. We hope they are good estimates.
Using such knowledge, one may transform the
This hope may be fulfilled if the sample size is fairly large.
data to alleviate the problem.
12/25
13/25
14/25
15/25
Spea
Spearman’s
a s Rank
a CoCorrelation
e at o Test
est Go
Goldfeld-Quandt
d e d Qua dt Test
est
Fit the regression to the data on Y and X and estimate the residuals. Rank the observations according to Xi values.
Rank both absolute value of residuals and Xi (or estimated Yi) and compute the Omit c central observations, and divide the remaining observations into two
Spearman’s rank correlation coefficient: groups each of (n − c) / 2 observations.
Fit separate OLS regressions to the first and last set of observations, and obtain
• di = difference in the ranks for ith observation.
the residual sums of squares RSS1 and RSS2.
Assuming that the population rank correlation coefficient is zero and n > 8, the Compute the ratio
significance
i ifi off the
th sample
l rs can be
b ttested
t d by
b the
th t test,
t t with
ith df = n − 2:
2
If ui are assumed to be normally distributed, and if the assumption of
homoscedasticity is valid, then it can be shown that λ follows the F distribution.
The ability of the test depends on how c is chosen.
If the computed t value exceeds the critical t value, we may accept the Goldfeld and Quandt suggest that c = 8 if n = 30, c = 16 if n = 60.
hypothesis of heteroscedasticity. Judge et al. note that c = 4 if n = 30 and c = 10 if n is about 60.
16/25
17/25
Breusch
Breusch–Pagan–Godfrey
Pagan Godfrey Test White
White’ss General Heteroscedasticity Test
Success of GQ test depends on c and X with which observations are ordered. Does not rely on the normality assumption and is easy to implement.
Estimate Yi = β1 + β2X2i + ··· + βkXki + ui by OLS and obtain the residuals. Estimate Yi = β1 + β2X2i + β3X3i + ui and obtain the residuals.
Obtain , (ML estimator of σ2) Run the following auxiliary regression:
Construct variables pi defined as
Regress pi on the Z’s as pi = α1 + α2Z2i + ··· + αmZmi + vi Higher powers of regressors can also be introduced.
o σ2i is assumed to be a linear function of the Z’s. Under the null hypothesis (homoscedasticity), if the sample size n increases
o Some or all of the X’s can serve as Z’s. i d fi i l it
indefinitely, i can be
b shown
h h nR2 ∼ χ2 (df = number
that b off regressors))
Obtain the ESS (explained sum of squares) = 0.5 ESS If the chi-square value exceeds the critical value, the conclusion is that there is
Assuming ui are normally distributed, one can show that if there is heteroscedasticity.
homoscedasticity and if the sample size n increases indefinitely, then ∼ χ2m−1 If it does not α2 = α3 = α4 = α5 = α6 = 0.
BPG test is an asymptotic, or large-sample, test. It has been argued that if cross-product terms are present, then it is a test of
heteroscedasticity and specification bias.
18/25
19/25
Heteroscedasticity
ete oscedast c ty does not
ot destroy
dest oy u
unbiasedness
b ased ess aandd W e σ2i iss known:
When ow :
consistency. The most straightforward method of correcting heteroscedasticity is
But OLS estimators are no longer efficient, not even by means of weighted least squares.
asymptotically. WLS method provides BLUE estimators.
22/25
23/25
Apart
pa t from
o being
be g a large-sample
a ge sa p e procedure,
p ocedu e, oonee drawback
d awbac of
o the
t e
White procedure is that the estimators thus obtained may not be
so efficient as those obtained by methods that transform data to
Y = per capita expenditure on public schools by state in 1979
reflect specific types of heteroscedasticity.
Income = per capita income by state in 1979
Both the regressors are statistically significant at the 5 percent We may consider several assumptions about the pattern of
level, whereas on the basis of White estimators they are not. heteroscedasticity.
Since robust standard errors are now available in established
regression packages, it is recommended to report them.
WHITE option can be used to compare the output with regular
OLS output as a check for heteroscedasticity.
24/25
25/25
Assumption 3: if ,
Assumption 4:
A log transformation such as lnYi = β1 + β2 ln Xi + ui very often reduces
Assignment weight factor = 0.5
heteroscedasticity.