Econometric S 1 R
Econometric S 1 R
DISTANCE EDUCATION
JAN:2022
1 MEKELLE TIGRAY
1
Econometrics Module
Outline
Introduction: Review of Econometrics (High attention)
Simple Linear Regression Analysis (High attention)
Multiple Linear Regression Analysis (High attention)
Violations of OLS Assumptions (High attention)
Simultaneous Equation Model and Instrumental Variables
Limited Dependent Variable Analysis (High attention)
Time Series Analysis (High attention)
Panel Data
Model Selection (High attention)
Examples (High attention)
Introduction: Review On Econometrics
Introduction: Review of Econometrics
• Research in economics, finance, management, marketing, and many other disciplines is becoming
increasingly quantitative.
• It involves estimation of different parameters or functions and quantification of different qualitative
information and hypotheses.
• These are achieved with the aid of econometric tools and techniques
Definition of Econometrics
What is Econometrics?
• The term econometrics is formed from two Greek words.
• Oikovomia which means ‘economy’ and
• Metopov which means ‘measure’.
• Simply, econometrics means economic measurement.
• Thus, Econometrics is defined as a science which deals with the measurements of economic relationship.
• It is a combination of;
• Economic theory,
• Mathematical economics and
• Statistics
• Thus, econometrics may be defined as a subject matter in which the tools of economics theory, mathematics,
and statistical inference are applied to the analysis of economic phenomena.
• Economics provides economic theory
• Mathematics puts what economic theory puts verbally into mathematical form or model
• Statistics makes the estimation of the coefficients of economic relationships
• But, Econometrics is completely different from any one of these disciplines.
Econometrics Vs. Mathematical Economics
• Mathematical economics states economic theory in terms of mathematical symbols.
There is no essential difference between mathematical economics and economic theory
• Both state the same relationships, but while economic theory use verbal exposition in terms of mathematical
symbols.
• Both express economic relationships in an exact or deterministic form.
• Neither mathematical economics nor economic theory allows for random elements which might affect the
relationship and make it stochastic.
• Econometrics does not assume exact or deterministic relationship. It account random disturbances.
Econometrics Vs. Statistics
• Econometrics differs from economic statistics. Economic statistics is mainly a descriptive aspect of
economics.
2
• Economic statistics does not provide explanations of the development of various variables
• Economic statistics doesn’t provide measurements the coefficients of economic relationships.
• Econometric methods are adjusted so they become appropriate for measurement of economic relationships
which are stochastic.
Economic Models Vs. Econometric Models
• Economic Model is an organized set of relationships that describes the functioning of an economic entity
under a set of simplifying assumptions.
• Econometric models contain a random element which is ignored by mathematical economic models.
Economic Model=
Econometric Model=
Example: Economic theory postulates that demand for a good depends on its price, on the prices of other related
commodities, on consumers’ income and on tastes.
Methodology of Econometrics
• Starting with the postulated theoretical relationships among economic variables, econometric research or
inquiry generally proceeds along the following lines/stages.
• Specification the model
• Estimation of the model
• Evaluation of the estimates
• Evaluation of the forecasting power of the estimated model
Specification of the model
• In this step the econometrician has to express the relationships between economic variables in mathematical
form. Involving steps;
1. The dependent and independent (explanatory) variables which will be included in the model.
Dependent variable = f (independent variables)
2. A priori theoretical expectations about the size and sign of the parameters of the function.
3. The mathematical form of the model (number of equations, specific functional form of the equations (linear or
non-linear), etc.)
• The most common errors of specification are:
a) Omissions of some important variables from the function.
b) The omissions of some equations (for example, in simultaneous equations model).
c) The mistaken mathematical form of the functions.
Estimation of the model
• This stage includes;
• Gathering of the data on the variables included in the model.
3
• Examination of the identification conditions of the function (especially for simultaneous
equations models).
• Checking for existence of any problems in the data such as aggregations problems.
• Examination of the degree of correlation between the explanatory variables (i.e. examination
of the problem of multicollinearity).
Choice of appropriate economic techniques for estimation, i.e. to decide a specific econometric method to be applied
in estimation; such as, OLS, Logit, Probit, Fixed effect, random effects and so on.
Evaluation of the estimates
• This stage consists of deciding whether the estimates of the parameters are theoretically meaningful and
statistically satisfactory.
Evaluation criteria involves:
• Economic a priori criteria: refer to the size and sign of the parameters
• Statistical criteria (first-order tests): evaluate the statistical reliability of the estimates of the parameters
• Econometric criteria (second-order tests): Econometric criteria aim at the detection of the violation or
validity of the assumptions of the various econometric techniques.
Evaluation of the forecasting power of the model
• The model may be economically meaningful and statistically and econometrically correct; yet it may not be
suitable for forecasting due to various factors (reasons).
• Therefore, this stage involves the investigation of the stability of the estimates and their sensitivity to
changes in the size of the sample.
• Consequently, we must establish whether the estimated function performs adequately outside the sample of
data. i.e. we must test an extra sample performance of the mode
4
Goals of Econometrics
• Three main goals of Econometrics are identified:
– Analysis i.e. testing economic theory
– Policy making i.e. Obtaining numerical estimates of the coefficients of economic relationships for
policy simulations.
– Forecasting i.e. using the numerical estimates of the coefficients in order to forecast the future values
of economic magnitudes.
The Sources, Types and Nature of Data
• There are several types of data with which economic/econometric analysis can be done.
– Cross-sectional data: collected from different parties or entities at a given point in time.
– Time’s series data: consists of observations on a variable or several variables over time.
– Panel or longitudinal data: consists of a time series for each cross-sectional member in the data set.
The basic linear regression model and the relationship between Y and X is given by:
Yi = b 0 + b1 X i + e i , i = 1,2,..., n (1.1)
Equation (1.1) that represents the classical linear regression model with a single regressor is
called estimator. While Y represents the dependent variable (explained variable), X represents
the independent (explaining) variable. The constant values b 0 and b1 are called coefficients or
estimates. In another way, b 0 is the intercept and b1 is the slope. The slope b1 is the change in
Y
associated with the change in X . The intercept b 0 is the value of the regression line (Y )
when X = 0 ; the point at which the regression line intersects the Y axis. The random component
of the model (referred to more commonly as the error term) that represents the other factors
(variables) not included in the model but determine the dependent variable (Y ) is represented
bye . The subscript i shows the unit of observation. The coefficients describe both the direction
and magnitude of the relationship between the dependent variable, Y and the independent
variable, X .
Individuals (i )
1 2 3 4 5 6 7 8
4 3 3.5 2 3 3.5 2.5 2.5
Consumption (Birr) (21
Y 15 15 9 12 18 6 12
)
5
Figure 1.1: Representation of income–consumption data by different lines
As you can see, many straight lines can be drawn (chosen) to fit the points, say in this case L1, L2
and L3. But, if we want to represent (fit) all the points by a single line, which line fits best to the
scattered points? There exists a procedure to get the line of best fit involving deviations and
squared deviations. The line of best fit is the line that minimizes the sum of squared deviations of
the points of the graph from the points of the straight line. The deviations are measured by the
vertical distance between the straight line and the scattered points of the graph. The process of
fitting the line that fits best the scattered points using the principle of minimum sum of squared
deviations is called the Ordinary Least Squares (OLS).
6
Figure 1.2: Deviations of the scattered points (from Line 3)
The coefficients b 0 and b1 are unknown but we can use the Least Squares procedure to estimate
them. The following graph shows how actually the parameters of the linear regression model are
estimated using the Least Square method. In order to do this, we must use data to the unknown
slope ( b1 ) and intercept ( b0 ) of the regression line.
Y •
Least Squares
i •
Deviation: Yi - Yi Y = b 0 + b1 X
•
•
•
•
•
•
X
Xi
Figure 1.3: Constructing regression line using Least Squares
7
To estimate the coefficients through minimizing the sum of squared deviations, we go through
the following process:
Where Yi = b 0 + b1 X i represents the equation of the straight line with intercept given by b 0 and
slope given by b1 . In those notations, Yi is the actual value for observation i and corresponds to
the value of X for that observation, while n is the number of observations. Yi , called the fitted
or predicted or estimated value is the value of Y on the straight line associated with
observation X i . The deviation (residual) is calculated by subtracting the fitted value of Yi (which
X there is a corresponding
is Yi ) from the actual value ( Yi ). Thus, for each observation on
deviation of the fitted value from the actual value of Y . So, our objective is to minimize the sum
of squared deviations where we can use elementary calculus to estimate the coefficients from. In
the process, we use procedures of partial derivatives with respect to b 0 and b1 , and setting each
first-order condition equal to zero to arrive at the formulae for calculating the values of
coefficients using simultaneous equations.
2
2
å( ) =å (Y - b - b X )
n n
Minimize Yi -Yi i 0 1 i
i=1 i=1
¶ n (Y - b
i 0 X ) = -2 (Y - b
-b 1 i 2 - b X )
n i 0 1 i
¶b0 å
i=1
n
å i=1
n
¶ å (Y i - b0 - b X ) = -2å(Y
1 i
2
i - b 0 - b1 X i )
¶b1 nX i
(
- 2å Yi - b0 - b1 X i = 0 )
i=1
8
n
(
- 2å Yi - b0 - b1 X i X i = 0 ) (1.4
i=1
By rearranging equations (1.3) and (1.4) we obtain a pair of simultaneous equations, called
Normal Equations, given below.
n n
Y
å i = b 0 n + b1 å X i
i=1 i=1
n n n 2
å
i=1
X iYi = b 0 å X i + b1 å X
i=1 i=1
i
n
Now we can solve for b 0 and b1 simultaneously by multiplying the normal equations by å Xi
i=1
and equation by n , respectively.
n n n n n
Y
å Xi å i = b 0 n å X i + b1 å X i å X i
i=1n i=1 n i=1 ni=1 i=1
n X Y = b0 n
i X + b1 n X
å i å i å i
2
i=1 è i=1 ø
Having obtained the coefficients of the model ( b 0 and b1 ), they need to be incorporated into the
model and the standard errors of the respective coefficients have to be estimated as well and
fitted within the model as follows:
9
To calculate the standard errors of each coefficient, we need to calculate first the estimated
population variance ( s2 )1. The estimate of the population variance is calculated as follows:
s2 =
åê i
2
=
å (Y i -b )
ˆ 0 - bˆ 1 X i 2
(1.5
n- 2 n- 2
æ å Xi ö
2
Notice that the standard errors of the coefficients ( s.ebˆ and s.ebˆ ) measure the dispersion of the
1 0
estimates about their means. How do they differ from s = s 2 ? Remember s is the standard
error of the regression, which measures the dispersion of the error term associated with the
regression line.
The formulae will be less complicated if we write the Least Square estimates in terms of
variables that are expressed as deviations from their respective sample means. Hence, we
transform the data to deviations form by expressing each observation of X and Y in terms of
deviations from their respective means. The deviations form are represented by lower cases of
X and Y .
xi = X i - X and yi = Yi -Y
Summing the population regression function, Yi = b 0 + b1 X i + e i over all n observations and
dividing it by n , we find that
1
Notice that s 2 is known as Root MSE in Stata’s ANOVA table.
10
Example 1.1: manual estimation of coefficients using OLS
Observations ( n Y X Yi -Y Xi - X X i2
) Yi Xi 1 7.5 Yi X i
3 13.5 441
1 4 21 0 1.5 84
3 13.5 225
2 3 15 0.5 1.5 45
3 13.5 225
3 3.5 15 -1 -4.5 52.5
3 13.5 0 -1.5 81
4 2 9 3 13.5 144 18
0.5 4.5
5 3 12 3 13.5 324 36
-0.5 -7.5
6 3.5 18 3 13.5 -0.5 -1.5 36 63
7 2.5 6 3 13.5 0 0 144 15
2.5 12 30
24 108X -Y ) 1620
n
å24Yi X
108 i åY å å(Y i å (X i -
n
X i2 X iYi
343.5
i=n n
å n
å i=n å
n n n 8´ (i=n
343.5) - (24 ´108) = 2748 - 2592 156 i=n
1. Y
n å n X iYi - å n X i å
i
b1 = i=1
å i å
i=1
= i = = 0.12
8´(1620)- (108)
2 2
n
2 æ n ö 12960 -11664
ån÷ Yi -X b1-åç X i X 24 - 0.12 ´108 24 -12.96
= = = 1.38
i=1 i=1 8 8 8
2. b 0 = n =
So, the fitted or estimated regression line of the income–consumption data is normally given as: Y =1.38 )+ 0.(0.026
12 X , the numbers
in )
parentheses are the standard errors of the estimated coefficients. That is normally how a fitted regression line is sketched.
11
n n n n
Y b b1 Xi
n n n n å i å 0 å åe i
åY = å b i 0 + b å X +å
1
e
i
i Þ
i=1
n =
i=1
n +
i=1
n +
i=1
n = Y = b0 + b1 X + e
i=1 i=1 i=1 i=1
Subtracting equation (6) from the population regression function and combining like terms gives
Yi -Y = b1 (X i - X )+ (e i -e ) Þ yi = b1 xi + e i (1.6)
Equation (1.6) represents the regression function in deviations form and notice that the intercept,
b 0 dissolves out. From equation (1.6) , the estimated slope of the regression line is given by
x
b1 = å i y i
åx i
2
If you calculate the value of b1 in Example 1.1 using this formula, the result would be exactly the
same.
1. The model is linear in parameters, without regard to the linearity of the dependent and
independent variables. The relationship between X and Y is linear.
2. The error term, e is a random (real number) variable
i. With zero mean or expected value of the error term has mean zero given any value of the
explanatory variable, X .
E (e i ) = 0 Þ E (ei X ) = 0 "i
2 Assumption is a statement that is assumed (considered to be true) and taken for granted without
proof and from which a conclusion can be drawn.
12
Thus, observing a high or a low value X does not imply a high or a low value of e . This
effectively means X and e are uncorrelated. The implication is that changes in X are
associated with changes in e in any particular direction. Hence, the associated changes in Y can
be attributed to the effect of X . This assumption allows us to interpret the estimated coefficients
as reflecting causal effects of X on Y .
ii. With constant variance (homoskedastic or homoscedastic distribution)
Var(ei ) = E (ei - E (ei )) = E (ei2 )= d
2 2
3. The error terms from any two observations are uncorrelated with each other, which implies
there is no autocorrelation (no serial correlation). When the observations are drawn
sequentially over time (time series data), we say that there is no serial correlation or no
autocorrelation. When the observations are cross sectional (survey data) we say that we have
no spatial correlation.
( )
Cov ei ,e j = 0 " i, j i ¹ j
4. The error term ( e ) is independent of the independent variables, X ’s. It follows that there is
no correlation between the random variable ( e ) and the explanatory variable ( X ). If two
variables are unrelated, then their covariance is zero.
Cov(ei , X ) = 0 " i
5. The variance of the independent variable X must be non-zero.
Var( X i ) > 0
This is a crucial requirement. To identify the effect of X on Y , it must be that we observe
situations with different values of X . In the absence of such variability, there is no information
about the effect of X on Y . It means that the values of the independent variable ( X ) should not
be constant.
6. The error term has a normal distribution with mean zero and constant variance.
e ~ N (0,d )
2
If conditions 1–6 hold, we have the Best Linear Unbiased Estimator (BLUE). We have an
unbiased estimator when over repeated samples; the estimator gives us the true population
parameter. We have efficient (best) estimator when we have an unbiased estimator that yields
smallest variance for the coefficients, bˆ s. We have a consistent estimator when we find over
'
13
many different samples, the Ordinary Least Square (OLS) estimates is close to the population
estimates.
A good starting point for this section is to notice that the Least Square estimates result from a
specific sample of observations of dependent and independent variables. The coefficients
produced from Least Squares are based on a single sample. It follows that the estimates may vary
from sample to sample. Remember also that the estimates of Least Square ( bˆ 0 and bˆ1 ) refer not
only to regression estimates from a specific sample but are also used to make inferences about
the population from which the sample is drawn (i.e. the estimator or formula which is also used
to compute the estimates from many different samples).
a) Unbiasedness
We want our estimator to be unbiased. Remember that there actually exist true values of the
coefficients for population regression function, which of course we do not know about. These
reflect the true underlying relationship between Y and X . We want to use a technique to
estimate these true coefficients. Our results will only be approximations to reality. An unbiased
estimator is such that the average of the estimates, across an infinite set of different samples of
the same size n , is equal to the TRUE value. This means that on average the estimator bˆ is
correct, even though any single estimate of bˆ for a particular sample of data may not equal
b.
Mathematically,
ˆ ˆthis unbiased estimator is given by
In()
E other
b = words, ()
b Û E the
b -average
b = 0 or expected value of b is equal to the true value of b .
14
Table 1.2: illustration of an unbiased estimator
+--------------------------------------------------+
ˆ ˆ
| Samples b0 | b1
|--------------------------------------------------|
| Sample 1 1.21851 1.584188 |
| Sample 2 .82502 2.5564 |
| Sample 3 1.375252 1.32566 |
| Sample 4 .9216356 2.106887 |
| Sample 5 1.056685 2.11987 |
|--------------------------------------------------|
| Sample 6 1.048275 1.818525 |
| Sample 7 .9140797 1.657301 |
| Sample 8 .7885023 2.957194 |
| Sample 9 .658188 2.293599 |
| Sample 10 1.085249 2.345555 |
|--------------------------------------------------|
| Average across 10 samples .9891397 2.076518 |
| Average across 500 samples .9899374 2.004986 |
+--------------------------------------------------+
An estimator is efficient if within the set of assumptions that we make, it provides the most
precise estimates in the sense that the variance is the lowest possible in the class of estimators we
are considering. How do we choose between the OLS estimator and any other unbiased
estimator? Our criterion is efficiency.
() ()
Var bˆ £ Var b
~
The variance of an estimator is an inverse measure of its statistical precision, i.e., of its
dispersion or spread around its mean. The smaller the variance of an estimator, the more
statistically precise it is. A minimum variance estimator is therefore the statistically most precise
estimator of an unknown population parameter, although it may be biased or unbiased. Among
all the linear unbiased estimators, which one has the smallest variance? It is OLS. Thus,
efficiency which includes unbiasedness and minimum variance characteristics is also another
desirable property of an estimator.
iii. Consistency
An estimator is said to be consistent, if bˆ approaches its true value, b when the sample size
gets
larger and lager (approaches infinity). More formally, b is a consistent estimator of b if the
15
probability limit of bˆ is b . Given assumptions 1–6, the Ordinary Least Squares Estimator
(OLS) is the Best Linear Unbiased Estimator (BLUE). This means that the OLS estimator is the
most efficient (least variance) estimator in the class of linear unbiased estimators. This is known
as the Gauss-Markov Theorem.
16
The t - test analyzes the significance of each coefficient. Apart from t - test, we can also test if the
overall model is good in the sense that the independent variables explain the change in the dependent
variable well enough (stated otherwise, if the coefficients of the model are together equal to zero). The
F - test enables us to perform this kind of test. The following formula shows the F - test.
Explained Variance ES
K-
bˆ12 å xi R2
K -1
FK -1,n-K = = = =
Unexplained Variance RS n-K s 2 (1-R )n-K
2
Yi = b 0 + b1 X 1i + b2 X 2i + b3 X 3i + ..., b k X ki + e i (1.8)
17
where Y is the dependent variable, the X ' s are the independent variables and e is the error term.
This model is an extension of the two-variable model and the mechanics of the method of least
squares through minimizing the sum of squared deviations works the same way. The multiple
regression model has to satisfy, in addition to the 6 assumptions specified in the two-variable
model. These additional assumptions are discussed below.
1.6.1 Multicollinearity
One of the assumptions of the multiple regression models is that there is no exact linear
relationship between any of the independent variables in the model. If such exact linear
relationship exists, we say that the independent variables are perfectly collinear. Explanatory
variables are rarely uncorrelated with each other suggesting that multicollinearity is only a matter
of degree. As an illustration, let us assume that a dependent variable Y (say, grade point average)
is explained by the following independent variables:
Yi = b 0 + b1 X 1i + b 2 X 2i + b3 X 3i + e i
X 1 = family income, thousands of birr per year
X 2 = average hours of study per day
X 3 = average hours of study per week
It is easy to see that variables X 2 and X 3 are perfectly collinear because X 3 =7 X 2 for each
observation unit. When such an exact relationship exists among independent variables, it will be
impossible to calculate the least square estimates. The term multicollinearity is broadly used to
include the case of (very) high collinearity among two or more independent variables. In
practice, we are often faced with the more difficult problem of having independent variables with
a high degree of multicollinearity. It is unusual for there to be an exact relationship among the
explanatory variables in a regression. When this occurs, it is typically because there is a logical
error in the specification. In practical applications in general, multicollinearity is said to occur
when two or more independent variables are highly (but not perfectly) correlated with each
other. If there is no perfect collinearity, estimation of the coefficients is possible but the
interpretation of the coefficients would be very difficult. A given change in one of the highly
correlated variable is likely to bring a change to the other independent variable in a predictable
18
similar fashion. It is difficult to attribute the change in the dependent variable to either of the
highly correlated variables and hence the difficulty of the interpretation of the coefficients of
these variables. That is why it is difficult to separate the effect each independent variable.
When multicollinearity is present, the estimator remains unbiased but estimated variances of the
coefficients ( bˆ ) are very large. This implies the reliance we can place on one or the other will
's
be small. Large variance means large standard errors and the confidence interval tend to be much
wider, leading to the acceptance of the null hypothesis when it is in fact false. Large standard
errors can result in very small values of computed t - ratio resulting in statistically insignificant
coefficients when tested individually. Despite the small t - ratios and statistically insignificant
coefficients, R 2 measuring goodness of fit of the model can be (very) high. Another effect is that
in the presence of multicollinearity, OLS estimators and their standard errors can be sensitive to
small change (even the slightest change) in the data.
Indications of multicollinearity
An estimated model with high standard errors and low t - ratios could be an indicative of
multicollinearity, but it could alternatively suggest that the underlying model is poorly specified.
How can we tell (test) the presence of multicollinearity?
1. A relatively high R 2 in a regression model with few significant t - ratios is an indicator
of multicollinearity. In fact, it is possible that the F - statistic for the regression model is
highly significant and none of the individual coefficients is significant.
2. Relatively high pair-wise correlations between two or more of the independent variables
may indicate multicollinearity. Testing the presence of multicollinearity using solely pair-
wise correlations should be made carefully.
3. Formal tests such as VIF (Variance Inflation Factor) and TOL (Tolerance): despite these
tests are not universally accepted, they may help in identifying the presence of
multicollinearity. VIF shows how the variance of an estimator is inflated by the presence
of multicollinearity. The rule of thumb is that if VIF is larger than 10, multicollinearity is
a serious problem (the variable is said to be highly collinear).
19
As partly remedial measures, increasing the sample size may reduce the problem of
multicollinearity as covariance of two independent variables is inversely related to the sample
size. Dropping one of the collinear variables may also reduce the problem; however we have to
be careful not to drop an important variable because it can result in specification error or biased
coefficients. Including a new variable may also reduce the problem of multicollinearity.
Combining the two collinear variables (whose unit of measurement is the same) into one index
can also reduce the problem of multicollinearity.
The problem of omitted variable bias arises as a result of excluding important and relevant
variables from a model. Omitting an independent variable which has an impact on the dependent
variable and is correlated with the included independent variables leads to omitted variable bias.
Omitting a relevant variable thus results in biased estimates. We may be forced to omit a relevant
variable for the following reasons: ignorance, lack of data about the specific variable, incomplete
observations, multicollinearity problems and etc.
Now suppose that we omit a variable that actually belongs in the true (or population) model. This
is often called the problem of excluding a relevant variable or under-specifying the model (or,
misspecification). Imagine that the true relationship between Y and X ’s is
Y i = b 0 + b1 X 1 + b 2 X 2 + e i
But if one estimates the following model (i.e., a relevant independent variable, X 2 is excluded
from the model)
Yi = b 0 + b1 X 1 + ei* (1.9)
The variable X 2 as a result goes to the error term (random component) of the model where
e i* = b 2 X 2 + ei , and when this is so
( )
E ei* = E (b2 X 2 + e i ) = E (b2 X 2 )+ E (ei ) = b 2 X 2 ¹ 0
20
It follows that excluding a relevant variable X 2 results in biased estimates for the included
variable, X 1 (i.e. b1 is biased).
Goodness of fit measures and hypothesis testing ( R 2 , testing each b using t - test and
F - test for joint testing) are just ways of making sure if our model reliably predicts the variation
in the dependent variable. The results from these post-estimation techniques are meaningful if
and only if we estimated the model with the right functional form. So, a model that includes the
appropriate independent variables may be mis-specified because the model may not reflect the
algebraic form of the relationship between the dependent and independent variables. As
indicated earlier, theory is often silent over whether a model should be estimated in level terms;
21
as a log-linear structure; as a polynomial for one or more of the independent variables or in
logarithms.
For instance, suppose that the true model specifies a nonlinear relationship between Y and X ' s –
such as a polynomial relationship–and we omit the squared term. Doing so would be mis-
specifying the functional form. Likewise, if the true model expresses a constant-elasticity
relationship, the model fitted based on logarithms of Y and X could render conclusions different
from those of a model fitted in level terms of the variables. Thus, in a misspecification of the
functional form, we have all the appropriate variables at hand and we only have to choose the
appropriate algebraic form in which they enter the regression function. Ramsey’s regression
specification error test (RESET) implemented by Stata’s estat ovtest3 can provide a useful
information about the problem of functional misspecification and omitted variable bias for linear
models. Ramsey’s RESET runs an augmented regression that includes the original regressors,
powers of the predicted values from the original regression and powers of the original regressors.
Usually, the powers of the predicted values are 2 or 3. Using F - test , we then test the null
hypothesis of no misspecification ( H0 :a 2 = a3 = a4 = an = 0 ). If the null hypothesis of no
misspecification holds, our model has no omitted variables and no misspecification of functional
form.
Consider the following Stata output and the subsequent RESET test. Only the powers of the
predicted values are used in the Stata output below. Moreover, Stata’s default use of 3 power
levels was used.
. regress wage hours iq kww educ exper age famsize meduc
3
In Stata version 8 and below, this command is only ovtest.
22
Source | SS df MS Number of obs = 857
-------------+------------------------------ F( 8, 848) = 25.87
Model | 27852695.7 8 3481586.96 Prob > F = 0.0000
Residual | 114113060 848 134567.288 R-squared = 0.1962
-------------+------------------------------ Adj R-squared = 0.1886
Total | 141965756 856 165847.846 Root MSE = 366.83
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hours | -3.390489 1.757339 -1.93 0.054 -6.839733 .0587553
iq | 3.907971 1.046046 3.74 0.000 1.854829 5.961114
kww | 7.105818 2.12599 3.34 0.001 2.932998 11.27864
educ | 43.57976 7.97551 5.46 0.000 27.9257 59.23381
exper | 12.54908 4.003661 3.13 0.002 4.690833 20.40733
age | 7.069505 5.356417 1.32 0.187 -3.443885 17.58289
famsize | -2.150576 5.995267 -0.36 0.720 -13.91788 9.616727
meduc | 10.93596 4.917083 2.22 0.026 1.28488 20.58704
_cons | -612.0657 193.4389 -3.16 0.002 -991.7408 -232.3906
------------------------------------------------------------------------------
For the above linear regression function of the wage variable and independent variables (level–
level form), the following RESET test is produced.
estat ovtest
The F - test from Stata’s RESET output shows the null hypothesis of no misspecification is
rejected ( F * = 3.04 is greater than Fc = 2.64 ). Notice that the above test incorporates the powers
of the predicted values only, which is Stata’s default. However, the powers of the regressors can
also be used. The RESET test including the powers of the original regressors is presented as
follows:
estat ovtest,rhs
Ramsey RESET test using powers of the independent variables
Ho: model has no omitted variables
F(24, 824) = 1.56
Prob > F = 0.0419
We can reject the RESET’s null hypothesis of no omitted variables for the model. Both of these
tests indicate that there is indication of misspecification (indication of omitted variable bias or
functional form misspecification). Let us run the regression by including an interaction variable
expeduc (= educ*exper), [using Stata’s generate command]. In the Stata output below,
23
the interaction variable (expeduc) is evidently significant, so a model excluding that term can
be considered mis-specified, which is also in line with the RESET test given below.
regress wage hours iq kww educ exper age famsize meduc expeduc
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hours | -3.275315 1.752191 -1.87 0.062 -6.714461 .1638299
iq | 3.861757 1.042794 3.70 0.000 1.814994 5.908521
kww | 6.529982 2.130969 3.06 0.002 2.347383 10.71258
educ | 2.517139 17.90253 0.14 0.888 -32.62138 37.65566
exper | -40.03575 20.92583 -1.91 0.056 -81.10832 1.036823
age | 9.190347 5.402864 1.70 0.089 -1.414225 19.79492
famsize | -1.675904 5.97861 -0.28 0.779 -13.41053 10.05872
meduc | 10.62493 4.902567 2.17 0.030 1.002327 20.24754
expeduc | 3.979945 1.55473 2.56 0.011 .9283689 7.031521
_cons | -101.9723 277.2744 -0.37 0.713 -646.1978 442.2532
------------------------------------------------------------------------------
estat ovtest
The calculated F-value (which is equal to 1.97) is less than the F-critical (2.65) which means that
we cannot reject the hypothesis of no misspecification (you can alternatively check the F–test p-
value). Notice that in this particular model specification, though the interaction term is
significant, the variable educ has become insignificant whereas the exper variable is only
significant at 10%.
1.6.4 Heteroskedasticity
There are often instances in econometric modeling where the assumption of homoskedastic
variance of the error term is unreasonable (the constant variance assumption of the error terms
fails). As an illustration, consider a cross section of firms in one industry. Error terms associated
with large firms may have large variances than those error terms of the smaller firms in the
industry. Another example is the cross section of household income and expenditure. It seems
reasonable to observe that low income households would spend rather steadily while the
24
spending pattern of high income households would be volatile. This suggests that in a model
where expenditures are the dependent variable, error variance associated with high income
households would be greater than the error variance of low income households. This
phenomenon of non-constant variance of the error terms is known as heteroskedasticity. It is a
systematic pattern of the error terms where the variances of the error terms are not constant.
The case of heteroskedasticity therefore means that the variance of the error term can vary
among observation units (individuals, households, firms, farms, etc.) as follows:
Hetroskedasticity in particular can also occur if the variance of the errors is a function of
explanatory variables. This case is defined as follows:
In figure 1.4, the presence or absence of heteroskedasticity is demonstrated. Panel (a) shows the
case of homoscedasticity (constant variance). Panels (b), (c) and (d) illustrate cases of
heteroskedasticity. In panel (b), the variance (di2 ) is observed to be high in the middle of the
data. In panel (c), variance of the error term is observed to be higher for low values of the X -
variable (i.e. income). Panel (d) shows the case of increasing error variance with the variable X
(variance of the error term increases with income).
But, why does heteroskedasticity occur? Some of the reasons where the variances of the error
terms may not be constant include:
(1) Error learning model: this assumes that as people learn, their error behavior becomes
smaller in which case variance of error terms (di2 ) is expected to decrease.
(2) As data collection techniques improve, error variance (di2 ) is also expected to decrease.
(3) The presence of outliers may also result in heteroskedasticity.
(4) Hetroskedasticity may also arise due to the fact that some important variables are omitted
from the model or the wrong functional form has been used.
25
Income
Income (b) heteroskedastic errors
(a) homoskedastic pattern of errors
Income Income
(c) heteroskedastic errors (d) heteroskedastic errors
26
biased estimators of the true variances (standard errors) of the population parameters (the
formulae used to estimate the standard errors of the coefficient are no longer correct). In that
case, the usual t -tests and F -tests will be misleading for drawing inferences (conclusions).
When heteroskedasticity is present, Ordinary Least Square estimation places more weight on the
observations with large error variances than those with smaller variances.
Panel (a) in figure 1.5 indicates homoskedastic error terms as there is no systematic pattern
between the residual ( ê i ) and the independent variable ( X ). Panels (b) – (f) indicate systematic
relationships between residuals ( ê i ) and
X and Yˆ , which suggest the presence of
heteroskedasticity in the data. Visual inspection of the datai and plot relationships therefore can
tell if systematic pattern is observed in the data. If there is a constant spread of the residuals
across all values of X , it is an indication of homoskedastic error variance. It is therefore always
a good idea to inspect the data before going to the more formal tests to check for the presence of
heteroskedasticity. Some of the formal tests for heteroskedasticity are White test, Goldfeld-
Quandt test and Breusch-Pagan test.
27
ê êi êi
i
Xi Xi
Xi (b) (c)
(a)
êi êi
Xi
Xi (e)
(f) Yˆ
(d)
Figure 1.5 detecting heteroskedasticity using plots
28
If the regression model we have is given by
Y i = b 0 + b1 X 1 + b2 X 2 + b3 X 3 + e i
29
Y i = b 0 + b1 X 1 + b2 X 2 + b3 X 3 + e i
Assume that the error variance d i2 is described as a function of Z i variables (where some or all
of the X ' s can serve as Z ' s ) given by
d i2 = f (a0 +a1Z1i + +a2 Z 2i + ... +a k Zki )
In which case d i2 is a linear function of the Z ' s given as
N
ê i2
squared, .
dˆ
4. Run the regression of normalized squared residuals on the independent variables (auxiliary
regression)
ê i2
= a0 + a1 Z1 + a2 Z 2 + a3 Z 3 + ... + ak Z k + u i
dˆ
1
5. Obtain the test statistic, which is equal to
ESS from this auxiliary regression
2
6. Assuming error terms ( e i ) are normally distributed, one can show that if there is
homoscedasticity, then test statistic is asymptotically distributed with a Chi-square
2
1-a ,K
distribution with 1-a significance level and K -1 degrees of freedom ( c - )
4
Remember that the given formula for estimated residual variance is using the formula for maximum
2
likelihood estimator. The estimated residual variance using OLS is dˆ 2 = åê i .
N-K
30
7. Given the following hypothesis
H 0 :d i2 = d (homoskedastic )
2
.
H1 : H 0 is not true (hetroskedastic )
The null hypothesis of homoscedasticity is rejected if the test statistic ( 1 2 ESS ) is larger than the
2
1-a ,K
Chi-square critical test
valuefor - )
( cheteroskedasticity
Goldfeld-Quandt
This method of testing for heteroskedasticity is applicable if one assumes that the heteroskedastic
variance is positively related to one of the explanatory variables in the regression model. Assume
that we are considering the two-variable model:
Y i = b 0 + b1 X i + e i (1.10
Suppose the variance d i2 is positively related with X i as
2
d i2 = d X i2
The Goldfeld-Quandt test procedure involves the calculation of two least-squares regression
lines, one using data thought to be associated with low variance errors and the other using data
thought to be associated with high variance errors. If the residual variances associated with each
regression line are approximately equal, the homoskedastic assumption should not be rejected,
but if the residual variance increases substantially, it is possible to reject the null hypothesis. The
test is carried out in the following step-wise manner:
1. Order or rank the data (observations) according to the values of X i , the variable that is
thought to be related to the error variance (begin with the lowest X value.
2. Omit the middle C5 observations, where C is usually about one-fifth of the total sample size
( N ). Next, divide the remaining ( N - C ) observations into two groups each of which has
( N -C ) observations.
2
5
Notice that C must be small enough to ensure that sufficient degrees of freedom are available to
allow for the proper estimation of each separate regression.
31
3. Fit (estimate) separate regressions, the first for the portion of the data associated with low
values of X (indicated by subscript 1) and the second associated with high values of X
(indicated by subscript 2).
4. Calculate the Residual Sum Squares ( RSS ) associated with each regression: RSS1 ,
associated with low X ' s , and RSS 2 , associated with high X ' s .
RSS 2 df
5. The test statistic is l = where the degrees of freedom of each regression line is
RSS1 df
æN- C- ö
ç 2K ÷ wher K is the number of coefficients to be estimated in one of the
è ø
regressions. Assuming the errors are normally distributed, the test statistic is distributed with
an F -distribution with degrees of freedom of (N - C - 2K 2) in both the nominator and
denominator.
6. Given the following hypothesis
H 0 :d i2 = d (homoskedastic )
2
.
H1 : H 0 is not true (hetroskedastic )
The null hypothesis of homoscedasticity is rejected at the chosen level of significance if the
calculated test statistic is larger than the critical value of the F -distribution.
32
illustration of how heteroskedasticity can be corrected. Unlike the classical OLS, assigning
different weight to observations using the known variances is known as the Weighted Least
Square. The weighted least squares estimation procedure (which is derived from the Maximum
Likelihood function) can be illustrated using the two-variable model.
Y i = b 0 + b1 X i + e i
If the heteroskedastic variances of the error terms are known, and are given by:
Var(ei ) = di2
Dividing each term of the equation by d i , by weighting the original data we get
æç 1 ÷ö æç b 0 + b1 X i + ei ö
Yi d i = b 0 d i + b1 X i d i + e i d i Û Yiç ÷ ÷=
ç ÷
èd i ø è di ø
This equation can be transformed into the usual regression model format as
* *
Yi* = b 0 + b1 X i* + e i* (1.11)
What is the purpose of transforming the original model (i.e. dividing it with d i )? To see this,
notice the following feature of the transformed error term, e i*
2
æei ö 1
Var (ei ) = E (ei )÷ = Eçç E (e i )
2 2
* *
÷ =2
Since d i2 is known
di
di
è ø 2
1 2 Since E (ei ) is d i2
= 2variance
The di is now constant. That is, the variance of the transformed disturbance term e i* is
di
now homoskedastic. Since we are still retaining the other assumptions of the classical model, the
finding that e * is homoscedastic suggests that if we apply OLS to the transformed model of it
will produce estimates that are BLUE. In short, the estimated b 0* and b1* are now BLUE and not
the OLS estimators b 0 and b1 . This procedure of transforming the original variables in such a
way that the transformed variables satisfy the assumptions of the classical model and then
applying OLS to them is known as the method of generalized least squares (GLS). In short, GLS
is OLS on the transformed variables that satisfy the standard least-squares assumptions. The
estimator thus obtained is known as GLS (WLS) estimator, and it is this estimator that is BLUE.
33
The appropriate estimator of b 0* and b1* is then obtained by minimizing the residual sum square
given by
å (ê ) = å (Y )
2 2
i
*
i
* ˆ *0 - bˆ *1 Xi*
-b
2
Minimiz å çæç Yi - b0 - b1 X
ˆ
d
ˆ
i
ö
Û Minimize
å ÷
( )2
Yi* - bˆ *0 - bˆ 1* Xi* (1.12
è i ø *
Partially differentiating the function with respect to both coefficients, b 0 and b1 are given by
*
n
æ ö æ n öæ n ö
å ( )
wi ç å wi X Yi i ÷ - ç å wiX i÷ç å wY i i÷
è i=1 ø è i=1 øè i=1 ø , where w = 1 d 2
b1* =
n n
2
i i (1.13
æ 2
ö æ w ö
(
å wi èç i=1 )
å wi X i÷ø- çè å
i=1 iX i÷ø
Since equation (1.12) minimizes a weighted Residual Sum Squares, it is appropriately known as
weighted least squares (WLS), and the estimators thus obtained and given in equation (1.13) are
known as WLS estimators. But WLS is just a special case of the more general estimating
technique, GLS. In the context of heteroskedasticity, one can treat the two terms WLS and GLS
interchangeably.
If there is an indication (for example from visual inspection of the graph between error terms
and X ' s ) that the variance of the error terms ( e i ) is proportional to the square of the explanatory
variable X , one may transform the original model as follows. Divide the original model by X i .
Yi b 0 b1 X i e i Y 1
=
Xi Xi
+
Xi
+
Xi
Û i = b0
Xi Xi
+ b1 + u i (1.14
where ui is the transformed error term, equal to e i X i and now it is easy to verify that
34
2
1
but E (ei ) = d 2 Xi2
2 2
Var (ui ) = E (ui ) = E æçe i X i ö ÷ = E (ei2 ) ,
è ø X i2
It follows that Var (ui ) = 1 d X i2 = d
2 2
X2
Hence the variance of ui i X i ) is now homoskedastic, and one may proceed to apply OLS to
the transformed equation, regressing X i on1 X i . Notice that in the transformed regression the
intercept term b1 is the slope coefficient in the original equation and the slope coefficient b 0 is
the intercept term in the original model.
If it is believed that the variance of the error term ( e i ) is proportional to X i , then the original
Apart from transforming the model to obtain homoskedastic error terms the following can also
be used as remedies for heteroskedasticity.
35
2. Heteroskedastic Consistent Standard Errors (White standard errors)
Where re-specification will not solve the problem, use robust Heteroskedastic Consistent
Standard Errors (White standard errors)
1.6.8 Autocorrelation
The classical regression model includes one important assumption about the independence of the
error terms from observation to observation. This assumption is the error terms from any two
observations are uncorrelated with each other, which implies there is no autocorrelation6 (no
serial correlation).
( ) ( )
Cov ei ,e j = 0 Þ E ui , u j = 0 "i, j i ¹ j
If this assumption is violated, the errors in one time period are correlated with their own values
in other periods and there is the problem of autocorrelation–sometimes referred to as serial
correlation. All time series variables can exhibit autocorrelation, with the values in a given time
period depend on values of the same series in previous periods. The assumption that errors
corresponding to different observations are uncorrelated often breaks down in time-series data.
When error terms from different (usually adjacent) time periods are correlated with each other,
we say that the error terms are serially-correlated or auto-correlated. Autocorrelation occurs in
time-series studies when errors associated with observations in a given time period carry over
into future time periods. Autocorrelation can be positive as well as negative, although most
economic time series generally exhibit positive autocorrelation. This is because most of them
either move upward or downward over extended time periods; and they do not exhibit constant
up-and-down movements. Incorrect functional form, omitted variables and an inadequate
dynamic specification of the model may all lead to the problem of autocorrelation.
36
the preceding period alone, we say that e t ' s follow a first order-autoregressive scheme AR (1).
That is, in its simplest form, the errors in one period are related to those in the previous period by
a simple first-order autoregressive process. Thus, the error term in the simplest classical
regression model:
Y t = b 0 + b1 X t + e t (1.16)
is assumed to depend upon its predecessor as follows
37
t - values will also be affected. If there is positive autocorrelation, the standard errors will be
underestimated and the t - values will biased upwards. The variance of the error term will also
be underestimated under positive autocorrelation so that R-squared will be exaggerated. Over all,
OLS is not BLUE. The F - test formula will also be incorrect. Forecasts based on the OLS
regression model will be inefficient (they will have larger variances than those from some other
techniques). We cannot make any inference using the computed standard errors.
1.6.10 Detecting and testing for autocorrelation
A good starting point to detect autocorrelation is to visually inspect the plot between residuals
( e t ) and lagged residuals ( e t-1 ). The existence of systematic pattern between them indicates the
presence of autocorrelation.
e et
t
e t-1
e t-1
Based on the impression obtained from the graphical analysis of residuals and lagged residuals,
one can proceed to the formal test in order to make sure that autocorrelation in fact exists in a
particular regression model. There are some tests readily available in order to test for
autocorrelation.
Asymptotic tests
The OLS residuals provide useful information about the possible presence of autocorrelation in
the equation’s error term. A very good starting point in this case is to consider the regression of
the OLS residual ( e t ) upon its lag ( e t-1 ). This regression may be done with or without an
intercept, which might lead to marginally different results. This auxiliary regression not only
38
produces an estimate for the first-order autocorrelation coefficient but also routinely provides a
standard error for the estimate. In the absence of lagged dependent variables, the corresponding
t- is asymptotically valid. In fact, the resulting test statistic can be shown to be
asymptotically equal to
t» Tr
ˆ
which provides an alternative way of computing the test statistic. Consequently, at the 5%
significance level we reject the null hypothesis of no autocorrelation against a two-sided
alterative if t > 1.96 . If the alternative is positive autocorrelation (r > 0) , which is often
expected a priori, the null hypothesis is rejected at the 5% level if t > 1.64 .
The Breusch-Godfrey test
This test for error autocorrelation is based on an auxiliary regression involving the residuals from
the original regression ( e t ), regressed on a set of lagged residuals, e t -s (up to order S ) and all
the variables which were used in the initial regression. Essentially, we are testing that the
coefficients of the lagged residuals in the auxiliary regression are all zero.
This alternative test is based on the R 2 of the auxiliary regression (including the intercept term).
If we take the R 2 of this regression and multiply it by the effective number of observations,
T - K , we obtain a test statistic, under the null hypothesis that has a Chi-squared ( c 2 )
distribution with T - K degrees of freedom. An R 2 close to zero in this auxiliary regression
implies the lagged residuals are not explaining current residuals and a simple way to test r = 0
is by computing the test statistic, (T - K )R 2 . If the test statistic is larger than the Chi-squared
critical value ( c a2,T -K ), we reject the null hypothesis of no autocorrelation ( r = 0 ). If the model
of interest includes a lagged dependent variable (or other explanatory variables that are
correlated with lagged error terms), the above tests are still appropriate provide that the
regressors X t are included in the auxiliary regression.
Given the original model, the Breush-Godfrey test for higher order autoregressive processes
proceeds as follow.
39
1. Estimate the original model and obtain residuals
2. Run the regression of the residuals on original independent variables and lagged residuals
of order S , AR ( S )
ê t = aX t + rê t-1 + rê t-2 + ... + rêt-s + vt
3. Obtain the R 2 form this auxiliary regression and calculate the test statistic (T - K )R 2
4. If the test statistic is larger than the Chi-square critical c a2 ,S , reject the
( T -
hypothesis of no autocorrelation.
The Durbin-Watson test
A popular test for first order autocorrelation is the Durbin-Watson test, which has a known small
sample distribution under some restrictive conditions. Some of these restrictions are the
regression model includes a constant; only applies for first order autocorrelation, and regression
does not include a lagged dependent variable. The Durbin-Watson test involves the calculation of
a test statistic based on the residuals from the Ordinary Least Squares regression. This test
statistic is famously known as Durbin-Watson, dw statistic.
T
- êt -1 )
2
å (ê t
dw t =2
T
åê
t =1 t
2
Notice that the numerator cannot include a difference of the first observation in the sample since
no earlier observation is available. When successive values of ê t are close to each other, the dw
statistic will be low, suggesting the presence of positive autocorrelation. The dw statistic lies in
the range of 0 and 4, with a value near 2 indicating no first order autocorrelation. By making
several approximations, it is possible to show that dw = 2(1- ˆr ) . Thus, when there is no
autocorrelation ( r = 0 ), the dw statistic will be close to 2. Positive autocorrelation is associated
with dw values below 2 and negative autocorrelation is associated with the dw values above 2.
For hypothesis testing, there are upper ( dU ) and lower ( d L ) limits for the critical values of the
dw test statistic. These critical values d L
and independent variables ( K ). and dU depend on the number of observations ( T )
40
Zone of Zone of
Reject H 0 indecision Accept H 0 (No Reject H 0
indecision
(Positive (Inconclusive autocorrelation) (Negative
autocorrelation) region) (Inconclusive
autocorrelation)
region)
0 dU
1. If it is pure autocorrelation, one can use appropriate transformation of the original model so
that in the transformed model we do not have the problem of (pure) autocorrelation. As in the
41
case of heteroskedasticity, we will have to use some type of generalized least-square (GLS)
method.
2. In large samples, we can use the Newey–West method to obtain standard errors of OLS
estimators that are corrected for autocorrelation. This method is actually an extension of White’s
heteroskedasticity-consistent standard errors method.
The Generalized Least Square (GLS) estimator
Limiting ourselves to the first-order autoregressive process, we will discuss how to correct for
autocorrelation in this section. The GLS works by transforming the original model in such a way
that the transformed model fulfils the usual OLS assumption (produce efficient variance).
Consider the regression model given by:
Y t = b 0 + b1 X t + e t (1.19)
whose first-order autoregressive process, AR (1) is given by
e t = re t-1 + vt
The GLS estimator works differently in two ways: when the coefficient of autocorrelation ( r ) is
known and when not known.
When r is known
If the coefficient of first-order autocorrelation is known, the problem of autocorrelation can be
easily solved. If equation (1.19) holds true at time t , it also holds true at time ( t -1 ). Hence,
Yt-1 = b0 + b1 X t -1 + e t-1 (1.20)
Multiplying equation (1.20) by r on both sides, we get
rYt-1 = rb0 + rb1 X t -1 + ret-1
Subtracting equation (1.20) from equation (1.19) gives
(Yt - rYt-1 )= (1- r)b0 + b1 ( X t - rX t -1 ) + (et - ret-1 ) (1.21
We can express equation (1.21) as )
Yt* = b 0* + b1*Xt* + e t*
where Yt* = (Yt - rYt-1 ) , b0* = (1- r)b0 , Xt* = (X t - rX t -1 ) and b1 * = b1 .
42
Since the error term in (1.22) satisfies the usual OLS assumptions, we can
apply OLS to the * *
transformed variables Y and X and obtain estimators with all the optimum
properties, namely,
BLUE. In effect, running (1.22) is tantamount to using generalized least squares
(GLS) –recall
that GLS is nothing but OLS applied to the transformed model that
satisfies the classical
assumptions.
Yt -Yt-1 = b1 ( X t - X t -1 ) + (e t - e t-1 )
(1.23)
Since the error term in (1.21) is free from (first-order autocorrelation), to run
the regression
(1.21) all one has to do is form the first differences of both the
dependent variable and
regressor(s) and run the regression on these first differences. The first difference
transformation
may be appropriate if the coefficient of autocorrelation is very high, say in
excess of 0.8, or the
Durbin–Watson dw is quite low. An interesting feature of the first-difference
model (1.21) is
1
that there is no intercept in it. Hence, to estimate (1.21), you have to use the
regression through
the origin routine (that is, suppress the intercept term), which is now available in
most software
packages.
Model Selection
Usually the choice of models that individuals and firms used depends on the nature of the
dependent variable in the study. For instance, if the dependent variable has two outcomes, the
choices can be represented by a binary model, whereas if the dependent variable is continuous
which means if it has more than two outcomes it goes to other models such as leaner
regression depending on its detail nature (Gujarati, 2004). Furthermore, for categorical
variables the choice of model depends on the nature of the response for dependent variable.
For instance, in cases where there are unordered responses or outcome variables of more than
two values, the multinomial logit and probit model is appropriate. While ordinal logit and
probit model is applied in cases where there is clear natural ranking or order from low to high
among the outcomes but the distance between adjacent categories is unknown.
Eg. Determinant of higher education institutions on promoting students’
entrepreneurship across discipline: Evidence from Dire Dawa, Haramaya
and Adama University.
Curriculum
University
University Environment Factors
Commitment
Delivery Method
Curriculum Students’
Entrepreneurs
Assessment Method
hip
University Commitment Promotion
Learning Facilities
Delivery Method
Entrepreneurship Students’
In this study the dependent Course
variable (students’ entrepreneurship promotion) was measured
Entrepreneurs
Assessment
using five point likert Method
scale (highly promoted, promoted, undecided, low promoted and very
hip
low promoted). As the dependent variable contains five ordered responses, ordinal model was
Field of Study Promotion
used to examine the relationship between the independent variable and dependent variable
which in turn to reach a conclusion.
Learning According to Gujarati (2004), ordinal model may be logit
Facilities
or probit. Due to its ease to apply and inclusion of probability, logit model is more preferable.
Thus, in this study, ordinal logit model was applied.
Entrepreneurship Course
2
Field of Study
When modeling these types of outcomes, numerical values are assigned to the outcomes, but
the numerical values are ordinal and reflect only the ranking of the outcomes. That is we
might assign a dependent variable the values 1 for “highly promoted”, 2 for “promoted”, 3
for “undecided”, 4 for “low promoted” and 5 for “very low promoted”.
Consider the generic population regression function given by:
If the latent variable denotes a natural ordering among the possible outcomes, then the
observed the dependent variable can assume a data generating process of the following type.
where is the observed scores for the dependent variable that are given numerical values as
follows: 1 for “highly promoted”, 2 for “promoted”, 3 for “undecided”, 4 for “low promoted”
and 5 for “very low promoted”; is the unobservable value of the dependent variable, is a
vector of variables that explains the variation in the observed dependent variable; is a vector
of coefficients; are the threshold parameters to be estimated along with ; and is a
disturbance term that is assumed normally distributed. These threshold parameters, which
usually must be estimated, determine how the values of to get translated into the five
possible values of
3
1.5. Test of the Model
To make the regression result of the model ready for discussion and to get reliable output from the research,
different tests were run. These tests are mainly intended to check whether the proportional odds assumption
and other classical linear regression model (CLRM) assumptions are fulfilled when the dependent variable is
regressed against the independent variables. The explanations of each test, decision rules therein, and their
implications are discussed as follows.
1. cross tab of the categorical variables
Before directly running the model a cross tab of the response variable with the categorical variables were
made to see if any cells are empty or extremely small. In this study all independent variables are categorical
variables. A cross tab of the response variable with these categorical variables were made. As a result, none
of the cells is too small or empty (see appendix A) so that it becomes ready for running the model.
2. Proportional odds assumption test
One of the assumptions underlying ordinal logistic regression is that the relationship between each pair of
outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that
describe the relationship between, say, the lowest versus all higher categories of the response variable are the
same as those that describe the relationship between the next lowest category and all higher categories, etc.
This is called the proportional odds assumption or the parallel regression assumption. Because the
relationship between all pairs of groups is the same, there is only one set of coefficients (only one model). If
this was not the case, we would need different models to describe the relationship between each pair of
outcome groups. Thus, we need to test the proportional odds assumption, and there are two tests that can be
used to do so: omodel and brant test.
In this study the first method that is omodel was applied. In order to apply the command, we need to
download a user-written command called omodel (type finditomodel). Accordingly, the user-written
command called omodel was downloaded from the internet and run it on the satata. As rule of thumb, the
proportional odds assumption will be fulfilled, if the result of the test is insignificant. That means Prob > chi2
should be insignificant at a significance level of 1, 5 or 10 percent . In line to this the omodel result of this
study is insignificant (i.e. 0.2700) and the proportional odds assumption is fill filled (see appendix D).
3. Testing the overall model fitness
P-value (Prob>f): Used to determine the overall significance of the model. In other words, it describes the
reliability of a group of independent variables in predicting the dependent variable. If the p-value of the
group of independent variable is less than 5 percent, they would have statistically significant relationship
with dependent variable or reliably predict the dependent variable, whereas if the p-value is more than 5
percent, it would conclude that the group of independent variables does not show a statistically significant
relationship with the dependent variable, or that the group of independent variables does not reliably predict
the dependent variable (Gujarati, 2004). Since the p-value of the group of independent variables of this
model is 0.000 which is less than 5 percent, it is possible to conclude that they can reliably predicting the
dependent variable. Hence, the requirement for fitness of model was safely fulfilled (see appendix B or C).
4. Test of Model Specification Error
Model specification error can occur when one or more relevant variables are omitted from the model or one
or more irrelevant variables are included in the model. If relevant variables are omitted from the model, the
common variance they share with included variables may be wrongly attributed to those variables, and the
error term is inflated. On the other hand, if irrelevant variables are included in the model, the common
variance they share with included variables may be wrongly attributed to them. Model specification errors
can substantially affect the estimate of regression coefficients (Gujarati, 2004). In this study, to detect
whether there is model specification error or not, both link test and ov test was adopted.
Linktest: It is the one which used to detect the inclusion of one or more irrelevant variables in the model. In
linktest two new variables are created; the variable of prediction, _hat, and the variable of squared prediction,
_hatsq. The model is then refitted using these two variables. Hence, if the p-value of the _hatsq is
insignificant at less than 10 percent, the model would be specified correctly, whereas if the p-value of the
_hatsq is significant, there would be a model specification error (Gujarati, 2004). In this study as the result of
_hatsq is insignificant (i.e. 0.151), there is no model specification error (see appendix E).
Ovtest: Used to detect whether one or more relevant variables are omitted. With regard to its decision rule: if
the prob >f result shows significant, the Ramsey RESET null hypothesis (i.e. Ho: model has no omitted
variables) is rejected and it shows there is a model specification error (omitted variable), otherwise the null
hypothesis is accepted and indicated as there is no model specification error. Hence, in this study as the prob
>f of the model is insignificant (0.2100), Ramsey RESET null hypothesis (i.e. Ho: model has no omitted
variables) is accepted which indicated that there was no model misspecification error (see appendix F).
5. Test of multi colliniarity
An important assumption for the multiple regression models is that independent variables are not perfectly
multi-collinear. Multi-collinearity problem is the existence of a “perfect,” or exact, linear relationship among
some or all explanatory variables of a regression model (Gujarati, 2004). In this paper, to detect whether
there is a collinearity problem or not, vif (Variance Inflation Factor) was utilized. As a rule of thumb multi-
collinearity test of the model states that a variable whose values are greater than 10 or whose 1/VIF value is
less than 0.1 indicates the possible problem of multi-collinearity. In connection to this, in this study there is
no perfect collinearity among and between discrete and continuous variable because the VIF value is below
5.03 and 1/VIF is greater than 0.199 (see appendix G).
In addition, a pair-wise correlation matrix of the selected variables is also employed to check whether there
exists a multicolliniarity problem in the model or not using correlation command (pwcorr). If the correlation
between each variable is greater than or equal 0.8 or -0.8; results could show the existence of perfect positive
or negative (serious problem) of correlation. This study tested the model for checking such correlation
problem using pair wise correlation (pwcorr) test and the result showed that the pair wise correlation of all
variables were different from 0.8 as well as -0.8 (i.e. between 0.73 and -0.075) (see appendix H). Thus,
based on the given result and its justification there was no intolerable problem of correlation.
6. Test of Heteroscedasticity
The other assumption of CLRM is the homogeneity of variance of the residuals. If the model is well-fitted,
there should be no pattern to the residuals plotted against the fitted values. If the variance of the residuals is
non-constant then the residual variance is said to be "heteroscedastic." There are graphical and non-graphical
methods of detecting hetroscedasticity problem (Gujarati, 2004). In this study, hettest was used to check
whether there is hetroscedasticity problem or not. As Breusch-Pagan/Cook-Weisberg test shows the null
hypothesis (i.e., Ho: constant variance) was rejected because the test result showed P-value of 0.5857 (58.57
percent), which is greater than the significance level (1 percent, 5 percent, and 10 percent). Thus, the result
indicated that there is equal variance among the error terms. Therefore, there was no serious problem of
Heteroscedasticity in the process of model specification and the model was well fitted. (Appendix I).