0% found this document useful (0 votes)
9 views

Chapter Three

Uploaded by

demilie
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Chapter Three

Uploaded by

demilie
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter Three

Regression analysis: Further details


3.1 Multiple regression analysis
• We have studied the two-variable model extensively in the
previous unit.
• But in economics it hardly found that one variable is
affected by only one explanatory variable.
• For example, the demand for a commodity is dependent on
price of the same commodity, price of other competing or
complementary goods, income of the consumer, number of
consumers in the market etc.
• Hence the two variable model is often inadequate in
practical works.
Cont.…
• Therefore, we need to discuss multiple regression models.
• The multiple linear regression is entirely concerned with the
relationship between a dependent variable (Y) and two or
more explanatory variables (X1, X2, …, Xn).
Why Do We Need Multiple Regression?
1. One of the motivation for multiple regression is the omitted
variable bias in the simple regression analysis.
• It is the primary drawback of the simple regression but
multiple regression allows us to explicitly control for many
other factors which simultaneously affect the dependent
variable.
Cont.…
Example: wages vs. education
• imagine we want to measure the (causal) effect of an
additional year of education on a person’s wage
• if we want to the model wage= β0+ β1educ + u and interpret
β1 as the ceteris paribus effect of educ on wage, we have to
assume that educ and u are uncorrelated.
• consider a different model now: wage= β0+ β1educ + β2exper
+ u, where exper is a person’s working experience (in years)
• Since the equation contains experience explicitly, we will be
able to measure the effect of education on wage, holding
experience fixed.
Cont..

2. multiple regression analysis is also useful for generalizing functional


relationships between variables.
Simple Regression vs. Multiple Regression
• most of the properties of the simple regression model directly extend
to the multiple regression case.
• we derived many of the formulas for the simple regression model;
however, with multiple variables, formulas can get difficult when
explanatory variables more than two.
• As far as the interpretation of the model is concerned, there’s a new
important fact: the coefficient βj captures the effect of j th
explanatory variable, holding all the remaining explanatory variables
fixed
Estimation
• as in simple regression, the resulting estimates
are identical; similarly as before, we can
define:
Population regression model:

Sample regression model:

Fitted values of y:

Residuals:
Estimation con…
When number of explanatory variable = 2
Population regression model:

Sample regression model:

Fitted values of y:

Residuals:
Estimation con…
• Summing and squaring both sides to get residual sum of
square ()
Or
• Now, using the concept of partial derivatives one can
minimize and set it equal to zero and solving for &.

• By using normal equation we can derive the formula for &


• The normal equations are obtained based on CLRM
assumptions.
• The method also know as moment method.
Estimation con…
Population assumption Sample counterpart

.
Estimation con…
• Then we can derived the following normal equation on the
base of above assumptions
………………………eq(1)

……..eq(2)

……...eq(3)
• Divided equation one by n and substituting in eq(2) and eq(3), we get

)
Estimation con…
• Now convert level form in to deviation form

By using matrix or substitution method, we get the value of and .


Estimation con…
Global hypothesis test (F and r2)
• We used the t test to test single hypotheses ,i.e. Hypotheses
involving only one coefficient. But what if we want to test
more than one Coefficients simultaneously? We do using F-
test.
• F-test it is used for to test overall significance of a model.

Or using R-squared
Decision: If F > F α(k−1,n−k) , reject H0; otherwise you may
accept H0.(Fcal > F-tab).
where F α(k−1,n−k) is the critical F value at the α level of
significance and (k − 1) numerator df and (n − k) denominator df.
Selection of models

• One of the assumptions of the classical linear regression


model (CLRM), is that the regression model used in the
analysis is “correctly” specified: If the model is not
“correctly” specified, we encounter the problem of model
specification error or model specification bias.
Basic questions related to model selection
 what are the criteria in choosing a model for empirical
analysis?
 What types of model specification errors is one likely to
encounter in practice?
 What are the consequences of specification errors?
Cont..
 How does one detect specification errors? In other words,
what are some of the diagnostic tools that one can use?
 Having detected specification errors, what remedies can one
adopt?
Model Selection Criteria
Model chosen for empirical analysis should satisfy the following
criteria
• Be data admissible; that is, predictions made from the model
must be logically possible.
• Be consistent with theory; that is, it must make good
economic sense.
Cont…
• Exhibit parameter constancy; that is, the values of the
parameters should be stable. Otherwise, forecasting will
be difficulty.
• Exhibit data coherency; that is, the residuals estimated
from the model must be purely random (technically, white
noise).
• Be encompassing; that is, the model should encompass
or include all the rival models in the sense that it is
capable of explaining their results.
• In short, other models cannot be an improvement over the
chosen model.
Types of Specification Errors

• In developing an empirical model, one is likely to commit one


or more of the following specification errors:
i. Omission of a relevant variable(s)
ii. Inclusion of an unnecessary variable(s)
iii. Adopting the wrong functional form
iv. Errors of measurement
Consequences of Model Specification Errors

Omitting a Relevant Variable


• If the left-out, or omitted, is correlated with the included
variable, the correlation coefficient between the two variables
is nonzero, the estimators are biased as well as inconsistent.
• Even if the two variables are not correlated, the intercept
parameter is biased, although the slope parameter is now
unbiased.
• The disturbance variance is incorrectly estimated.
• In consequence, the usual confidence interval and hypothesis-
testing procedures are likely to give misleading conclusions
about the statistical significance of the estimated parameters.
Con…
• There is asymmetry in the two types of specification biases.
• Including an irrelevant variable in the model, the model still
gives us unbiased and consistent estimates of the coefficients
in the true model, the error variance is correctly estimated, and
the conventional hypothesis-testing methods are still valid.
• The only penalty we pay for the inclusion of the superfluous
variable is that the estimated variances of the coefficients are
larger, and as a result our probability inferences about the
parameters are less precise.
Functional Forms of Regression Models
• Commonly used regression models that may be nonlinear in
the variables but are linear in the parameters or that can be
made so by suitable transformations of the variables.
1. Linear model: Y = β1 + β2X
2. Log model: lnY = β1 + β2 ln X
3. Semi-log model(lin-log or log-lin): Y = β1 + β2 ln X and lnY
= β1 + β2 X
4. Reciprocal model:Y = β1 + β2(1/X)
Example
Double-log model

• Interprtation:total personal expenditure goes up by 1 percent, on


average, the expenditure on durable goods goes up by about 1.63 percent.
Lin-log model

• Interprtation: an increase in the total food expenditure of 1 percent, on


average, leads to about 2.57 birr increase in the expenditure on food.
Cont..
Log-lin model

Where, EXS is expenditure on services and t is time and


measured in quarter.
Interprtation: expenditures on services increased at the
(quarterly) rate of 0.705 percent.
Relaxing the CLRM basic assumptions
Multicollinearity problem
• The Assumption classical linear regression model
(CLRM) is that there is no high multicollinearity among
the regressors included in the regression model.
• Multicollinearity meant the existence of a “perfect” or
exact and inexact, linear relationship among some or all
explanatory variables of a regression model.
Cont..
Sources of multicollinearity
• The data collection method employed
• Model specification.
• Overdetermined model
Consequences of Multicollinearity
• The OLS estimators are BLUE
• OLS estimators have large variances and covariance's.
• Because of the large variance of the estimators, which
means large standard errors, the confidence interval tend
to be much wider, leading the acceptance of “zero null
hypothesis”
Cont..
• The computed t-ratio will be very small leading one or
more of the coefficients tend to be statistically insignificant
when tested individually
• R-squared, the overall measure of goodness of fit, can be
very high.
Remedial measures of multicollinearity
• Combining cross-sectional and time series data
• Dropping a variable(s) and specification bias.
• Transformation of variables
Tests to check the existence of multicollinearity
• Variance inflated factor (VIF)
• Correlation matrix
Heteroscedasticity
Source
• Model specification problem
• Data collection problem
• The presence of outliers
Consequences
• Variance of the error term under or over estimates
• The OLS estimators are not BLUE
• CI and t-ratio also affected
• Hypothesis testing is misleading
Cont..
Tests to check the existence of Heteroscedasticity
• Goldfeld-Quandt Test
• Breusch–Pagan–Godfrey Test
• White’s test
Regression on Dummy Variables
• Four types of variables that one generally encounters in
empirical analysis: These are: ratio scale, interval scale,
ordinal scale, and nominal scale.
• Regression models that may involve not only ratio scale
variables but also nominal scale variables. Such variables are
also known as indicator variables, categorical variables,
qualitative variables, or dummy variables.
The nature of dummy variable
• In regression analysis the dependent variable, is frequently
influenced not only by ratio scale variables (e.g., income,
output, prices, costs, height, temperature) but also by variables
that are essentially qualitative, or nominal scale, in nature,
such as sex, race, color, religion, nationality, geographical
region and political party affiliation.
Cont..
• One way we could “quantify” such attributes is by
constructing artificial variables that take on values of 1 or 0, 1
indicating the presence (or possession) of that attribute and 0
indicating the absence of that attribute.
• Variables that assume such 0 and 1 values are called dummy
variables.
• Dummy variables can be used in regression models just as
easily as quantitative variables.
• Regression model may contain explanatory variables that are
exclusively dummy, or qualitative, in nature.
Cont…
Given :
where Y= annual salary of a college professor
if male college professor = 0 otherwise (i.e., female professor)
• The above regression model may enable us to find out whether
sex makes any difference in a college professor’s salary,
assuming, of course, that all other variables such as age,
degree attained, and years of experience are held constant.
• Assuming that the disturbance satisfy the usually assumptions
of the classical linear regression model, we obtain from.
• Mean salary of female college professor: =0)=
• Mean salary of male college professor: =0)= +
Cont…

• The intercept term gives the mean salary of female college


professors and the slope coefficient tells by how much the
mean salary of a male college professor differs from the mean
salary of his female counterpart.
• How to test whether there is sex discrimination or not ?
Example
18,000 + 3,280Di
(0.32) (0.44)
t = (57.74) (7.439) R2 = 0.8737
Based on the result the estimated mean salary of female college
professor is birr 18,000 and that of male professor is birr 21,280.
Cont.….
Regression on one quantitative variable and one qualitative
variable with two classes

where Y= annual salary of a college professor


if male college professor = 0 otherwise (i.e., female professor)
Cont…
Mean salary of female college professor:
, =0) =
Mean salary of male college professor:
, =1) =
• The level of the male professor’s mean salary is different from
that of the female professor’s mean salary (by ) but the rate of
change in the mean annual salary by years of experience is the
same for both sexes.
Cont.…
Graphically,
Regression on one quantitative variable and two
qualitative variables

where Y= annual salary of a college professor


if male college professor = 0 otherwise (i.e., female professor). if
white and 0 otherwise
Exercises: From the above expression obtain the following
1. Mean salary for black female professor
2. Mean salary for black male professor
3. Mean salary for white female professor
4. Mean salary for white male professor
Cont.…
 Regression on one quantitative variable and one
qualitative variable with more than two classes.
• Suppose we consider three mutually exclusive levels of
education: less than high school, high school, and college.
• If a qualitative variable has ‘m’ categories, introduce only ‘m-
1’ dummy variables
• Following the rule that the number of dummies be one less
than the number of categories of the variable.

= Where annual expenditure on health care


= annual expenditure, = 1 if high school education, = 1 if
college education and = 0 otherwise
• Compute the mean health care expenditure functions for the
three levels of education

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy