0% found this document useful (0 votes)
19 views

OLS Assumptions

The document discusses assumptions made in regression analysis including that the error terms have a mean value of zero, are homoscedastic, have no multicollinearity, are not autocorrelated, follow a normal distribution, the parameters are linear, and there is no serial correlation or non-stationarity. It then provides output from a regression analysis and discusses multicollinearity, heteroscedasticity, and autocorrelation.

Uploaded by

Abhimanyu Verma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

OLS Assumptions

The document discusses assumptions made in regression analysis including that the error terms have a mean value of zero, are homoscedastic, have no multicollinearity, are not autocorrelated, follow a normal distribution, the parameters are linear, and there is no serial correlation or non-stationarity. It then provides output from a regression analysis and discusses multicollinearity, heteroscedasticity, and autocorrelation.

Uploaded by

Abhimanyu Verma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Assumptions

■ Zero mean value of ui, or E (ui | X2i, X3i) = 0 for each I


■ Homoscedasticity, or var (ui) = σ^2
■ No multicollinearity
■ No autocorrelation
■ Normality assumption
■ Linearity in parameters
■ No serial correlation
■ Stationarity
Dependent Variable: PAT
Method: Least Squares
Date: 04/23/19 Time: 23:27
Sample: 1 99
Included observations: 99

Variable Coefficient Std. Error t-Statistic Prob.

C 1371.198 588.9139 2.328351 0.0220


DEBT_EQUITY_RATIO 118.7550 196.0949 0.605600 0.5462
NET_SALES 0.054053 0.007966 6.785659 0.0000

R-squared 0.325295 Mean dependent var 3383.463


Adjusted R-squared 0.311238 S.D. dependent var 6036.003
S.E. of regression 5009.379 Akaike info criterion 19.90585
Sum squared resid 2.41E+09 Schwarz criterion 19.98449
Log likelihood -982.3394 Hannan-Quinn criter. 19.93766
F-statistic 23.14217 Durbin-Watson stat 1.675624
Prob(F-statistic) 0.000000
MULTICOLLINEARI
TY
DEFINITION

■ Multicollinearity occurs when your model includes


independent variables that are correlated not just to your
dependent variable, but also to each other.

TYPES
Y = β1 + β2X1 +β3X2+β4X3+µ

• Perfect collinearity
• Imperfect collinearity
■ Multicollinearity, as we have defined, adheres only to linear
relationship among X variables. It does not rule out nonlinear
relationships among them. For example, consider the
following regression model:

Y = β1 + β2X1 +β3X12+β4X13+µ
SOURCES

■ The data collection method employed, for example, sampling over a


limited range of the values taken by the regressors in the population.
■ Constraints on the model or in the population being sampled. For
example, in the regression of electricity consumption on income (X2) and
house size (X3) there is a physical constraint in the population in that
families with higher incomes generally have larger homes than families
with lower incomes.
■ In time series data, the regressors included in the model share a common
trend, that is, they all increase or decrease over time.
CONSEQUENCES

■ Although BLUE, the OLS estimators have large variances and


covariance, making precise estimation difficult.
■ Because of consequence 1, the confidence intervals tend to be
much wider. Hence, the probability of accepting a false
hypothesis (i.e., type II error) increases.
■ Also because of consequence 1, the t ratio of one or more
coefficients tends to give wrong results.
■ The OLS estimators and their standard errors can be
sensitive to small changes in the data
DETECTION

■ High R2 but few significant t ratios.


■ High pair-wise correlations among regressors.
■ Auxiliary regressions
■ Variance Inflation Factor
REMEDIAL MEASURES

■ Dropping a variable(s).
■ Transformation of variables.

Yt = β1 +β2X2t +β3X3t +u
Yt−1 = β1 +β2X2,t−1 +β3X3,t−1 +ut−1
Yt −Yt−1 = β2(X2t − X2,t−1)+β3(X3t − X3,t−1)+v

■ Additional or new data.


HETEROSCEDASTICI
TY
Meaning

▰ The data should be homoscedastic which implies that the error


terms have identical variances for all observations under a given
value of X. It means that they should be equally spread across the
conditional expected mean of the error term.

13
Meaning

▰ Homoscedasticity can also be explained by saying that the Y


population corresponding to different X values have the same
variance around the respective X values. The following graphs
explain the same:

14
Graphical representation of
heteroscedasticity

15
Graphical representation of
heteroscedasticity
•The following graph
shows a heteroscedastic
data where the conditional
variance of Y variables
varies with a change in X
variable.

•We see that the variance is


the least when X=X1.
When X=X2 or Xn, the
variance tends to increase.
This raises a question on
the reliability of the data.
16
Reasons for heteroscedasticity

1. The regression model we choose might not suit the data. There
might be incorrect data transformation or incorrect functional form.
2. We might not consider a variable while recording data which was
an important variable in the first place. This violates the
assumption of specification of a regression model in CLRM.
3. Heteroscedasticity can also arise as a result of the presence of
outliers.
4. Skewness in the distribution of one or more regressors included in
the model may also lead to heteroscedasticity. Variables like
education, income, wealth, etc. give heteroscedastic data in most
cases distribution of which are generally uneven.

17
Consequences
1. Heteroscedasticity does not alter the unbiasedness and consistency
properties of OLS estimators.
2. But OLS estimators are no longer of minimum variance or
efficient. That is, they are not best linear unbiased estimators
(BLUE); they are simply linear unbiased estimators (LUE).
3. As a result, the t and F tests based under the standard assumptions
of CLRM may not be reliable, resulting in erroneous conclusions
regarding the statistical significance of the estimated regression
coefficients.

18
Detection- Graphical

19
Detection: Mathematical
▰ Park test
▰ Glejser test
▰ Spearman’s rank correlation test
▰ Goldfeld-Quandt test
▰ Breusch-Pagan-Godfrey test
▰ White’s general heteroscedasticity test
▰ Koenkar-Bassett test

20
An Example
Factors that determine the abortion rate across the 50 states in the USA:
▰ State = name of the state (50 US states).
▰ ABR=Abortion rate, number of abortions per thousand women
aged 15–44 in 1992.
▰ Religion = the percent of a state’s population that is Catholic,
Southern Baptist, Evangelical, or Mormon.
▰ Price = the average price charged in 1993 in non-hospital facilities
for an abortion at 10 weeks with local anesthesia (weighted by the
number of abortions performed in 1992).

21
An Example
▰ State = name of the state (50 US states).
▰ Laws = a variable that takes the value of 1 if a state enforces a law
that restricts a minor’s access to abortion, 0 otherwise.
▰ Funds = a variable that takes the value of 1 if state funds are
available for use to pay for an abortion under most circumstances,
0 otherwise.
▰ Educ = the percent of a state’s population that is 25 years or older
with a high school degree (or equivalent), 1990.
▰ Income = disposable income per capita, 1992.
▰ Picket = the percentage of respondents that reported experiencing
picketing with physical contact or blocking of patients.

22
The Econometrics Model
Abortioni = β1 + β2Reli + β3Pricei + β4Lawsi + β5Fundsi + β6Educi +
β7Incomei + β8Picketi + ui
Where i = 1,2,3,…..,50
we would expect ABR to be negatively related to religion, price, laws,
picket, education, and positively related to fund and income. We assume
the error term satisfies the standard classical assumptions, including the
assumption of homoscedasticity.
Of course, we will do a post-estimation analysis to see if this assumption
holds in the present case.

23
Consequences
1. Heteroscedasticity does not alter the unbiasedness and consistency
properties of OLS estimators.
2. But OLS estimators are no longer of minimum variance or
efficient. That is, they are not best linear unbiased estimators
(BLUE); they are simply linear unbiased estimators (LUE).
3. As a result, the t and F tests based under the standard assumptions
of CLRM may not be reliable, resulting in erroneous conclusions
regarding the statistical significance of the estimated regression
coefficients.

24
AUTOCORRELATI
ON
Introduction

Autocorrelation is a mathematical representation of the degree of similarity


between a given time series and a lagged version of itself over successive
time intervals.
■ E( ) 0
■ Time series data
■ (rho) is the coefficient of autocorrelation

■ = +E
■ First order autocorrelation
Patterns of positive and negative autocorrelation
Causes

■ Inertia
■ Specification bias
■ Cobweb phenomenon
■ Lags or autoregression
■ Manipulation of data
■ Nonstationarity
■ Data transformation
Consequences
■ Residual variance is underestimated
■ R square is overestimated
■ F and t test results are misleading
Overall, the results are unreliable
Tests to detect

■ Graphical method
■ Durbin – Watson d test
■ Runs test
■ Breusch - Godfrey (BG) test
Graphical method
1. A plot of lag 1 is a plot of the values
of versus

■ Vertical axis: for all t


■ Horizontal axis: for all t
2. Time sequence plot

■ Vertical axis: Residuals


■ Horizontal axis: Time
Durbin – Watson d test
■ Null hypothesis: = 0
Alternate hypothesis: 0
■ Formula

■ Interpretation of test results


Runs test
■ Null hypothesis: = 0
Alternate hypothesis: 0
■ Formula

N = N1 + N2 = total number of observations


N1 = Number of + symbols
N2 = Number of – symbols
R = Number of Runs
■ Construction of a confidence interval

E(R) +/- 1.96 (SE)

■ Interpretation of results

If R= Number of runs, lies within the confidence interval, we do not reject the null
hypothesis. However, if R lies outside the confident interval, we reject the null
hypothesis.
Example
(---------) (+++++++++++++++++++++) (----------)
R=3

Confidence interval= 95%

R = 3, which lies outside the confidence interval. Therefore, we reject out null hypothesis
and we can conclude that the residuals exhibit autocorrelation.
Remedial measures

■ Explore the possibility of mis-specification of the model


■ Generalized least square method to transform the original model
■ Newly- West test to obtain the corrected standard errors
Steps in E-views

■ Estimate the regression equation


■ Note down the Durbin Watson d statistic given in the window
■ Compare with critical d statistic
■ Open the ‘Residuals graph’ under ‘View’ to cross check your result with
the graphical method
THANK YOU!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy