Multiple Linear Regression
Multiple Linear Regression
1
Generalising the Simple Model to
Multiple Linear Regression
• Now we write
yt 1 2 x2t 3 x3t ... k xkt ,ut=1,2,...,T
t
• Where is x1? It is the constant term. In fact the constant term is usually represented by a
column of ones of length T:
1
1
x1
1
1 is the coefficient attached to the constant term (which we called before).
Different Ways of Expressing
the Multiple Linear Regression Model
where y is T 1
X is T k
is k 1
u is T 1
Inside the Matrices of the
Multiple Linear Regression Model
y1 1 x21 u1
y 1 x u
2 22 1 2
2
yT 1 x2T uT
T 1 T2 21 T1
• To calculate the coefficients, just multiply the matrix by the vector to obtain
X ' X
1
X'y
• To calculate the standard errors, we need an estimate of 2.
2 RSS 10.96
s 0.91
Tk 15 3
Calculating Parameter and Standard Error Estimates
for Multiple Regression Models: An Example (cont’d)
• The unrestricted regression is the one in which the coefficients are freely
determined by the data, as we have done before.
• Example
The general regression is
yt = 1 + 2x2t + 3x3t + 4x4t + ut (1)
• We want to test the restriction that 3+4 = 1 (we have some hypothesis
from theory which suggests that this would be an interesting hypothesis to
study). The unrestricted regression is (1) above, but what is the restricted
regression?
yt = 1 + 2x2t + 3x3t + 4x4t + ut s.t. 3+4 = 1
RRSS URSS T k
test statistic
URSS m
where URSS = RSS from unrestricted regression
RRSS = RSS from restricted regression
m = number of restrictions
T = number of observations
k = number of regressors in unrestricted regression
including a constant in the unrestricted regression (or the total number
of parameters to be estimated).
The F-Distribution
• Examples :
H0: hypothesis No. of restrictions, m
1 + 2 = 2 1
2 = 1 and 3 = -1 2
2 = 0, 3 = 0 and 4 = 0 3
• Note the form of the alternative hypothesis for all tests when more than one
restriction is involved: H 1: 2 0, or 3 0 or 4 0
What we Cannot Test with Either an F or a t-test
• We cannot test using this framework hypotheses which are not linear
or which are multiplicative.
e.g. H0: 2 3 = 2 or H0: 2 2 = 1 cannot be tested.
The Relationship between the t and the F-
Distributions
• Any hypothesis which could be tested with a t-test could have been
tested using an F-test, but not the other way around.
• Solution:
Unit sensitivity implies H0:2=1 and 3=1. The unrestricted regression is the one
in the question. The restricted regression is (yt-x2t-x3t)= 1+ 4x4t+ut or letting
zt=yt-x2t-x3t, the restricted regression is zt= 1+ 4x4t+ut
In the F-test formula, T=144, k=4, m=2, RRSS=436.1, URSS=397.2
F-test statistic = 6.68.
Conclusion: Reject H0.
Data Mining
• If data mining occurs, the true significance level will be greater than
the nominal significance level.
Goodness of Fit Statistics
• We would like some measure of how well our regression model actually fits
the data.
• We have goodness of fit statistics to test this: i.e. how well the sample
regression function (srf) fits the data.
• The most common goodness of fit statistic is known as R2. One way to define
R2 is to say that it is the square of the correlation coefficient between y and y .
• For another explanation, recall that what we are interested in doing is
explaining the variability of y about its mean value, y , i.e. the total sum of
squares, TSS:
TSS yt y
2
• We can split the TSS into two parts, the part which we have explained (known
as the explained sum of squares, ESS) and the part which we did not explain
using the model (the RSS).
Defining R2
t t t
• Our goodness of fit statistic is
ESS
R2
TSS
• But since TSS = ESS + RSS, we can also write
yt
y
t
x x
t
t
Problems with R2 as a Goodness of Fit Measure
3. R2 quite often takes on values of 0.9 or higher for time series regressions.
Adjusted R2
T 1
R 2 1 (1 R 2 )
T k
• So if we add an extra regressor, k increases and unless R2 increases by
a more than offsetting amount,R 2 will actually fall.
• There are still problems with the criterion- it’s a “soft” rule
A Regression Example:
Hedonic House Pricing Models
• Hedonic models are used to value real assets, especially housing, and view the asset as
representing a bundle of characteristics.
• Des Rosiers and Thérialt (1996) consider the effect of various amenities on rental values
for buildings and apartments 5 sub-markets in the Quebec area of Canada.
• The rental value in Canadian Dollars per month (the dependent variable) is a function of
9 to 14 variables (depending on the area under consideration). The paper employs 1990
data, and for the Quebec City region, there are 13,378 observations, and the 12
explanatory variables are:
LnAGE - log of the apparent age of the property
NBROOMS - number of bedrooms
AREABYRM - area per room (in square metres)
ELEVATOR - a dummy variable = 1 if the building has an elevator; 0 otherwise
BASEMENT - a dummy variable = 1 if the unit is located in a basement; 0 otherwise
Hedonic House Pricing Models:
Variable Definitions
• All of the hypothesis tests concluded thus far have been in the context
of “nested” models.