QBA Chapter-4 Regression-Models (1)
QBA Chapter-4 Regression-Models (1)
Regression
Models
Learning Objectives
After completing this chapter, students will be able to:
4-3
Chapter Outline
4.1 Introduction
4.2 Scatter Diagrams
4.3 Simple Linear Regression
4.4 Measuring the Fit of the Regression
Model
4.5 Using Computer Software for Regression
4.6 Assumptions of the Regression Model
4-4
Chapter Outline
4-5
Introduction
◼ Regression analysis is a very valuable
tool for a manager.
◼ Regression can be used to:
◼ Understand the relationship between
variables.
◼ Predict the value of one variable based on
another variable.
◼ Simple linear regression models have
only two variables.
◼ Multiple regression models have more
variables.
4-6
Introduction
◼ The variable to be predicted is called
the dependent variable.
◼ This is sometimes called the response
variable.
◼ The value of this variable depends on
the value of the independent variable.
◼ This is sometimes called the explanatory
or predictor variable.
Dependent Independent Independent
variable
= variable
+ variable
4-7
Scatter Diagram
4-8
Triple A Construction
◼ Triple A Construction renovates old homes.
◼ Managers have found that the dollar volume of
renovation work is dependent on the area
payroll.
TRIPLE A’S SALES LOCAL PAYROLL
($100,000s) ($100,000,000s)
6 3
8 4
9 6
5 4
4.5 2
9.5 5
Table 4.1
4-9
Triple A Construction
Scatter Diagram of Triple A Construction Company Data
Figure 4.1
4-10
Simple Linear Regression
◼ Regression models are used to test if there is a
relationship between variables.
◼ There is some random error that cannot be
predicted.
Y = 0 + 1X + e
where
Y = dependent variable (response)
X = independent variable (predictor or explanatory)
0 = intercept (value of Y when X = 0)
1 = slope of the regression line
e = random error
4-11
Simple Linear Regression
Yˆ = b0 + b1 X
where
Y^ = predicted value of Y
b0 = estimate of β0, based on sample results
b1 = estimate of β1, based on sample results
4-12
Triple A Construction
Y = Sales
X = Area payroll
The line chosen in Figure 4.1 is the one that
minimizes the errors.
4-13
Triple A Construction
For the simple linear regression model, the values
of the intercept and slope can be calculated using
the formulas below.
Yˆ = b0 + b1 X
X=
X
= average (mean) of X values
n
Y=
Y
= average (mean) of Y values
n
b1 =
( X − X )(Y − Y )
(X − X ) 2
b0 = Y − b1 X
4-14
Triple A Construction
Regression calculations for Triple A Construction
Y X (X – X)2 (X – X)(Y – Y)
6 3 (3 – 4)2 = 1 (3 – 4)(6 – 7) = 1
8 4 (4 – 4)2 = 0 (4 – 4)(8 – 7) = 0
9 6 (6 – 4)2 = 4 (6 – 4)(9 – 7) = 4
5 4 (4 – 4)2 = 0 (4 – 4)(5 – 7) = 0
4.5 2 (2 – 4)2 = 4 (2 – 4)(4.5 – 7) = 5
9.5 5 (5 – 4)2 = 1 (5 – 4)(9.5 – 7) = 2.5
ΣY = 42 ΣX = 24 Σ(X – X)2 = 10 Σ(X – X)(Y – Y) = 12.5
Y = 42/6 = 7 X = 24/6 = 4
Table 4.2
4-15
Triple A Construction
Regression calculations
X=
X 24
= =4
6 6
Y=
Y 42
= =7
6 6
b1 =
( X − X )(Y − Y ) 12.5
= = 1.25
(X − X ) 2
10
b0 = Y − b1 X = 7 − (1.25 )(4 ) = 2
Therefore Yˆ = 2 + 1.25 X
4-16
Triple A Construction
Regression calculations
X=
X 24
= =4 sales = 2 + 1.25(payroll)
6 6
Y=
Y 42
= =7
If the payroll next
year is $600 million
6 6
b1 =
ˆ=
( X − X )(Y −YY + 1..525(6) = 9.5 or $ 950,000
) 2 12
= = 1.25
(X − X ) 2
10
b0 = Y − b1 X = 7 − (1.25 )(4 ) = 2
Therefore Yˆ = 2 + 1.25 X
4-17
Measuring the Fit
of the Regression Model
◼ Regression models can be developed
for any variables X and Y.
◼ How do we know the model is actually
helpful in predicting Y based on X?
◼ We could just take the average error, but
the positive and negative errors would
cancel each other out.
◼ Three measures of variability are:
◼ SST – Total variability about the mean.
◼ SSE – Variability about the regression line.
◼ SSR – Total variability that is explained by
the model.
4-18
Measuring the Fit
of the Regression Model
◼ Sum of the squares total:
SST = (Y − Y )2
◼ An important relationship:
SST = SSR + SSE
4-19
Measuring the Fit
of the Regression Model
Sum of Squares for Triple A Construction
^ ^ ^
Y X (Y – Y)2 Y (Y – Y)2 (Y – Y)2
6 3 (6 – 7)2 = 1 2 + 1.25(3) = 5.75 0.0625 1.563
4-20
Table 4.3
Measuring the Fit
of the Regression Model
◼ Sum of the squares total
For Triple A Construction
SST = (Y − Y ) 2
SST = 22.5
◼ Sum of the squared error SSE = 6.875
SSRˆ=2 15.625
SSE = e = (Y − Y )
2
◼ An important relationship
SST = SSR + SSE
4-21
Measuring the Fit
of the Regression Model
Deviations from the Regression Line and from the Mean
Figure 4.2
4-22
Coefficient of Determination
◼ The proportion of the variability in Y explained by
the regression equation is called the coefficient
of determination.
◼ The coefficient of determination is r2.
SSR SSE
r = 2
= 1−
SST SST
◼ For Triple A Construction:
15.625
r =
2
= 0.6944
22.5
◼ About 69% of the variability in Y is explained by
the equation based on payroll (X).
4-23
Correlation Coefficient
◼ The correlation coefficient is an expression of the
strength of the linear relationship.
◼ It will always be between +1 and –1.
◼ The correlation coefficient is r.
r = r2
r = 0.6944 = 0.8333
4-24
Four Values of the Correlation
Coefficient
Y Y
*
*
* ** *
*
* ** *
* * *
* *
(a) Perfect Positive X (b) Positive X
Correlation: Correlation:
r = +1 0<r<1
Y Y
* *
* * *
* * * * *
* *
* *** *
*
Figure 4.3 (c) No Correlation: X (d) Perfect Negative X
r=0 Correlation:
r = –1
4-25
Using Computer Software
for Regression
Accessing the Regression Option in Excel 2010
Program 4.1A
4-26
Using Computer Software
for Regression
Data Input for Regression in Excel
Program 4.1B
4-27
Using Computer Software
for Regression
Excel Output for the Triple A Construction Example
Program 4.1C
4-28
Assumptions of the Regression Model
4-29
Residual Plots
Pattern of Errors Indicating Randomness
Error
Figure 4.4A
4-30
Residual Plots
Nonconstant error variance
Error
X
Figure 4.4B
4-31
Residual Plots
Errors Indicate Relationship is not Linear
Error
Figure 4.4C
4-32
Estimating the Variance
SSE
s = MSE =
2
n− k −1
where
n = number of observations in the sample
k = number of independent variables
4-33
Estimating the Variance
4-34
Testing the Model for Significance
4-35
Testing the Model for Significance
4-37
Testing the Model for Significance
4-39
Steps in a Hypothesis Test
4. Make a decision using one of the following
methods:
a) Reject the null hypothesis if the test statistic is
greater than the F-value from the table in Appendix D.
Otherwise, do not reject the null hypothesis:
Reject if Fcalculated F ,df1 ,df2
df1 = k
df2 = n − k − 1
0.05
F = 7.71 9.09
Figure 4.5
4-43
Analysis of Variance (ANOVA) Table
DF SS MS F SIGNIFICANCE
Regression k SSR MSR = SSR/k MSR/MSE P(F >
MSR/MSE)
Residual n-k-1 SSE MSE =
SSE/(n - k - 1)
Total n-1 SST
4-44
Table 4.4
ANOVA for Triple A Construction
Program 4.1C
(partial)
P(F > 9.0909) = 0.0394
Because this probability is less than 0.05, we reject
the null hypothesis of no linear relationship and
conclude there is a linear relationship between X
and Y.
4-45
Multiple Regression Analysis
◼ Multiple regression models are
extensions to the simple linear model
and allow the creation of models with
more than one independent variable.
Y = 0 + 1X1 + 2X2 + … + kXk + e
where
Y = dependent variable (response variable)
Xi = ith independent variable (predictor or explanatory
variable)
0 = intercept (value of Y when all Xi = 0)
i = coefficient of the ith independent variable
k = number of independent variables
e = random error
4-46
Multiple Regression Analysis
To estimate these values, a sample is taken the
following equation developed
Yˆ = b0 + b1 X 1 + b2 X 2 + ... + bk X k
where
Ŷ = predicted value of Y
b0 = sample intercept (and is an estimate of 0)
bi = sample coefficient of the ith variable (and is
an estimate of i)
4-47
Jenny Wilson Realty
Jenny Wilson wants to develop a model to determine
the suggested listing price for houses based on the
size and age of the house.
Yˆ = b0 + b1 X 1 + b2 X 2
where
Ŷ = predicted value of dependent variable (selling
price)
b0 = Y intercept
X1 and X2 = value of the two independent variables (square
footage and age) respectively
b1 and b2 = slopes for X1 and X2 respectively
Program 4.2A
4-50
Jenny Wilson Realty
Output for the Jenny Wilson Realty Multiple
Regression Example
4-51
Program 4.2B
Evaluating Multiple Regression Models
4-52
Evaluating Multiple Regression Models
4-53
Jenny Wilson Realty
◼ The model is statistically significant
◼ The p-value for the F-test is 0.002.
◼ r2 = 0.6719 so the model explains about 67% of
the variation in selling price (Y).
◼ But the F-test is for the entire model and we can’t
tell if one or both of the independent variables are
significant.
◼ By calculating the p-value of each variable, we can
assess the significance of the individual variables.
◼ Since the p-value for X1 (square footage) and X2
(age) are both less than the significance level of
0.05, both null hypotheses can be rejected.
4-54
Binary or Dummy Variables
4-55
Jenny Wilson Realty
◼ Jenny believes a better model can be developed if
she includes information about the condition of
the property.
X3 = 1 if house is in excellent condition
= 0 otherwise
X4 = 1 if house is in mint condition
= 0 otherwise
◼ Two dummy variables are used to describe the
three categories of condition.
◼ No variable is needed for “good” condition since
if both X3 and X4 = 0, the house must be in good
condition.
4-56
Jenny Wilson Realty
Input Screen for the Jenny Wilson Realty Example
with Dummy Variables
4-57
Program 4.3A
Jenny Wilson Realty
Output for the Jenny Wilson Realty Example with
Dummy Variables
4-58
Program 4.3B
Model Building
4-59
Model Building
◼ The formula for r2
SSR SSE
r =
2
= 1−
SST SST
SST /( n − 1)
4-61
Nonlinear Regression
* * * *
*
** * * ** *
*
*** * ** *
Linear relationship Nonlinear relationship
4-62
Colonel Motors
◼ Engineers at Colonel Motors want to use
regression analysis to improve fuel efficiency.
◼ They have been asked to study the impact of
weight on miles per gallon (MPG).
Figure 4.6A
4-64
Colonel Motors
Excel Output for Linear Regression Model with
MPG Data
Figure 4.6B
4-66
Colonel Motors
◼ The nonlinear model is a quadratic model.
◼ The easiest way to work with this model is to
develop a new variable.
X 2 = ( weight)2
◼ This gives us a model that can be solved with
linear regression software:
Yˆ = b0 + b1 X 1 + b2 X 2
4-67
Colonel Motors
Program 4.5
A better model with a
smaller F-test for
significance and a larger
adjusted r2 value 4-68
Cautions and Pitfalls