Solution_chapter 14
Solution_chapter 14
1. a.
c. Many different straight lines can be drawn to provide a linear approximation of the
line that “best” represents the relationship according to the least squares criterion.
d.
b0 y b1 x 8 (2.6)(3) 0.2
e.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
2. a.
c. Many different straight lines can be drawn to provide a linear approximation of the
line that “best” represents the relationship according to the least squares criterion.
d.
e.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
3. a.
b.
c.
4. a.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. There appears to be a positive linear relationship between the percentage of women
working in the five companies (x) and the percentage of management jobs held by
c. Many different straight lines can be drawn to provide a linear approximation of the
line that “best” represents the relationship according to the least squares criterion.
d.
e.
5. a.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. There appears to be a negative relationship between line speed (feet per minute) and
c. Let x = line speed (feet per minute) and y = number of defective parts.
d.
6. a.
number of passing yards per attempt and y = the percentage of games won by the
team.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.
d. The slope of the estimated regression line is approximately 17.2. So, for every
increase of one yard in the average number of passes per attempt, the percentage of
e. With an average number of passing yards per attempt of 6.2, the predicted percentage
of games won is = –70.391 + 17.175(6.2) = 36%. With a record of seven wins and
nine losses, the percentage of Kansas City Chiefs wins is 43.8 or approximately 44%.
Considering the small data size, the prediction made using the estimated regression
7. a.
150
140
130
120
Annual Sales ($1000s)
110
100
90
80
70
60
50
0 2 4 6 8 10 12 14
Years of Experience
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. Let x = years of experience and y = annual sales ($1000s)
y 80 4 x
8. a.
4.5
4.0
3.5
Satisfaction
3.0
2.5
2.0
2.0 2.5 3.0 3.5 4.0 4.5
Speed of Execution
c.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
d. The slope of the estimated regression line is approximately .9077. So, a one unit
increase in the speed of execution rating will increase the overall satisfaction rating
by approximately .9 points.
e. The average speed of execution rating for the other brokerage firms is 3.4. Using this
as the new value of x for Zecco.com, we can use the estimated regression equation
Thus, an estimate of the overall satisfaction rating when x = 3.4 is approximately 3.3.
9. a.
25
20
Landscaping Expenditures ($000)
15
10
0
100 200 300 400 500 600 700 800 900
Home Value ($000)
b. The scatter diagram indicates a positive linear relationship between x = Home Value
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
($ thousands) and y = landscaping expenditures ($ thousands).
c.
y = .0214 x + 6.3091
e. or $18,614
10. a.
350.00
300.00
250.00
200.00
Price ($)
150.00
100.00
50.00
0.00
15 20 25 30 35 40 45 50
Age (years)
b. The scatter diagram indicates a positive linear relationship between x = age of wine
and y = price of a 750-ml bottle of wine. In other words, the price of the wine
c.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
d. The slope of the estimated regression line is approximately 6.95. So, for every
11. a.
85
80
75
Overall Score
70
65
60
55
50
400 500 600 700 800 900 1000 1100 1200 1300 1400
Price ($)
b. The scatter diagram indicates a positive linear relationship between x = price ($) and
y = overall score.
c.
d. The slope of .0212 means that spending an additional $100 in price will increase the
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
overall score by approximately two points.
12. a.
10.00
8.00
6.00
% Return Coca-Cola
4.00
2.00
0.00
-6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 10.00
-2.00
-4.00
percentage return of the S&P 500 and y = percentage return for Coca-Cola.
c.
d. A 1-percent increase in the percentage return of the S&P 500 will result in a .529
e. The beta of .529 for Coca-Cola differs somewhat from the beta of .82 reported by
Yahoo Finance. This is likely to the result of differences in the period over which the
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
data were collected and the amount of data used to calculate the beta. Note: Yahoo
13. a.
30.0
Reasonable Amount of Itemized Deductions
25.0
20.0
($1000s)
15.0
10.0
5.0
0.0
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0
y 4.68 016
. x
y 4.68 016
. x 4.68 016
. (52.5) 13.08
c. or approximately $13,080. The agent's request for
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
14. a.
6
Number of Days Absent
0
0 2 4 6 8 10 12 14 16 18 20
Distance to Work (miles)
The scatter diagram indicates a negative linear relationship between x = distance to work
b.
six days.
15. a. The estimated regression equation and the mean for the dependent variable are:
The sum of squares resulting from error and the total sum of squares are:
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Thus, SSR = SST – SSE = 80 – 12.4 = 67.6
The least squares line provided a good fit; 84.5% of the variability in y has been
c.
16. a. The estimated regression equation and the mean for the dependent variable are:
The sum of squares due to error and the total sum of squares are
The least squares line provided an excellent fit; 87.6% of the variability in y has been
c.
Note: the sign for r is negative because the slope of the estimated regression equation is
negative.
(b1 = –3)
17. The estimated regression equation and the mean for the dependent variable are:
The sum of squares due to error and the total sum of squares are:
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
We see that 54.7% of the variability in y has been explained by the least squares line.
18. a.
b.
c.
19. a. The estimated regression equation and the mean for the dependent variable are:
= 80 + 4x = 108
The sum of squares resulting from error and the total sum of squares are:
We see that 93% of the variability in y has been explained by the least squares line.
c.
20. a.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
SSR = SST – SSE = 52,120,800 – 7,102,922.54 = 45,017,877
c.
Thus, an estimate of the price for a bike that weighs 15 pounds is $6,989.
21. a.
y 1246.67 7.6 x
b. $7.60
c. The sum of squares resulting from error and the total sum of squares are:
We see that 95.87% of the variability in y has been explained by the estimated regression
equation.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. The estimated regression equation provided an especially good fit; approximately
90% of the variability in the dependent variable was explained by the linear
c.
c.
d.
Using t table (3 degrees of freedom), area in tail is between .01 and .025.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Source of Sum of Degrees of Mean F p-value
Total 80.0 4
b.
c.
d.
Using t table (3 degrees of freedom), area in tail is less than .01; p-value is less
than .02.
than .025.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Source of Variation Sum of Degrees of Mean F p-value
Total 1850 4
b.
Using t table (3 degrees of freedom), area in tail is between .05 and .10.
related.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
related.
Using t table (4 degrees of freedom), area in tail is between .005 and .01.
c.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Variation Squares Freedom Square
Total 1,800 5
27. a.
The scatter diagram suggests a positive linear relationship between the two
variables.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c. SSE = SST = = 14,706,900,000
Thus,
28. The sum of squares due to error and the total sum of squares are:
Thus,
We can use either the t test or F test to determine whether speed of execution and overall
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Using t table (9 degrees of freedom), the area in the tail is less than .005; the p-
Because we can reject H0: = 0 we conclude that speed of execution and overall
Using the F table (1 degree of freedom numerator and 9 denominator), the p-value
Because we can reject H0: = 0 we conclude that speed of execution and overall
Total 3.5800 10
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
29. SSE = 233,333.33 SST = = 5,648,333.33
Thus,
Total 5,648,333.33 5
Using the F table (1 degree of freedom numerator and 4 denominator), the p-value
Because p-value , we reject H0: β1 = 0. Production volume and total cost are
related.
Thus,
= 56.655
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Using t table (4 degrees of freedom), area in tail is less than .005.
Using the F table (1 degree of freedom numerator and 8 denominator), the p-value
32. a. s = 2.033
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
10.6 + 3.182 (1.11) = 10.6 + 3.53
or 7.07 to 14.13.
c.
d.
or 3.22 to 17.98.
33. a. s = 8.7560
b.
or 30.07 to 57.93.
c.
d.
44 + 3.182(9.7895) = 44 + 31.15
or 12.85 to 75.15.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
34. s = 6.5141
or 8.65 to 28.15.
or -4.50 to 41.30.
The two intervals are different because there is more variability associated with
35. a.
b.
or 21.32 to 40.28.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.
or 6.47 to 55.13.
d. As expected, the prediction interval is much wider than the confidence interval. This
is because it is more difficult to predict the waiting time for an individual customer
arriving with three people in line than it is to estimate the mean waiting time for a
36. a.
b.
c. As expected, the prediction interval is much wider than the confidence interval. This
is because it is more difficult to predict annual sales for one new salesperson with
nine years of experience than it is to estimate the mean annual sales for all
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
salespersons with nine years of experience.
37. a.
s2 = 1.88 s = 1.37
b. = 1.47
d. Any deductions exceeding the $16,860 upper limit could suggest an audit.
b.
or $3,815.10 to $6,278.24.
c. Based on one month, $6,000 is not out of line because $3,815.10 to $6,278.24 is the
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
prediction interval. However, a sequence of five to seven months with consistently
or $94.84 to $124.08.
c.
or $110.69 to $188.85.
40. a. 9
b. = 20.0 + 7.21x
c. 1.3626
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Using Excel, the p-value corresponding to F = 28.00 is .0011.
b.
Using the t table (8 degrees of freedom), the area in the tail is less
than .005.
b. 30
c. F = MSR / MSE = 6828.6/82.1 = 83.17
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
43. a.
800
700
600
500
Total Batch Time
400
300
200
100
0
0 100 200 300 400 500 600 700
Quantity
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
= .7631(Quantity) + 117.5419
The intercept is the estimate for the setup time (117.5 minutes), and the slope (.7631
1000
900
800
700
600
Price ($)
500
400
300
200
100
0
45 50 55 60 65 70
Weight (oz)
b. There appears to be a negative linear relationship between the two variables. The
Analysis of Variance
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Total 17 597626
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
R Large residual
45. a.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. The residuals are 3.48, –2.47, –4.83, –1.6, and 5.22.
c.
Residuals 2
-2
-4
-6
4 6 8 10 12 14 16 18 20 22
With only five observations, it is difficult to determine if the assumptions are satisfied.
However, the plot does suggest curvature in the residuals that would indicate that the
error term assumptions are not satisfied. The scatter diagram for these data also indicates
d.
e. The standardized residual plot has the same shape as the original residual plot. The
curvature observed indicates that the assumptions regarding the error term may not be
satisfied.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
46. a.
b.
Residuals
0
-1
-2
-3
-4
1 2 3 4 5 6 7 8 9 10
The assumption that the variance is the same for all values of x is
Because p-value = .05, we conclude that the two variables are related.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.
10
0
Residuals
-5
-10
-15
25 30 35 40 45 50 55 60 65
Predicted Values
d. The residual plot leads us to question the assumption of a linear relationship between
x and y. Even though the relationship is significant at the .05 level of significance, it
48. a.
2
Residuals
-2
-4
-6
-8
0 2 4 6 8 10 12 14
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
49. a. A portion of the output follows.
Regression Statistics
Multiple R 0.8092
R Square 0.6549
Observations 61.0000
ANOVA
df SS MS F Significance F
Total 60 9,142.6756
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b.
30
25
20
Residual 15
10
0
0.00 5,000.00 10,000.00 15,000.00 20,000.00 25,000.00 30,000.00
-5
-10
-15
Density
c. The residual plot leads us to question the assumption of constant variance of the error
terms. The plot indicates that the absolute value of the residuals is larger for larger
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The first data point (y= 145, x=135) has a large standardized residual.
b.
The standardized residual plot indicates that the observation x = 135, y = 145 may be an
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c. The scatter diagram follows.
150
145
140
135
130
125
y
120
115
110
105
100
100 110 120 130 140 150 160 170 180
The scatter diagram also indicates that the observation x = 135, y = 145 may be an
outlier; the implication is that for simple linear regression an outlier can be identified by
Analysis of Variance
Total 7 101.500
Model Summary
S R-sq R-sq(adj)
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficients
Regression Equation
y = 13.00 + 0.425 x
R Large residual
X Unusual X
The standardized residuals are: –1.00, –.40, .01, –.48, .25, .65, 2.00, –2.16.
The last two observations in the data set appear to be outliers because the
standardized residuals for these observations are 2.00 and –2.16, respectively.
Here we identify an observation as having high leverage if hi > 6/n; for these data, 6/n =
6/8 = .75. Because the leverage for the observation x = 22, y = 19 is .76, we would
an influential observation.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.
30
25
20
15
y
10
0
0 5 10 15 20 25
observation.
52. a.
120
100
80
Program Expenses ($)
60
40
20
0
0 5 10 15 20 25
Fund-raising Expenses (%)
The scatter diagram does indicate potential influential observations. For example, the
22.2% fund-raising expense for the American Cancer Society and the 16.9% fund-raising
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
expense for the St. Jude Children’s Research Hospital look like they may each have a
large influence on the slope of the estimated regression line. And with a fund-raising
expense of 2.6%, the percentage spent on programs and services by the Smithsonian
Institution (73.7%) seems to be somewhat lower than would be expected; thus, this
Analysis of Variance
Total 9 855.2
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Fits and Diagnostics for Unusual Observations
R Large residual
X Unusual X
c. The slope of the estimated regression equation is –0.917. Thus, for every 1% increase
in the amount spent on fund-raising, the percentage spent on program expenses will
decrease by .917%; in other words, just a little less than 1%. The negative slope and
d. The output in part b indicates that there are two unusual observations:
standardized residual.
low side compared to most of the other supersized charities, the percentage spent
on program expenses appears to be much lower than one would expect. It appears
that the Smithsonian’s administrative expenses are too high. But thinking about
the expenses of running a large museum like the Smithsonian, the percentage
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
costs for a museum are higher than for some other types of organizations. The
especially large value of fund-raising expenses for the American Cancer Society
suggests that this obervation has a large influence on the estimated regression
equation. The following output shows the results if this observation is deleted
The y-intercept has changed slightly, but the slope has changed from –.917
to –1.00.
53. a.
60
50
Shopping Time (Minutes)
40
30
20
10
0
0 20 40 60 80 100 120 140
Arrival Time (Minutes before 6:00 p.m.)
b. There appears to be a positive relationship between the two variables, but observation
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.
d. In part c, we have highlighted observation 32, which has a standardized residual less
e. Looking at the scatter diagram in part a, observation 32 probably will have a lot of
drop the observation from the data set and fit a new estimated regression equation.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The output we obtained follows.
Note that the slope of the estimated regression equation is now .2829 as compared to a
value of .2543 when this observation is included. Thus, we see that this observation has
an impact on the value of the slope of the fitted line and hence we would say that it is an
influential observation. Also note that the R2 value has increased from .65 to .75,
indicating that observation 32 was an outlier. If possible, Kroger should analyze the
dissimilar from the other observations (or if this observation suffers from data recording
error), then removing it from the analysis may be reasonable. Otherwise, Kroger should
be wary of removing the observation (and thereby inflating the accuracy of the model)
but instead investigate the possibility of additional data collection to reduce its influence.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
54. a.
2,500
1,500
1,000
500
0
150 200 250 300 350 400 450 500
Revenue ($ millions)
The scatter diagram does indicate potential outliers or influential observations (or both).
For example, the New York Yankees have both the highest revenue and value and appear
to be an influential observation. The Los Angeles Dodgers have the second highest value
Regression Statistics
Multiple R 0.9062
R Square 0.8211
Observations 30
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
ANOVA
df SS MS F Significance F
Total 29 4296009.367
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Thus, the estimated regression equation that can be used to predict the team’s value given the value of annual
c. The standard residual value for the Los Angeles Dodgers is 4.7 and should be treated as an outlier. To determine if the New
York Yankees point is an influential observation, we can remove the observation and compute a new estimated regression
equation. The results show that the estimated regresssion equation is = –449.061 + 5.2122 revenue. The following two
scatter diagrams illustrate the small change in the estimated regression equation after removing the observation for the New
York Yankees. These diagrams show that the effect of the New York Yankees observation on the regression results is not
that dramatic.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Scatter Diagram Including the New York Yankees Observation
55. No. Regression or correlation analysis can never prove that two variables are causally
related.
56. The estimate of a mean value is an estimate of the average of all y values associated with
the same x. The estimate of an individual y value is an estimate of only one of the y
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
57. The purpose of testing whether is to determine whether or not there is a significant
relationship between x and y. However, rejecting does not necessarily imply a good
58. a.
1450
1400
1350
S&P 500
1300
1250
1200
12300 12400 12500 12600 12700 12800 12900 13000 13100 13200 13300
DJIA
Analysis of Variance
Total 14 23346
Model Summary
S R-sq R-sq(adj)
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficients
Regression Equation
c. Using the F test, the p-value corresponding to F = 239.89 is .000. Because the p-value
d. With R-Sq = 94.9%, the estimated regression equation provided an excellent fit.
e.
f. The DJIA is not that far beyond the range of the data. With the excellent fit provided
by the estimated regression equation, we should not be too concerned about using the
59. a.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
350.0
300.0
250.0
150.0
100.0
50.0
0.0
0.50 1.00 1.50 2.00 2.50 3.00 3.50
The scatter diagram suggests that there is a linear relationship between size and
approximately $171,166.
e. The estimated regression equation should provide a good estimate because r2 = 0.897.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
f. This estimated equation might not work well for other cities. Housing markets are
also driven by other factors that influence demand for housing such as job market and
quality-of-life factors. For example, because of the existence of high-tech jobs and its
proximity to the ocean, Seattle, Washington, has houses that are quite different from
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
60. a.
The scatter diagram indicates a positive linear relationship between the two variables.
Online universities with higher retention rates tend to have higher graduation rates.
Analysis of Variance
Total 28 2725.3
Model Summary
S R-sq R-sq(adj)
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficients
Regression Equation
R Large residual
X Unusual X
d. The estimated regression equation is able to explain 44.9% of the variability in the
graduation rate based on the linear relationship with the retention rate. It is not a great
fit, but the fit is reasonably good given the type of data.
standardized residual. With a retention rate of 51% it does appear that the graduation
rate of 25% is low compared to the results for other online universities. The president
of South University should be concerned after looking at the data. Using the
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
estimated regression equation, we estimate that the graduation rate at South
x value gives it large influence. With a retention rate of only 4%, the president of the
Analysis of Variance
Total 9 1004.5
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
Variable Setting
Usage 30
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Fit SE Fit 95% CI 95% PI
H0: 1 = 0.
Analysis of Variance
Total 5 34.000
Model Summary
S R-sq R-sq(adj)
Coefficients
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Equation
Variable Setting
Speed 50
b. Because the p-value corresponding to F = 11.33 = .028 < = .05, the relationship is
significant.
c. = .739; a good fit. The least squares line explained 73.9% of the variability in the
number of defects.
d. Using the output in part a, the 95% confidence interval is 12.294 to 17.2712.
63. a.
9
8
7
6
5
Days
4
3
2
1
0
0 2 4 6 8 10 12 14 16 18 20
Distance
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
There appears to be a negative linear relationship between distance to work and
Analysis of Variance
Total 9 46.000
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
Variable Setting
Distance 5
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c. Since the p-value corresponding to F = 19.67 is .002 < α = .05. We reject H0 : 1 = 0.
There is a significant relationship between the number of days absent and the
distance to work.
e. The 95% confidence interval is 5.19502 to 7.5586 or approximately 5.2 to 7.6 days.
Analysis of Variance
Total 9 357650
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
Variable Setting
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Age 4
b. Because the p-value corresponding to F = 54.75 is .000 < α = .05, we reject H0: 1 =
Analysis of Variance
Total 9 3702.5
Model Summary
S R-sq R-sq(adj)
Coefficients
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Equation
Variable Setting
Hours 95
b. Because the p-value corresponding to F = 57.42 is .000 < α = .05, we reject H0: 1 =
c. 84.65 points
Analysis of Variance
Total 9 107.04
Model Summary
S R-sq R-sq(adj)
Coefficients
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Term Coef SE Coef T-Value P-Value VIF
Regression Equation
b. Because the p-value = 0.029 is less than α = .05, the relationship is significant.
c. r2 = .470. The least squares line does not provide an especially good fit.
Analysis of Variance
Total 19 1.0020
Model Summary
S R-sq R-sq(adj)
Coefficients
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Adjusted_Gross Income 0.000039 0.000017 2.23 0.038 1.00
Regression Equation
Variable Setting
b. Because the p-value = 0.038 is less than α = .05, the relationship is significant.
c. r2 = .217. The least squares line does not provide a good fit.
68. a.
18.0
16.0
14.0
Price ($1000s)
12.0
10.0
8.0
6.0
4.0
0 20 40 60 80 100 120
Miles (1000s)
b. There appears to be a negative relationship between the two variables that can be
approximated by a straight line. An argument could also be made that the relationship
is perhaps curvilinear because at some point a car has so many miles that its value
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c. The output follows.
Analysis of Variance
Total 18 87.547
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
e. = .5387; a reasonably good fit considering that the condition of the car is also an
f. The slope of the estimated regression equation is –.0558. Thus, a one-unit increase in
the value of x coincides with a decrease in the value of y equal to .0558. Because the
data were recorded in thousands, every additional 1,000 miles on the car’s odometer
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
g. The predicted price for a 2007 Camry with 60,000 miles is = 16.47 –.0588(60) =
12.942 or $12,942. Because of other factors such as condition and whether the seller
is a private party or a dealer, this is probably not the price you would offer for the car.
But it should be a good starting point in figuring out what to offer the seller.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Case Solutions
From the descriptive statistics we see that six companies had a higher mean monthly return
than the market (as measured by the S&P 500): Exxon Mobil, Caterpillar, McDonald’s,
Sandisk, Qualcomm, and Procter & Gamble. Microsoft and Johnson & Johnson had lower
Using the standard deviation as a measure of volatility, Sandisk was the most volatile
stock with a standard deviation of .1954. The stocks of Johnson & Johnson and P & G
exhibit less volatility than the other individual stocks, but all of the individual stocks are
more volatile than the market as a whole. The diversification embodied in the S&P 500
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. The estimated regression equation relating each of the individual stocks to the S&P 500 is
The betas (slope of estimated regression equation) for the individual stocks can be obtained
Company Beta
Microsoft .458
Caterpillar 1.493
McDonald’s 1.503
Sandisk 2.600
Qualcomm 1.414
The beta for the market as a whole is 1. So any stock with a beta greater than 1 will move up
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
faster than the market when the market goes up. Any stock with a beta less than 1 will not go
We would expect Sandisk, with a beta of 2.6, to benefit most from an up market.
Johnson & Johnson, with a beta of .009 is least affected by the market. The effect of the
market going down cannot be expected to exert much downward pressure on shares of
c. The values seem to indicate that from 0% to 33.8% of the variability of the returns in
MIN MAX Q1 Q3
The following scatter diagram suggests a linear relationship between these two variables:
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
4.5
4
3.5
3
Fatal Accidents
2.5
2
1.5
1
0.5
0
6 8 10 12 14 16 18 20
Percent Under 21
Analysis of Variance
Total 41 47.028
Model Summary
Coefficients
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Constant -1.597 0.372 -4.30 0.000
Regression Equation
R Large residual
There is a significant relationship between the two variables. Two observations are identified as
having a large standardized residual and should be treated as possible outliers; the following
standardized residual plot does not indicate any other problems with the residuals.
Conclusion: The number of fatal accidents per 1000 licenses appear to be linearly related
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
to the percentage of licensed drivers under the age of 21; that is, the higher the percentage of
Mode 200 12 5 66
Range 320 6 3 24
Minimum 80 10 4 42
Maximum 400 16 7 66
Count 28 28 28 28
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Score Price ($) Megapixels
0.000
0.969 0.481
With a sample correlation coefficient of .683, price appears to be the best predictor of the
overall score.
There appears to be a positive relationship between the price of the camera and the
overall score. But, observation 17, a Nikon camera with a price of $400, appears to be an
70
65
60
55
Score
50
45
40
35
30
50 100 150 200 250 300 350 400 450
Price ($)
observation that will have a significant impact when we fit a linear model to these data. It
may be worth considering restricting the analysis to cameras that have a price of less than
$400. Another possible explanation for what we observe here is that the underlying
relationship may not be linear. In other words, the somewhat curvilinear trend in the data
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The number of megapixels does not appear to have much effect on the overall
score, but note that as the number of megapixels increase from 10 to 14, the overall score
appears to have a downward trend—that is, the overall score is decreasing. This seems to
be counterintuitive in that generally speaking, higher megapixel cameras are usually
70
65
60
55
50
Score
45
40
35
30
8 9 10 11 12 13 14 15 16 17
Megapixels
considered to have better picture quality. But, the overall score for the 16 megapixel
cameras does increase somewhat.
There may be a modest increase in overall score for cameras that weigh more. Also note
the large variability in the score for cameras with a weight of 5 ounces and cameras with
a weight of 7 ounces. The pattern in the data may also be an indication that the effect of
70
65
60
55
50
Score
45
40
35
30
© 2019 Cengage Learning.3 All Rights3.5 4 not be4.5
Reserved. May 5 or duplicated,
scanned, copied 5.5 or posted
6 to a publicly
6.5 accessible
7 website,
7.5in whole or in part.
Weight (oz.)
Conclusion: The variable that appears to be the best predictor of overall score is
Analysis of Variance
Total 27 1210.4
Model Summary
Coefficients
Regression Equation
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Obs Score Fit Resid Std Resid
R Large residual
X Unusual X
With a p-value = .000, price is a significant factor in predicting the overall score. The estimated
regression equation explained 46.7% of the variability in the overall score. Note two unusual
observations: observations 17 and 28. But observation 17 is listed as being an observation with a
large leverage and thus is considered an influential observation. To confirm this conclusion, the
following regression output show the results after removing observation 17 from the data.
Analysis of Variance
Total 26 1203.2
Model Summary
Coefficients
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Term Coef SE Coef T-Value P-Value VIF
Regression Equation
R Large residual
X Unusual X
Note that the slope of the estimated line without observation 17 is .0724 as compared to the slope
of .0552 with observation 17. And the fit has also improved.
provides a better fit? No. But if we are interested in only exploring the relationship between price
and overall score for cameras that cost less than $400, then removing observation 17 from the
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
4. Using only the data for the Canon cameras, the scatter diagram using the price of the
There does appear to be a relationship between the price of the camera and the overall
score. But, the relationship appears to be curvilinear. However, using simple linear
70
65
60
55
50
Score
45
40
35
30
50 100 150 200 250 300 350
Price ($)
Model Summary
Total 12 455.69
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
S R-sq R-sq(adj) R-sq(pred)
Coefficients
Regression Equation
R Large residual
The estimated regression equation is significant and explains 68.39% of the variability in
the overall score using the price of the camera, but the curvilinear relationship we
observed in the scatter diagram is still a concern. If we are willing to only consider
cameras with a price of $200 or less, then a linear relationship may be able to be used as
an approximation. For instance, the following regression output show the results using
Analysis of Variance
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Price ($) 1 261.97 261.974 26.62 0.001
Total 10 350.55
Model Summary
Coefficients
Regression Equation
The fit has improved slightly, but the issue whether the underlying relationship may be
better described by curvilinear model cannot be resolved using the methods introduced in
this chapter.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Price ($) Cost/Mile Road- Predicted Value
Score
Count 20 20 20 20 20
Analysis of Variance
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression 1 0.2428 0.24276 8.78 0.008
Total 19 0.7407
Model Summary
Coefficients
Regression Equation
R Large residual
Analysis of Variance
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression 1 0.38012 0.38012 18.97 0.000
Total 19 0.74072
Model Summary
Coefficients
Regression Equation
R Large residual
Analysis of Variance
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Source DF Adj SS Adj MS F-Value P-Value
Total 19 0.7407
Model Summary
Coefficients
Regression Equation
X Unusual X
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analysis of Variance
Total 19 0.74072
Model Summary
Coefficients
Regression Equation
R Large residual
6. Although Consumer Reports did not include price as one of the components of value
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
score, the regression output shown in part 2 shows a significant statistical relationship
between price ($) and value score (p-value = .008). Reviewing the regression output in
parts 3–5 indicates that cost/mile is the best single predictor of value score (R-Sq =
51.3%). To further investigate the relationship among these variables, we really need to
1. Descriptive statistics for the dependent and independent variables and a scatter plot of
these two variables follow. The mean population over the 151 zip codes is 15,738.2 and
the mean number of season pass holders is 128.3. Over all zip codes, the maximum
population is 62,303 and the minimum is 1,227. The maximum number of season pass
There appears to be a positive linear relationship between the population of a zip code
1400
1200
1000
800
Members
600
400
200
0
0 10000 20000 30000 40000 50000 60000 70000
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Population
and the number of season pass holders.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
40 357.0 170.8 186.2 2.13 R
R Large residual
X Unusual X
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
3. Significant relationship: p-value = .000 < = .05. We must check the model assumptions
for the validity of this inference. There are several unusual observations and observations
4. We see that 63.74% of the variability in y has been explained by the estimated regression
5. A plot of the residuals against the predicted number of season pass holders is shown
Because the variance does not appear to be constant, testing hypotheses and calculating
interval estimates from this model may not be appropriate. A curvilinear regression
6. Because the model provides a reasonable fit (r2 = .637), it could be used to guide the
marketing campaign. The following scatter diagram shows the estimated regression line
and the data. Any data point below the estimated regression line has fewer observed
number of season pass holders than is estimated by the estimated regression equation (the
point on the estimated regression line). These zip codes are good targets for the direct
mail campaign.
7. Other data of interest for independent variables might include the distance of the zip code
from the park, the average household income of the zip code, and the average number of
700
600
500 R² = 0.637429873240696
400
300
200
100
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
0
0 10000 20000 30000 40000 50000 60000 70000
Population