0% found this document useful (0 votes)
9 views

Solution_chapter 14

The document discusses various regression analyses, highlighting positive and negative linear relationships between different variables such as years of experience and annual sales, as well as the percentage of women in management roles. It includes calculations for regression equations, slopes, and predictions based on those equations, demonstrating the application of least squares criteria. Additionally, it provides insights into the variability explained by the regression models and the significance of these relationships.

Uploaded by

nancysinghal95
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Solution_chapter 14

The document discusses various regression analyses, highlighting positive and negative linear relationships between different variables such as years of experience and annual sales, as well as the percentage of women in management roles. It includes calculations for regression equations, slopes, and predictions based on those equations, demonstrating the application of least squares criteria. Additionally, it provides insights into the variability explained by the regression models and the significance of these relationships.

Uploaded by

nancysinghal95
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 93

Regression

1. a.

b. There appears to be a positive linear relationship between x and y.

c. Many different straight lines can be drawn to provide a linear approximation of the

relationship between x and y; in part d we will determine the equation of a straight

line that “best” represents the relationship according to the least squares criterion.

d.

b0  y  b1 x 8  (2.6)(3) 0.2

e.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
2. a.

b. There appears to be a negative linear relationship between x and y.

c. Many different straight lines can be drawn to provide a linear approximation of the

relationship between x and y; in part d we will determine the equation of a straight

line that “best” represents the relationship according to the least squares criterion.

d.

e.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
3. a.

b.

c.

4. a.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. There appears to be a positive linear relationship between the percentage of women

working in the five companies (x) and the percentage of management jobs held by

women in that company (y).

c. Many different straight lines can be drawn to provide a linear approximation of the

relationship between x and y; in part d we will determine the equation of a straight

line that “best” represents the relationship according to the least squares criterion.

d.

e.

5. a.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. There appears to be a negative relationship between line speed (feet per minute) and

the number of defective parts.

c. Let x = line speed (feet per minute) and y = number of defective parts.

d.

6. a.

b. The scatter diagram indicates a positive linear relationship between x = average

number of passing yards per attempt and y = the percentage of games won by the

team.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.

d. The slope of the estimated regression line is approximately 17.2. So, for every

increase of one yard in the average number of passes per attempt, the percentage of

games won by the team increases by 17.2%.

e. With an average number of passing yards per attempt of 6.2, the predicted percentage

of games won is = –70.391 + 17.175(6.2) = 36%. With a record of seven wins and

nine losses, the percentage of Kansas City Chiefs wins is 43.8 or approximately 44%.

Considering the small data size, the prediction made using the estimated regression

equation is not too bad.

7. a.

150
140
130
120
Annual Sales ($1000s)

110
100
90
80
70
60
50
0 2 4 6 8 10 12 14

Years of Experience

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. Let x = years of experience and y = annual sales ($1000s)

b0  y  b1 x 108  (4)(7) 80

y 80  4 x

y 80  4 x 80  4(9) 116


c. or $116,000

8. a.

4.5

4.0

3.5
Satisfaction

3.0

2.5

2.0
2.0 2.5 3.0 3.5 4.0 4.5
Speed of Execution

b. The scatter diagram indicates a positive linear relationship between x = speed of

execution rating and y = overall satisfaction rating for electronic trades.

c.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
d. The slope of the estimated regression line is approximately .9077. So, a one unit

increase in the speed of execution rating will increase the overall satisfaction rating

by approximately .9 points.

e. The average speed of execution rating for the other brokerage firms is 3.4. Using this

as the new value of x for Zecco.com, we can use the estimated regression equation

developed in part c to estimate the overall satisfaction rating corresponding to x = 3.4.

Thus, an estimate of the overall satisfaction rating when x = 3.4 is approximately 3.3.

9. a.

25

20
Landscaping Expenditures ($000)

15

10

0
100 200 300 400 500 600 700 800 900
Home Value ($000)

b. The scatter diagram indicates a positive linear relationship between x = Home Value

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
($ thousands) and y = landscaping expenditures ($ thousands).

c.

y = .0214 x + 6.3091

d. For every additional $1,000 in home value, $21.40 is spent on landscaping.

e. or $18,614

10. a.

350.00

300.00

250.00

200.00
Price ($)

150.00

100.00

50.00

0.00
15 20 25 30 35 40 45 50
Age (years)

b. The scatter diagram indicates a positive linear relationship between x = age of wine

and y = price of a 750-ml bottle of wine. In other words, the price of the wine

increases with age.

c.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
d. The slope of the estimated regression line is approximately 6.95. So, for every

additional year of age, the price of the wine increases by $6.95.

11. a.

85

80

75
Overall Score

70

65

60

55

50
400 500 600 700 800 900 1000 1100 1200 1300 1400
Price ($)

b. The scatter diagram indicates a positive linear relationship between x = price ($) and

y = overall score.

c.

d. The slope of .0212 means that spending an additional $100 in price will increase the

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
overall score by approximately two points.

e. A prediction of the overall score is

12. a.

10.00

8.00

6.00
% Return Coca-Cola

4.00

2.00

0.00
-6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 10.00

-2.00

-4.00

% Return S&P 500

b. The scatter diagram indicates a somewhat positive linear relationship between x =

percentage return of the S&P 500 and y = percentage return for Coca-Cola.

c.

d. A 1-percent increase in the percentage return of the S&P 500 will result in a .529

increase in the percentage return for Coca-Cola.

e. The beta of .529 for Coca-Cola differs somewhat from the beta of .82 reported by

Yahoo Finance. This is likely to the result of differences in the period over which the

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
data were collected and the amount of data used to calculate the beta. Note: Yahoo

uses the last five years of monthly returns to calculate beta.

13. a.

30.0
Reasonable Amount of Itemized Deductions

25.0

20.0
($1000s)

15.0

10.0

5.0

0.0
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0

Adjusted Gross Income ($1000s)

b. Let x = adjusted gross income and y = reasonable amount of itemized deductions.

y 4.68  016
. x

y 4.68  016
. x 4.68  016
. (52.5) 13.08
c. or approximately $13,080. The agent's request for

an audit appears to be justified.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
14. a.

6
Number of Days Absent

0
0 2 4 6 8 10 12 14 16 18 20
Distance to Work (miles)

The scatter diagram indicates a negative linear relationship between x = distance to work

and y = number of days absent.

b.

c. A prediction of the number of days absent is or approximately

six days.

15. a. The estimated regression equation and the mean for the dependent variable are:

yi 0.2  2.6 xi y 8

The sum of squares resulting from error and the total sum of squares are:

SSE  ( yi  yi ) 2 12.40 SST  ( yi  y ) 2 80

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Thus, SSR = SST – SSE = 80 – 12.4 = 67.6

b. r2 = SSR/SST = 67.6/80 = .845

The least squares line provided a good fit; 84.5% of the variability in y has been

explained by the least squares line.

c.

16. a. The estimated regression equation and the mean for the dependent variable are:

The sum of squares due to error and the total sum of squares are

Thus, SSR = SST – SSE = 1850 – 230 = 1620

b. r2 = SSR/SST = 1620/1850 = .876

The least squares line provided an excellent fit; 87.6% of the variability in y has been

explained by the estimated regression equation.

c.

Note: the sign for r is negative because the slope of the estimated regression equation is

negative.

(b1 = –3)

17. The estimated regression equation and the mean for the dependent variable are:

The sum of squares due to error and the total sum of squares are:

Thus, SSR = SST – SSE = 281.2 – 127.3 = 153.9

r2 = SSR/SST = 153.9/281.2 = .547

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
We see that 54.7% of the variability in y has been explained by the least squares line.

18. a.

SSR = SST – SSE = 1800 – 287.624 = 1512.376

b.

c.

19. a. The estimated regression equation and the mean for the dependent variable are:

= 80 + 4x = 108

The sum of squares resulting from error and the total sum of squares are:

Thus, SSR = SST – SSE = 2442 – 170 = 2272

b. r2 = SSR/SST = 2272/2442 = .93

We see that 93% of the variability in y has been explained by the least squares line.

c.

20. a.

b. SST = 52,120,800 SSE = 7,102,922.54

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
SSR = SST – SSE = 52,120,800 – 7,102,922.54 = 45,017,877

= SSR/SST = 45,017,877/52,120,800 = .864

The estimated regression equation provided a good fit.

c.

Thus, an estimate of the price for a bike that weighs 15 pounds is $6,989.

21. a.

y 1246.67  7.6 x

b. $7.60

c. The sum of squares resulting from error and the total sum of squares are:

Thus, SSR = SST – SSE = 5,648,333.33 – 233,333.33 = 5,415,000

r2 = SSR/SST = 5,415,000/5,648,333.33 = .9587

We see that 95.87% of the variability in y has been explained by the estimated regression

equation.

d. y 1246.67  7.6 x 1246.67  7.6(500) $5046.67

22. a. SSE = 1043.03

SSR = SST – SSR = 10,568 – 1043.03 = 9524.97

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. The estimated regression equation provided an especially good fit; approximately

90% of the variability in the dependent variable was explained by the linear

relationship between the two variables.

c.

This reflects a strong linear relationship between the two variables.

23. a. s2 = MSE = SSE / (n – 2) = 12.4 / 3 = 4.133

b. s  MSE  4.133 2.033

c.

d.

Using t table (3 degrees of freedom), area in tail is between .01 and .025.

p-value is between .02 and .05.

Using Excel the p-value corresponding to t = 4.04 is .0272.

Because p-value , we reject H0: β1 = 0.

e. MSR = SSR / 1 = 67.6

F = MSR / MSE = 67.6 / 4.133 = 16.36

Using F table (1 degree of freedom numerator and 3 denominator), p-value is

between .025 and .05/

Using Excel, the p-value corresponding to F = 16.36 is .0272.

Because p-value , we reject H0: β1 = 0.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Source of Sum of Degrees of Mean F p-value

Variation Squares Freedom Square

Regression 67.6 1 67.6 16.36 .0272

Error 12.4 3 4.133

Total 80.0 4

24. a. s2 = MSE = SSE/(n – 2) = 230/3 = 76.6667

b.

c.

d.

Using t table (3 degrees of freedom), area in tail is less than .01; p-value is less

than .02.

Using Excel, the p-value corresponding to t = –4.59 is .0193.

Because p-value , we reject H0: β1 = 0.

e. MSR = SSR/1 = 1,620

F = MSR/MSE = 1,620/76.6667 = 21.13

Using F table (1 degree of freedom numerator and 3 denominator), p-value is less

than .025.

Using Excel, the p-value corresponding to F = 21.13 is .0193.

Because p-value , we reject H0: β1 = 0.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Source of Variation Sum of Degrees of Mean F p-value

Squares Freedom Square

Regression 1620 1 1620 21.13 .0193

Error 230 3 76.6667

Total 1850 4

25. a. s2 = MSE = SSE/(n – 2) = 127.3/3 = 42.4333

b.

Using t table (3 degrees of freedom), area in tail is between .05 and .10.

p-value is between .10 and .20.

Using Excel, the p-value corresponding to t = 1.90 is .1530.

Because p-value > , we cannot reject H0: β1 = 0; x and y do not appear to be

related.

c. MSR = SSR/1 = 153.9 /1 = 153.9

F = MSR/MSE = 153.9/42.4333 = 3.63

Using F table (1 degree of freedom numerator and 3 denominator), p-value is

greater than .10.

Using Excel, the p-value corresponding to F = 3.63 is .1530.

Because p-value > , we cannot reject H0: β1 = 0; x and y do not appear to be

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
related.

26. a. In the statement of exercise 18, = 23.194 + .318x.

In solving exercise 18, we found SSE = 287.624.

Using t table (4 degrees of freedom), area in tail is between .005 and .01.

p-value is between .01 and .02.

Using Excel, the p-value corresponding to t = 4.58 is .010.

Because p-value , we reject H0: = 0; there is a significant relationship

between price and overall score.

b. In exercise 18, we found SSR = 1,512.376

MSR = SSR/1 = 1,512.376/1 = 1,512.376

F = MSR/MSE = 1,512.376/71.906 = 21.03

Using F table (1 degree of freedom numerator and 4 denominator), p-value is

between .025 and .01.

Using Excel, the p-value corresponding to F = 11.74 is .010.

Because p-value , we reject H0: =0

c.

Source of Sum of Degrees of Mean F p-value

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Variation Squares Freedom Square

Regression 1,512.376 1 1,512.376 21.03 .010

Error 287.624 4 71.906

Total 1,800 5

27. a.

The scatter diagram suggests a positive linear relationship between the two

variables.

b. Let x = GPA and y = annual salary ($).

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c. SSE = SST = = 14,706,900,000

Thus,

SSR = SST – SSE = 14,706,900,000 –1,047,747,129 = 13,659,152,871

MSR = SSR/1 = 13,659,152,871

MSE = SSE/(n – 2) = 1,047,747,129 /8 = 130,968,391.2

F = MSR / MSE = 13,659,152,871/130,968,391.2 = 104.2935

Using F table (1 degree of freedom numerator and 8 denominator), p-

value is less than .01.

Because p-value , we reject H0: β1 = 0

Salary and GPA are related.

28. The sum of squares due to error and the total sum of squares are:

Thus,

SSR = SST - SSE = 3.5800 – 1.4379 = 2.1421

s2 = MSE = SSE / (n - 2) = 1.4379 / 9 = .1598

We can use either the t test or F test to determine whether speed of execution and overall

satisfaction are related.

We will first illustrate the use of the t test.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Using t table (9 degrees of freedom), the area in the tail is less than .005; the p-

value is less than .01.

Using Excel, the p-value corresponding to t = 3.66 is .000.

Because p-value , we reject H0: = 0.

Because we can reject H0: = 0 we conclude that speed of execution and overall

satisfaction are related.

Next we illustrate the use of the F test.

MSR = SSR / 1 = 2.1421

F = MSR / MSE = 2.1421 / .1598 = 13.4

Using the F table (1 degree of freedom numerator and 9 denominator), the p-value

is less than .01.

Using Excel, the p-value corresponding to F = 13.4 is .005.

Because p-value , we reject H0: = 0.

Because we can reject H0: = 0 we conclude that speed of execution and overall

satisfaction are related.

The ANOVA table follows.

Source of Variation Sum of Degrees of Mean F p-value

Squares Freedom Square

Regression 2.1421 1 2.1421 13.4 .000

Error 1.4379 9 .1598

Total 3.5800 10

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
29. SSE = 233,333.33 SST = = 5,648,333.33

Thus,

SSR = SST – SSE = 5,648,333.33 –233,333.33 = 5,415,000

MSE = SSE/(n – 2) = 233,333.33/(6 – 2) = 58,333.33

MSR = SSR/1 = 5,415,000

F = MSR / MSE = 5,415,000 / 58,333.25 = 92.83

Source of Sum of Degrees of Mean F p-value

Variation Squares Freedom Square

Regression 5,415,000.00 1 5,415,000 92.83 .0006

Error 233,333.33 4 58,333.33

Total 5,648,333.33 5

Using the F table (1 degree of freedom numerator and 4 denominator), the p-value

is less than .01.

Using Excel, the p-value corresponding to F = 92.83 is .0006.

Because p-value , we reject H0: β1 = 0. Production volume and total cost are

related.

30. SSE = 1043.03 SST = = 10,568

Thus,

SSR = SST – SSE = 10,568 – 1043.03 = 9524.97

s2 = MSE = SSE/(n-2) = 1043.03/4 = 260.7575

= 56.655

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Using t table (4 degrees of freedom), area in tail is less than .005.

p-value is less than .01.

Using Excel, the p-value corresponding to t = 6.045 is .004.

Because p-value , we reject H0: β1 = 0.

There is a significant relationship between cars in service and annual revenue.

31. SST = 52,120,800 SSE = 7,102,922.54

SSR = SST – SSE = 52,120,800 – 7,102,922.54 = 45,017,877

MSR = SSR/1 = 45,017,877

MSE = SSE/(n – 2) = 7,102,922.54/8 = 887,865.3

F = MSR / MSE = 45,017,877/887,865.3 = 50.7

Using the F table (1 degree of freedom numerator and 8 denominator), the p-value

is less than .01.

Using Excel, the p-value corresponding to F = 50.7 is .000.

Because p-value , we reject H0: β1 = 0

Weight and price are related.

32. a. s = 2.033

b. = .2 + 2.6 = .2 + 2.6(4) = 10.6

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
10.6 + 3.182 (1.11) = 10.6 + 3.53

or 7.07 to 14.13.

c.

d.

10.6 + 3.182 (2.32) = 10.6 + 7.38

or 3.22 to 17.98.

33. a. s = 8.7560

b.

44 + 3.182 (4.3780) = 44 + 13.93

or 30.07 to 57.93.

c.

d.

44 + 3.182(9.7895) = 44 + 31.15

or 12.85 to 75.15.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
34. s = 6.5141

18.40 + 3.182(3.0627) = 18.40 + 9.75

or 8.65 to 28.15.

18.40 + 3.182(7.1982) = 18.40 + 22.90

or -4.50 to 41.30.

The two intervals are different because there is more variability associated with

predicting an individual value than there is a mean value.

35. a.

b.

30.8 + 2.306 (4.11) = 30.8 + 9.48

or 21.32 to 40.28.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.

30.8 + 2.306 (10.55) = 30.8 + 24.33

or 6.47 to 55.13.

d. As expected, the prediction interval is much wider than the confidence interval. This

is because it is more difficult to predict the waiting time for an individual customer

arriving with three people in line than it is to estimate the mean waiting time for a

customer arriving with three people in line.

36. a.

116 + 2.306(1.6503) = 116 + 3.8056

or 112.19 to 119.81 ($112,190 to $119,810).

b.

116 + 2.306(4.8963) = 116 + 11.2909

or 104.71 to 127.29 ($104,710 to $127,290).

c. As expected, the prediction interval is much wider than the confidence interval. This

is because it is more difficult to predict annual sales for one new salesperson with

nine years of experience than it is to estimate the mean annual sales for all

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
salespersons with nine years of experience.

37. a.

s2 = 1.88 s = 1.37

= 4.68 + 0.16 = 4.68 + 0.16(52.5) = 13.08

13.08 + 2.571 (.52) = 13.08 + 1.34

or 11.74 to 14.42 or $11,740 to $14,420.

b. = 1.47

13.08 + 2.571 (1.47) = 13.08 + 3.78

or 9.30 to 16.86 or $9,300 to $16,860.

c. Yes, $20,400 is much larger than anticipated.

d. Any deductions exceeding the $16,860 upper limit could suggest an audit.

38. a. = 1,246.67 + 7.6(500) = $5,046.67

b.

s2 = MSE = 58,333.33 s = 241.52

5,046.67 + 4.604 (267.50) = 5,046.67 + 1231.57

or $3,815.10 to $6,278.24.

c. Based on one month, $6,000 is not out of line because $3,815.10 to $6,278.24 is the

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
prediction interval. However, a sequence of five to seven months with consistently

high costs should cause concern.

39. a. With = 89,

b. s2 = MSE = SSE/(n – 2) = 1,541.4/7 = 220.2

109.46 + 2.365(6.1819) = 109.46 + 14.6202

or $94.84 to $124.08.

c.

149.77 + 2.365(16.525) = 149.77 + 39.08

or $110.69 to $188.85.

40. a. 9

b. = 20.0 + 7.21x

c. 1.3626

d. SSE = SST – SSR = 51,984.1 – 41,587.3 = 10,396.8

MSE = 10,396.8/7 = 1,485.3

F = MSR / MSE = 41,587.3 /1,485.3 = 28.00

Using F table (1 degree of freedom numerator and 7 denominator), p-

value is less than .01.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Using Excel, the p-value corresponding to F = 28.00 is .0011.

Because p-value = .05, we reject H0: β1 = 0.

Selling price is related to annual gross rents.

e. = 20.0 + 7.21(50) = 380.5 or $380,500

41. a. = 6.1092 + .8951x

b.

Using the t table (8 degrees of freedom), the area in the tail is less

than .005.

p-value is less than .01.

Using Excel, the p-value corresponding to t = 6.01 is .0003.

Because p-value = .05, we reject H0: B1 = 0.

Maintenance expense is related to usage.

c. = 6.1092 + .8951(25) = 28.49 or $28.49 per month.

42. a. = 80.0 + 50.0x

b. 30
c. F = MSR / MSE = 6828.6/82.1 = 83.17

Using the F table (1 degree of freedom numerator and 28 denominator),

the p-value is less than .01.

Using Excel, the p-value corresponding to F = 83.17 is .000.

Because p-value < = .05, we reject H0: B1 = 0.

Annual sales is related to the number of salespersons.

d. = 80 + 50 (12) = 680 or $680,000

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
43. a.

800

700

600

500
Total Batch Time

400

300

200

100

0
0 100 200 300 400 500 600 700

Quantity

b. There appears to be a positive linear relationship between the two variables.

c. The Excel output follows:

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
= .7631(Quantity) + 117.5419

The intercept is the estimate for the setup time (117.5 minutes), and the slope (.7631

minutes) is the production time per unit.

d. Significant relationship: p-value = 0.000 < α = .05.

e. = .6689; a good fit.

44. a. Scatter diagram:

1000
900
800
700
600
Price ($)

500
400
300
200
100
0
45 50 55 60 65 70
Weight (oz)

b. There appears to be a negative linear relationship between the two variables. The

heavier helmets tend to be less expensive.

c. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 462761 462761 54.90 0.000

Error 16 134865 8429

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Total 17 597626

Model Summary

S R-sq R-sq(adj)

91.8098 77.43% 76.02%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 2044 226 9.03 0.000

Weight -28.35 3.83 -7.41 0.000 1.00

Regression Equation

Price = 2044 - 28.35 Weight

Fits and Diagnostics for Unusual Observations

Obs Price Fit Resid Std Resid

7 900.0 655.2 244.8 3.03 R

R Large residual

d. Significant relationship: p-value = .000 <  = .05.

e. r2 = 0.774; a good fit.

45. a.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. The residuals are 3.48, –2.47, –4.83, –1.6, and 5.22.

c.

Residuals 2

-2

-4

-6
4 6 8 10 12 14 16 18 20 22

With only five observations, it is difficult to determine if the assumptions are satisfied.

However, the plot does suggest curvature in the residuals that would indicate that the

error term assumptions are not satisfied. The scatter diagram for these data also indicates

that the underlying relationship between x and y may be curvilinear.

d.

The standardized residuals are 1.32, –.59, –1.11, –.40, 1.49.

e. The standardized residual plot has the same shape as the original residual plot. The

curvature observed indicates that the assumptions regarding the error term may not be

satisfied.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
46. a.

b.

Residuals
0

-1

-2

-3

-4
1 2 3 4 5 6 7 8 9 10

The assumption that the variance is the same for all values of x is

questionable. The variance appears to increase for larger values of x.

47. a. Let x = advertising expenditures and y = revenue.

b. SST = 1002 SSE = 310.28 SSR = 691.72

MSR = SSR / 1 = 691.72

MSE = SSE / (n – 2) = 310.28/ 5 = 62.0554

F = MSR / MSE = 691.72/ 62.0554= 11.15

Using the F table (1 degree of freedom numerator and 5 denominator), the

p-value is between .01 and .025.

Using Excel, the p-value corresponding to F = 11.15 is .0206.

Because p-value = .05, we conclude that the two variables are related.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.

10

0
Residuals

-5

-10

-15
25 30 35 40 45 50 55 60 65

Predicted Values

d. The residual plot leads us to question the assumption of a linear relationship between

x and y. Even though the relationship is significant at the .05 level of significance, it

would be extremely dangerous to extrapolate beyond the range of the data.

48. a.

2
Residuals

-2

-4

-6

-8
0 2 4 6 8 10 12 14

b. The assumptions concerning the error term appear reasonable.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
49. a. A portion of the output follows.

Regression Statistics

Multiple R 0.8092

R Square 0.6549

Adjusted R Square 0.6490

Standard Error 7.3131

Observations 61.0000

ANOVA

df SS MS F Significance F

Regression 1 5,987.2570 5,987.2570 111.9497 2.97466E-15

Residual 59 3,155.4186 53.4817

Total 60 9,142.6756

Coefficients Standard Error t Stat P-value

Intercept 47.0160 1.4216 33.0729 8.78773E-40

Density 0.0024 0.0002 10.5806 2.97466E-15

= 47.016 + .0024 (Density)

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b.

30

25

20

Residual 15

10

0
0.00 5,000.00 10,000.00 15,000.00 20,000.00 25,000.00 30,000.00
-5

-10

-15
Density

c. The residual plot leads us to question the assumption of constant variance of the error

terms. The plot indicates that the absolute value of the residuals is larger for larger

values of the independent variable (see panel B in Figure 14.16).

50. a. The output follows:

The standardized residuals are calculated as follows (as in Table14.8):

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The first data point (y= 145, x=135) has a large standardized residual.

b.

The standardized residual plot indicates that the observation x = 135, y = 145 may be an

outlier; note that this observation has a standardized residual of 2.11.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c. The scatter diagram follows.

150

145

140

135

130

125
y

120

115

110

105

100
100 110 120 130 140 150 160 170 180

The scatter diagram also indicates that the observation x = 135, y = 145 may be an

outlier; the implication is that for simple linear regression an outlier can be identified by

looking at the scatter diagram.

51. a. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 40.779 40.779 4.03 0.091

Error 6 60.721 10.120

Total 7 101.500

Model Summary

S R-sq R-sq(adj)

3.18123 40.18% 30.21%

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 13.00 2.40 5.43 0.002

x 0.425 0.212 2.01 0.091 1.00

Regression Equation

y = 13.00 + 0.425 x

Fits and Diagnostics for Unusual Observations

Obs y Fit Resid Std Resid

7 24.00 18.10 5.90 2.00 R

8 19.00 22.35 -3.35 -2.16 R X

R Large residual

X Unusual X

The standardized residuals are: –1.00, –.40, .01, –.48, .25, .65, 2.00, –2.16.

The last two observations in the data set appear to be outliers because the

standardized residuals for these observations are 2.00 and –2.16, respectively.

b. Using statistical software package, we obtained the following leverage values:

.28, .24, .16, .14, .13, .14, .14, .76

Here we identify an observation as having high leverage if hi > 6/n; for these data, 6/n =

6/8 = .75. Because the leverage for the observation x = 22, y = 19 is .76, we would

identify observation 8 as a high leverage point. Thus, we conclude that observation 8 is

an influential observation.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.

30

25

20

15
y

10

0
0 5 10 15 20 25

The scatter diagram indicates that the observation x = 22, y = 19 is an influential

observation.

52. a.

120

100

80
Program Expenses ($)

60

40

20

0
0 5 10 15 20 25
Fund-raising Expenses (%)

The scatter diagram does indicate potential influential observations. For example, the

22.2% fund-raising expense for the American Cancer Society and the 16.9% fund-raising

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
expense for the St. Jude Children’s Research Hospital look like they may each have a

large influence on the slope of the estimated regression line. And with a fund-raising

expense of 2.6%, the percentage spent on programs and services by the Smithsonian

Institution (73.7%) seems to be somewhat lower than would be expected; thus, this

observeraton may need to be considered a possible outlier.

b. A portion of the output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 408.4 408.35 7.31 0.027

Error 8 446.9 55.86

Total 9 855.2

Model Summary

S R-sq R-sq(adj)

7.47387 47.75% 41.22%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 90.98 3.18 28.64 0.000

Fundraising Expenses (%) -0.917 0.339 -2.70 0.027 1.00

Regression Equation

Program Expenses (%) = 90.98 - 0.917 Fundraising Expenses (%)

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Fits and Diagnostics for Unusual Observations

Obs Program Expenses(%) Fit Resid Std Resid

3 73.70 88.60 -14.90 -2.13 R

5 71.60 70.62 0.98 0.21 X

R Large residual

X Unusual X

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large leverage.

c. The slope of the estimated regression equation is –0.917. Thus, for every 1% increase

in the amount spent on fund-raising, the percentage spent on program expenses will

decrease by .917%; in other words, just a little less than 1%. The negative slope and

value seem to make sense in the context of this problem situation.

d. The output in part b indicates that there are two unusual observations:

• Observation 3 (Smithsonian Institution) is an outlier because it has a large

standardized residual.

• Observation 5 (American Cancer Society) is an influential observation because it

has high leverage.

Although fund-raising expenses for the Smithsonian Institution are on the

low side compared to most of the other supersized charities, the percentage spent

on program expenses appears to be much lower than one would expect. It appears

that the Smithsonian’s administrative expenses are too high. But thinking about

the expenses of running a large museum like the Smithsonian, the percentage

spent on administrative expenses may not be unreasonable; in general, operating

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
costs for a museum are higher than for some other types of organizations. The

especially large value of fund-raising expenses for the American Cancer Society

suggests that this obervation has a large influence on the estimated regression

equation. The following output shows the results if this observation is deleted

from the original data.

The regression equation is:

Program Expenses (%) = 91.3 - 1.00 Fundraising Expenses (%)

Predictor Coef SE Coef T P

Constant 91.256 3.654 24.98 0.000

Fundraising Expenses (%) -1.0026 0.5590 -1.79 0.116

S = 7.96708 R-Sq = 31.5% R-Sq(adj) = 21.7%

The y-intercept has changed slightly, but the slope has changed from –.917

to –1.00.

53. a.

60

50
Shopping Time (Minutes)

40

30

20

10

0
0 20 40 60 80 100 120 140
Arrival Time (Minutes before 6:00 p.m.)

b. There appears to be a positive relationship between the two variables, but observation

32 (111, 24) appears to be an outlier.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c.

The regression equation is: y = 14.2765 + .2543x

d. In part c, we have highlighted observation 32, which has a standardized residual less

than –2, indicating it is an outlier.

e. Looking at the scatter diagram in part a, observation 32 probably will have a lot of

influence on the estimated regression equation. To investigate this, we can simply

drop the observation from the data set and fit a new estimated regression equation.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The output we obtained follows.

Note that the slope of the estimated regression equation is now .2829 as compared to a

value of .2543 when this observation is included. Thus, we see that this observation has

an impact on the value of the slope of the fitted line and hence we would say that it is an

influential observation. Also note that the R2 value has increased from .65 to .75,

indicating that observation 32 was an outlier. If possible, Kroger should analyze the

circumstances behind observation 32 and determine if it corresponds to a situation with

unusual circumstances. If observation 32 corresponds to a situation that is indeed

dissimilar from the other observations (or if this observation suffers from data recording

error), then removing it from the analysis may be reasonable. Otherwise, Kroger should

be wary of removing the observation (and thereby inflating the accuracy of the model)

but instead investigate the possibility of additional data collection to reduce its influence.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
54. a.

2,500

Value ($ millions) 2,000

1,500

1,000

500

0
150 200 250 300 350 400 450 500

Revenue ($ millions)

The scatter diagram does indicate potential outliers or influential observations (or both).

For example, the New York Yankees have both the highest revenue and value and appear

to be an influential observation. The Los Angeles Dodgers have the second highest value

and appear to be an outlier.

b. A portion of the output follows:

Regression Statistics

Multiple R 0.9062

R Square 0.8211

Adjusted R Square 0.8148

Standard Error 165.6581

Observations 30

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
ANOVA

df SS MS F Significance F

Regression 1 3527616.598 3527616.6 128.5453 5.616E-12

Residual 28 768392.7687 27442.599

Total 29 4296009.367

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept –601.4814 122.4288 –4.9129 3.519E-05 –852.2655 –350.6973

Revenue ($ millions) 5.9271 0.5228 11.3378 5.616E-12 4.8562 6.9979

Thus, the estimated regression equation that can be used to predict the team’s value given the value of annual

revenue is = –601.4814 + 5.9271 revenue.

c. The standard residual value for the Los Angeles Dodgers is 4.7 and should be treated as an outlier. To determine if the New

York Yankees point is an influential observation, we can remove the observation and compute a new estimated regression

equation. The results show that the estimated regresssion equation is = –449.061 + 5.2122 revenue. The following two

scatter diagrams illustrate the small change in the estimated regression equation after removing the observation for the New

York Yankees. These diagrams show that the effect of the New York Yankees observation on the regression results is not

that dramatic.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Scatter Diagram Including the New York Yankees Observation

Scatter Diagram Excluding the New York Yankees Observation

55. No. Regression or correlation analysis can never prove that two variables are causally

related.

56. The estimate of a mean value is an estimate of the average of all y values associated with

the same x. The estimate of an individual y value is an estimate of only one of the y

values associated with a particular x.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
57. The purpose of testing whether is to determine whether or not there is a significant

relationship between x and y. However, rejecting does not necessarily imply a good

fit. For example, if is rejected and r2 is low, there is a statistically significant

relationship between x and y but the fit is not very good.

58. a.

1450

1400

1350
S&P 500

1300

1250

1200
12300 12400 12500 12600 12700 12800 12900 13000 13100 13200 13300
DJIA

b. A portion of the output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 22146 22145 6 239.89 0.000

Error 13 1200 92.3

Total 14 23346

Model Summary

S R-sq R-sq(adj)

9.60811 94.86% 94.46%

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant -669 131 -5.12 0.000

DJIA 0.1573 0.0102 15.49 0.000 1.00

Regression Equation

S&P = -669 + 0.1573 DJIA

c. Using the F test, the p-value corresponding to F = 239.89 is .000. Because the p-value

=.05, we reject ; there is a significant relationship.

d. With R-Sq = 94.9%, the estimated regression equation provided an excellent fit.

e.

f. The DJIA is not that far beyond the range of the data. With the excellent fit provided

by the estimated regression equation, we should not be too concerned about using the

estimated regression equation to predict the S&P 500.

59. a.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
350.0

300.0

250.0

Selling Price ($1,000s)


200.0

150.0

100.0

50.0

0.0
0.50 1.00 1.50 2.00 2.50 3.00 3.50

Size (1,000's sq. ft.)

The scatter diagram suggests that there is a linear relationship between size and

selling price and that selling price increases as size increases.

b. The output appears as follows.

The estimated regression equation is: = –59.016 + 115.091x.

c. Significant relationship: p-value = .000 <  = .05.

d. = –59.016 + 115.091(square feet) = –59.016 + 115.091(2.0) = 171.166 or

approximately $171,166.

e. The estimated regression equation should provide a good estimate because r2 = 0.897.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
f. This estimated equation might not work well for other cities. Housing markets are

also driven by other factors that influence demand for housing such as job market and

quality-of-life factors. For example, because of the existence of high-tech jobs and its

proximity to the ocean, Seattle, Washington, has houses that are quite different from

those in Winston- Salem, North Carolina.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
60. a.

The scatter diagram indicates a positive linear relationship between the two variables.

Online universities with higher retention rates tend to have higher graduation rates.

b. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 1224.3 1224.29 22.02 0.000

Error 27 1501.0 55.59

Total 28 2725.3

Model Summary

S R-sq R-sq(adj)

7.45610 44.92% 42.88%

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 25.42 3.75 6.79 0.000

RR(%) 0.2845 0.0606 4.69 0.000 1.00

Regression Equation

GR(%) = 25.42 + 0.2845 RR(%)

Fits and Diagnostics for Unusual Observations

Obs GR(%) Fit Resid Std Resid

2 25.00 39.93 -14.93 -2.04 R

3 28.00 26.56 1.44 0.22 X

R Large residual

X Unusual X

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large leverage.

c. Because the p-value = .000 < α =.05, the relationship is significant.

d. The estimated regression equation is able to explain 44.9% of the variability in the

graduation rate based on the linear relationship with the retention rate. It is not a great

fit, but the fit is reasonably good given the type of data.

e. In the output in part b, South University is identified as an observation with a large

standardized residual. With a retention rate of 51% it does appear that the graduation

rate of 25% is low compared to the results for other online universities. The president

of South University should be concerned after looking at the data. Using the

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
estimated regression equation, we estimate that the graduation rate at South

University should be 25.4 + .285(51) = 40%.

f. In the output in part b, the University of Phoenix is identified as an observation whose

x value gives it large influence. With a retention rate of only 4%, the president of the

University of Phoenix should be concerned after looking at the data.

61. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 860.1 860.05 47.62 0.000

Error 8 144.5 18.06

Total 9 1004.5

Model Summary

S R-sq R-sq(adj)

4.24962 85.62% 83.82%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 10.53 3.74 2.81 0.023

Usage 0.953 0.138 6.90 0.000 1.00

Regression Equation

Expense = 10.53 + 0.953 Usage

Variable Setting

Usage 30

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Fit SE Fit 95% CI 95% PI

39.1312 1.49251 (35.6894, 42.5729) (28.7447, 49.5176)

a. = 10.53 + .953 Usage

b. Because the p-value corresponding to F = 47.62 = .000 < α = .05, we reject

H0: 1 = 0.

c. The 95% prediction interval is 28.74 to 49.52 or $2,874 to $4,952.

d. Yes, since the expected expense is = 10.53 + .953(30) = 39.12 or $3,912.

62. a. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 25.130 25.130 11.33 0.028

Error 4 8.870 2.217

Total 5 34.000

Model Summary

S R-sq R-sq(adj)

1.48909 73.91% 67.39%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 22.17 1.65 13.42 0.000

Speed -0.1478 0.0439 -3.37 0.028 1.00

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Equation

Defects = 22.17 - 0.1478 Speed

Variable Setting

Speed 50

Fit SE Fit 95% CI 95% PI

14.7826 0.896327 (12.2940, 17.2712) (9.95703, 19.6082)

b. Because the p-value corresponding to F = 11.33 = .028 <  = .05, the relationship is

significant.

c. = .739; a good fit. The least squares line explained 73.9% of the variability in the

number of defects.

d. Using the output in part a, the 95% confidence interval is 12.294 to 17.2712.

63. a.

9
8
7
6
5
Days

4
3
2
1
0
0 2 4 6 8 10 12 14 16 18 20

Distance

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
There appears to be a negative linear relationship between distance to work and

number of days absent.

b. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 32.699 32.699 19.67 0.002

Error 8 13.301 1.663

Total 9 46.000

Model Summary

S R-sq R-sq(adj)

1.28941 71.09% 67.47%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 8.098 0.809 10.01 0.000

Distance -0.3442 0.0776 -4.43 0.002 1.00

Regression Equation

Days = 8.098 - 0.3442 Distance

Variable Setting

Distance 5

Fit SE Fit 95% CI 95% PI

6.37681 0.512485 (5.19502, 7.55860) (3.17717, 9.57646)

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c. Since the p-value corresponding to F = 19.67 is .002 < α = .05. We reject H0 : 1 = 0.

There is a significant relationship between the number of days absent and the

distance to work.

d. r2 = .711. The estimated regression equation explained 71.1% of the variability in y;

this is a reasonably good fit.

e. The 95% confidence interval is 5.19502 to 7.5586 or approximately 5.2 to 7.6 days.

64. a. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 312050 312050 54.75 0.000

Error 8 45600 5700

Total 9 357650

Model Summary

S R-sq R-sq(adj)

75.4983 87.25% 85.66%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 220.0 58.5 3.76 0.006

Age 131.7 17.8 7.40 0.000 1.00

Regression Equation

Cost = 220.0 + 131.7 Age

Variable Setting

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Age 4

Fit SE Fit 95% CI 95% PI

746.667 29.7769 (678.001, 815.332) (559.515, 933.818)

b. Because the p-value corresponding to F = 54.75 is .000 < α = .05, we reject H0: 1 =

0. Maintenance cost and age of bus are related.

c. r2 = .873. The least squares line provided a good fit.

d. The 95% prediction interval is 559.515 to 933.818 or $559.52 to $933.82

65. a. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 3249.7 3249.72 57.42 0.000

Error 8 452.8 56.60

Total 9 3702.5

Model Summary

S R-sq R-sq(adj)

7.52312 87.77% 86.24%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 5.85 7.97 0.73 0.484

Hours 0.830 0.109 7.58 0.000 1.00

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Equation

Points = 5.85 + 0.830 Hours

Variable Setting

Hours 95

Fit SE Fit 95% CI 95% PI

84.6533 3.66780 (76.1953, 93.1112) (65.3529, 103.954)

b. Because the p-value corresponding to F = 57.42 is .000 < α = .05, we reject H0: 1 =

0. Total points earned is related to the hours spent studying.

c. 84.65 points

d. The 95% prediction interval is 65.3529 to 103.954

66. a. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 50.26 50.255 7.08 0.029

Error 8 56.78 7.098

Total 9 107.04

Model Summary

S R-sq R-sq(adj)

2.66413 46.95% 40.32%

Coefficients

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Term Coef SE Coef T-Value P-Value VIF

Constant 0.275 0.900 0.31 0.768

S&P 500 0.950 0.357 2.66 0.029 1.00

Regression Equation

Horizon = 0.275 + 0.950 S&P 500

The market beta for Horizon is b1 = .95

b. Because the p-value = 0.029 is less than α = .05, the relationship is significant.

c. r2 = .470. The least squares line does not provide an especially good fit.

d. Xerox has higher risk with a market beta of 1.22.

67. a. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 0.2175 0.21749 4.99 0.038

Error 18 0.7845 0.04358

Total 19 1.0020

Model Summary

S R-sq R-sq(adj)

0.208768 21.71% 17.36%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant -0.471 0.584 -0.81 0.431

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Adjusted_Gross Income 0.000039 0.000017 2.23 0.038 1.00

Regression Equation

Percent_Audited = -0.471 + 0.000039 Adjusted_Gross Income

Variable Setting

Adjusted_Gross Income 35000

Fit SE Fit 95% CI 95% PI

0.882770 0.0523186 (0.772853, 0.992687) (0.430602, 1.33494)

b. Because the p-value = 0.038 is less than α = .05, the relationship is significant.

c. r2 = .217. The least squares line does not provide a good fit.

d. The 95% confidence interval is .772853 to .992687.

68. a.

18.0

16.0

14.0
Price ($1000s)

12.0

10.0

8.0

6.0

4.0
0 20 40 60 80 100 120
Miles (1000s)

b. There appears to be a negative relationship between the two variables that can be

approximated by a straight line. An argument could also be made that the relationship

is perhaps curvilinear because at some point a car has so many miles that its value

becomes very small.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 47.158 47.158 19.85 0.000

Error 17 40.389 2.376

Total 18 87.547

Model Summary

S R-sq R-sq(adj)

1.54138 53.87% 51.15%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 16.470 0.949 17.36 0.000

Miles (1000s) -0.0588 0.0132 -4.46 0.000 1.00

Regression Equation

Price ($1000s) = 16.470 - 0.0588 Miles (1000s)

d. Significant relationship: p-value = 0.000 < α = .05.

e. = .5387; a reasonably good fit considering that the condition of the car is also an

important factor in what the price is.

f. The slope of the estimated regression equation is –.0558. Thus, a one-unit increase in

the value of x coincides with a decrease in the value of y equal to .0558. Because the

data were recorded in thousands, every additional 1,000 miles on the car’s odometer

will result in a $55.80 decrease in the predicted price.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
g. The predicted price for a 2007 Camry with 60,000 miles is = 16.47 –.0588(60) =

12.942 or $12,942. Because of other factors such as condition and whether the seller

is a private party or a dealer, this is probably not the price you would offer for the car.

But it should be a good starting point in figuring out what to offer the seller.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Case Solutions

Case Problem 1 Measuring Stock Market Risk

a. Selected descriptive statistics follow:

Variable N Mean StDev Minimum Median Maximum

Microsoft 36 0.00503 0.04537 -0.08201 0.00400 0.08883

Exxon Mobil 36 0.01664 0.05534 -0.11646 0.01279 0.23217

Caterpillar 36 0.03010 0.06860 -0.10060 0.04080 0.21850

Johnson & Johnson 36 0.00530 0.03487 -0.05917 -0.00148 0.10334

McDonald’s 36 0.02450 0.06810 -0.11440 0.03700 0.18260

Sandisk 36 0.06930 0.19540 -0.28330 0.07410 0.50170

Qualcomm 36 0.02840 0.08620 -0.12170 0.03870 0.21060

Procter & Gamble 36 0.01059 0.03707 -0.05365 0.01333 0.08783

S&P 500 36 0.01010 0.02633 -0.03429 0.01034 0.08104

From the descriptive statistics we see that six companies had a higher mean monthly return

than the market (as measured by the S&P 500): Exxon Mobil, Caterpillar, McDonald’s,

Sandisk, Qualcomm, and Procter & Gamble. Microsoft and Johnson & Johnson had lower

mean monthly returns.

Using the standard deviation as a measure of volatility, Sandisk was the most volatile

stock with a standard deviation of .1954. The stocks of Johnson & Johnson and P & G

exhibit less volatility than the other individual stocks, but all of the individual stocks are

more volatile than the market as a whole. The diversification embodied in the S&P 500

reduces its volatility.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
b. The estimated regression equation relating each of the individual stocks to the S&P 500 is

shown below. The value of for each equation is also shown.

Microsoft = 0.00040 + 0.458 S&P 500 R-Sq = 7.1%

Exxon Mobil = 0.00926 + 0.731 S&P 500 R-Sq = 12.1%

Caterpillar = 0.015000 + 1.493 S&P 500 R-Sq = 32.9%

Johnson & Johnson = 0.00521 + 0.009 S&P 500 R-Sq = 0.0%

McDonald’s = 0.00930 + 1.503 S&P 500 R-Sq = 33.8%

Sandisk = 0.04300 + 2.600 S&P 500 R-Sq = 12.3%

Qualcomm = 0.01410 + 1.414 S&P 500 R-Sq = 18.7%

Procter & Gamble = 0.00548 + 0.507 S&P 500 R-Sq = 12.9%

The betas (slope of estimated regression equation) for the individual stocks can be obtained

from the regression output.

Company Beta

Microsoft .458

Exxon Mobil .731

Caterpillar 1.493

Johnson & Johnson .009

McDonald’s 1.503

Sandisk 2.600

Qualcomm 1.414

The beta for the market as a whole is 1. So any stock with a beta greater than 1 will move up

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
faster than the market when the market goes up. Any stock with a beta less than 1 will not go

down as fast as the market in periods where the market declines.

We would expect Sandisk, with a beta of 2.6, to benefit most from an up market.

Johnson & Johnson, with a beta of .009 is least affected by the market. The effect of the

market going down cannot be expected to exert much downward pressure on shares of

Johnson & Johnson.

c. The values seem to indicate that from 0% to 33.8% of the variability of the returns in

these individual stocks is explained by the return for the market.

Case Problem 2 U.S. Department of Transportation

Descriptive statistics for these data are shown below:

N MEAN MEDIAN TRMEAN STDEV SEMEAN

PERCENT 42 12.262 12.000 12.184 3.132 0.483

FATAL 42 1.922 1.881 1.906 1.071 0.165

MIN MAX Q1 Q3

PERCENT 8.000 18.000 9.000 15.000

FATAL 0.039 4.100 0.992 2.824

The following scatter diagram suggests a linear relationship between these two variables:

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
4.5
4
3.5
3

Fatal Accidents
2.5
2
1.5
1
0.5
0
6 8 10 12 14 16 18 20

Percent Under 21

We have the following regression analysis output:

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 33.134 33.1344 95.40 0.000

Percent_Under 21 1 33.134 33.1344 95.40 0.000

Error 40 13.893 0.3473

Lack-of-Fit 9 2.189 0.2432 0.64 0.751

Pure Error 31 11.704 0.3776

Total 41 47.028

Model Summary

S R-sq R-sq(adj) R-sq(pred)

0.589350 70.46% 69.72% 67.64%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Constant -1.597 0.372 -4.30 0.000

Percent_Under 21 0.2871 0.0294 9.77 0.000 1.00

Regression Equation

Fatal Accidents_per 1000 = -1.597 + 0.2871 Percent_Under 21

Fits and Diagnostics for Unusual Observations

Obs Fatal Accidents_per 1000 Fit Resid Std Resid

15 0.039 1.273 -1.234 -2.13 R

23 2.190 0.699 1.491 2.62 R

R Large residual

There is a significant relationship between the two variables. Two observations are identified as

having a large standardized residual and should be treated as possible outliers; the following

standardized residual plot does not indicate any other problems with the residuals.

Conclusion: The number of fatal accidents per 1000 licenses appear to be linearly related

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
to the percentage of licensed drivers under the age of 21; that is, the higher the percentage of

drivers under 21, the larger the number of total accidents.

Case Problem 3 Selecting a Point and Shoot Digital Camera

1. Descriptive statistics for the data set follow.

Price ($) Megapixels Weight (oz.) Score

Mean 175.36 12.86 5.82 56.36

Standard Error 15.65 0.35 0.19 1.27

Median 160 12 6 56.5

Mode 200 12 5 66

Standard Deviation 82.80 1.84 0.98 6.70

Sample Variance 6855.42 3.39 0.97 44.83

Kurtosis 0.66 -0.63 –1.19 –0.62

Skewness 1.06 0.23 –0.12 –0.43

Range 320 6 3 24

Minimum 80 10 4 42

Maximum 400 16 7 66

Sum 4910 360 163 1578

Count 28 28 28 28

The sample correlation coefficients for this data set follow.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Score Price ($) Megapixels

Price ($) 0.683

0.000

Megapixels -0.008 0.139

0.969 0.481

Weight (oz.) 0.286 0.349 -0.199

0.141 0.069 0.310

With a sample correlation coefficient of .683, price appears to be the best predictor of the

overall score.

2. Scatter diagrams for the data follow.

There appears to be a positive relationship between the price of the camera and the

overall score. But, observation 17, a Nikon camera with a price of $400, appears to be an
70

65

60

55
Score

50

45

40

35

30
50 100 150 200 250 300 350 400 450
Price ($)

observation that will have a significant impact when we fit a linear model to these data. It

may be worth considering restricting the analysis to cameras that have a price of less than

$400. Another possible explanation for what we observe here is that the underlying

relationship may not be linear. In other words, the somewhat curvilinear trend in the data

may be due to diminishing returns.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The number of megapixels does not appear to have much effect on the overall
score, but note that as the number of megapixels increase from 10 to 14, the overall score
appears to have a downward trend—that is, the overall score is decreasing. This seems to
be counterintuitive in that generally speaking, higher megapixel cameras are usually
70

65

60

55

50
Score

45

40

35

30
8 9 10 11 12 13 14 15 16 17
Megapixels

considered to have better picture quality. But, the overall score for the 16 megapixel
cameras does increase somewhat.

There may be a modest increase in overall score for cameras that weigh more. Also note

the large variability in the score for cameras with a weight of 5 ounces and cameras with

a weight of 7 ounces. The pattern in the data may also be an indication that the effect of

weight may also involve some curvilinear effect.

70

65

60

55

50
Score

45

40

35

30
© 2019 Cengage Learning.3 All Rights3.5 4 not be4.5
Reserved. May 5 or duplicated,
scanned, copied 5.5 or posted
6 to a publicly
6.5 accessible
7 website,
7.5in whole or in part.
Weight (oz.)
Conclusion: The variable that appears to be the best predictor of overall score is

the price of the camera.

3. A portion of the output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 565.0 565.00 22.76 0.000

Price ($) 1 565.0 565.00 22.76 0.000

Error 26 645.4 24.82

Lack-of-Fit 13 293.4 22.57 0.83 0.626

Pure Error 13 352.0 27.08

Total 27 1210.4

Model Summary

S R-sq R-sq(adj) R-sq(pred)

4.98238 46.68% 44.63% 33.06%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 46.67 2.24 20.85 0.000

Price ($) 0.0552 0.0116 4.77 0.000 1.00

Regression Equation

Score = 46.67 + 0.0552 Price ($)

Fits and Diagnostics for Unusual Observations

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Obs Score Fit Resid Std Resid

17 59.00 68.77 -9.77 -2.36 R X

28 42.00 53.85 -11.85 -2.44 R

R Large residual

X Unusual X

With a p-value = .000, price is a significant factor in predicting the overall score. The estimated

regression equation explained 46.7% of the variability in the overall score. Note two unusual

observations: observations 17 and 28. But observation 17 is listed as being an observation with a

large leverage and thus is considered an influential observation. To confirm this conclusion, the

following regression output show the results after removing observation 17 from the data.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 695.7 695.72 34.27 0.000

Price ($) 1 695.7 695.72 34.27 0.000

Error 25 507.5 20.30

Lack-of-Fit 12 155.5 12.96 0.48 0.894

Pure Error 13 352.0 27.08

Total 26 1203.2

Model Summary

S R-sq R-sq(adj) R-sq(pred)

4.50538 57.82% 56.14% 52.46%

Coefficients

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Term Coef SE Coef T-Value P-Value VIF

Constant 44.17 2.24 19.72 0.000

Price ($) 0.0724 0.0124 5.85 0.000 1.00

Regression Equation

Score = 44.17 + 0.0724 Price ($)

Fits and Diagnostics for Unusual Observations

Obs Score Fit Resid Std Resid

1 66.00 68.06 -2.06 -0.52 X

27 42.00 53.58 -11.58 -2.63 R

R Large residual

X Unusual X

Note that the slope of the estimated line without observation 17 is .0724 as compared to the slope

of .0552 with observation 17. And the fit has also improved.

Are we justified in simply discarding observation 17 just because it is influential and

provides a better fit? No. But if we are interested in only exploring the relationship between price

and overall score for cameras that cost less than $400, then removing observation 17 from the

data set is acceptable.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
4. Using only the data for the Canon cameras, the scatter diagram using the price of the

camera as the independent variable follows.

There does appear to be a relationship between the price of the camera and the overall

score. But, the relationship appears to be curvilinear. However, using simple linear

regression for these data we obtain the following output.

70

65

60

55

50
Score

45

40

35

30
50 100 150 200 250 300 350
Price ($)

Model Summary

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 311.66 311.660 23.80 0.000

Price ($) 1 311.66 311.660 23.80 0.000

Error 11 144.03 13.094

Lack-of-Fit 6 104.78 17.464 2.22 0.199

Pure Error 5 39.25 7.850

Total 12 455.69

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
S R-sq R-sq(adj) R-sq(pred)

3.61854 68.39% 65.52% 53.13%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 47.29 2.57 18.38 0.000

Price ($) 0.0665 0.0136 4.88 0.000 1.00

Regression Equation

Score = 47.29 + 0.0665 Price ($)

Fits and Diagnostics for Unusual Observations

Obs Score Fit Resid Std Resid

13 46.00 53.27 -7.27 -2.21 R

R Large residual

The estimated regression equation is significant and explains 68.39% of the variability in

the overall score using the price of the camera, but the curvilinear relationship we

observed in the scatter diagram is still a concern. If we are willing to only consider

cameras with a price of $200 or less, then a linear relationship may be able to be used as

an approximation. For instance, the following regression output show the results using

only Canon cameras with a price of $200 or less.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 261.97 261.974 26.62 0.001

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Price ($) 1 261.97 261.974 26.62 0.001

Error 9 88.57 9.841

Lack-of-Fit 4 49.32 12.330 1.57 0.313

Pure Error 5 39.25 7.850

Total 10 350.55

Model Summary

S R-sq R-sq(adj) R-sq(pred)

3.13708 74.73% 71.93% 61.75%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 41.81 3.21 13.02 0.000

Price ($) 0.1068 0.0207 5.16 0.001 1.00

Regression Equation

Score = 41.81 + 0.1068 Price ($)

The fit has improved slightly, but the issue whether the underlying relationship may be

better described by curvilinear model cannot be resolved using the methods introduced in

this chapter.

Case Problem 4 Finding the Best Car Value

1. Descriptive statistics follow.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Price ($) Cost/Mile Road- Predicted Value

Test Reliability Score

Score

Mean 26886.20 0.642 80.45 3.75 1.46

Standard Error 754.51 0.01 2.21 0.14 0.04

Median 28067.50 0.665 82 4 1.43

Mode #N/A 0.67 81 4 1.73

Standard Deviation 3374.284152 0.06 9.90 0.64 0.20

Sample Variance 11385793.54 0.00 98.05 0.41 0.04

Kurtosis –1.41 –1.58 2.58 –0.44 –0.64

Skewness –0.23 –0.04 –1.41 0.25 –0.18

Range 10560 0.18 41 2 0.7

Minimum 21800 0.56 52 3 1.05

Maximum 32360 0.74 93 5 1.75

Sum 537724 12.84 1609 75 29.16

Count 20 20 20 20 20

2. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression 1 0.2428 0.24276 8.78 0.008

Price ($) 1 0.2428 0.24276 8.78 0.008

Error 18 0.4980 0.02766

Total 19 0.7407

Model Summary

S R-sq R-sq(adj) R-sq(pred)

0.166326 32.77% 29.04% 12.70%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 2.359 0.306 7.70 0.000

Price ($) -0.000033 0.000011 -2.96 0.008 1.00

Regression Equation

Value Score = 2.359 - 0.000033 Price ($)

Fits and Diagnostics for Unusual Observations

Obs Value Score Fit Resid Std Resid

4 1.7000 1.2746 0.4254 2.84 R

20 1.0500 1.3874 -0.3374 -2.10 R

R Large residual

3. The output follows.

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression 1 0.38012 0.38012 18.97 0.000

Cost/Mile 1 0.38012 0.38012 18.97 0.000

Error 18 0.36060 0.02003

Lack-of-Fit 10 0.26498 0.02650 2.22 0.136

Pure Error 8 0.09563 0.01195

Total 19 0.74072

Model Summary

S R-sq R-sq(adj) R-sq(pred)

0.141540 51.32% 48.61% 41.96%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 2.942 0.342 8.60 0.000

Cost/Mile -2.312 0.531 -4.36 0.000 1.00

Regression Equation

Value Score = 2.942 - 2.312 Cost/Mile

Fits and Diagnostics for Unusual Observations

Obs Value Score Fit Resid Std Resid

20 1.0500 1.3933 -0.3433 -2.50 R

R Large residual

4. The output follows.

Analysis of Variance

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Source DF Adj SS Adj MS F-Value P-Value

Regression 1 0.1255 0.12546 3.67 0.071

Road-Test Score 1 0.1255 0.12546 3.67 0.071

Error 18 0.6153 0.03418

Lack-of-Fit 13 0.3243 0.02494 0.43 0.899

Pure Error 5 0.2910 0.05820

Total 19 0.7407

Model Summary

S R-sq R-sq(adj) R-sq(pred)

0.184882 16.94% 12.32% 0.00%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 0.798 0.347 2.30 0.034

Road-Test Score 0.00821 0.00428 1.92 0.071 1.00

Regression Equation

Value Score = 0.798 + 0.00821 Road-Test Score

Fits and Diagnostics for Unusual Observations

Obs Value Score Fit Resid Std Resid

19 1.2000 1.2245 -0.0245 -0.18 X

X Unusual X

5. The output follows.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 0.09105 0.09105 2.52 0.130

Predicted Reliability 1 0.09105 0.09105 2.52 0.130

Error 18 0.64967 0.03609

Lack-of-Fit 1 0.08289 0.08289 2.49 0.133

Pure Error 17 0.56679 0.03334

Total 19 0.74072

Model Summary

S R-sq R-sq(adj) R-sq(pred)

0.189982 12.29% 7.42% 0.00%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant 1.052 0.259 4.05 0.001

Predicted Reliability 0.1084 0.0682 1.59 0.130 1.00

Regression Equation

Value Score = 1.052 + 0.1084 Predicted Reliability

Fits and Diagnostics for Unusual Observations

Obs Value Score Fit Resid Std Resid

19 1.2000 1.5935 -0.3935 -2.39 R

R Large residual

6. Although Consumer Reports did not include price as one of the components of value

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
score, the regression output shown in part 2 shows a significant statistical relationship

between price ($) and value score (p-value = .008). Reviewing the regression output in

parts 3–5 indicates that cost/mile is the best single predictor of value score (R-Sq =

51.3%). To further investigate the relationship among these variables, we really need to

use multiple regression analysis.

Case Problem 5 Buckeye Creek Amusement Park

1. Descriptive statistics for the dependent and independent variables and a scatter plot of

these two variables follow. The mean population over the 151 zip codes is 15,738.2 and

the mean number of season pass holders is 128.3. Over all zip codes, the maximum

population is 62,303 and the minimum is 1,227. The maximum number of season pass

holders is 657 and the minimum is 5.

There appears to be a positive linear relationship between the population of a zip code
1400

1200

1000

800
Members

600

400

200

0
0 10000 20000 30000 40000 50000 60000 70000
© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Population
and the number of season pass holders.

2. The regression tool output follows:

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Regression 1 2020777 2020777 261.95 0.000

Population 1 2020777 2020777 261.95 0.000

Error 149 1149418 7714

Total 150 3170195

Model Summary

S R-sq R-sq(adj) R-sq(pred)

87.8306 63.74% 63.50% 62.52%

Coefficients

Term Coef SE Coef T-Value P-Value VIF

Constant -16.3 11.4 -1.42 0.157

Population 0.009183 0.000567 16.19 0.000 1.00

Regression Equation

Season Pass Holders = -16.3 + 0.009183 Population

Fits and Diagnostics for Unusual Observations

Obs Season Pass Holders Fit Resid Std Resid

25 424.0 151.2 272.8 3.12 R

31 564.0 376.2 187.8 2.18 R

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
40 357.0 170.8 186.2 2.13 R

42 380.0 193.8 186.2 2.13 R

46 423.0 220.3 202.7 2.32 R

50 303.0 119.2 183.8 2.10 R

52 307.0 389.4 -82.4 -0.96 X

53 377.0 167.5 209.5 2.39 R

54 448.0 225.4 222.6 2.55 R

60 440.0 446.8 -6.8 -0.08 X

63 561.0 408.6 152.4 1.78 X

68 496.0 555.9 -59.9 -0.72 X

71 286.0 458.8 -172.8 -2.03 R X

75 657.0 469.3 187.7 2.21 R X

77 549.0 453.8 95.2 1.12 X

89 322.0 439.4 -117.4 -1.37 X

107 93.0 277.1 -184.1 -2.11 R

120 55.0 276.9 -221.9 -2.55 R

135 39.0 229.3 -190.3 -2.18 R

141 39.0 218.2 -179.2 -2.05 R

147 20.0 207.7 -187.7 -2.15 R

R Large residual

X Unusual X

The regression equation is = -16.3 +.009183x where x = population.

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
3. Significant relationship: p-value = .000 <  = .05. We must check the model assumptions

for the validity of this inference. There are several unusual observations and observations

with large residuals.

4. We see that 63.74% of the variability in y has been explained by the estimated regression

equation; a reasonably good fit

5. A plot of the residuals against the predicted number of season pass holders is shown

below. The regression model assumption of constant variance appears to be violated

(larger predicted values exhibit greater variability).

Because the variance does not appear to be constant, testing hypotheses and calculating

interval estimates from this model may not be appropriate. A curvilinear regression

model or multiple regression model should be considered.

6. Because the model provides a reasonable fit (r2 = .637), it could be used to guide the

marketing campaign. The following scatter diagram shows the estimated regression line

and the data. Any data point below the estimated regression line has fewer observed

number of season pass holders than is estimated by the estimated regression equation (the

point on the estimated regression line). These zip codes are good targets for the direct

mail campaign.

7. Other data of interest for independent variables might include the distance of the zip code

from the park, the average household income of the zip code, and the average number of

children per household in the zip code.

700

600

f(x) = 0.00918294959602868 x − 16.258366949827


Season Pass Holders

500 R² = 0.637429873240696

400

300

200

100

© 2019 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
0
0 10000 20000 30000 40000 50000 60000 70000
Population

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy