Chapter 3B
Chapter 3B
and Economics
Anderson Sweeney
Williams
Slides by
John Loucks
St. Edward’s University
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
1
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 3, Part B
Descriptive Statistics: Numerical
Measures
Measures of Distribution Shape, Relative
Location, and Detecting Outliers
Exploratory Data Analysis
Measures of Association Between Two
Variables
The Weighted Mean and
Working with Grouped Data
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
2
or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Distribution Shape,
Relative Location, and Detecting Outliers
Distribution Shape
z-Scores
Chebyshev’s
Theorem
Empirical Rule
Detecting Outliers
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
3
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
An important measure of the shape of a
distribution is called skewness.
The formula for the skewness of sample data is
3
n xi x
Skewness
(n 1)(n 2) s
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
4
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
Symmetric (not skewed)
• Skewness is zero.
• Mean and median are equal.
.35
Skewness =
0
Relative Frequency
.30
.25
.20
.15
.10
.05
0
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
5
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
Moderately Skewed Left
• Skewness is negative.
• Mean will usually be less than the median.
.35
Skewness = - .31
Relative Frequency
.30
.25
.20
.15
.10
.05
0
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
6
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
Moderately Skewed Right
• Skewness is positive.
• Mean will usually be more than the median.
.35
Skewness = .31
Relative Frequency
.30
.25
.20
.15
.10
.05
0
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
7
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
Highly Skewed Right
• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
.35
Skewness = 1.25
Relative Frequency
.30
.25
.20
.15
.10
.05
0
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
8
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
Example: Apartment Rents
Seventy efficiency apartments were
randomly
sampled in a college town. The monthly rent
prices
425 430 430 435 435 435 435 435 440 440
for the apartments are listed below in ascending
440 440 440 445 445 445 445 445 450 450
order.
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
9
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
Example: Apartment Rents
.30
.25
.20
.15
.10
.05
0
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
10
or duplicated, or posted to a publicly accessible website, in whole or in part.
z-Scores
The
The z-score
z-score is
is often
often called
called the
the standardized
standardized value.
value.
It
It denotes
denotes the the number
number of
of standard
standard deviations
deviations aa data
data
value
value xxii is
is from
from the
the mean.
mean.
xi x
zi
s
Excel’s
Excel’s STANDARDIZE
STANDARDIZE function
function can
can be
be used
used to
to
compute
compute the
the z-score.
z-score.
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
11
or duplicated, or posted to a publicly accessible website, in whole or in part.
z-Scores
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
12
or duplicated, or posted to a publicly accessible website, in whole or in part.
z-Scores
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
13
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chebyshev’s Theorem
At
At least
least (1
(1 -- 1/z
1/z22)) of
of the
the items
items in
in any
any data
data set
set will
will be
be
within
within zz standard
standard deviations
deviations of
of the
the mean,
mean, where
where zz is
is
any
any value
value greater
greater than than 1.
1.
Chebyshev’s
Chebyshev’s theorem
theorem requires
requires zz >
> 1,
1, but
but zz need
need not
not
be
be an
an integer.
integer.
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
14
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chebyshev’s Theorem
At least 75%
At least of
of the
the data
data values
values must
must be
be
within z = 2 standard deviations
within of
of the
the mean.
mean
At least 89%
At least of
of the
the data
data values
values must
must be
be
within z = 3 standard deviations
within of
of the
the mean.
mean
At least 94%
At least of
of the
the data
data values
values must
must be
be
within z = 4 standard deviations
within of
of the
the mean.
mean
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
15
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chebyshev’s Theorem
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
16
or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rule
The
The empirical
empirical rule
rule can
can be
be used
used to
to determine
determine the
the
percentage
percentage of
of data
data values
values that
that must
must be
be within
within aa
specified
specified number
number ofof standard
standard deviations
deviations of
of the
the
mean.
mean.
The
The empirical
empirical rule
rule is
is based
based on
on the
the normal
normal
distribution,
distribution, which
which is
is covered
covered in
in Chapter
Chapter 6.
6.
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
17
or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rule
68.26% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/- 1 standard deviation
are within of
of its
its mea
mea
95.44% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/- 2 standard deviations
are within of
of its
its mea
mea
99.72% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/- 3 standard deviations
are within of
of its
its mea
me
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
18
or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rule
99.72%
95.44%
68.26%
m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
19
or duplicated, or posted to a publicly accessible website, in whole or in part.
Detecting Outliers
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
20
or duplicated, or posted to a publicly accessible website, in whole or in part.
Detecting Outliers
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
21
or duplicated, or posted to a publicly accessible website, in whole or in part.
Exploratory Data Analysis
Exploratory
Exploratory data
data analysis
analysis procedures
procedures enable
enable us
us to
to use
use
simple
simple arithmetic
arithmetic and
and easy-to-draw
easy-to-draw pictures
pictures to
to
summarize
summarize data.
data.
We
We simply
simply sort
sort the
the data
data values
values into
into ascending
ascending order
order
and
and identify
identify the
the five-number
five-number summary
summary andand then
then
construct
construct aa box
box plot.
plot.
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
22
or duplicated, or posted to a publicly accessible website, in whole or in part.
Five-Number Summary
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
23
or duplicated, or posted to a publicly accessible website, in whole or in part.
Five-Number Summary
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
24
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot
A
A box
box plot
plot is
is aa graphical
graphical summary
summary of
of data
data that
that is
is
based
based on
on aa five-number
five-number summary.
summary.
A
A key
key to
to the
the development
development ofof aa box
box plot
plot is
is the
the
computation
computation of of the
the median
median and
and the
the quartiles
quartiles Q Q11 and
and
Q
Q33..
Box
Box plots
plots provide
provide another
another way
way to
to identify
identify outliers.
outliers.
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
25
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot
Example: Apartment Rents
• A box is drawn with its ends located at the first an
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).
40 42 45 47 50 52 55 57 60 62
0 5 0 5 0 5 0 5 0 5
Q1 = 445 Q3 = 525
Q2 = 475
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
26
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
27
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot
Example: Apartment Rents
• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
28
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot
Example: Apartment Rents
• Whiskers (dashed lines) are drawn from the
ends
of the box to the smallest and largest data
values
inside the limits.
40 42 45 47 50 52 55 57 60 62
0 5 0 5 0 5 0 5 0 5
Smallest value Largest value
inside limits = 425 inside limits = 615
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
29
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot
Often
Often aa manager
manager or
or decision
decision maker
maker isis interested
interested in
in
the
the relationship
relationship between
between two
two variables.
variables.
Two
Two descriptive
descriptive measures
measures ofof the
the relationship
relationship
between
between two
two variables
variables are
are covariance
covariance and
and correlation
correlation
coefficient.
coefficient.
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
31
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance
The
The covariance
covariance is
is aa measure
measure of
of the
the linear
linear association
association
between
between two
two variables.
variables.
Positive
Positive values
values indicate
indicate aa positive
positive relationship.
relationship.
Negative
Negative values
values indicate
indicate aa negative
negative relationship.
relationship.
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
32
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance
The
The covariance
covariance is
is computed
computed as
as follows:
follows:
( xi x )( yi y ) for
sxy
n 1 samples
( xi x )( yi y ) for
xy populations
N
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
33
or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation Coefficient
Correlation
Correlation is
is aa measure
measure of
of linear
linear association
association and
and not
not
necessarily
necessarily causation.
causation.
Just
Just because
because two
two variables
variables are
are highly
highly correlated,
correlated, it
it
does
does not
not mean
mean that
that one
one variable
variable is
is the
the cause
cause of
of the
the
other.
other.
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
34
or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation Coefficient
The
The correlation
correlation coefficient
coefficient is
is computed
computed as
as follows:
follows:
sxy xy
rxy xy
sx s y x y
for for
samples populations
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
35
or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation Coefficient
The
The coefficient
coefficient can
can take
take on
on values
values between
between -1
-1 and
and +1
+
Values
Values near
near -1
-1 indicate
indicate aa strong
strong negative
negative linear
linear
relationship.
relationship.
Values
Values near
near +1
+1 indicate
indicate aa strong
strong positive
positive linear
linear
relationship.
relationship.
The
The closer
closer the
the correlation
correlation is
is to
to zero,
zero, the
the weaker
weaker the
the
relationship.
relationship.
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
36
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance and Correlation Coefficient
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
37
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance and Correlation Coefficient
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
38
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance and Correlation Coefficient
sxy
(x x)(y y) 35.40
i i
7.08
n 1 6 1
• Sample Correlation Coefficient
sxy 7.08
rxy -.9631
sxsy (8.2192)(.8944)
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
39
or duplicated, or posted to a publicly accessible website, in whole or in part.
The Weighted Mean and
Working with Grouped Data
Weighted Mean
Mean for Grouped Data
Variance for Grouped Data
Standard Deviation for Grouped Data
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
40
or duplicated, or posted to a publicly accessible website, in whole or in part.
Weighted Mean
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
41
or duplicated, or posted to a publicly accessible website, in whole or in part.
Weighted Mean
x wx i i
w i
where:
xi = value of observation i
wi = weight for observation i
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
42
or duplicated, or posted to a publicly accessible website, in whole or in part.
Grouped Data
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
43
or duplicated, or posted to a publicly accessible website, in whole or in part.
Mean for Grouped Data
Sample Data
x fM i i
Population
Data
fM i i
N
where:
fi = frequency of class i
Mi = midpoint of class i
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
44
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Mean for Grouped Data
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
45
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Mean for Grouped Data
Example: Apartment Rents
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
46
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variance for Grouped Data
For sample data
2 f i ( Mi x ) 2
s
n 1
f ( M ) 2
2 i i
N
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
47
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Variance for Grouped Data
Example: Apartment Rents
Rent ($) fi Mi Mi - x (M i - x )2 f i (M i - x )2
420-439 8 429.5 -63.7 4058.96 32471.71
440-459 17 449.5 -43.7 1910.56 32479.59
460-479 12 469.5 -23.7 562.16 6745.97
480-499 8 489.5 -3.7 13.76 110.11
500-519 7 509.5 16.3 265.36 1857.55
520-539 4 529.5 36.3 1316.96 5267.86
540-559 2 549.5 56.3 3168.56 6337.13
560-579 4 569.5 76.3 5820.16 23280.66
580-599 2 589.5 96.3 9271.76 18543.53
600-619 6 609.5 116.3 13523.36 81140.18
Total 70 208234.29
continued
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
48
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Variance for Grouped Data
Example: Apartment Rents
• Sample Variance
s2 = 208,234.29/(70 – 1) = 3,017.89
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
49
or duplicated, or posted to a publicly accessible website, in whole or in part.
End of Chapter 3, Part B
© 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
50
or duplicated, or posted to a publicly accessible website, in whole or in part.