Probability and SamplingDistributions
Probability and SamplingDistributions
y yP
For continuous variables taking on all
possible values
Probability distribution is a smooth curve
with the area under the curve for an interval
representing the probability of a value being
in that interval
Probability
Bell curve
Gaussian distribution
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Height and weight
IQs and many other test scores
Measurement errors
Sample statistics
Result of processes when values are affected
by a large number of small random effects
Height
IQ
Measurement error
Drawing a sample from a population
Can have varying means, displacing to left or
right
Can have varying standard deviations, making
steeper or flatter
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Std dev=1
Std dev=2
Std dev=0.5
Formula for the normal distribution
This implies that normal distribution is fully
specified given the mean and standard
deviation
( ) ( )
t o
o
2
2
2 /
=
X
e
y
Symmetrical
Continuous
Extends to infinity, never reaching zero
Total area under curve is 1.0
About 34 percent of area falls between mean
and one standard deviation above
So about 68 percent is within +/- one
standard deviation of mean
About 95 percent is within +/- two standard
deviations of mean
About 99.7 percent is within +/- three
standard deviations of mean
Normal distribution with mean of zero and
standard deviation of one
Since mean and standard deviation define any
normal distribution
Standard normal distribution can be used for
any normally distributed variable by
converting mean to zero and standard
deviation to onez scores
z score is the number of standard deviations
a value falls from the mean
Converts a value from any normally
distributed variable to a value for the
standard normal distribution
o
=
y
z
Standard normal distribution is a probability
distribution for normally-distributed variables
with mean of zero and standard deviation of
one
Area under curve between two values
corresponds to probability of value falling
between those values
Have tables of areas under standard normal
distribution
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
1.55
Second Decimal Place of z
z . . . 0.04 0.05 0.06 . . .
. . . . . . . . . . . . . . . . . .
1.4 . . . 0.0749 0.0735 0.0722 . . .
1.5 . . . 0.0618 0.0606 0.0594 . . .
1.6 . . . 0.0505 0.0495 0.0485 . . .
. . . . . . . . . . . . . . . . . .
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
1.55
P=.0606
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
-1.55 1.55
P=.0606+.0606
P=.1212
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
1.55
P=.5-.0606=.4394
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
1.55
P=1-.0606=.9394
V a l u e F r e q u e n c y
2 0 0
2 5 3
3 0 5
3 5 6
4 0 8
4 5 1 3
5 0 5
5 5 6
6 0 3
6 5 1
M o r e 0
Histogram
0
2
4
6
8
10
12
14
2
0
2
5
3
0
3
5
4
0
4
5
5
0
5
5
6
0
6
5
M
o
r
e
Value
F
r
e
q
u
e
n
c
y
68 . 40 = 88 . 9 = o
4372 .
88 . 9
68 . 40 45
=
= z
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
.4372
P=.3300
5749 .
88 . 9
68 . 40 35
=
= z
4372 .
88 . 9
68 . 40 45
=
= z
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
-.5749 .4372
P=.5-.3300=.1700
P=.5-.2843=.2157
P=.2157+.1700=.3857
Have probability for some unknown z value
Find probability for tail
Look up probability in body of table
Read off z value from row and column
headings
Convert z value to value for variable
o
=
y
z
o + = z y
Probability above
value is 0.20
z score for
probability of 0.20 is
0.84
Use z score to
compute value
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
Area=0.20
z=0.84
( )( ) 98 . 48 68 . 40 88 . 9 84 . 0 = + = y
When the SAT was first developed, the
scoring was standardized so that the scores
had a mean of 500 and a standard deviation
of 100
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
200 300 400 500 600 700 800
P=.75
P=.25
Area under curve in tail from 75% up is 0.25
Look up 0.25 in body of table to find z score,
which is 0.67
Convert z score to test score
567 500 100 67 . 0 = + = + = o z y
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
200 300 400 500 600 700 800
0.30
Area under curve from mean in tail below 30%
is 0.30
Look up 0.30 in body of table to find z score,
which is 0.52
Convert z score to test score
448 100 52 . 0 500 = = + = o z y
Draw samples from a large population (or any
population with replacement)
Calculate means for each sample
Means will, of course, vary
Then we can look at the distribution of the
sample means sampling distribution
, , ,
3 2 1
y y y
2500 random samples of 75 selected, with
replacement, from actual data from survey of
library patrons
For each sample, calculated mean of variable
SPEND, How much time did you spend in the
library?
Create histogram showing distribution of
sample means
0
50
100
150
200
250
300
350
400
450
Sampling distribution of the mean SPEND
N-75 Samples=2500 Mean=41.34 SE=4.58
F
r
e
q
u
e
n
c
y
1
3
1
5
1
7
1
9
2
1
2
3
2
5
2
7
2
9
3
1
3
3
3
5
3
7
3
9
4
1
4
3
4
5
4
7
4
9
5
1
5
3
5
5
5
7
5
9
6
1
6
3
6
5
6
7
6
9
7
1
7
3
7
5
7
7
7
9
Would expect mean of sample means to be
close to mean of population
For our example, mean of population
Mean of sample means
38 . 41 =
34 . 41 =
y
=
y
y
z
o
( ) 866 . 0 134 . 0 1 112 . 1 = = < z P
-1.112
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
Then if you took a large number of sample of
size 50 from the population of library users
What is the proportion of the sample means
that would be more than 10 minutes below or
above the population mean?
659 . 5
50
018 . 40
= = =
n
y
o
o
767 . 1
659 . 5
29 . 41 29 . 31
=
=
y
y
z
o
767 . 1
659 . 5
29 . 41 29 . 51
=
=
y
y
z
o
( ) 0768 . 0 0384 . 0 2 767 . 1 or 767 . 1 = = > < z z P
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
-1.767 1.767