0% found this document useful (0 votes)
24 views

CH 07

The document discusses sampling distributions and their properties. It defines key terms like population, sample, parameter, and statistic. The document also explains that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases based on the central limit theorem, even if the population is not normally distributed. This allows sample means to be used to estimate population parameters and conduct hypothesis tests.

Uploaded by

sukhmandhawan88
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

CH 07

The document discusses sampling distributions and their properties. It defines key terms like population, sample, parameter, and statistic. The document also explains that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases based on the central limit theorem, even if the population is not normally distributed. This allows sample means to be used to estimate population parameters and conduct hypothesis tests.

Uploaded by

sukhmandhawan88
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

CHAPTER 7

SAMPLING
DISTRIBUTIONS
PARAMETERS AND STATISTICS

➢ Population (data) is the collection of all the values of


a variable on all the members of the population.
➢ Sample: a subset of measurements that actually collected
randomly in the course of an investigation.

2
➢ Parameter: a numerical value associated with a
population. It is considered fixed and
unchanging.
Ex: population mean , population standard
deviation , and population proportion p
➢ Statistic: functions of the random sample
Ex: sample mean 𝑋ത , sample standard deviation 𝑠,
and sample proportion 𝑝ො
Note:
▪ Statistic refers to a sample quantity
▪ Parameter refers to a population quantity
▪ Statistic is a random variable
▪ In practice, parameters are unknown constants

3
Ex:
(1) X = the in-city gas mileage of a certain car
X and X are unknown
(2) Y = the commuting distance of workers in a large
city from their home to principal place of work
Y and Y are unknown
(3) W = the lifetime of light bulbs made by a certain
manufacturer W and W are unknown

4
SAMPLING DISTRIBUTION
➢ Sampling distribution: the probability
distribution of a statistic.
➢ One of major objectives of statistics is to use
the information of sample to approximate the
information of population.
➢ To estimate “ characteristics” of the population,
we take a random sample from the population.
Then we use the sample data to compute sample
statistics, such as the sample mean, sample
standard deviation, sample proportion. Use
sample statistics to estimate population
parameter. 5
➢ The sampling distribution of sample means is the
distribution of the sample means obtained when
we repeatedly draw samples of the same size
from the same population
➢ Statistical inference about a population parameters
is prime importance in most practical studies
➢ The central limit theorem (CLT) forms a
theoretical base for statistics. It is used to estimate
population parameters and hypothesis testing.

6
MEAN AND STANDARD DEVIATION OF 𝑋ത
Theorem 1 :
• Given a polulation with  , 
• Take a sample of size n : X 1 , X 2 , , X n
• Consider statistic (sample mean) :
X1 + X 2 +  + X n
X=
n
Then
2 
 X = ,  = 2
X
, X =
n n
𝜎
𝜎𝑋ത = when n /N ≤ .05 where N = population size
𝑛

7
Theorem 2 : X is normal when sampling from
a normal population. If the normal population
has mean  and standard deviation  , then
2 
 X = ,  X = 2
, X =
n n
𝜎
𝜎𝑋ത = when n /N ≤ .05 where N = population size
𝑛

8
Sampling distributions of 𝑋ത when population is normal

9
Theorem 3 (CLT) : Whatever the population, the distribution
of X is approximat ely normal when n is large (n  30).
More specific: In random sampling from an arbitrary
population with mean  and standard deviation  ,
when n  30, the distribution of X is approximat ely
normal with
2 
 X = ,  X =
2
, X = .
n n

That is, the distribution of X  N (  , ).
n
Note : If population is normal, then X is exactly normal
distribution for all n, small or large.
𝜎
𝜎𝑋ത = when n /N ≤ .05 where N = population size
𝑛
10
Sampling distributions of 𝑋ത when population is not normal

11
Ex: To get an intuitive understanding of its most
important consequences of the central limit theorem:
As the sample size increases , the sampling distribution
of sample means approaches a normal distribution.
Let’s look at the last four digits of social insurance numbers
from each of 50 different students. See the table on next slide.

12
Group SIN digits mean Group SIN digits mean
1 1 8 6 4 4.75 26 7 3 1 1 3.00
2 5 3 3 6 4.25 27 9 1 1 3 3.50
3 9 8 8 8 8.25 28 8 6 5 9 7.00
4 5 1 2 5 3.25 29 5 6 4 1 4.00
5 9 3 3 5 5.00 30 9 3 9 5 6.50
6 4 2 6 2 3.50 31 6 0 7 3 4.00
7 7 7 1 6 5.25 32 8 2 9 6 6.25
8 9 1 5 4 4.75 33 0 2 8 6 4.00
9 5 3 3 9 5.00 34 2 0 9 7 4.50
10 7 8 4 1 5.00 35 5 8 9 0 5.50
11 0 5 6 1 3.00 36 6 5 4 9 6.00
12 9 8 2 2 5.25 37 4 8 7 6 6.25
13 6 1 5 7 4.75 38 7 1 2 0 2.50
14 8 1 3 0 3.00 39 2 9 5 0 4.00
15 5 9 6 9 7.25 40 8 3 2 2 3.75
16 6 2 3 4 3.75 41 2 7 1 6 4.00
17 7 4 0 7 4.50 42 6 7 7 1 5.25
18 5 7 5 6 5.75 43 2 3 3 9 4.25
19 4 1 5 7 4.25 44 2 4 7 5 4.50
20 1 2 0 6 2.25 45 5 4 3 7 4.75
21 4 0 2 8 3.50 46 0 4 3 8 3.75
22 3 1 2 5 2.75 47 2 5 8 6 5.25
23 0 3 4 0 1.75 48 7 1 3 4 3.75
24 1 5 1 0 1.75 49 8 3 7 0 4.50 13
25 9 7 4 0 5.00 50 5 6 6 7 6.00
(a) If we combine the four digits from each into one big
collection of 200 numbers. Then we have an
approximately uniform distribution with the graph shown
below. The mean of the distribution is 4.5 and the standard
deviation is 2.8.

30

25

20

15

10

0
0 1 2 3 4 5 6 7 8 9

14
(b) Now look at sample means. We have 50 sample
means. Even though the original collection of data has
an approximately uniform distribution, the sample
means have a distribution that is approximately
normal. It is a truly fascinating and intriguing
phenomenon in statistics.

18
16
14
12
10
8
6
4
2
0
0 1 2 3 4 5 6 7 8 9

15
SHAPE OF SAMPLING DISTRIBUTION OF 𝑋ത

In a recent SAT, the mean score for all examinees was


1020. Assume that the distribution of SAT scores of all
examinees is normal with a mean of 1020 and a
standard deviation of 153. Let 𝑋ത be the mean SAT score
of a random sample of certain examinees. Calculate the
mean and standard deviation of and describe the shape
of its sampling distribution when the sample size is
(a) 16 (b) 50 (c) 1000

16
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
𝐿𝑒𝑡 µ and σ be the mean and standard deviation of SAT scores
of examinees and let 𝜇𝑋ത and 𝜎𝑋ത 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑎𝑛𝑑 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑
𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 of the sampling distribution of 𝑋ത 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦.
𝜇 = 1020 𝑎𝑛𝑑 𝜎 = 153
(a)
𝜇𝑋ത = 𝜇 = 1020
𝜎 153
𝜎𝑋ത = = = 38.250 ത
𝑋~𝑁(1020, 38.250)
𝑛 16

17
(b)
𝜇𝑋ത = 𝜇 = 1020
𝜎 153
𝜎𝑋ത = = = 21.637 ത
𝑋~𝑁(1020, 21.637)
𝑛 50

18
(c)
𝜇𝑋ത = 𝜇 = 1020
𝜎 153
𝜎𝑋ത = = = 4.838 ത
𝑋~𝑁(1020, 4.838)
𝑛 1000

19
APPLICATION OF SAMPLING
DISTRIBUTION OF 𝑋ത
Example 7-5 (Application Example)

Assume that the weights of all packages of


a certain brand of cookies are normally
distributed with a mean of 32 ounces and a
standard deviation of 0.3 ounce. Find the
probability that the mean weight, x , of a
random sample of 20 packages of this
brand of cookies will be between 31.8 and
31.9 ounces.
Example 7-5: Solution
𝑁𝑜𝑡𝑒: 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑖𝑠 𝑛𝑜𝑟𝑚𝑎𝑙
𝜇𝑋ത = 𝜇 = 32 𝑜𝑢𝑛𝑐𝑒𝑠 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝑖𝑠 𝑠𝑚𝑎𝑙𝑙: 𝑛 = 20 < 30
𝜎 0.3
𝜎𝑋ത = = = 0.06708204 𝑜𝑢𝑛𝑐𝑒𝑠
𝑛 20
𝑃 31.8 < 𝑋ത < 31.9 = 𝑃 31.8 − 𝜇𝑋ത < 𝑋ത − 𝜇𝑋ത < 31.9 − 𝜇𝑋ത
31.8 − 𝜇𝑋ത 𝑋ത − 𝜇𝑋ത 31.9 − 𝜇𝑋ത
=𝑃 < <
𝜎𝑋ത 𝜎𝑋ത 𝜎𝑋ത
31.8 − 32 31.9 − 32
=𝑃 <𝑍<
0.06708204 0.06708204
= 𝑃(−2.98 < 𝑍 < −1.49)
= 0.0681 − 0.0014
= 0.0667
Example 7-6 (Application Example)
According to Moebs Services Inc., an individual checking
account at major U.S. banks costs the banks between $350
and $450 per year. Suppose that the current average cost of
all checking accounts at major U.S. banks is $400 per year
with a standard deviation of $30. Let 𝑥ҧ be the current
average annual cost of a random sample of 225 individual
checking accounts at major banks in American.
(a) What is the probability that the average annual cost of the
checking accounts in this sample is within $4 of the
population mean?
(b) What is the probability that the average annual cost of the
checking accounts in this sample is less than the population
mean by $2.7 or more?
22
Example 7-6: Solution
𝑁𝑜𝑡𝑒: 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑖𝑠 𝑛𝑜𝑡 𝑛𝑜𝑟𝑚𝑎𝑙
𝜇𝑋ത = μ = $400 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒: 𝑛 = 225 ≥ 30
𝜎 30
𝜎𝑋ത = = = $2.00
𝑛 225
(a)
𝑃 396 < 𝑋ത < 404 = 𝑃 396 − 𝜇𝑋ത < 𝑋ത − 𝜇𝑋ത < 404 − 𝜇𝑋ത
396 − 𝜇𝑋ത 𝑋ത − 𝜇𝑋ത 404 − 𝜇𝑋ത
=𝑃 < <
𝜎𝑋ത 𝜎𝑋ത 𝜎𝑋ത
396 − 400 404 − 400
=𝑃 <𝑍<
2 2
= 𝑃(−2 < 𝑍 < 2)
= 0.9772 − 0.0228 = 0.9544
(b)
𝑃 𝑋ത < 400 − 2.7 = 𝑃 𝑋ത < 397.3
= 𝑃 𝑋ത − 𝜇𝑋ത < 397.3 − 𝜇𝑋ത
𝑋ത − 𝜇𝑋ത 397.3 − 𝜇𝑋ത
=𝑃 <
𝜎𝑋ത 𝜎𝑋ത
397.3 − 400
=𝑃 𝑍<
2
= 𝑃(𝑍 < −1.35)
= 0.0885

24
POPULATION AND SAMPLE PROPORTIONS

The population and sample


proportions, denoted by p and p̂ ,
respectively, are calculated as

𝑋 𝑥
𝑝= , 𝑝ො =
𝑁 𝑛
POPULATION AND SAMPLE
PROPORTIONS

where
◼ N = total number of elements in the population
◼ n = total number of elements in the sample
◼ X = number of elements in the population that
possess a specific characteristic
◼ x = number of elements in the sample that
possess a specific characteristic
Example 7-7

Suppose a total of 789,654 families live in a


city and 563,282 of them own homes. A
sample of 240 families is selected from this
city, and 158 of them own homes. Find the
proportion of families who own homes in
the population and in the sample.
Example 7-7: Solution

𝑋 563282
𝑝= = = 0.71
𝑁 789654

𝑥 158
𝑝ො = = = 0.66
𝑛 240
Sampling Distribution of the Sample Proportion p̂



Example 7-8

Boe Consultant Associates has five


employees. Table 7.6 gives the names of
these five employees and information
concerning their knowledge of statistics.
Table 7.6 Information on the Five Employees of
Boe Consultant Associates
Example 7-8

 If we define the population proportion, p,


as the proportion of employees who know
statistics, then
 p = 3 / 5 = .60
 This is a constant. It will not change.
Example 7-8

 Now, suppose we draw all possible samples


of three employees each and compute the
proportion of employees, for each sample,
who know statistics.

𝟓!
𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒂𝒎𝒑𝒍𝒆𝒔 = 𝟓𝑪𝟑 = = 𝟏𝟎
𝟑! 𝟓 − 𝟑 !
Table 7.7 All Possible Samples of Size 3 and the
Value of 𝑝ො for Each Sample
Table 7.8 Frequency and Relative Frequency
Distribution of 𝑝ො when the Sample Size Is 3
Table 7.9 Sampling Distribution of 𝑝ො when the
Sample Size is 3
Mean and Standard Deviation of 𝑝ො

Mean of the Sample Proportion


The mean of the sample proportion, p̂ ,
is denoted by  p̂ and is equal to the
population proportion, p. Thus,

 pˆ = p
Mean and Standard Deviation of 𝑝ො

The standard deviation of the sample


proportion, , is denoted by  p̂ and is given by
the formula

pq
pˆ =
n
where p is the population proportion, q = 1 – p , and
n is the sample size. This formula is used when n/N
≤ .05, where N is the population size.
Shape of the Sampling Distribution of 𝑝ො

According to the central limit theorem, when the


sample size is sufficiently large, the sampling
distribution of 𝑝Ƹ is approximately normal
with
𝒑𝒒
Mean = 𝝁𝑝ො = 𝒑 , sd = 𝝈𝑝ො =
𝒏

In the case of proportion, the sample size is


considered to be sufficiently large if np and nq are
both greater than 5 – that is, if
np > 5 and nq >5
Application of the Sampling Distribution of 𝑝ො

According to the BBMG Conscious Consumer Report,


51% of the adults surveyed said that they are willing to
pay more for products with social and environmental
benefits despite the current tough economic times (USA
TODAY, June 8, 2009). Suppose this result is true for
the current population of adult Americans. Let 𝑝Ƹ be the
proportion in a random sample of 1050 adult
Americans who will hold the said opinion. Find the
probability that the value of 𝑝Ƹ is between
53% and 55%.
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏:
Given information: 𝑛 = 1050, 𝑝 = 0.51 𝑎𝑛𝑑 𝑞 = 1 − 𝑝 = 0.49
Then mean and standard deviation of 𝑝Ƹ is:
𝑝𝑞 0.51 × 0.49
𝜇𝑝ො = 𝑝 = 0.51 𝜎𝑝ො = = = 0.01542725
𝑛 1050
Values of 𝑛𝑝 and 𝑛𝑞 are:
𝑛𝑝 = 1050 × 0.51 = 535.5 > 5
𝑛𝑞 = 1050 × 0.49 = 514.5 > 5
Central limit theorem implies that the sampling distribution of 𝑝Ƹ
is approximately normal.

41
𝑃 0.53 < 𝑝Ƹ < 0.55
0.53 − 𝜇𝑝Ƹ 𝑝Ƹ − 𝜇𝑝ො 0.55 − 𝜇𝑝Ƹ
=𝑃 < <
𝜎𝑝Ƹ 𝜎𝑝ො 𝜎𝑝Ƹ
0.53−0.51 0.55−0.51
=𝑃 <𝑍<
0.01542725 0.01542725
= 𝑃 1.30 < 𝑍 < 2.59
= 0.9952 − 0.9032 = 0.0920

42
Example 7-11 (Application example)

Maureen Webster, who is running for mayor in a


large city, claims that she is favored by 53% of
all eligible voters of that city. Assume that this
claim is true. What is the probability that in a
random sample of 400 registered voters taken
from this city, less than 49% will favor Maureen
Webster?

43
Example 7-11: Solution

n =400, p = .53, and q = 1 – p = 1 - .53 = .47

𝑝𝑞 0.53 × 0.47
𝜇𝑝ො = 𝑝 = 0.53 𝑎𝑛𝑑 𝜎𝑝ො = = = 0.02495496
𝑛 400

Values of 𝑛𝑝 and 𝑛𝑞 are:


𝑛𝑝 = 400 × 0.53 = 212 > 5
𝑛𝑞 = 400 × 0.47 = 188 > 5
Central limit theorem implies that the sampling distribution of
𝑝Ƹ is approximately normal.
Example 7-11: Solution
𝑃 𝑝Ƹ < 0.49
𝑝Ƹ − 𝜇𝑝ො 0.49 − 𝜇𝑝ො
=𝑃 <
𝜎𝑝ො 𝜎𝑝ො
0.49−0.53
=𝑃 𝑍<
0.02495496
= 𝑃 𝑍 < −1.60 = 0.0548
Hence, the probability that less than 49% of the voters in a
random sample of 400 will favor Maureen Webster is .0548.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy