09 Sampling Distribution
09 Sampling Distribution
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (2)
N-n
where is called the ‘finite population multiplier’ or ‘the finite population correction
N-1
n
factor’. If the sampling fraction is less than 0.05, then finite population multiplier need not be
N
used. For a large N, this factor, of course, approach 1 and hence can be ignored. The usual rule of
thumb is to consider N is large enough if it is at least 20 times larger than n.
Case (ii) when the sampling is with replacement from infinite population:
x- = and x- =
n
(8) An element of a sample is called a sample unit. A complete list of all possible sampling
units is called a sampling frame.
(9) Numerical information or values drawn from population are called parameter. For
example population mean and the standard deviation .
(10) Numerical information or values drawn from sample are called statistic. It varies from
sample to sample from the same population. For example sample mean X and sample
standard deviation S.
(11) The difference between parameter and statistic due to small sample is called sampling
error. It can be reduced by increasing the sample size to a sufficient level.
sampling error = X
(12) The non sampling errors are those which arise due to defective sampling frame or
information not being provided correctly. For example, income, sale, production age etc.
are not coated correctly in the most of the cases.
(13) Bias is a cumulative component of error which arise due to defective selection of the
sample or negligence of the investigator. Errors due to bias increase with an increase in
the size of the sample.
(14) A population in which every sampling unit have similar characteristic and have equal
chances of selection in sample is called a homogeneous population.
Definition (Sampling)
Sampling techniques are used to estimate the population parameters on the basis of
samples measures called as statistic and usually these inferences are mean, variance, standard
deviation, Skewness and Kurtosis etc. That’s why we discuss here sampling distributions as an
application of these inferences.
Sampling methods
(1) Probability Sampling
when each unit in population has known non-zero (not necessarily equal) probability of
its being included in the sample, the sampling is said to be probability sampling is also
called random sampling. e.g. simple random sampling, stratified sampling, systematic
sampling, cluster sampling etc.
(2) Non-probability Sampling
a non-probability sampling is a process in which the personal judgment determines which
units of the population are selected for the sample. It is also called non-random or
judgment sampling
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (3)
Types of Sampling
Random or Probability Sampling
Non-random or Judgment Sampling
In probability sampling or random sampling, all the items in the population have a chance
of being chosen in the sample. In judgment sampling, personal knowledge and opinion are used
to identify the items from the population that are to be included in the sample. Sometimes
judgment sample is used as pilot or trial sample to decide how to take a random sample later.
The rigorous statistical analysis can be done only with the probability samples.
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (4)
What is the Sampling Distribution of X ?
In the following section, we will discuss sampling distribution of X and S2 as a
mechanism from which we will be able to make inferences on the parameters and 2.
In many situations it is reasonable to assume that the population from which we are
selecting a simple random sample has a normal, or nearly normal, distribution. When the
population has a normal distribution, the sampling distribution of is normally distributed for any
sample size.
When the population from which we are selecting a simple random sample does not have
a normal distribution, the central limit theorem is helpful in identifying the shape of the sampling
distribution of X.
Explanation of Central Limit Theorem
Suppose we draw samples from a normally distributed population with mean 100 and a
standard deviation of 25. We draw samples of 5 items each and calculate their mean.
Relationship between the population distribution and sampling distribution of the mean
for a normal population is:
Suppose we increase our sample size from 5 to 20. This would increase the effect of
averaging in each sample and would expect even less dispersion among the sample means
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (5)
Examples (1)
Consider the data concerning the experience of five motorcycle owners with life of tires.
Owners Carl Debbie Elizabeth Frank George Total
Tire Life 3 3 7 9 14 36
(in months)
Because only five people are involved, the population is too small to be approximated by a
normal distribution. We will take all of the possible samples of the owners in groups of three.
Compute the sample mean X , list them and compute the mean of the sampling distribution x- ?
Solution
Calculation of sample mean of tire’s life with n = 3 is given below:
Calculations show that even the population is not normal, the mean of the sampling distribution
x- , is still equal to the population mean .
More about Central Limit Theorem
In the following figures, we observe that the distributions of the population is not normal
whereas the sampling distribution of the mean looks a little like the bell shape.
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (6)
As the sample size is increased, the sampling distribution of the mean looks more likely to a bell
shape of the normal distribution.
Now we state central limit theorem which supports the above sited arguments.
Central Limit Theorem
The central limit theorem (CLT) states that, given certain conditions, the mean of a
sufficiently large number of independent random variables, each with a well-defined mean and
well-defined variance, will be approximately normally distributed.
The central limit theorem explains:
the mean of the sampling distribution of the mean will equal the population mean
it measures that the sampling distribution of the mean approaches normal as the
sample size increases
It is a relationship between the shape of the population distribution and the shape
of the sampling distribution of the mean.
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (7)
The sample mean x is the best estimator of the population mean . It is unbiased, consistent, the
most efficient estimator, and, as long as the sample is sufficiently large, its sampling distribution
can be approximated by the normal distribution as central limit theorem says.
Definition (Point Estimate)
Point estimate of a population parameter is a single numerical value of a sample statistic.
(1) Point Estimates
Sampling mean
x as point estimate of the population mean.
E(
x) =
i.e.
x is an unbiased estimate of the population mean .
s2 is a point estimate of the population variance 2
(xi -
x)2
s2 =
n-1
i.e. an unbiased estimate of the population variance
s2 is also a point estimate of the population variance 2
(xi -
x)2
s2 =
n
i.e. an biased estimate of the population variance
Examples (2)
A bank calculates that its individual saving accounts are normally distributed with mean
of $2000 and a standard deviation of $600. If the bank takes random samples of 100 accounts,
what is the probability that the sample mean will lie between $1900 and 2050.
Solution
First we calculate standard error of the mean:
x- = (for infinite population)
n
600
=
100
= $ 60
To determine the probability that sample mean will lie between $1900 and $2050. We find that
x-
corresponding values z1 and z2 using Z =
x
It tells us to convert any normal random variable to a standard normal random variable.
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (8)
1900 - 2000
For X = $ 1900 z1 = = - 1.67
60
2050 - 2000
For X = $ 2050 z2 = = 0.83
60
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (9)
audits 50 randomly selected accounts, what is the probability that the sample average monthly
balance is Page 321 AIOU SC 6.5
(i) below $ 100
(ii) between $ 100 and $ 130.
solution
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (10)
(a) What is the standard error of the mean
(b) what is the P[363 <
x < 366].
(c) What would your answer to part (a) be if we sample with replacement.
Statistics for Management, 7th Ed, by Richard Levin and David Rubin Prob. 6.40 p-327
Examples (8)
Given a population of size N = 80 with a mean of 8.2 and standard deviation of 3.2. What
is the probability that a sample of 25 will have a mean between 21 and 23.5?
Statistics for Management, 7th Ed, by Richard Levin and David Rubin Prob. 6.41 p-327
Examples (9)
For a population of size N = 80 with a mean of 8.2 and standard deviation of 2.1, find the
S.E of the mean for the following sample size (a) n= 16, (b) n= 25, (c) n = 49
Statistics for Management, 7th Ed, by Richard Levin and David Rubin Prob. 6.42 p-327
Examples (10)
Data on pull-off force (pounds) for connectors used in an automobile engine application
are as follows: (Douglas Montgomary Ch 7 page 228)
79.3, 75.1, 78.2, 74.1, 73.9, 75.0, 77.6, 77.3, 73.8, 74.6, 75.5, 74.0, 74.7,
75.9, 72.9, 73.8, 74.2, 78.1, 75.4, 76.3, 75.3, 76.2, 74.9, 78.0, 75.1, 76.8.
(a) Calculate a point estimate of the mean pull-off force of all connectors in
the population. State which estimator you used and why.
(b) Calculate a point estimate of the pull-off force value that separates the
weakest 50% of the connectors in the population from the strongest 50%.
(c) Calculate point estimates of the population variance and the population
standard deviation.
(d) Calculate the standard error of the point estimate found in part (a). Provide
an interpretation of the standard error.
(e) Calculate a point estimate of the proportion of all connectors in the
population whose pull-off force is less than 73 pounds.
Solution
a) The average of the 26 observations provided can be used as an estimator of the
mean pull force since we know it is unbiased. This value is 75.427 pounds.
b) The median of the sample can be used as an estimate of the point that divides the
population into a “weak” and “strong” half. This estimate is 75.1 pounds.
c) Our estimate of the population variance is the sample variance or 2.214 square
pounds. Similarly, our estimate of the population standard deviation is the sample
standard deviation or 1.488 pounds.
d) The standard error of the mean pull force, estimated from the data provided is
0.292 pounds. This value is the standard deviation, not of the pull force, but of the
mean pull force of the population.
e) Only one connector in the sample has a pull force measurement under 73 pounds.
Our point estimate for the proportion requested is then 1/26 = 0.0385
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (11)
Examples (11)
Data on oxide thickness of semiconductors are as follows:
425, 431, 416, 419, 421, 436, 418, 410, 431, 433, 423, 426,
410, 435, 436, 428, 411, 426, 409, 437, 422, 428, 413, 416.
(a) Calculate a point estimate of the mean oxide thickness for all wafers in the
population.
(b) Calculate a point estimate of the standard deviation of oxide thickness for all
wafers in the population.
(c) Calculate the standard error of the point estimate from part (a).
(d) Calculate a point estimate of the proportion of wafers in the population that have
oxide thickness greater than 430 angstrom.
(Douglas Montgomary Ch 7 page 228)
Practice Problems
(1) If X ~ N (80, 25) Find
a. a point that has 14% area below it
b. a point that has 85.31% area above it
c. a point that has 30.5 % area above it
d. two points symmetrical to mean containing 92% area between them
(2)
If X ~ N(24, 16) Find
a. lower and upper quartiles
b. 37th percentile
c. median
d. mode
(3)
In a sample of 36 observations taken without replacement from normal distribution with mean
98.5 and standard deviation 16.5
(i) what is P[85 <
x < 100 ]
(ii) Find the corresponding probability given a sample of 36.
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (12)
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (13)
is standard normal without considering how small the sample size is.
The probability of falling Z in the interval ( - Z/2 , Z/2 ) is 1- and the corresponding
interval is:
- Z/2 ≤ Z ≤ Z/2
x-
- Z/2 ≤ ≤ Z/2
/ n
- Z / n ≤
/2 x - ≤ Z / n /2
-
x - Z/2 / n ≤ - ≤ -
x + Z/2 / n
x + Z/2 / n ≥ ≥ x - Z/2 / n
i.e.
x - Z / n ≤ ≤ x + Z / n
/2 /2
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (14)
x-
The statistic t = is used.
s/ n
and the corresponding interval is:
- t/2 (v) ≤ t ≤ t/2 (v)
x-
- t/2 (v) ≤ ≤ t/2 (v)
s/ n
-t s/ n ≤
/2 (v) x-≤t /2 (v) s/ n
-
x - t/2 (v) s/ n ≤ - ≤ -
x + t/2 (v) s/ n
x + t/2 (v) s/ n ≥ ≥ x - t/2 (v) / n
i.e.
x-t s/ n ≤ ≤ x+t s/ n
/2 (v) /2 (v)
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
09 : Sampling Techniques and Sampling Distribution (15)
had an average of 39.2 miles per week and a sample standard deviation of 3.2 miles per week.
Construct a 95% confidence interval for the population mean.
Examples (21)
Given the following sample sizes and t values used to construct confidence intervals, find
the corresponding confidence level:
(i). n = 27, t = 2.056
(ii). n = 5, t = 2.132
Examples (22)
For the sample size 10 and confidence level 99%, find the appropriate t value for
constructing confidence intervals. Given the sample size18 and t values t = 2.898 used to
construct confidence intervals, find the corresponding confidence level.
Examples (23)
A population consists of 5 numbers 2, 3, 6, 8 and 9. Consider all possible samples of size 3 that
can be drawn with replacement from this population. Find (a) the mean of the population, (b) the standard
deviation of the population, (c) the mean of the sampling distribution of means and (d) the standard
deviation of the sampling distribution.
Using software Minitab, this question may be solved.
Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore