0% found this document useful (0 votes)
21 views

Topic_10_Point_estmation_of_parameters

This document discusses point estimation in statistical inference, focusing on estimating population parameters using sample data. It explains the concepts of unbiased estimators, variance of estimators, standard error, and mean square error, providing examples and definitions for clarity. The document emphasizes the importance of selecting the best estimator based on statistical properties and precision.

Uploaded by

Bageya Alexis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Topic_10_Point_estmation_of_parameters

This document discusses point estimation in statistical inference, focusing on estimating population parameters using sample data. It explains the concepts of unbiased estimators, variance of estimators, standard error, and mean square error, providing examples and definitions for clarity. The document emphasizes the importance of selecting the best estimator based on statistical properties and precision.

Uploaded by

Bageya Alexis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

H.W.

Kayondo c 1

Point Estimation of Parameters

1 Introduction
In the field of statistical inference, the concern is on making decisions/drawing
conclusions about populations. Statistical inference methods use information
from samples of the underlying populations for drawing conclusions. This
chapter begins the study of statistical methods that are employed for inference
and in decision making.
Statistical inference involves two crucial areas and these are parameter
estimation and hypothesis testing. As an example of a parameter estimation
problem, suppose that a structural engineer is analyzing the tensile strength of a
component used in an automobile chassis. Since variability in tensile strength is
naturally present between the individual components because of differences in
raw material batches, manufacturing processes, and measurement procedures
(for example), the engineer is interested in estimating the mean tensile strength
of the components. In practice, the engineer will use sample data to compute
a number that is in some sense a reasonable value (or guess) of the true mean.
This number is called a point estimate. We will see that it is possible to establish
the precision of the estimate.
Suppose that we are interested in obtaining a point estimate of a population
parameter. We know that before the data is collected, the observations are
considered to be random variables, say X1 , X2 , ..., Xn . Therefore, any function
of the observation, or any statistic, is also a random variable. For example, the
sample mean X̄ and the sample variance S2 are statistics and they are also random
variables. Since a statistic is a random variable, it has a probability distribution.
We call the probability distribution of a statistic a sampling distribution.
When handling statistical inference problems, it is convenient to have a
general symbol to represent the parameter of interest. We normally use the Greek
symbol θ (theta) to represent the parameter. The objective of point estimation is
to select a single number, based on sample data, that is the most plausible value
for θ. A numerical value of a sample statistic is used as the point estimate.
Suppose X is a random variable with probability distribution f (x), charac-
terized by the unknown parameter θ. Suppose further that X1 , X2 , ..., Xn is a
random sample of size n from X, then statistic Θ̂ = h(X1 , X2 , ..., Xn ) is called a
point estimator of θ. It should be noted that Θ̂ is a random variable since it is
a function of random variables. From a given sample, Θ̂ takes on a particular
numerical value θ̂, called the point estimate of θ.
Definition: A point estimate of some population parameter θ is a single
numerical value θ̂ of a statistic Θ̂. The statistic Θ̂ is called the point estimator.
For example, suppose that X is a random variable that is normally distributed
with an unknown mean µ. The sample mean is a point estimator of the unknown
population mean µ. That is, µ̂ = X̄. Having obtained the sample, the numerical
value x̄ is the point estimate of µ. Suppose that x1 = 30, x2 = 35, x3 = 34 and
2 Engineering Mathematics IV– EMT 2201

x4 = 36, the point estimate of µ is:

30 + 35 + 34 + 36
x̄ = = 33.75.
4
Similarly if the population variance σ2 is also unknown, a point estimator
for σ2 is the sample variance S2 . The sample variance is given as:
n
1 X
S2 = (xi − x̄)2
n−1
i=1

Estimation problems occur often in engineering. Engineers often need to


estimate:

• The mean µ of a single population.

• The variance σ2 (or standard deviation σ) of a single population.

• The proportion p of items in a population that belong to a class of interest.

• The difference in means of two populations, µ1 − µ2 .

• The difference in two population proportions, p1 − p2 .

Reasonable point estimates of these parameters are as follows:

• For µ, the estimate is µ̂ = x̄, the sample mean.

• For σ2 , the estimate is σ̂ = s2 , the sample variance.

• For p, the estimate p̂ = nx , the sample pro[portion , where x is the number


of items in a random sample of size n that belong to the class of interest.

• For µ1 − µ2 , the estimate is µˆ1 − µˆ2 = x¯1 − x¯2 , the difference between the
sample means of two independent random samples.

• For p1 − p2 , the estimate pˆ1 − pˆ2 , the difference between two sample
proportions computed from two independent random samples.

Remark: There may be several different choices for the point estimator of a
parameter. For example, if we wish to estimate the mean of a population, we
might consider the sample mean, the sample median, or perhaps the average of
the smallest and largest observations in the sample (range) as point estimators.
In order to decide which point estimator of a particular parameter is the best
one to use, we need to examine their statistical properties and develop some
criteria for comparing estimators.
H.W. Kayondo c 3

2 General concepts of point estimation


2.1 Unbiased Estimators
We say that Θ̂ is an unbiased estimator of θ if the expected value of Θ̂ is equal
to to θ. This is equivalent to saying that the mean of the probability distribution
of Θ̂ (or the mean of the sampling distribution of Θ̂) is equal to θ.
Definition: The point estimator Θ̂ is an unbiased estimator for the parameter
θ if:
E(Θ̂) = θ.
If the estimator is not unbiased, then the difference E(Θ̂) − θ is called the bias of
the estimator Θ̂.
Example: Suppose that X is a random variable with mean µ and variance σ2 .
Let X1 , X2 , ..., Xn be a random sample of size n from a population represented
by X. Show that the sample mean X̄ and the sample variance S2 are unbiased
estimators of µ and σ2 , respectively.

Solution

We shall show that S2 is an unbiased estimator of σ2 and we leave as the exercise


for the reader to show that X̄ is an unbiased estimator for µ.
P n 
 (Xi − X̄)2  n ! n !
1 1
  X X
 i=1
E(S2 ) = E   = E 2
= 2
+ 2
X̄X

(X i − X̄) E (X i X̄ − 2 i )
 n − 1  n − 1 i=1 n−1
i=1
 

It follows that:
 n  n !
1 X  1 X
E(S ) =
2
E
 Xi − nX̄  =
2 2
 2 2
E(Xi ) − nE(X̄ )
n−1  n−1
i=1 i=1

Recall that for the sample X1 , X2 , ..., Xn from a population represented by X


whose mean is µ and variance σ2 , it is true that:

Var(Xi ) = E(Xi2 ) − (E(Xi ))2

It follows that:
σ2 = E(Xi2 ) − µ2
Therefore,
E(Xi2 ) = µ2 + σ2 (1)
We shall see in section 5 for sampling distribution of X̄. For now we can use
the results. For a population represented by X with mean µ and variance σ2 and
a sample X1 , X2 , ..., Xn , then:

σ2
E(X̄) = µ and Var(X̄) =
n
4 Engineering Mathematics IV– EMT 2201

It follows that:
Var(X̄) = E(X̄2 ) − (E(X̄))2
Therefore,
σ2
= E(X̄2 ) − µ2
n
This gives:
σ2
E(X̄2 ) = + µ2 (2)
n
Substituting equations (1) and (2) into the last expression for E(S2 ) gives:
n
σ2
!
1 X 1
E(S ) =
2
(µ + σ ) − n(µ + ) =
2 2 2
(nµ2 + nσ2 − nµ2 − σ2 )
n−1 n n−1
i=1

After simplifying, one obtains:

E(S2 ) = σ2

Therefore, the sample variance S2 is an unbiased estimator of the population


variance σ2 .
n
Exercise: Show that X̄ = n1 Xi is an unbiased estimator for µ.
P
i=1
Remark: We shall not only rely on the property of unbiasedness alone to
select an estimator, we look at other concepts like variance of a point estimator.

2.2 Variance of a point estimator


Suppose that Θ̂1 and Θ̂2 are two unbiased estimators of θ. However the variance
of the two distributions for Θ̂1 and Θ̂2 may be different as illustrated in Figure 1.
Since Θ̂1 has a smaller variance than Θ̂2 , the estimator Θ̂1 is more likely to give
an estimate close to the true value θ. Therefore, when selecting among unbiased
estimators for a given parameter, choose the one with minimum variance.
Definition: If we consider all unbiased estimators of θ, the one with the
smallest variance is called the minimum variance unbiased estimator (MVUE).
Remark: The MVUE is most likely among all unbiased estimators to produce
an estimate θ̂ that is close to the true value of θ.
Theorem: If X1 , X2 , ..., Xn is a random sample of size n from a normal
distribution with mean µ and variance σ2 , the sample mean X̄ is the MVUE for
µ.
In situations in which we do not know whether an MVUE exists, we could
still use a minimum variance principle to choose among competing estimators.
Suppose, for example, we wish to estimate the mean of a population (not
necessarily a normal population). We have a random sample of n observations
X1 , X2 , ..., Xn and we wish to compare two possible estimators for µ: the sampling
mean X̄ and a single observation from the sample, say Xi . Observe that both
X̄ and Xi are unbiased estimators of µ. For X̄, we have Var(X̄) = σn and the
2
H.W. Kayondo c 5

Figure 1: Sampling distributions of two unbiased estimators Θ̂1 and Θ̂2 .

variance of any observation is Var(Xi ) = σ2 . Since Var(X̄) < Var(Xi ) for sample
sizes n ≥ 2, we can conclude that X̄ is a better estimator of µ than a single
observation Xi .

2.3 Standard error of a point estimator


When the numerical value or point estimate of a parameter is reported, it is
usually desirable to give some idea of the precision of estimation. The measure
of precision usually employed is the standard error of the estimator that has
been used.
The standard error of an estimator Θ̂ is its standard deviation,
Definition: q
given by σΘ̂ = Var(Θ̂). If the standard error involves unknown parameters
that can be estimated, substitution of those values into σΘ̂ results in an estimated
standard error, denoted by σ̂Θ̂ .
Note: Sometimes the estimated standard error is denoted by sΘ̂ or se(Θ̂).
Suppose we are sampling from a normal distribution with mean µ and
variance σ2 . The distribution of X̄ is normal with mean µ and variance σn , so the
2

standard error of X̄ is:


σ
σX̄ = √ .
n
If we did not know σ but substituted the sample standard deviation S into
the above formula, the estimated standard error of X̄ would be:

S
σ̂X̄ = √ .
n

Note: When the estimator follows a normal distribution, as in the above


situation, we can be reasonably confident that the true value of the parameter
lies within two standard errors of the estimate. Since many point estimators are
6 Engineering Mathematics IV– EMT 2201

normally distributed (or approximately so) for large n, this is a very useful result.
Even in cases in which the point estimator is not normally distributed, we can
state that so long as the estimator is unbiased, the estimate of the parameter
will deviate from the true value by as much as four standard errors at most 6
percent of the time. Thus a very conservative statement is that the true value of
the parameter differs from the point estimate by at most four standard errors.

Example: An article in the Journal of Heat Transfer (Trans. ASME, Sec. C,


96, 1974, p. 59) described a new method of measuring the thermal conductivity
of Armco iron. Using a temperature of 100◦ F and a power input of 550 watts,
the following 10 measurements of thermal conductivity (in Btu/hr-ft-◦ F) were
obtained:

41.60, 41.48, 42.34, 41.95, 41.86, 42.18, 41.72, 42.26, 41.81, 42.04

A point estimate of the mean thermal conductivity at 100◦ F and 550 watts is
the sample mean given as:

x̄ = 41.924 Btu/hr − f t −◦ F
The standard error of the sample mean is σX̄ = √σn , and since σ is unknown,
we replace it with the sample standard deviation S = 0.284 to obtain the
estimated standard error of X̄ as:

S 0.284
σ̂X̄ = √ = √ = 0.0898.
n 10
Note: The standard error is about 0.2 percent of the sample mean, implying
that we have obtained a relatively precise point estimate of thermal conductivity.
If we can assume that thermal conductivity is normally distributed, 2 times the
standard error is 2σ̂X̄ = 2(0.0898) = 0.1796, and we are highly confident that
the true mean thermal conductivity is with in the interval 41.924 ± 0.1756, or
between 41.744 and 42.104.

2.4 Mean square error of an estimator


Sometimes it is necessary to use a biased estimator. In such cases, the mean
square error of the estimator can be important. The mean square error of an
estimator Θ̂ is the expected squared difference between Θ̂ and θ.
Definition: The mean square error of an estimator Θ̂ of the parameter θ is
defined as MSE(Θ̂) = E(Θ̂ − θ)2 .
The mean square error can be written as follows:

MSE(Θ̂) = E[Θ̂ − E(Θ̂)]2 + [θ − E(Θ̂)]2 = Var(Θ̂) + (bias)2 .

This means that the mean square error of Θ̂ is equal to the variance of
the estimator plus the squared bias. If Θ̂ is an unbiased estimator of θ, the
H.W. Kayondo c 7

mean square error of Θ̂ is equal to the variance of Θ̂. The mean square error
is an important criterion for comparing two estimators. Let Θ̂1 and Θ̂2 be two
estimators of the parameter θ, and let MSE(Θ̂1 ) and MSE(Θ̂2 ) be the mean square
errors of Θ̂1 and Θ̂2 . Then the relative efficiency of Θ̂2 to Θ̂1 is defined as:

MSE(Θ̂1 )
MSE(Θ̂2 )
If this relative efficiency is less than 1, we would conclude that Θ̂1 is more
efficient estimator of θ than Θ̂2 , that is, it has a smaller mean square error.
Remark: Sometimes we find that biased estimators are preferable to unbiased
estimators because they have smaller mean square error. That is, we may be able
to reduce the variance of the estimator considerably by introducing a relatively
small amount of bias. As long as the reduction in variance is greater than the
squared bias, an improved estimator from a mean square error viewpoint will
result. For example, Figure 2 shows the probability distribution of a biased
estimator Θ̂1 that has a smaller variance than the unbiased estimator Θ̂2 . An
estimate based on Θ̂1 would more likely be close to the true value of θ than
would an estimate based on Θ̂2 . An estimator Θ̂ that has a mean square error
that is less than or equal to the mean square error of any other estimator, for all
values of the parameter, is called an optimal estimator of θ. Optimal estimators
rarely exist.

Figure 2: A biased estimator Θ̂1 that has smaller variance than the unbiased
estimator Θ̂2 .

Exercise
1. Suppose we have a random sample of size 2n from a population denoted
by X with E(X) = µ and Var(X) = σ2 . Let

2n n
1 X 1X
X̄1 = Xi and X̄2 = Xi
2n n
i=1 i=1

be two estimators of µ. Which is the better estimator of µ? Justify your


choice.
8 Engineering Mathematics IV– EMT 2201

2. Let X1 , X2 , ..., X7 denote a random sample from a population having a


mean µ and variance σ2 . Consider the following estimators of µ:
X1 + X2 + ... + X7 2X1 − X6 + X4
Θ̂1 = and Θ̂2 =
7 2
(a) Is either estimator unbiased?
(b) Which estimator is better and in what sense?

3. Suppose that Θ̂1 and Θ̂2 are unbiased estimators of the parameter θ. We
know that Var(Θ̂1 ) = 10 and Var(Θ̂2 ) = 4. Which estimator is better and in
what sense is it better?
4. Calculate the relative efficiency of the two estimators given in exercise 2.
5. Calculate the relative efficiency of the two estimators given in exercise 3.

6. Suppose that Θ̂1 and Θ̂2 are estimators of the parameter θ. If we know that
E(Θ̂1 ) = θ, E(Θ̂2 ) = θ/2, Var(Θ̂1 ) = 10 and Var(Θ̂2 ) = 4. Which estimator
is better and in which sense?
7. Suppose that Θ̂1 , Θ̂2 and Θ̂3 are estimators of θ. Suppose we know
that E(Θ̂1 ) = E(Θ̂2 ) = θ, E(Θ̂3 ) , θ, Var(Θ̂1 ) = 12, Var(Θ̂2 ) = 10 and
E(Θ̂3 − θ)2 = 6. Compare these three estimators. Which do you prefer?
Justify your choice.
8. Let three random samples of sizes n1 = 20, n2 = 10 and n3 = 8 be taken
from a population with mean µ and variance σ2 . Let S21 , S22 and S23 be
sample variances. Show that

20S21 + 10S22 + 8S23


S2 =
38
is an unbiased estimator of σ . 2

n
9. (a) Show that n1 (Xi − X̄) is a biased estimator of σ2 .
P
i=1
(b) Find the amount of bias in the estimator.
(c) What happens to the bias as the sample size n increases?
10. Let X1 , X2 , ..., Xn be a random sample of size n from a population with
mean µ and variance σ2 .
(a) Show that X̄2 is a biased estimator for µ2 .
(b) Find the amount of bias in this estimator.
(c) What happens to the bias as the sample size n increases?
11. Let X1 , X2 , ..., Xn be a random sample of size n from a normal distribution
with mean µ and variance σ2 . Let Xmin and Xmax be the smallest and largest
observations in the sample.
H.W. Kayondo c 9

(a) Is (Xmin + Xmax )/2 an unbiased estimate for µ?


(b) What is the standard error of this estimate?
(c) Would this estimate be preferable to the sample mean X̄?

12. Suppose that X is the number of observed "successes" in a sample of n


observations where p is the probability of success on each observation.

(a) Show that P̂ = X


is an unbiased estimator of p.
n
p
(b) Show that the standard error of P̂ is p(1 − p)/n. How would you
estimate the standard error?

13. X̄1 and S21 are sample mean and sample variance from a population with
mean µ1 and variance σ21 . Similarly X̄2 and S22 are sample mean and
sample variance from a second independent population with mean µ2 and
variance σ22 . The sample sizes are n1 and n2 , respectively.

(a) Show that X̄1 − X̄2 is an unbiased estimator of µ1 − µ2 .


(b) Find the standard error of X̄1 − X̄2 . How would you estimate the
standard error?

14. Suppose that both populations in exercise 13 have the same variance
σ21 = σ22 = σ2 . Show that

(n1 − 1)S21 + (n2 − 1)S22


S2p =
n1 + n2 − 2

is an unbiased estimator of σ2 .

3 Methods of point estimation


The definitions of unbiasness and other properties of estimators do not provide
any guidance about how good estimators can be obtained. In this section, we
discuss two methods for obtaining point estimators: the method of moments
and the method of maximum likelihood. Maximum likelihood estimates are
generally preferable to moment estimators because they have better efficiency
properties. However, moment estimators are sometimes easier to compute. Both
methods can be used to obtain unbiased point estimators.

3.1 Method of Moments


The general idea behind the method of moments is to equate population
moments, which are defined in terms of expected values, to the corresponding
sample moments. The population moments will be functions of the unknown
parameters. Then these equations are solved to yield estimators of the unknown
parameters.
10 Engineering Mathematics IV– EMT 2201

Definition: Let X1 , X2 , ..., Xn be a random sample from the probability


distribution f (x),where f (x) can be a discrete probability mass function or
a continuous probability density. The kth population moment (or distribu-
tion moment) is E(Xk ), k = 1, 2, .... The corresponding kth sample moment is
n
(1/n) Xik , k = 1, 2, ....
P
i=1
To illustrate, the first population moment is E(X) = µ, and the first sample
n
moment is (1/n) Xi = X̄. Thus by equating the population and sample
P
i=1
moments, we find that µ̂ = X̄. That is, the sample mean is the moment estimator
of the population mean. In the general case, the population moments will be
functions of the unknown parameters of the distribution, say θ1 , θ2 , ..., θm .
Definition: Let X1 , X2 , ..., Xn be a random sample from either a probability
mass function or probability density function with m unknown parameters
θ1 , θ2 , ..., θm . The moment estimators Θ̂1 , Θ̂2 , ..., Θ̂m are found by equating the
first m population moments to the first m sample moments and solving the
resulting equations for unknown parameters.
Example Suppose that X1 , X2 , ..., Xn is a random sample from an exponential
distribution with parameter λ. There is only one parameter to estimate, so we
must equate E(X) to X̄. For the exponential, E(X) = λ1 . Therefore E(X) = X̄
results in λ1 = X̄, so λ̂ = X̄1 is the moment estimator of λ.

As an illustration of the above example, suppose that the time to failure of


an electronic module used in an auto-mobile engine controller is tested at an
elevated temperature to accelerate the failure mechanism. The time to failure
is exponentially distributed. Eight units are randomly selected and tested,
resulting in the following failure time (in hours):
x1 = 11.96, x2 = 5.03, x3 = 67.40, x4 = 16.07, x5 = 31.50, x6 = 7.73, x7 = 11.10 and
x8 = 22.38. Because x̄ = 21.65, the moment estimate of λ is λ = 1x̄ = 21.65
1
= 0.0462.
Example: Suppose that X1 , X2 , ..., Xn is a random sample from a normal
distribution with parameters µ and σ2 . For the normal distribution E(X) = µ
n
and E(X2 ) = µ2 + σ2 . Equating E(X) to X̄ and E(X2 ) to n1 Xi2 gives:
P
i=1
n
1X 2
µ = X̄, µ2 + σ2 = Xi
n
i=1

Solving these equations using:


n
X n
X
(Xi − X̄) =
2
Xi2 − nX̄2
i=1 i=1

result in the moment estimators as:


n
(Xi − X̄)2
P
i=1
µ̂ = X̄, σˆ2 =
n
H.W. Kayondo c 11

Remark: The moment estimator of σ2 is not an unbiased estimator.


Example Suppose that X1 , X2 , ..., Xn is a random sample from a gamma
distribution with parameters r and λ. For the gamma distribution E(X) = r/λ
and E(X2 ) = r(r + 1)/λ2 . The moment estimators are found by solving:
n
X
r/λ = X̄, r(r + 1)/λ2 = 1/n Xi2
i=1

The resulting estimators are:

X̄2 X̄
r̂ = n
λ̂ = n
(1/n) Xi2 − X̄i2 (1/n) Xi2 − X̄i2
P P
i=1 i=1

For illustration using data, consider the data given in the first example of
8
subsection 3.1, for this data, x̄ = 21.65 and x2i = 6639.40, so the moment
P
i=1
estimates are:

(21.65)2 21.65
r̂ = 2
= 1.3 λ̂ = = 0.0599
(1/8)6639.40 − (21.65) (1/8)6639.40 − (21.65)2

Remark: When r = 1, the gamma distribution reduces to an exponential


distribution. Because r̂ slightly exceeds unity, it is quite possible that either the
gamma or the exponential distribution would provide a reasonable model for
the data.

3.2 Method of Maximum Likelihood


One of the best methods of obtaining a point estimator of a parameter is the
method of maximum likelihood. This technique was developed in the 1920s by
a famous British statistician, Sir R. A. Fisher. As the name implies, the estimator
will be the value of the parameter that maximizes the likelihood function.
Definition: Suppose that X is a random variable with probability distribution
f (x; θ), where θ is a single unknown parameter. Let x1 , x2 , ..., xn be observed
values in a random sample of size n. Then the likelihood function of the sample
is:
L(θ) = f (x1 ; θ) · f (x2 ; θ) · ... · f (xn ; θ)
Remark: The likelihood function is now a function of only the unknown
parameter θ. The maximum likelihood estimator of θ is the value of θ that
maximizes the likelihood function L(θ).
In the case of a discrete random variable, the interpretation of the likelihood
function is clear. The likelihood function of the sample L(θ) is just the probability

P(X1 = x1 , X2 = x2 , ..., Xn = xn ).
12 Engineering Mathematics IV– EMT 2201

That is, L(θ) is just the probability of obtaining the sample values x1 , x2 , ..., xn .
Therefore, in the discrete case, the maximum likelihood estimator is an estimator
that maximizes the probability of occurrence of the sample values.
Example Let X be a Bernoulli random variable. The probability mass function
is:

p (1 − p)1−x for x = 0, 1
( x
f (x; p) =
0, elsewhere.
where p is the parameter to be estimated. The likelihood function of a
random sample of size n is:

L(p) = px1 (1 − p)1−x1 px2 (1 − p)1−x2 · · · pxn (1 − p)1−xn


It follows that:
n n
n P
xi P
Y n− xi
L(p) = xi
p (1 − p) 1−xi
= pi i=1
(1 − p) i=1

i=1

We observe that if p̂ maximizes L(p), p̂ also maximizes ln(L(P)). Therefore,


we obtain:
n
X ! n
X !
ln(L(p)) = xi ln(p) + n − xi ln(1 − p)
i=1 i=1

Upon differentiation with respect to p, we obtain:


!
n n
P
n− xi
P
xi
d ln(L(p)) i=1 i=1
= −
dp p 1−p
Equating above to zero and solving for p yields:
n
X
p̂ = (1/n) xi .
i=1

Therefore, the maximum likelihood estimator of p is:


n
1X
p̂ = Xi .
n
i=1

For illustration purposes, suppose that this estimator was applied to the
following scenario: n items are selected at random from a production line, and
each item is judged as either defective ( in which case we set xi = 1) or non
n
defective ( in which case we set xi = 0). Then xi is the number of defective units
P
i=1
in the sample, and p̂ is the sample proportion defective. The parameter p is the
H.W. Kayondo c 13

population proportion defective; and it seems intuitively quite reasonable to


use p̂ as an estimate of p.
Remark: Although the interpretation of the likelihood function given above
is confined to the discrete random variable case, the method of maximum
likelihood can easily be extended to a continuous distribution. We now give
two examples of maximum likelihood estimation for continuous distributions.
Example Let X be normally distributed with unknown µ and known variance
σ2 . The likelihood function of a random sample of size n, say X1 , X2 , ...Xn is:
n n
1 1 −(1/2σ2 ) (xi −µ)2
Y P
√ e−(xi −µ) /(2σ ) =
2 2
L(µ) = e i=1

i=1 σ 2π (2πσ2 )n/2

Upon taking natural logarithms, we obtain:


n
X
ln(L(µ)) = −(n/2) ln(2πσ ) − (2σ ) 2 2 −1
(xi − µ)2
i=1

After differentiating with respect to µ, one obtains:


n
d ln(L(µ)) X
= (σ2 )−1 (xi − µ)

i=1

Equating this last result to zero and solving for µ yields:


n
P
Xi
i=1
µ̂ = = X̄
n
Remark: The sample mean is the maximum likelihood estimator of µ. This
is identical to the moment estimator.
Example: Let X be exponentially distributed with parameter λ. The likeli-
hood function of a random sample of size n, say X1 , X2 , ..., Xn is:
n n
P
Y −λ xi
L(λ) = λe −λxi
=λ e n i=1

i=1

The log likelihood is:


n
X
ln(L(λ)) = n ln(λ) − λ xi
i=1

Upon differentiating with respect to λ, one obtains:


n
d ln(L(λ)) n X
= − xi
dλ λ
i=1

and upon equating this last result to zero, we obtain:


14 Engineering Mathematics IV– EMT 2201

n 1
λ̂ = n
=
P X̄
xi
i=1

Thus the maximum likelihood estimator of λ is the reciprocal of the sample


mean. Observe that this is the same as the moment estimator.
Remark: The method of maximum likelihood can be used in situations
where there are several unknown parameters, say θ1 , θ2 , ..., θk to estimate. In
such cases, the likelihood function is a function of the k unknown parameters
θ1 , θ2 , ..., θk and the maximum likelihood estimators Θ̂1 , Θ̂2 , ..., Θ̂k would be
found by equating the k partial derivatives ∂L(θ1 , θ2 , ..., θk )/∂θi , i = 1, 2, ..., k to
zero and solving the resulting system of equations.
Example Let X be normally distributed with mean µ and variance σ2 , where
both µ and σ2 are unknown. The likelihood function for a random sample of
size n is:

n n
1 1 −(1/2σ2 ) (xi −µ)2
Y P
√ e−(xi −µ) /(2σ ) =
2 2
L(µσ ) =
2
e i=1

i=1 σ 2π (2πσ2 )n/2

Upon taking natural logarithms, we obtain:

n
X
ln(L(µ, σ2 )) = −(n/2) ln(2πσ2 ) − (2σ2 )−1 (xi − µ)2
i=1

After differentiating with respect to µ and σ2 and equating to zero, one


obtains:

n
∂ ln(L(µ, σ2 )) 1 X
= 2 (xi − µ) = 0
∂µ σ i=1

n
∂ ln(L(µ, σ2 )) n 1 X
=− 2 + 4 (xi − µ)2 = 0
∂σ2 2σ 2σ i=1

The solutions to the above equations yield the maximum likelihood estima-
tors and are given as:

n
1X
µ̂ = X̄ and σˆ2 = (Xi − X̄)2 .
n
i=1

Once again, the maximum likelihood estimators are equal to the moment
estimators.
H.W. Kayondo c 15

Properties of the Maximum Likelihood Estimator


The method of maximum likelihood is often the estimation method that math-
ematical statisticians prefer, because it is usually easy to use and produces
estimators with good statistical properties. We summarize these properties as
follows.
Under very general and not restrictive conditions, when the sample size n is
large and if Θ̂ is the maximum likelihood estimator of the parameter θ,
(1) Θ̂ is approximately unbiased estimator for θ, i.e, E(Θ̂) ≈ θ.
(2) The variance of Θ̂ is nearly as small as the variance that could be obtained
with any other estimator.
(3) Θ̂ has an approximate normal distribution.
Properties 1 and 2 essentially state that the maximum likelihood estimator
is approximately an MVUE. This is a very desirable result and, coupled with
the fact that it is fairly easy to obtain in many situations and has an asymptotic
normal distribution (the "asymptotic" means "when n is large"), explains why
the maximum likelihood estimation technique is widely used. To use maximum
likelihood estimation, remember that the distribution of the population must be
either known or assumed.
To illustrate the "large-sample" or asymptotic nature of the above properties,
consider the maximum likelihood estimator for σ2 , the variance of the normal
distribution done in one of the previous examples. It is easy to show that:

n−1 2
E(σˆ2 ) = σ
n
The bias is:

n−1 2 −σ2
E(σˆ2 ) − σ2 = σ − σ2 =
n n
ˆ
Because the bias is negative, σ tends to underestimate the true variance
2

σ2 . Note that the bias approaches zero as n increases. Therefore, σˆ2 is an


asymptotically unbiased estimator for σ2 .
Complications in Using Maximum Likelihood Estimation: While the
method of maximum likelihood is an excellent technique, sometimes com-
plications arise in its use. For example, it is not always easy to maximize the
likelihood function because the equation(s) obtained from dL(θ)/dθ = 0 may be
difficult to solve. Furthermore, it may not always be possible to use calculus
methods directly to determine the maximum of L(θ).

Exercise
1. Consider the Poisson distribution
x
e−λλ
f (x) =
x!, x = 0, 1, 2, ...
16 Engineering Mathematics IV– EMT 2201

Find the maximum likelihood estimator of λ, based on a random sample


of size n.
2. Consider the shifted exponential distribution

f (x) = λe−λ(x−θ, x≥θ)

When θ = 0, this density reduces to the usual exponential distribution.


When θ > 0, there is only positive probability to the right of θ.
(a) Find the maximum likelihood estimator of λ and θ, based on a
random sample of size n.
(b) Describe a practical situation in which one would suspect that the
shifted exponential is a plausible model.
3. Let X be a random variable with the following probability distribution:

(θ + 1)xθ , for 0 ≤ x ≤ 1
(
f (x) =
0, elsewhere.

Find the maximum likelihood estimator of θ, based on a random sample


of size n.
4. Consider the probability distribution given in Exercise 3. Find the moment
estimator of θ.
5. Let X1 , X2 , ..., Xn be uniformly distributed on the interval 0 to a. Show
that the moment estimator of a is â = 2X̄. Is this an unbiased estimator?
Discuss the reasonableness of this estimator.
6. Consider the probability density function:

1 −x/θ
f (x) =xe , 0 ≤ x ≤ ∞, 0 < θ < ∞.
θ2
Find the maximum likelihood estimator for θ.
7. Consider the probability density function:

f (x) = c(1 + θx), 1 ≤ x ≤ 1.

(a) Find the value of the constant c.


(b) Find both the moment and maximum likelihood estimators for θ.

Sampling is the process of selecting units from a population of interest. By


studying the sample, we may fairly generalize our results back to the population
from which the sample was taken. For example, we might state that the cost of
building a house in Kampala city is between 70M - 120M based on estimates of
5 contractors selected at random from 50 city contractors. There are different
types of sampling techniques such as simple random sampling, stratified random
sampling, cluster sampling and systematic sampling.
H.W. Kayondo c 17

4 Sampling distributions
Consider a population with mean µ and standard deviation σ. If you chose
a sample from this population, we need to study the characteristics of its statistics.

Definition 1 A parameter is any value that describes the characteristics of a population.


For instance, population mean and population variance are all parameters.

Definition 2 A statistic is any value that describes the characteristics of a sample. For
instance, the sample mean, X̄, and the sample variance S2 .

Definition 3 The probability distribution of a statistic is called the sampling distribu-


tion of that statistic.

4.1 Degrees of Freedom


Consider a sample of size n = 4 containing the following data points: x1 =
10, x2 = 12, x3 = 16 and x4 = − − . Given that the sample mean x̄ = 14. Given the
values of three data points and the sample mean, the value of the fourth data
point can be determined.

If only two data points and the sample mean are known, the values of the
remaining two data points cannot be uniquely determined.

The number of degrees of freedom is equal to the total number of measurements


(these are not always raw data points), less the total number of restrictions on
the measurements. A restriction is a quantity computed from the measurements.

The sample mean is a restriction on the sample measurements, so after calculating


the sample mean there are only (n − 1) degrees of freedom remaining with which
to calculate the sample variance. The sample variance is based on only (n − 1)
(x−x̄)2
P
free data points: s = n−1 .
2

Example 1 A company manager has a total budget of 150, 000 to be completely allocated
to four different projects. How many degrees of freedom does the manager have?
x1 + x2 + x3 + x4 = 150, 000. A fourth project’s budget can be determined from the total
budget and the individual budgets of the other three. For example, if x1 = 40, 000, x2 =
30, 000, x3 = 50, 000, then x4 = 150, 000 − 40, 000 − 30, 000 − 50, 000 = 30, 000. So
there are (n − 1) = 3 degrees of freedom.

5 Sampling Distributions of the Mean


Suppose that a random sample of size n is taken from a normal population
with µ and variance σ2 . Then each observation Xi , i = 1, 2, . . . , n of the random
18 Engineering Mathematics IV– EMT 2201

sample will have the same normal distribution as the population being sampled
and the mean of these observations is:
n
1 1X
X̄ = (X1 + X2 + ... + Xn ) = Xi
n n
i=1
σ2
has normal distribution with mean µX̄ = µ and variance σ2X̄ = n. Indeed, it is
σ2
important to prove that the mean of X̄ = µ and and the variance of X̄, σ2X̄ = n.
Thus,
n n n
1X 1X 1X 1
µX̄ = E(X̄) = E[ Xi ] = [E(Xi )] = µ = .nµ = µ
n n n n
i i=1 i=1

and
n n n
1X 1 X 1 X 2 1 σ2
Var(X̄) = σ2X̄ = Var[ Xi ] = 2 Var(Xi ) = 2 σ = 2 .nσ2 =
n n i=1 n i=1 n n
i=1

Definition 1 The standard deviation of the mean X̄ of a sample of size n


defined by √σn is called the Standard error of the sample mean.
Lets consider a population of size four consisting of values 0,1,2 and 3. Clearly
the four observations making up the population are values of a random variable
X having the probability distribution:
1
f (x) =
, for x = 0, 1, 2, 3
4
3
The population mean µ = E(X) = x f (x) = 0+1+2+3 = 3
P
4 2 and
x=0
3
Var(X) = E(X2 ) − [E(X)]2 = x2 f (x) − 9
= 1+4+9 9
= 54 .
P
4 4 − 4
x=0

Suppose that we take parts of the population of size 2. If these subjects


of the population are chosen in such a way that each element in the population
has an equal chance of being selected and that each subset is equally likely to be
selected, a selected subset is called a random sample.

If we enumerate all samples of size two with out replacement in a popula-


tion of four objects, and for each sample, we compute its mean. The following
table shows the possible samples and their computed sample means.

No. Sample Sample mean x̄ Sample Sample mean x̄


1 (0,1) 0.5 7 (2,0) 1.0
2 (0,2) 1.0 8 (2,1) 1.5
3 (0,3) 1.5 9 (2,3) 2.5
4 (1,0) 0.5 10 (3,0) 1.5
5 (1,2) 1.5 11 (3,1) 2.0
6 (1,3) 2.0 12 (3,2) 2.5
H.W. Kayondo c 19

Observe from the table above that the sample mean varies from sample to
sample. Thus, the sample mean is a random variable, denoted by X̄. The value
of the random variable X̄ for a given sample is denoted by x̄. The sample mean,
like the sample median or sample variance is a statistic. A statistic is a random
variable that depends only on the observed sample but not on the population
of interest. Since a statistic is a random variable, it must have a probability
distribution.

The table below shows the sampling distribution of the sample mean X̄.

x̄ 0.5 1.0 1.5 2.0 2.5


1 1 1 1 1
P(x̄) 6 6 3 6 6

The mean of the sample mean E(X̄) = x̄ f (x̄) = 0.5 × 16 + 1 × 16 + 1.5 × 26 +


P

2 × 16 + 2.5 × 16 = 32 Note that E(X̄) (the mean of the sample mean) is equal to the
population mean and it is generally true. Thus E(X̄) = µ

5.1 Variance
Var(X̄) = E(X̄2 ) − [E(X̄)]2 . E(X̄2 ) = 0.52 × 16 + 1 × 61 + 1.52 × 26 22 × 16 + 2.52 × 16 = 15.5
6 .
Therefore, Var(X̄) = 16.06 − 9
4 = 5
12 .
It can be shown that Var(X̄) = σn . N−n
2
N−1 where σ is the population variance, N the
2

population size, and n is the sample size.


q
The standard deviation of the sample mean is given by √σn N−n N−1 . This is called
the standard error of the sample mean X̄ when sampling without replacement.
If we sample with replacement, the variance of the sample mean is given by σn .
2

Exercise
From the previous example, list all samples of size two drawn with replacement
and verify that E(X̄) = µ and Var(X̄) = σn where µ is the population mean and
2

σ2 is the population variance.

Note
Recall that N−n
N−1 is called the finite population correction which is approximately
equal to 1 if n, the sample size is too small compared to the population size
N. Thus if N  n, the variance of the distribution of the sample mean X̄ when
sampling without replacement can be approximated by σn which is the same as
2

the variance of the distribution of X̄ when sampling with replacement.


20 Engineering Mathematics IV– EMT 2201

5.2 Sampling Error of X̄


Var(X̄) = σn . S.error(X̄) = √σn . Differences arising from taking sample rather
2

than the whole population are referred to as Sampling errors. The standard
error of the sample mean depends on the population variance σ2 and the sample
size. The Sampling errors are smaller if you sample from a population with
small variance and if the sample size is large.

The Central Limit Theorem I


For a general population with mean µ and variance σ2 , the distribution of the
sample mean will be approximately Normal with mean µX̄ = µ and standard
deviation σX̄ = √σn that is, X̄ ∼ N(µ, σn ).
2

The theorem states that if all possible random samples of size n are drawn
with replacement from a finite population of size N with mean µ and standard
deviation σ, then for n sufficiently large, the sampling distribution of the mean
X̄ will be approximately normally distributed with mean µX̄ = µ and standard
x̄−µ
deviation σX̄ = √σn . Hence z = σ/ √n is a value of a standard normal variable Z.
Thus we can calculate for any population, approximate probability P(a <
x̄ < b) using the normal distribution. The normal approximation above will be
good if n ≥ 30 for any finite population. If n < 30, the approximation is good
only if the population is not too different from the normal population. If the
population is known to be normal, the sampling distribution of X̄ will follow a
normal distribution exactly, no matter how small the size of the samples.
Example 2 A soft drink machine is set so that the amount of drink dispensed is a
random variable with mean 200ml and a standard deviation of 15ml. What is the
probability that the average amount dispensed in a random sample of size 36 is at least
204ml.
Solution
n=36, µ = 200, σ = 15. By the CLT, the distribution of X̄ is approximately normal
with mean µx̄ = 200 and S.D σx̄ = √15 = 2.5.
36
Therefore P(X̄ ≥ 204) = P( X̄−200
2.5 ≥
204−200
2.5 ) = P(Z ≥ 1.6) = 0.0548.
Example 3 Given the population 1, 1, 1, 3, 4, 5, 6, 6, 6, and 7. Find the probability that
a random sample of size 36, selected with replacement, will yield a sample mean greater
than 3.8 but less than 4.5 if the mean is measured to the nearest tenth.
Solution
The probability distribution of our population may be written as:

x 1 3 4 5 6 7
P(X = x) 0.3 0.1 0.1 0.1 0.3 0.1
H.W. Kayondo c 21

xp(x) = 1∗0.3+3∗0.1+4∗0.1+5∗0.1+6∗0.3+7∗0.1 = 4.
P
The mean is given by P
The variance is given by x2 p(x) − ( xp(x))2 = 12 ∗ 0.3 + 32 ∗ 0.1 + 42 ∗ 0.1 +
P
52 ∗ 0.1 + 62 ∗ 0.3 + 72 ∗ 0.1 − 16 = 5. The sampling distribution of X̄ may be
approximated by the normal distribution with mean µX̄ = µ = 4 and variance
σ2X̄ = σn = 36
2
5
. Thus, standard deviation σX̄ = 0.373. Therefore, the probability
that X̄ is greater than 3.8 and less than 4.5 is given by the area shown below

σX̄ = 0.37

3.85 4 4.45
The z values that correspond to x¯1 = 3.85 and x¯2 = 4.45 are z1 = 3.85−4
0.373 = −0.40
and z2 = 0.373 = 1.21. Therefore, P(3.8 < X̄ < 4.5) ≈ P(−0.40 < Z < 1.21 = P(Z <
4.45−4

1.21) − P(Z < −0.40) = 0.8869 − 0.3446 = 0.5423

6 Sampling Distribution of the Differences of Means


Suppose that we have two populations, with the first having mean µ1 and
variance σ21 . Suppose further that the mean and variance for the second
population is µ2 and σ22 , respectively. Let the variable X̄1 represent the means
of random samples of size n1 drawn the first population and the variable X̄2
represent the means of random samples of size n2 drawn the second population,
independent of the samples from the first population. The distribution of the
differences x̄1 − x̄2 between the two sets of independent sample means is known
as the sampling distribution of the statistic X̄1 − X̄2 .

Example 4 Let the first population consist of 3, 5 and 7, while the second population
comprise of 0, 4 and 8. Derive the sampling distribution for X̄1 − X̄2 .

Solution

It follows that the population means and1 variances are:


µ1 = 3+5+7
3 =5
(3−5)2 +(5−5)2 +(7−5)2
σ21 = 3 = 8/3 = 2.6667
µ2 = 0+4+8
3 = 4
(0−4)2 +(4−4)2 +(8−4)2
σ22 = 3 = 32/3 = 10.6667
We then draw all possible samples of size n1 = 2 with replacement from the first
population, and for each sample, the mean x̄1 is computed. Similarly for the
second population, samples of size n2 = 2 are drawn with replacement and the
corresponding x̄2 for each sample is computed.

The possible samples from the two populations with their means are given in
Table 1. There are 81 possible differences (x̄1 − x̄2 ) and these are shown in Table
2. The frequency distribution of X̄1 − X̄2 is given in Table 3 with corresponding
probability histogram shown in Figure 3.
22 Engineering Mathematics IV– EMT 2201

It can observed from the probability histogram in Figure 3, that the sam-
pling distribution of X̄1 − X̄2 may be approximated by a normal curve. This
approximation improves as n1 and n2 increase.
From Table 3, the sampling distributions for X̄1 and X̄2 were obtained and
are shown in Table 4 and from this table, we obtained the quantities:
µX̄1 = 5, µσX̄1 = 1.3333, µX̄2 = 4, µσX̄2 = 5.3333.
From Table 3, we obtained µX̄1 −X̄2 = 1 and σ2X̄ −X̄ = 6.6667.
1 2
From the computations done, we observe that:
σ21 σ22
µX̄1 −X̄2 = µx̄1 − µx̄2 = µ1 − µ2 and σ2X̄ = σ2x̄1 + σ2x̄2 = n1 + n2
1 −X̄2

Remark
For the sampling distribution of X̄1 − X̄2 , it is enough to stop your computation
on Table 3.

Figure 3: Probability histogram of X̄1 -X̄2 with replacement.

The Central Limit Theorem II


If independent samples of sizes n1 and n2 are drawn at random from two
populations, discrete or continuous, with means µ1 and µ2 and variances σ21
and σ22 , respectively, then the sampling distribution of the differences of means
X̄1 − X̄2 is approximately normally distributed with mean µX̄1 −X̄2 = µ1 − µ2 and
H.W. Kayondo c 23

Table 1: Means of Random Samples with Replacement from Two Finite Popula-
tions.
From Population 1 From Population 2
No. Sample x̄1 No. Sample x̄2
1 3, 3 3 1 0, 0 0
2 3, 5 4 2 0, 4 2
3 3, 7 5 3 0, 8 4
4 5, 3 4 4 4, 0 2
5 5, 5 5 5 4, 4 4
6 5, 7 6 6 4, 8 6
7 7, 3 5 7 8, 0 4
8 7, 5 6 8 8, 4 6
9 7, 7 7 9 8, 8 8

σ21 σ22
variance σ2X̄ = n1 + n2 . Hence
1 −X̄2

(X̄1 − X̄2 ) − (µ1 − µ2 )


Z= q 2
σ σ2
( n11 + n22 )

is the value of the standard normal random variable Z.


Example 5 A sample size of n1 = 5 is drawn from a population that is normally
distributed with mean µ1 = 50 and variance σ21 = 9 and the sample mean X̄1 is recorded.
A second random sample of size n2 = 4 is drawn, independent of the first from a different
population that is normally distributed with mean µ2 = 40 and variance σ22 = 4, and
the sample mean X̄2 is recorded. Find P(X̄1 − X̄2 < 8.2).

Solution

From the sampling distribution of X̄1 − X̄2 , we have µX̄1 −X̄2 = µ1 − µ2 =


σ2 σ22
50 − 40 = 10 and σ2X̄ −X̄ = n11 + n2 = 9
5 + 4
4 = 2.8. Therefore, corresponding to the
1 2
value X̄1 − X̄2 = 8.2 you have

(X̄1 − X̄2 ) − (µ1 − µ2 ) 8.2 − 10


z= q 2 = √ = −1.08
σ1 σ22 2.8
n1 + n2

Thus, P(X̄1 − X̄2 < 8.2) = P(Z < −1.08) = 0.1401

Example 6 Phones from manufacturer A have a mean guarantee period of 6.5 years and
standard deviation of 0.9 years, while those of manufacturer B have a mean guarantee
24 Engineering Mathematics IV– EMT 2201

Table 2: Differences of Independent Means.


x̄1
x̄2 3 4 5 4 5 6 5 6 7
0 3 4 5 4 5 6 5 6 7
2 1 2 3 2 3 4 3 4 5
4 −1 0 1 0 1 2 1 2 3
2 1 2 3 2 3 4 3 4 5
4 −1 0 1 0 1 2 1 2 3
6 −3 −2 −1 −2 −1 0 −1 0 1
4 −1 0 1 0 1 2 1 2 3
6 −3 −2 −1 −2 −1 0 −1 0 1
8 −5 −4 −3 −4 −3 −2 −3 −2 −1

Table 3: Sampling Distribution of X̄1 − X̄2 with Replacement.


x̄1 − x̄2 f f (x̄1 − x̄2 ) x̄1 − x̄2 f f (x̄1 − x̄2 )
1 10
−5 1 81
2 10 81
2 10
−4 2 81
3 10 81
5 6
−3 5 81
4 6 81
6 5
−2 6 81
5 5 81
10 2
−1 10 81
6 2 81
10 1
0 10 81
7 1 81
13
f = 81 f (x̄1 − x̄2 ) = 1
P P
1 13 81

period of 6.0 years and a standard deviation of 0.8 years. What is the probability that a
random sample of 36 phones from manufacturer A will have a mean guarantee period
that is at least one year more than the mean guarantee period of a sample of 49 phones
from manufacturer B?

Solution

From the sampling distribution of X̄1 − X̄2 , we have µX̄1 −X̄2 = µ1 − µ2 = 6.5 − 6.0 =
q 2
σ σ2
q
0.5 and σX̄1 −X̄2 = n11 + n22 = 0.81
36 + 49 = 0.189. Since X̄1 − X̄2 = 1, then
0.64

(X̄1 − X̄2 ) − (µ1 − µ2 ) 1.0 − 0.5


z= = = 2.65
0.189
q 2
σ1 σ22
n1 + n2

Thus, P(X̄1 − X̄2 > 1.0) = P(Z > 2.65) = 1 − P(Z < 2.65) = 1 − 0.9960 = 0.0040

Remark
When sampling with replacement, the mean of the sample mean µX̄ = µ and
the variance of the sample mean σ2X̄ = σn . When sampling without replacement
2
H.W. Kayondo c 25

Table 4: Sampling Distribution of X̄1 and X̄2 .


X̄1 = x̄1 3 4 5 6 7
1 2 1 2 1
f (x̄1 ) 9 9 3 9 9
X̄2 = x̄2 0 2 4 6 8
1 2 1 2 1
f (x̄2 ) 9 9 3 9 9

the mean of the sample mean µX̄ = µ and the standard deviation of the sample
q
mean σX̄ = √σn N−n
N−1 , where N is the population size and n is the sample size.
q
We define the factor N−nN−1 as the finite population correction factor.

The Student t-distribution


X̄−µ
The CLT tells us that if X ∼ N(µ, σ2 ) then Z = √σ
is approximately normal with
n
mean 0 and variance 1. In most cases though, the variance of the population
from which we select our sample is unknown. When the population variance is
unknown, we use the sample variance S2 as the best estimate for the population
variance. There are two cases and these include:
X−µ
(i) The first case is when n ≥ 30. For n ≥ 30, the statistic √S
is approximately
n
normal with mean 0 and variance 1. It is important to recall that for a
n
sample of size n, the sample variance is given by S2 = n−1
1 P
(Xi − X̄)2 .
i=1

(ii) For samples of size n < 30, the values of S2 fluctuate considerably from
sample to sample, and as such, by estimating σ2 with S2 , the values of Z are
no longer normal. This short coming is resolved by using a distribution
called the t-distribution.

Definition 2 If X̄ and S2 are the mean and variance, respectively, of


a random sample of size n taken from a population that is normally
distributed with mean µ and the unknown variance σ2 , then

X̄ − µ
t=
√S
n

is the value of a random variable T having the t-distribution, with


ν = n − 1 degrees of freedom.

The probability that a random sample produces a t value falling between any
two specific values is equal to the area under the curve of the t-distribution
between the two ordinates corresponding to the spesified values.
26 Engineering Mathematics IV– EMT 2201

α α
t1−α = −tα 0

Example 7 For example, the t value with 10 degrees of freedom leaving an area of
0.025 to the right if t0.025 = 2.228. Since the t distribution is symmetric about a mean of
zero, we have t1−α = −tα ; that is, the t value leaving an area of 1 − α to the right and
therefore an area of α to the left is equal to the negative t value that leaves an area of α to
the right tail of the distribution, see figure above. For a t distribution with 10 degrees
of freedom we have t0.975 = −t0.025 = −2.228. This means that the t value of a random
sample of size 11 selected from a normal population, will fall between -2.228 and 2.228,
with probability 0.95.

Example 8 Find k such that P(k < T < −1.761) = 0.045, for a random sample of size
15 selected from a normal distribution.

Solution

Note from the tables that t0.05 = 1.761 when ν = 14. Therefore, −t0.05 = −1.761.
Since k is to the left of −t0.05 = −1.761, let k = −tα . Then, 0.045 = 0.05 − α and
α = 0.005. Therefore, with ν = 14 and from the tables, k = −t−0.005 = −2.977.
Therefore, P(−2.977 < T < −1.761) = 0.045.

Example 9 A manufacturer of batteries claims that they will last on average 500 hours.
To maintain this average, he tests 25 batteries 1
each month. If the computed t value falls
between −t0.05 and t0.05 , he is satisfied with his claim. What conclusion should he draw
from a sample that has a mean x̄ = 518 and a standard deviation s = 40 hours if the
distribution of the battery times is approximately normal.

Solution

From the tables, t0.05 = 1.711 for ν = 24. Therefore, the manufacturer is
satisfied with his claim if a sample of 25 bulbs yields a t value between −1.711
and 1.711. If µ = 500, then

518 − 500
t= √ = 2.25,
40/ 25

a value above 1.711. Now, P(T > 2.25) = 0.02. If µ > 500, the value of t computed
from the sample would be more reasonable. Therefore, the manufacturer is
likely to conclude that his batteries are a better product than he thought.

The Chi-square distribution


The probability for a random variable having a chi-square distribution is equal
to the area under a chi-square curve.
H.W. Kayondo c 27

Definition 3 If S2 is the variance of a random sample of size n taken from a


(n−1)S2
normal population having variance σ2 , then σ2 is a value of the random
variable χ2 having the chi-square distribution with ν = n − 1 degrees of
freedom.

The probability that a random sample produces a χ2 value greater than some
specified value is equal to the area under the curve to the right of this value.
Thus, χ2α represent the χ2 value above which we find an area α as shown below.

α
-
0 χ2α χ2

Example 10 The value of χ20.05 for ν = 7 is 14.067 which is an area of 0.05 to the right
of the χ2 value. Due to lack of symmetry, values for χ21−α are provided in most cases.
For instance, χ20.95 = 2.167

Exercise
1. Random samples of size 2 are drawn, with replacement from the finite
population consisting of 2, 4 and 6.

(a) Assuming that all possible samples are equally likely to occur, con-
struct the sampling distribution of X̄.
(b) Draw a probability histogram for the sampling distribution of X̄.
σ2
(c) Verify that µX̄ = µ and σ2X̄ = n.

2. Random samples of size 2 are drawn without replacement from a finite


population comprising of 1, 2, 3 and
1
4.

(a) Assuming that all possible samples are equally likely to occur, con-
struct the sampling distribution of X̄.
√ p
(b) Verify that µX̄ = µ and σX̄ = (σ/ n) (N − n)/(N − 1).

3. Marks of students in one of the tests for Elements of Probability and


Statistics had a mean of 78% and a standard deviation of 5.6%. Assuming
that the population is finite, how is the standard error of the mean changed
when the sample size is:

(a) increased from 49 to 225.


(b) decreased from 900 to 64..
28 Engineering Mathematics IV– EMT 2201

4. If the standard error of the mean for the sampling distribution of random
samples of size 64 from a large or infinite population is 1.8, how large
must the size of the sample become if the standard error is reduced to 0.8?

5. Find the value of the finite population correction factor when:

(a) n = 4 and N = 16.


(b) n = 30 and N = 1000.
(c) n = 50 and N = 20000.
(d) n = 100 and N = 1000000.

6. If all possible samples of size 25 are drawn from a normal population with
mean equal to 60 and a standard deviation of 5.5, what is the probability
that a sample mean X̄ will fall in the interval from µX̄ − 2.5σX̄ to µX̄ + 1.5σX̄ ?

7. If all possible samples of size 5 are drawn from a population with out
replacement. The mean and standard deviation for the population equal to
20 and 2.5, respectively. Given that N = 50 for the population, what is the
probability that a sample mean X̄ will fall in the interval from µX̄ + 0.5σX̄
to µX̄ + 3.5σX̄ ?

8. A soft-drink machine is regulated so that the amount of soda dispensed in


the 300 millilitre bottles averages 299 millilitres with a standard deviation
of 10 millilitres. Regularly, the the machine is inspected by taking a random
sample of 100 packed bottles and their average soda content computed.
If the mean of the 100 bottles is a value within the interval µX̄ ± 2.75σX̄ ,
the machine is thought to be operating well and therefore no action is
required. In one of quality assurance tests, an official found the mean
of 100 packed bottles to be x̄ = 300.5 millilitres and concluded that the
machine needed no action to be taken. Was this a reasonable decision?

9. The marks of 500 students in EPS examination done last year was approxi-
mately normally distributed with a mean of 72% and a standard deviation
of 8%. If 80 random samples of size 16 are drawn from this population
and the means recorded, determine:

(a) the mean and standard error of the sampling distribution of X̄.
(b) the number of sample means that fall between 68% and 76%.
(c) the probability that the sample means fall below 62%.

10. The random variable X representing the number of children in a household


has the following probability distribution:

(a) Find the mean µ and the variance σ2 of X.


(b) Find the mean µX̄ and variance σX̄ of the mean X̄ for random samples
of 36 households.
H.W. Kayondo c 29

x 0 1 2 3 4 5
P(X=x) 0.15 0.1 0.3 0.2 0.1 0.15

(c) Find the probability that the average number of children in 36


households will be less than 3.

11. Let X̄1 represent the mean of a sample of size n1 = 2, selected with
replacement from a finite population −1, 0, 1 and 2. Similarly, let X̄2
represent the mean of a sample of size n2 = 2, selected with replacement
from a finite population −1 and 2.

(a) Assuming that all the 64 possible differences x̄1 − x̄2 are equally likely
to occur, construct the sampling distribution of X̄1 − X̄2 .
(b) Construct a probability histogram for the sampling distribution of
X̄1 − X̄2 .
(c) Verify that µX̄1 −X̄2 = µ1 − µ2 and σ2X̄ = σ21 /n1 + σ22 /n2 .
1 −X̄2

12. A random sample of size 25 is taken from a normal distribution population


having a mean of 100 and a variance of 25. A second random sample of
size 36 is taken from a different normal population having a mean of 90
and a variance of 9. Find the probability that the sample mean computed
from the first sample will exceed the sample mean from the second random
sample by at least 8 but less than 11.

13. A market researcher studying the length of time shoppers spend in two
shopping malls observes a sample of 64 shoppers in each mall. The mean
time spent by shoppers in mall A is 50 minutes, while the mean time for
sample of shoppers in mall B is 46 minutes. What is the probability of
observing mean sample difference (X̄A − X̄B ) this large or larger if there is
no difference in the true mean time spent by shoppers in the two malls
and the standard deviation is 13 minutes for both populations?

14. An EPS test was given to two groups of students, i.e group 1 and group
2. The scores for group 1 were normally distributed with mean and
variance of 60 and 100, respectively. Scores for group 2 were also normally
distributed with a mean of 50 and a variance of 121. A random sample of
10 students is selected from group 1 and an independent random sample of
size 11 is selected from group 2. What is the probability that the difference
between sample means (x̄1 − x̄2 ) is:

(a) greater than 16.


(b) between 6 and 18.
30 Engineering Mathematics IV– EMT 2201

15. The mean score for an aptitude test done by students intending to do laws
at a certain university was 560 with a standard deviation of 55. What is
the probability that two groups of students selected at random, consisting
of 40 and 55 students, respectively, will differ in their mean scores by:

(a) more than 25 points.


(b) any amount between 15 and 35.

16. Work out the following:

(a) t0.05 when v = 11.


(b) −t0.025 when v = 28.
(c) t0.95 when v = 5.
(d) −t0.95 when v = 16.
(e) −t0.995 when v = 2.

17. Evaluate the following probabilities:

(a) P(T < 1.812) when v = 10.


(b) P(−1.383 < T < 2.821) when v = 9.
(c) P(−2.447 < T < −1.440) when v = 6.
(d) P(1.325 < T < 2.845) when v = 20.
(e) P(T > −1.782) when v = 12.

18. Evaluate the following probabilities:

(a) P(−t0.01 < T < t0.01 ).


(b) P(T > −t0.005 ).
(c) P(t0.025 < T < t0.01 ).
(d) P(−t0.01 < T < −t0.1 ).
(e) P(−t0.05 < T < t0.01 ).

19. Given a random sample of size 16 from a normal distribution, find k such
that:

(a) P(−1.341 < T < k) = 0.89.


(b) P(2.131 < T < k) = 0.015.
(c) P(−2.947 < T < −k) = 0.095.
(d) P(−k < T < k) = 0.95.
(e) P(T > k) = 0.01.
H.W. Kayondo c 31

20. A researcher claims that the average age for first year students offering civil
engineering in a certain university is 19.0 years. To verify this claim, he
samples 20 students and computes both the sample mean and the sample
standard deviation and finally computes the t statistic. He concludes
that the claim is true if the computed statistic lies between −t0.05 and t0.05 .
Given that the sample for 20 first year students consists of the following:
18.0, 18.4, 20.0, 19.2, 21.0 18.3, 18.9, 19.4, 18.0, 18.5, 20.2, 19.2, 18.0, 19.0,
21.0, 19.5, 18.0, 20.4, 17.9, 22.0.
What conclusion can this researcher make?
32 Engineering Mathematics IV– EMT 2201
H.W. Kayondo c 33

Table A.4 Student t-Distribution Probability Table 737

α
Table A.4 Critical Values of the t-Distribution 0 tα

α
v 0.40 0.30 0.20 0.15 0.10 0.05 0.025
1 0.325 0.727 1.376 1.963 3.078 6.314 12.706
2 0.289 0.617 1.061 1.386 1.886 2.920 4.303
3 0.277 0.584 0.978 1.250 1.638 2.353 3.182
4 0.271 0.569 0.941 1.190 1.533 2.132 2.776
5 0.267 0.559 0.920 1.156 1.476 2.015 2.571
6 0.265 0.553 0.906 1.134 1.440 1.943 2.447
7 0.263 0.549 0.896 1.119 1.415 1.895 2.365
8 0.262 0.546 0.889 1.108 1.397 1.860 2.306
9 0.261 0.543 0.883 1.100 1.383 1.833 2.262
10 0.260 0.542 0.879 1.093 1.372 1.812 2.228
11 0.260 0.540 0.876 1.088 1.363 1.796 2.201
12 0.259 0.539 0.873 1.083 1.356 1.782 2.179
13 0.259 0.538 0.870 1.079 1.350 1.771 2.160
14 0.258 0.537 0.868 1.076 1.345 1.761 2.145
15 0.258 0.536 0.866 1.074 1.341 1.753 2.131
16 0.258 0.535 0.865 1.071 1.337 1.746 2.120
17 0.257 0.534 0.863 1.069 1.333 1.740 2.110
18 0.257 0.534 0.862 1.067 1.330 1.734 2.101
19 0.257 0.533 0.861 1.066 1.328 1.729 2.093
20 0.257 0.533 0.860 1.064 1.325 1.725 2.086
21 0.257 0.532 0.859 1.063 1.323 1.721 2.080
22 0.256 0.532 0.858 1.061 1.321 1.717 2.074
23 0.256 0.532 0.858 1.060 1.319 1.714 2.069
24 0.256 0.531 0.857 1.059 1.318 1.711 2.064
25 0.256 0.531 0.856 1.058 1.316 1.708 2.060
26 0.256 0.531 0.856 1.058 1.315 1.706 2.056
27 0.256 0.531 0.855 1.057 1.314 1.703 2.052
28 0.256 0.530 0.855 1.056 1.313 1.701 2.048
29 0.256 0.530 0.854 1.055 1.311 1.699 2.045
30 0.256 0.530 0.854 1.055 1.310 1.697 2.042
40 0.255 0.529 0.851 1.050 1.303 1.684 2.021
60 0.254 0.527 0.848 1.045 1.296 1.671 2.000
120 0.254 0.526 0.845 1.041 1.289 1.658 1.980
∞ 0.253 0.524 0.842 1.036 1.282 1.645 1.960
34 Engineering Mathematics IV– EMT 2201

738 Appendix A Statistical Tables and P

Table A.4 (continued) Critical Values of the t-Distribution


α
v 0.02 0.015 0.01 0.0075 0.005 0.0025 0.0005
1 15.894 21.205 31.821 42.433 63.656 127.321 636.578
2 4.849 5.643 6.965 8.073 9.925 14.089 31.600
3 3.482 3.896 4.541 5.047 5.841 7.453 12.924
4 2.999 3.298 3.747 4.088 4.604 5.598 8.610
5 2.757 3.003 3.365 3.634 4.032 4.773 6.869
6 2.612 2.829 3.143 3.372 3.707 4.317 5.959
7 2.517 2.715 2.998 3.203 3.499 4.029 5.408
8 2.449 2.634 2.896 3.085 3.355 3.833 5.041
9 2.398 2.574 2.821 2.998 3.250 3.690 4.781
10 2.359 2.527 2.764 2.932 3.169 3.581 4.587
11 2.328 2.491 2.718 2.879 3.106 3.497 4.437
12 2.303 2.461 2.681 2.836 3.055 3.428 4.318
13 2.282 2.436 2.650 2.801 3.012 3.372 4.221
14 2.264 2.415 2.624 2.771 2.977 3.326 4.140
15 2.249 2.397 2.602 2.746 2.947 3.286 4.073
16 2.235 2.382 2.583 2.724 2.921 3.252 4.015
17 2.224 2.368 2.567 2.706 2.898 3.222 3.965
18 2.214 2.356 2.552 2.689 2.878 3.197 3.922
19 2.205 2.346 2.539 2.674 2.861 3.174 3.883
20 2.197 2.336 2.528 2.661 2.845 3.153 3.850
21 2.189 2.328 2.518 2.649 2.831 3.135 3.819
22 2.183 2.320 2.508 2.639 2.819 3.119 3.792
23 2.177 2.313 2.500 2.629 2.807 3.104 3.768
24 2.172 2.307 2.492 2.620 2.797 3.091 3.745
25 2.167 2.301 2.485 2.612 2.787 3.078 3.725
26 2.162 2.296 2.479 2.605 2.779 3.067 3.707
27 2.158 2.291 2.473 2.598 2.771 3.057 3.689
28 2.154 2.286 2.467 2.592 2.763 3.047 3.674
29 2.150 2.282 2.462 2.586 2.756 3.038 3.660
30 2.147 2.278 2.457 2.581 2.750 3.030 3.646
40 2.123 2.250 2.423 2.542 2.704 2.971 3.551
60 2.099 2.223 2.390 2.504 2.660 2.915 3.460
120 2.076 2.196 2.358 2.468 2.617 2.860 3.373
∞ 2.054 2.170 2.326 2.432 2.576 2.807 3.290
H.W. Kayondo c 35

Table A.5 Chi-Squared Distribution Probability Table 739

χα2
Table A.5 Critical Values of the Chi-Squared Distribution 0

α
v 0.995 0.99 0.98 0.975 0.95 0.90 0.80 0.75 0.70 0.50
1 0.04 393 0.03 157 0.03 628 0.03 982 0.00393 0.0158 0.0642 0.102 0.148 0.455
2 0.0100 0.0201 0.0404 0.0506 0.103 0.211 0.446 0.575 0.713 1.386
3 0.0717 0.115 0.185 0.216 0.352 0.584 1.005 1.213 1.424 2.366
4 0.207 0.297 0.429 0.484 0.711 1.064 1.649 1.923 2.195 3.357
5 0.412 0.554 0.752 0.831 1.145 1.610 2.343 2.675 3.000 4.351
6 0.676 0.872 1.134 1.237 1.635 2.204 3.070 3.455 3.828 5.348
7 0.989 1.239 1.564 1.690 2.167 2.833 3.822 4.255 4.671 6.346
8 1.344 1.647 2.032 2.180 2.733 3.490 4.594 5.071 5.527 7.344
9 1.735 2.088 2.532 2.700 3.325 4.168 5.380 5.899 6.393 8.343
10 2.156 2.558 3.059 3.247 3.940 4.865 6.179 6.737 7.267 9.342
11 2.603 3.053 3.609 3.816 4.575 5.578 6.989 7.584 8.148 10.341
12 3.074 3.571 4.178 4.404 5.226 6.304 7.807 8.438 9.034 11.340
13 3.565 4.107 4.765 5.009 5.892 7.041 8.634 9.299 9.926 12.340
14 4.075 4.660 5.368 5.629 6.571 7.790 9.467 10.165 10.821 13.339
15 4.601 5.229 5.985 6.262 7.261 8.547 10.307 11.037 11.721 14.339
16 5.142 5.812 6.614 6.908 7.962 9.312 11.152 11.912 12.624 15.338
17 5.697 6.408 7.255 7.564 8.672 10.085 12.002 12.792 13.531 16.338
18 6.265 7.015 7.906 8.231 9.390 10.865 12.857 13.675 14.440 17.338
19 6.844 7.633 8.567 8.907 10.117 11.651 13.716 14.562 15.352 18.338
20 7.434 8.260 9.237 9.591 10.851 12.443 14.578 15.452 16.266 19.337
21 8.034 8.897 9.915 10.283 11.591 13.240 15.445 16.344 17.182 20.337
22 8.643 9.542 10.600 10.982 12.338 14.041 16.314 17.240 18.101 21.337
23 9.260 10.196 11.293 11.689 13.091 14.848 17.187 18.137 19.021 22.337
24 9.886 10.856 11.992 12.401 13.848 15.659 18.062 19.037 19.943 23.337
25 10.520 11.524 12.697 13.120 14.611 16.473 18.940 19.939 20.867 24.337
26 11.160 12.198 13.409 13.844 15.379 17.292 19.820 20.843 21.792 25.336
27 11.808 12.878 14.125 14.573 16.151 18.114 20.703 21.749 22.719 26.336
28 12.461 13.565 14.847 15.308 16.928 18.939 21.588 22.657 23.647 27.336
29 13.121 14.256 15.574 16.047 17.708 19.768 22.475 23.567 24.577 28.336
30 13.787 14.953 16.306 16.791 18.493 20.599 23.364 24.478 25.508 29.336
40 20.707 22.164 23.838 24.433 26.509 29.051 32.345 33.66 34.872 39.335
50 27.991 29.707 31.664 32.357 34.764 37.689 41.449 42.942 44.313 49.335
60 35.534 37.485 39.699 40.482 43.188 46.459 50.641 52.294 53.809 59.335
36 Engineering Mathematics IV– EMT 2201

740 Appendix A Statistical Tables and P

Table A.5 (continued) Critical Values of the Chi-Squared Distribution


α
v 0.30 0.25 0.20 0.10 0.05 0.025 0.02 0.01 0.005 0.0
1 1.074 1.323 1.642 2.706 3.841 5.024 5.412 6.635 7.879 10.8
2 2.408 2.773 3.219 4.605 5.991 7.378 7.824 9.210 10.597 13.8
3 3.665 4.108 4.642 6.251 7.815 9.348 9.837 11.345 12.838 16.2
4 4.878 5.385 5.989 7.779 9.488 11.143 11.668 13.277 14.860 18.4
5 6.064 6.626 7.289 9.236 11.070 12.832 13.388 15.086 16.750 20.5
6 7.231 7.841 8.558 10.645 12.592 14.449 15.033 16.812 18.548 22.4
7 8.383 9.037 9.803 12.017 14.067 16.013 16.622 18.475 20.278 24.3
8 9.524 10.219 11.030 13.362 15.507 17.535 18.168 20.090 21.955 26.1
9 10.656 11.389 12.242 14.684 16.919 19.023 19.679 21.666 23.589 27.8
10 11.781 12.549 13.442 15.987 18.307 20.483 21.161 23.209 25.188 29.5
11 12.899 13.701 14.631 17.275 19.675 21.920 22.618 24.725 26.757 31.2
12 14.011 14.845 15.812 18.549 21.026 23.337 24.054 26.217 28.300 32.9
13 15.119 15.984 16.985 19.812 22.362 24.736 25.471 27.688 29.819 34.5
14 16.222 17.117 18.151 21.064 23.685 26.119 26.873 29.141 31.319 36.1
15 17.322 18.245 19.311 22.307 24.996 27.488 28.259 30.578 32.801 37.6
16 18.418 19.369 20.465 23.542 26.296 28.845 29.633 32.000 34.267 39.2
17 19.511 20.489 21.615 24.769 27.587 30.191 30.995 33.409 35.718 40.7
18 20.601 21.605 22.760 25.989 28.869 31.526 32.346 34.805 37.156 42.3
19 21.689 22.718 23.900 27.204 30.144 32.852 33.687 36.191 38.582 43.8
20 22.775 23.828 25.038 28.412 31.410 34.170 35.020 37.566 39.997 45.3
21 23.858 24.935 26.171 29.615 32.671 35.479 36.343 38.932 41.401 46.7
22 24.939 26.039 27.301 30.813 33.924 36.781 37.659 40.289 42.796 48.2
23 26.018 27.141 28.429 32.007 35.172 38.076 38.968 41.638 44.181 49.7
24 27.096 28.241 29.553 33.196 36.415 39.364 40.270 42.980 45.558 51.1
25 28.172 29.339 30.675 34.382 37.652 40.646 41.566 44.314 46.928 52.6
26 29.246 30.435 31.795 35.563 38.885 41.923 42.856 45.642 48.290 54.0
27 30.319 31.528 32.912 36.741 40.113 43.195 44.140 46.963 49.645 55.4
28 31.391 32.620 34.027 37.916 41.337 44.461 45.419 48.278 50.994 56.8
29 32.461 33.711 35.139 39.087 42.557 45.722 46.693 49.588 52.335 58.3
30 33.530 34.800 36.250 40.256 43.773 46.979 47.962 50.892 53.672 59.7
40 44.165 45.616 47.269 51.805 55.758 59.342 60.436 63.691 66.766 73.4
50 54.723 56.334 58.164 63.167 67.505 71.420 72.613 76.154 79.490 86.6
60 65.226 66.981 68.972 74.397 79.082 83.298 84.58 88.379 91.952 99.6

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy