Chapter 8
Chapter 8
1
Chapter Goals
After completing this chapter, you are expected to:
• Distinguish between a point estimate and an interval
estimate
• Construct and interpret a confidence interval estimate
for a single population mean.
• Describe what a statistical hypothesis is.
• Know what Type I and Type II errors are.
• Test a hypothesis about a single population mean
• Conduct a hypothesis test of association between
variables.
2
Statistical Inference
• Statistical inference deals with making inference (under
uncertainty) based on the information provided by a
sample.
Descriptive Statistics Statistical Inference
Without generalization Generalize on the basis
of sample information
Describe data with summary Estimate population
measures parameters
Display data with graphical Test hypotheses about values
aids of parameters
3
Statistical Inference . . .
• Statistical inference is divided in two major classes:
estimation and hypothesis testing.
• Parameter estimation: finding optimal estimates of the
unknown parameter on the basis of sample information.
• Hypothesis testing: deals with testing whether a prior
assumption of the value of a parameter of a model is
supported by empirical data.
4
Estimation
• Population parameter is a numerical measure of a
summary characteristic of a population (e.g., µ,s).
• Sample statistic is a numerical measure of a summary
characteristic of a sample (e.g., x , s).
• An estimator of a population parameter is a sample
statistic to estimate or predict the population parameter.
• An estimate is a particular numerical value of a sample
statistic (estimator is a function, and estimate is a value
of the function).
• A point estimate is a single value used as an estimate of
a population parameter.
• An interval estimate is an interval aimed to include the
value of a population parameter.
5
Point and interval estimation of the mean
• We use the sample mean ( X ) as a point estimator of
the population mean (μ) .
• An estimator of a population parameter possesses the
following properties.
– Unbiasedness
– Efficiency
– Consistency
– Sufficiency
• A point estimator is unbiased if its expected value is
equal to the population parameter.
• Example:
– The sample mean is an unbiased estimator of μ
– The sample variance is an unbiased estimator of σ2
– The sample proportion is an unbiased estimator of P6
Unbiasedness
θ̂is1 an unbiased estimator, θ̂is2 biased:
θ̂1 θ̂ 2
θ θ θ̂
• Let θ̂ be an estimator of
• The bias in θ̂ is defined as the difference between its
mean and
Bias(θˆ ) E(θˆ )θ
• The bias of an unbiased estimator is 0.
7
Most efficient estimator
• Suppose that there are several unbiased estimators of
• The most efficient estimator or the minimum variance
unbiased estimator of is the unbiased estimator with the
smallest variance.
• Let θ̂ and θ̂ 2 be two unbiased estimators of , based on
1
the same number of sample observations. Then,
– θ̂1 is said to be more efficient than θ̂ 2 if Var( θˆ ) Var( θˆ )
1 2
8
Consistency
• Let θ̂ be an estimator of .
• θ̂ is a consistent estimator of if the difference between the
expected value of θ̂ and decreases as the sample size
increases.
• Or an estimator is consistent if its values approach (in
probability sense) to the parameter as the sample size increases.
• Consistency is desired when unbiased estimators cannot be
obtained.
Sufficiency
• An estimator is sufficient if it contains all the information in the
data about the parameter it estimates.
• For example, a sample median is also an estimator of the
population mean, but it is not sufficient as it only uses the
middle observation of the ordered sample.
9
Interval estimation of population mean
• An interval estimate is defined by two numbers, between which a
population parameter is said to lie.
• A confidence interval is a calculated interval which contains an
unknown parameter value with a prescribed level of confidence.
• If Pr(a < < b) = 1 - then the interval from a to b is called a 100(1 -
)% confidence interval of .
11
Finding the reliability factor, Z/2
• Consider a 95% confidence interval:
1 .95
α α
.025 .025
2 2
H0 : μ 2.95 H0 : X 2.95
17
Hypothesis testing . . .
• Begin with the assumption that the null hypothesis is
TRUE.
18
Example
Formulate appropriate null and alternative hypotheses for
testing the demographer's theory that the mean number of
children born to urban women is less than the mean
number of children born to rural women.
– H0: The mean number of children born to urban
women is greater than or equal to the mean number
of children born to rural women.
– H1: The mean number of children born to urban
women is less than the mean number of children
born to rural women.
19
Exercise
For many years, cigarette advertisements have been
required to carry the following statement: "Cigarette
smoking is dangerous to your health." But, this warning
is often located in not easily seen corners of the
advertisements and printed in small type. Consequently,
a researcher believes that over 85% of those who read
cigarette advertisements fail to see the warning. Specify
the null and alternative hypotheses that would be used
in testing the researcher's theory.
20
Hypothesis testing . . .
The decision of whether to reject the null hypothesis is
based on the value of a test statistic.
• A test statistic is a standardized value that is calculated
from sample data during a hypothesis test.
• Rejection region (also called critical region) is the range
of values of a sample statistic that will lead to rejection
of the null hypothesis.
• For example, in testing the population mean grade point
of students at AAU = 2.95, a logical choice as a test
statistic for μ is , and the rejection region contains the
values of that would lead us to believe that H1 is true,
i.e.,μ, μ>2.95 or μ<2.95.
21
Hypothesis Testing Process
Claim: the
population mean
grade point is 2.95
(Null Hypothesis:
H0: μ = 2.95 ) Population of students
Select a random
sample
Is X = 2.80 likely if μ = 2.95?
Suppose
If not likely,
the sample
REJECT mean grade point Sample of students
Null Hypothesis is 2.80: X = 2.80
22
Level of Significance ()
• The maximum acceptable probability of rejecting a true
null hypothesis.
• It defines the unlikely values of the sample statistic if the
null hypothesis is true.
• It defines rejection region of the sampling distribution.
• Typical values of α are 0.01, 0.05, or 0.10
• The level of significance is selected by the researcher at
the beginning.
• A level of significance 5% means 95 out of 100 cases
are true while 5 out of 100 cases are wrong.
23
Level of Significance
and the Rejection Region
H0: μ = 3 a /2 a /2
Rejection
H1: μ ≠ 3 Two-tail test 0 region is
shaded
H0: μ ≤ 3 a
H1: μ > 3 Upper-tail test 0
H0: μ ≥ 3 H1:
a
μ<3
Lower-tail test 0
24
Possible decisions in hypothesis testing
30
Hypothesis testing about population mean . . .
• Solution
• Since σis unknown the decision rule is
x μ0 x μ0
Reject H0 if t tn-1 , α/2 or if t tn-1 , α/2
s s
n n
25 28
t 3.79 and t 2.0096
5.6 49 , 0.025
50
• Reject H0 Since t < - t 49,0.025 , i.e. there is a sufficient
evidence that the population mean is different from 28.
31
Test of association
• When we have counts from categorical/qualitative
variables, we arrange them in cross tabulations or
contingency tables.
• The possible values of one variable determine the rows of
the table, and the possible values of the other determine
the columns of the table.
• Assume r categories for attribute A and c categories
for attribute B
– There are (r x c) possible cross-classifications.
32
r x c Contingency Table
Attribute B
33
Test for Association
• Consider n observations tabulated in an r x c
contingency table.
• Denote by Oij the number of observations in the cell
belonging to the ith row and the jth column.
• The null hypothesis is
H 0 : No association exists
between the two attributes in the population
35
Contingency Table Example
Gender vs. opinion on abortion
Opinion on abortion: pro-abortion vs. against-
abortion
Gender: Male vs. Female
H0: There is no association between opinion on abortion
and gender
H1: There is association between opinion on abortion and
gender.
36
Assumptions for a Chi-squared Test of Association
Well-defined categorical variables
Representative sample
Independent random sampling
Large number condition: all expected values
should be > 5.
Example
A survey of clients' satisfaction levels with the
facilities and management of three sporting facilities
was conducted based on random samples of 60 clients.
The results are summarized in the following
contingency table:
37
Chi-squared Test of Association
Sporting Facility
Total
Satisfied? A B C
Yes 17 14 11 42
No 5 6 7 18
Total 22 20 18 60
• Is there evidence of different satisfaction levels in the
three facilities?
38
Chi-squared Test of Association . . .
H0: There is no association between satisfaction
and sporting facility
H1: There is association between satisfaction and
sporting facility
• Expected Cell Frequencies:
R iC j
(i th Row total)(j th Column total)
Eij
n Total sample size
(42)(22) (42)(20)
E11 15.4 E12 14
60 60
39
Observed vs. Expected Frequencies
Sporting Facility
Total
Satisfied? A B C
Total 22 20 18 60
40
Chi-squared Test of Association
42