ch11 HypothesisTesting
ch11 HypothesisTesting
Hypothesis Testing
1. Introduction
In the previous chapter, we learned how to create confidence intervals based on a point estimate from a
sample. We can think of the range of values as a “reasonable” range of values that we could expect to
observe for members of the population. We accept that the sample point estimate (and so the confidence
interval) is most likely imperfect, but that it may be “good enough” for our purposes. Recall George
Box’s maxim that “All models are wrong, but some are useful.”
Now suppose that someone claims that a population parameter (e. g., population mean) has a particular
value. If we had the whole population to hand, we could calculate the true parameter value and so prove
or disprove the claim. But in general, we only have a sample and can only calculate the corresponding
sample statistic (e. g., sample mean). Almost certainly it will di↵er somewhat from the true population
parameter, so we cannot reject the claim just because the claimed population parameter is not exactly
equal to the sample statistic. However, if the sample statistic obtained from the sample is extreme
compared to the claimed population parameter, we might be suspicious of the claim and decide that we
could not believe the claim to be true.
But how big a di↵erence between claimed population parameter and sample statistic should we regard
as “extreme”, meaning we reject the claim? This is the reason why in this chapter, we begin to study
hypothesis testing, which is the second major part of inferential statistics.
Hypothesis testing is not about mathematically proving or disproving a claim. Rather, it is about
“proving beyond reasonable doubt”, as in a court trial: where we choose the level of “reasonable doubt”,
which we call the significance level.
Definition 11.1. A hypothesis is a claim or assumption about a population parameter: a claim that
has not yet been verified.
Hence, the null hypothesis is always about a population parameter, not about a sample statistic!
An example of a null hypothesis is the following about the population mean µ: H0 : µ = µ0 . Here, µ0 is
the claimed value and we want to test if µ0 is a reasonable value for µ.
The null hypothesis refers to the status quo or what the person conducting the study thinks is the current
situation. It always contains the = sign (one of =, or ) and it may be either rejected or not rejected.
127
128 MIS10090
The starting point (the working assumption) is that the null hypothesis is true: we only reject the null
hypothesis provided we find enough evidence from a sample of data.
Note 11.3. The only way we could prove the null hypothesis to be true is to calculate the true value of the
population parameter: this would mean we need access to the full population to make this computation.
Thus, since we are limited to examining a sample from the population, the null hypothesis is never
proven!
Also, we do not use the word “accepted”; we always say “not rejected” as there is a subtle di↵erence
here: what we mean is that either we find enough evidence to reject the null hypothesis, or we do not
find enough evidence (think about the possible di↵erence between someone being found not guilty in
court — there was not enough evidence to convict — and actually being innocent).
The null hypothesis generally makes claims such as “the population mean is not di↵erent from a stated
value µ0 ” or “the population mean is not less than a particular value µ0 ”, and so on. For example:
(1) H0 : µ = µ0 ; “the population mean is not di↵erent from µ0 ”;
(2) H0 : µ µ0 “the population mean is no less than µ0 ” or “the population mean is at least µ0 ”;
and
(3) H0 : µ µ0 “the population mean is no more than µ0 ” or “the population mean is at most µ0 ”.
Identifying the null hypothesis correctly depends on carefully reading the problem statement for terms
like “is equal to” or “not di↵erent from” (which indicate a two-tailed test: see below) versus terms like
“is at least/most” or “is no less/more than” or “has a lower/upper limit of” or “does not exceed / fall
beneath” (which indicate a one-tailed test: see below).
2.1.2. The Alternative Hypothesis.
Definition 11.4. The Alternative Hypothesis, H1 , is the opposite of the null hypothesis.
The alternative hypothesis challenges the status quo and is generally the hypothesis that the researcher
is trying to prove. It never contains the = sign: it does not have =, or , but rather 6=, > or <. As
the alternative hypothesis H1 is the opposite of the null hypothesis H0 , it may or may not be proven.
The alternative hypothesis is accepted if and only if the null hypothesis is rejected: that is, if the sample
data give enough evidence that the null hypothesis is false.
2.2. Types of Hypothesis Test. In many cases, the alternative hypothesis focuses on a particular
direction. For example, in a battery testing application, it is not very informative to know that battery
life is not 4 hours, as we do not know if it is more than or less than 4. Better to test if it is less than 4
hours, or greater than 4 hours, depending on your aim.
The way in which the alternative hypothesis H1 is set defines three possible types of hypothesis test:
(1) H1 : µ 6= µ0 ;
(2) H1 : µ < µ0 ; and
(3) H1 : µ > µ0 .
The first type of test is called two-tailed, the second type is called (lower) one-tailed or left-tailed and
the third type is called (upper) one-tailed or right-tailed.
Example 11.5. The manufacturer of a laptop battery claims that on average the battery lasts for 4
hours before needing to be recharged.
• The manufacturer’s claim can be written as: H0 : µ = 4 hours;
• If the claim is not true, then the opposite — the alternative — must be true: H1 : µ 6= 4 hours.
This is an example of a two-tailed test. As mentioned above, it may not be very useful but serves us as
an example. If the claim was “the battery lasts for at least 4 hours”, then it would be a one-tailed test:
H0 would be µ 4, giving H1 : µ < 4 so this would be a left-tailed test. }
Example 11.6 (Lower one-tail Example). A health food snack label claims that the snack provides at
least 11.9mg of iron.
Data Analysis for Decision Makers 129
• H0 : µ 11.9 mg
• H1 : µ < 11.9 mg.
This is an example of a lower-tail test (or left-tailed test). Notice the “at least” tells us this is a one-tailed
test. }
Example 11.7 (Upper one-tail Example). A pizza delivery company claims that their pizzas are delivered
within 30 minutes.
• H0 : µ 30 minutes
• H1 : µ > 30 minutes.
This is an example of an upper-tail test (or right-tailed test). Notice that “within 30 min” i. e., “at most
30 min” tells us this is a one-tailed test. }
2.3.1. Critical value. We know that there will be random di↵erences from sample to sample (sample
error), so we have to allow for this variability. Just because the sample disagrees a little with the null
hypothesis does not mean the null hypothesis is wrong. We need a substantial threshold of evidence
from the sample before we can reject the null hypothesis.
Let’s try to get some intuition about this. Suppose we have a population of people and their age
distribution is approximately normal. It is claimed (H0 ) that the mean age is 50 years old. Take a
sample of ages from the population. Then:
• If the sample mean is “close” to the stated population mean of 50 (e. g., x̄ = 48), then because
of sample “randomness”, the claim H0 should probably not be rejected.
• If the sample mean is “far” from the stated population mean of 50 (e. g., x̄ = 20), then the null
hypothesis H0 should probably be rejected, as it is “very unlikely” to happen.
• But how far is “too far” — so that we reject H0 ?
• How far is considered “not too far” or “still OK” to not reject H0 ?
The critical value is the threshold or borderline beyond which the sample mean is considered “too far”,
so that we have to reject H0 . We indirectly control the critical value by choosing a level of significance.
2.4. Level of Significance. Our basic approach in hypothesis testing is to compute a “test statis-
tic” from the sample data and either:
Both approaches will give the same answer: the first compares scores while the second compares areas
(probabilities).
Definition 11.8. The level of significance, ↵, defines the unlikely values of the test statistic if the null
hypothesis is true.
We choose ↵, just as we choose confidence levels for confidence intervals (recall that we often wrote our
confidence level as 1 ↵). In both confidence intervals and hypothesis testing, ↵ plays a similar role: it
is the probability that we are willing to accept of “being wrong”. It is very important to stress that ↵
is selected by the researcher at the beginning, so that the critical value(s) of the test can be obtained
from the statistical tables.
130 MIS10090
So how do we test a claim (the null hypothesis) or look for evidence in support of the alternative? Like
the prosecution team in a court case, if we can find enough evidence (“beyond reasonable doubt”) to
prove the alternative is true, then we have shown that the claim must be false! Based on the theory of
sampling distributions we saw earlier, we choose a level of significance, ↵, we take a sample from the
population, calculate a “test statistic” from the sample, and ask if this is a reasonable result (according
to ↵) to obtain if the claim H0 is true.
Definition 11.9. The test statistic is the statistic computed from the data which we use to decide the
outcome of the hypothesis test.
The particular form of the test statistic depends on the hypothesis we are testing (more later).
The six steps in hypothesis testing are:
(1) State the null hypothesis, H0 , and alternative hypothesis, H1 ;
(2) Choose the level of significance, ↵;
(3) Determine the appropriate statistical technique and the test statistic to use;
(4) Find the critical value(s) or the p-value (see below);
(5) Make the statistical decision: Reject H0 or Do Not Reject H0 ; and
(6) Express (interpret) the decision in the context of the problem.
Steps 1–3 and 6 are the same in both the critical value and p-value approaches; only steps 4 and 5 vary
according to the approach.
Notice that for a one-tailed test, the rejection region is on the same side as the point in the H1
inequality, e. g., left for <;
Nonrejection region: the set of values of the test statistic that do not lead us to reject the null
hypothesis.
y y y
1 1 1
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
x x x
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Figure 11.1. Rejection regions (shown in red): (left) lower one-tailed; (centre) upper
one-tailed; and (right) two-tailed. The total area of the rejection region (which may be
one or both tails) is always the significance level ↵, and it is split into two parts for a
two-tailed test
Based on the significance level of the hypothesis test, one or more cut-o↵ point(s), also called critical
value(s), are identified.
Definition 11.10. The critical value(s) are the value(s) of the test statistic that separate the rejection
and nonrejection regions; these values are part of the rejection region.
The criterion for deciding whether to reject the null hypothesis is based on a comparison between two
values: the value of the test statistic and the critical value(s). We get a picture as shown in Figure 11.2.
Figure 11.2. Critical value approach: (left) lower one-tailed; (centre) upper one-tailed;
and (right) two-tailed. Notice that the total area of the rejection region (the blue shaded
area, which may be one or both tails) is always ↵, and it is split into two parts for a
two-tailed test
3.2. The p-Value Approach. Unlike the critical value approach, the p-value approach does not
compare values but areas (that is, probabilities). The criterion for deciding whether to reject the null
hypothesis is based on the likelihood of the value obtained for the test statistic if the null hypothesis is
true, and the specified significance level of the hypothesis test.
Definition 11.11. The p-value is defined as the probability of obtaining a test statistic more extreme
than the observed sample value, given H0 is true.
We compare the p-value with ↵, the level of significance. If the p-value is less than ↵, we reject the
claim, H0 ; otherwise, we do not.
132 MIS10090
For the rest of this chapter, we continue on from the general comments on hypothesis testing made above:
we investigate hypothesis testing for the most basic population parameters and which tests to use when.
We start with the the case of hypothesis testing for the population mean µ, when the population standard
deviation is known. We’ll see below that we need a di↵erent approach when the population standard
deviation is not known.
As with confidence intervals, there are general assumptions we need for hypothesis testing for population
mean µ, whether or not the population standard deviation is known:
• the level of measurement is either interval or ratio; and
• either
– the population is normally distributed (often a reasonable assumption); or
– the sample size is large enough, n 30. Then the central limit theorem (CLT) guarantees
that the sampling distribution of the mean will be approximately normal.
4.1. Critical value approach. Using the critical value approach, we need to standardise the sam-
ple statistic X to a z-score. This is called the test statistic value and can be obtained using the formula:
x̄ µ
zt = p .
/ n
Then, we need to determine the critical values of z for a specified level of significance ↵ from the
statistical tables or Excel. The critical values of z cut the normal distribution into regions of rejection
and nonrejection. Moreover, the critical values of z are part of the rejection region(s).
The decision rule is simple:
• if the test statistic falls in the rejection region, reject H0 ;
• otherwise, do not reject H0 .
Example 11.12. In Example 11.5 on battery life, assume that battery duration is normally distributed
and the population standard deviation is known to be 12 minutes (that is, 12/60 = 0.2 hours). }
Nonrejection
Region
0 Z
Rejection
Region
4.2. The p-value approach. For the p-value approach, the first three steps are the same as for
the critical value approach.
Example 11.14. In the battery life example above, assume that battery duration is normally distributed
and the population standard deviation is known to be 12 minutes (that is, 12/60 = 0.2 hours). }
Steps 1, 2 and 3 are exactly as in the CV approach: we get ↵ = 0.1 = 10%, zt = 1.58.
134 MIS10090
Step 4: Because this is a two-tailed test, we have to sum P (z < 1.58) and P (z > 1.58) which, by
symmetry about 0, are equal. From Table 4 of NCST we have that P (z < 1.58) = 0.9426 so we get
P (z > 1.58) + P (z < 1.58) = 2 ⇥ (1 0.9429) = 0.1142.
Thus, the p-value = 0.1142.
Step 5: We get the picture as shown in Figure 11.5:
0.05 0.05
0 Z
0.0571 0.0571
Example 11.15. The Student Union wants to give budgetary advice to incoming first year students.
Based on historical data, the Student Union assumes that the average amount students spend for lunch
is e 5. However, the Student Union would like to investigate if the spending has now increased. Assume
the population standard deviation, , is known to be e 0.80. Suppose we take a sample of 100 students,
and calculate the sample mean to be e 4.84.
Step 1: We state the null and alternative hypotheses:
• H0 : µ e 5.00
• H1 : µ > e 5.00 (right-tail test)
Step 4: We find the critical value and determine the region of rejection. As before, for a one-tailed test
with ↵ = 0.05, we have zc = 1.65.
Step 5: We compare the test statistic to the critical value and since it is in the region of nonrejection,
we fail to reject the Null Hypothesis, H0 .
Step 6: Interpretation: We have failed to reject the null hypothesis; the claim is not proven to be true
but we have insufficient evidence to prove that the claim is false. It is reasonable for the Students’ Union
to advise incoming first years that they should budget on spending at most e 5.00 on lunch per day. }
Exercise 11.16. Use the p-value approach to test the Students’ Union claim that students spend at
most e 5.00 on lunch per day. Verify your results using the Excel template on Brightspace.
Data Analysis for Decision Makers 135
Exercise 11.17. The National Consumer Agency (NCA) have raised concerns that the amount of water
in the 500ml bottles of water produced by the Quinn Bottle Company is not actually 500ml. Taking a
sample of 50 bottles yields a sample mean of 497.5 ml. The population standard deviation is known to
be 10ml. Assist the NCA in looking for evidence that the mean amount of water per bottle is not 500ml
at a 0.1 level of significance. Is your decision the same at a 0.05 level of significance?
If the population standard deviation is unknown, a Student t-distribution needs to be used to obtain the
critical value. The new sample statistic x̄ has to be transformed into a t-test statistic with n 1 degrees
of freedom. This is called the test statistic value and can be obtained using the formula:
x̄ µ
tt = p .
s/ n
Example 11.18. A tourism website claims the average cost of a room in a 4⇤ hotel in Dublin per night
is e 85. Assume the hotel price distribution is normal. A random sample of 25 such hotels results in a
sample mean of e 84.5 and a sample standard deviation of e 13.62. It seems possible hotel prices have
decreased: a change in the status quo. We wish to test if there has been a decrease in room rates at a 1%
level of significance: the alternative hypothesis will be a change in the status quo, i. e., a decrease. The
null hypothesis will be that hotel prices have not decreased (i. e., stayed the same or even increased).
Step 1: We set up the the null and alternative pair of hypotheses:
• H0 : µ e 85
• H1 : µ < e 85
Step 2: We wish to test the claim at a 1% level of significance.
Step 3: This is a (lower) one-tailed hypothesis test for the mean but we only have s, the sample standard
deviation: so a t-test must be used.
We calculate the t-test statistic to be
x̄ µ 84.5 85
tt = p = p = 0.18.
s/ n 13.62/ 25
Step 4: We look in NCST Table 9 (p. 44) for the p-value of this t-test statistic for n 1 = 25 1 = 24
df (degrees of freedom). We can only look up t-values to one decimal place so we must round our t-test
statistic of 0.18 to 0.2. The tables only give p-values for positive values of t but by symmetry we can
just subtract from 1. The p-value of t = 0.2 with 24 degrees of freedom is 1 0.5784 = 0.4216.
Step 5: We compare the p-value of 0.4216 to the level of significance ↵ = 0.01 and since the p-value
0.4216 > 0.1, in this case we decide not to reject the null hypothesis.
Step 6: We have not found sufficient evidence to disprove the claim: so the website can continue to make
this claim. }
Exercise 11.19. Use the critical value approach to perform the same hypothesis testing on “hotel room
rates in Dublin”.
Exercise 11.20. The Health Service Executive (HSE) claims that Irish women weigh at most 58kg.
Taking a sample of 10 women yields a sample mean of 59kg and a sample standard deviation of 5kg.
Assist the HSE in testing their claim at a 0.1 level of significance. Is your decision the same at a 0.05
level of significance?
Figure 11.6. Flowchart for Hypothesis Testing for population mean µ: when to use z,
when to use t, and when neither can be used
If the variables involved in the testing are categorical with possible binary outcomes:
• “success” (possesses a certain characteristic) or
• “failure” (does not possess that characteristic),
then a z-test for the proportion is needed.
This condition needs to be verified together with other binomial requirements:
• the (random) sample data has been gathered based on independent trials, and
• the probability associated to the possible outcome “success” is fixed.
The proportion of the population in the “success” category is denoted by p, and the sample proportion
in the “success” category is denoted by ps . When both np and n(1 p) are at least 5, ps can be
approximated by a normal distribution with mean p and standard deviation equal to the standard error
of the proportion1 r
p(1 p)
.
n
Hence, the sampling distribution of ps is approximately normal, so the test statistic is a z-value:
ps p
zt = q .
p(1 p)
n
1Recall that we covered the standard error of the proportion at the end of §4.2 of Chapter 9 and the start of §4of
Chapter 10.
Data Analysis for Decision Makers 137
Example 11.21. The UCD facilities sta↵ consider it possible that more than 15% of students are left
handed and are looking for evidence in support of their belief at a 0.1 level of significance. They survey
100 students and find that 20 out of the sample are left handed: thus, ps = 20/100 = 0.2.
Step 1: We state the null and alternative hypotheses as follows:
• H0 : p 0.15
• H1 : p > 0.15
Step 2: The level of significance is given to us as 0.1.
Step 3: The claim is about a population proportion. We have n = 100 and p = 0.15 so np = 15 and
n(1 p) = 85. Since both are at least 5, we can use a z-test. From the data given this standardises to a
test z-score of 1.4:
ps p 0.2 0.15 0.05
zt = q =q = ⇡ 1.4.
p(1 p) 0.15(0.85) 0.0357
n 100
Step 4: The critical value of z for an upper tail test with a 0.1 level of significance is 1.28.
Step 5: Since the test statistic of 1.4 falls into the upper tail region of rejection, we reject the claim that
at most 15% of students are left handed.
Step 6: We interpret this as saying we have found sufficient evidence to prove that more than 15% of
students are left handed. }
Check the Brightspace folder for an Excel template to perform hypothesis testing for the proportion.
Exercise 11.22. Based on previous student elections, it is expected that 53% will vote for the male
candidate. However, a poll of 800 voters finds that 51% plan to vote for the male candidate in the
forthcoming student elections. Hence, it could be that the male candidate may obtain fewer votes than
expected. Test this with a level of significance of 5%.
The types of hypothesis testing we have looked at so far (either for one population mean or the population
proportion) are so-called parametric.
The term parametric means that these tests require assumptions on the population distribution (i. e.,
normality or approximate normality) in order to be performed.
Conversely, nonparametric tests do not require such conditions.
We summarise the assumptions underpinning the parametric tests we’ve seen: the z-test for one popu-
lation mean, the t-test for one population mean and the z-test for the population proportion.
z-test (µ) t-test (µ) z-test (p)
simple random sample simple random sample binomial conditions
normal population or sample normal population or sample np > 5 and n(1 p) > 5 (ap-
large enough (CLT) large enough (CLT) proximately normal)
known unknown
7.1. The Goodness-of-Fit test. We are going to look at one type of nonparametric test which
goes under the name of goodness-of-fit test. This type of test is applied to categorical data, that is,
grouped into categories. Before introducing the hypotheses that are compared in this case, we need to
introduce a new distribution, called the 2 (chi-square) distribution.
Figure 11.7 displays di↵erent chi-square distributions according to degrees of freedom (df). In this case
of goodness-of-fit applications, the df are not linked to sample size but to the number of categories the
data are grouped into.
Features of the chi-square distribution are:
• values are never negative
138 MIS10090
• there is a family of distributions dependent upon degrees of freedom based on number of data
categories (e. g., df = k 1 where k is the number of categories)
• the shape depends on the number of categories
• positive skewness (however, the higher the value of degrees of freedom, the more the chi-square
distribution approximates well to the normal)
The main idea of the goodness-of-fit test is to try and compare observed (O) distribution values with
expected (E) distribution values.
Example 11.23. A small airline company is interested in investigating if the number of extraordinary
maintenance interventions of the planes of its fleet vary according to seasonality. In the previous year,
during the n = 4 quarters, the company has reported 3 additional maintenance services in spring, 7 in
summer, 6 in autumn and 12 in winter. }
3, 7, 6 and 12 are the so-called observed values given that the company has been able to observe and
count such interventions over the past year.
Now, why does this circumstance call for a goodness-of-fit test?
Example 11.24. Step 1: What are the null and alternative hypotheses in this case?
How do we find the critical value for a chi-square distribution? We use the NCST or Excel; however,
we also need the value of degrees of freedom. In this case where we have only one variable and four
categories, the degrees of freedom are based on the number of categories by setting df = k 1 where k
is the number of categories. Hence, df = k 1 is 3 in our example.
Step 4: In the case of the NCST, you can check Table 8. We have a level of significance equal to 0.01
and degrees of freedom equal to 3. Hence, the critical value we are looking for is 11.34. You can obtain
the same value by using the Excel function CHISQ.INV.RT (See Excel for lecture solutions).
Step 5: We compare the test statistic with the critical value and we notice the former does not fall within
the rejection region identified by the latter so we decide not to reject the null hypothesis.
The test statistic 2 is equal to 6.
t
How do we find the p-value approach for a chi-square distribution? We use the NCST or Excel; however,
we also need the value of degrees of freedom.
Step 4: In the case of the NCST, you can check Table 7. We have a test statistic equal to 6 and degrees
of freedom equal to 3. Hence, the p-value we are looking for is 1 0.8884 = 0.1116. You can obtain the
same value by using the Excel function CHISQ.DIST.RT (See Excel for lecture solutions).
Step 5: We compare the p-value with the level of significance so we decide not to reject the null hypothesis
given that 0.1116 > 0.01.
Step 6: We have not found sufficient evidence to disprove the claim that seasonality has no role in
extraordinary maintenance inteventions for the flight company fleet. }
Exercise 11.25. A company that manufactures kitchen glasses would like to know, with a 5% level
of significance, whether the observed number of faulty (e. g., cracked) glasses may be attributed to the
type of glass used. Here are the data collected for the number of faulty pieces per glass type: 2 cracked
glasses for glass type 1, 9 cracked glasses for glass type 2, and 19 cracked glasses for glass type 3.
7.2. Contingency tables and hypothesis testing. A contingency table allows us to summarise
two factors at the nominal level of measurement at the same time.
In the example related to the extraordinary maintenance interventions, apart from season another factor
(or variable) to be considered could have been the type of flight (i. e., short-haul or long-haul). We can
then extend the application of the chi-square statistic to evaluate if two variables of such a type are
independent.
How do things change under these circumstances? Let’s consider a revised version of the flight company
example.
Example 11.26. Step 1: What are the null and alternative hypotheses in this case?
• H0 : There does not exist any link between type of flight and seasonality when it comes to
extraordinary maintenance interventions
• H1 : There does exist a link between type of flight and seasonality when it comes to extraordinary
maintenance interventions
140 MIS10090
What about the expected values in this case? In order to compute the expected value for each element
[flight type, season], we can use the formula E = (total by row)⇥(total by column)
overall total . We summarise expected
values in a table.
Expected values Spring Summer Autumn Winter Total
Short-haul 3 4 6 7 20
Long-haul 3 4 6 7 20
Total 6 8 12 1 40
2 -test 2
P (O E)2
Step 3: We calculate the chi-square statistic as: t = E .
Hence, in our case,
How do we find the critical value for a chi-square distribution? We use the NCST or Excel; however, we
also need the value of degrees of freedom. In this case where we have two variables (i. e., flight type and
seasonality), the degrees of freedom are computed as
df = (number of rows 1) ⇥ (number of columns 1).
Hence, df = (2 1) ⇥ (4 1) = 3 in our example.
Step 4: In the case of the NCST, you can check Table 8. We have a level of significance equal to 0.01
and degrees of freedom equal to 3. Hence, the critical value we are looking for is 11.34. You can obtain
the same value by using the Excel function CHISQ.INV.RT(See Excel for lecture solutions).
Step 5: We compare the test statistic with the critical value and we notice the former does not fall within
the rejection region identified by the latter so we decide not to reject the null hypothesis.
The chi-square test statistic 2 is equal to 0.95.
t
How do we find the p-value approach for a chi-square distribution? We use the NCST or Excel; however,
we also need the value of degrees of freedom.
Step 4: In the case of the NCST, you can check Table 7. We have a test statistic equal to 6 and degrees
of freedom equal to 3. Hence, the p-value we are looking for is 1 0.1987 = 0.8013. You can obtain a
similar value (due to approximation) by using the Excel function CHISQ.DIST.RT (See Excel for lecture
solutions).
Step 5: We compare the p-value with the level of significance so we decide not to reject the null hypothesis
given that 0.8013 > 0.01.
Step 6: We have not found sufficient evidence to disprove the claim that seasonality and flight type are
independent when it comes to extraordinary maintenance inteventions for the flight company fleet. }
Data Analysis for Decision Makers 141
Exercise 11.27. A company that manufactures kitchen glasses would like to know, with a 5% level of
significance, whether there is a relation between glass type and environmental circumstances when it
comes to number of faulty (e. g., cracked) glasses. Here are the data collected for the number of faulty
pieces under temperature, pressure or stress circumstances, respectively: 3, 4, and 3 cracked glasses for
glass type 1; 5, 10, and 5 cracked glasses for glass type 2; and 7, 1, and 12 cracked glasses for glass type
3.
We can also use a 2 test to test if a population standard deviation is equal to a particular value 0 .
The 2 test statistic follows a chi-square distribution with n 1 degrees of freedom. It is a right-skewed
distribution dependent on the number of degrees of freedom and is used to draw conclusions about the
population variability.
Assume the population follows a normal distribution. The test statistic is given by:
2 (n 1)s2
t = 2
where n is the sample size, s is the sample standard deviation and 0 is the given (claimed) population
standard deviation.
The test statistic is then compared to the critical values of the 2 distribution (NCST Table 7 or Excel
CHISQ.INV) for n 1 degrees of freedom and the level of significance ↵ we are asked to use.
The 2 test can be used for either a one-sided test or a two-sided test:
• two-sided tests against the alternative hypothesis that the population standard deviation is 6=
a specified value 0 .
• one-sided only tests in one direction (either > or < depending on the null hypothesis).
The problem we are given tells us whether we need to use a two-sided or one-sided test. If we simply
want to test for inequality, we use a two-sided test. But sometimes, e. g., if we are testing a new process,
we may only be interested in whether its variability exceeds the variability 0 of the current process;
then we would use a one-sided test with H0 : 0 and H1 : > 0 .
Since the population variance is just 2, this approach can also be used to test claims about the population
variance.
Example 11.28. A pizza delivery service claims that the standard deviation on delivery times is 15
minutes. The sample standard deviation over a sample size of 25 is 17 minutes. Test the service’s claim
at a 0.05 level of significance.
Step 1: We are asked to test the claim of equality to the claimed time 15 minutes, so this is a 2-sided
test:
142 MIS10090
• H0 : = 15
• H1 : 6= 15.
Step 2: We are told to use a 0.05 level of significance.
Step 3: With a sample size of 25, n 1 = 24 and s = 17 minutes. We compute the test statistic
2 (n 1)s2
(24)172
t = 2
= = 30.83.
152
Step 4: With ↵ = 0.05 and 24 df, the critical values of 2 are:
• lower: 2
1 ↵/2,n 1 ⇡ 12.4 (from Excel CHISQ.INV(0.025,24)),
• upper: 2
↵/2,n 1 ⇡ 39.4 (from Excel CHISQ.INV(0.975,24)).
Step 5: Since the test statistic falls within the critical region (i. e., between the critical values), we do
not reject the null hypothesis.
Step 6: We have insufficient evidence to disprove the pizza delivery company’s claim. }
9. Chapter Summary
Check that you can use the critical and p-value approaches to hypothesis testing for the population mean
(standard deviation known or unknown) and for the population proportion.
Confirm that you know how to check a claim about the population standard deviation (or variance).
Confirm that you can interpret the results of hypothesis testing and you can explain the possible errors
that can occur.