Sampling Theory
Sampling Theory
Hypothesis testing
Note
1) Simple hypothesis is the hypothesis that specifies the population distribution completely
sometimes referred to as a two tail test or non-directional test or a two sided test but a
composite hypothesis does not specify the population distribution completely sometimes
referred to as one tail test or directional test or a one sided test.
2) Parametric tests such as Z – tests (normal standard tests), t – tests and ANOVA are used in
data comprising specific measurements i.e quantitative data whereas non-parametric tests
such as chi square tests are used on non-parametric tests i.e these are tests that simply
compare frequencies of occurrences. These tests are particularly used when one wishes to
test a theory about qualitative observations e.g the type of company or quality of product.
3) Level of significance (𝛼) is the probability of rejecting a true null hypothesis
𝛼 = Pr(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 \𝐻0 𝑖𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡).
The critical region refers to the region where the null hypothesis is rejected.
Difference 𝑥̅1 − 𝑥̅ 2
𝑠12 𝑠22 𝑧=
between 𝑠𝑥̅ = √ + 𝑠𝑥̅
means 𝑛1 𝑛2
Difference 𝑝1 − 𝑝2
𝑝𝑞 𝑝𝑞 𝑧=
between 𝑠𝑝 = √ + 𝑠𝑝
proportions 𝑛1 𝑛2
Example
The management of a local health club claims that its members lose on the average 15 pounds or
more within the first 3 months after joining the club. To check this claim, a consumer agency took
a random sample of 45 members of this health club and found that they lost an average of 13.8
pounds within the first 3 months of membership, with a standard deviation of 4.2 pounds.
(a) Find the p-value for this test.
(b) Based on the p-value in (a), would you reject the null hypothesis at α = 0.01?
Solution
(a) Let μ be the true mean weight loss in pounds within the first 3 months
Then we have to test the hypothesis
H0 : μ = 15 versus Ha : μ < 15
Here n = 45, x = 13.8, and s = 4.2. Because n = 45 > 30, we can use normal
approximation.
Hence, the test statistic is
𝑥̅ − 𝜇 13.8 − 15
𝑧= = = −1.9166 ≅ −1.92 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑝 𝑣𝑎𝑙𝑢𝑒 = 𝑝(𝑧 < −1.92)
𝑠𝑥̅ 4.2
√45
= 0.0274
Thus, we can use an α as small as 0.0274 and still reject H0.
(b) No. Because the p-value = 0.0274 is greater than α = 0.01, one cannot reject H0.
Example
It is claimed that sports-car owners drive on the average 18,000 miles per year. A consumer firm
believes that the average mileage is probably lower. To check, the consumer firm obtained
information from 40 randomly selected sports-car owners that resulted in a sample mean of
17,463 miles with a sample standard deviation of 1348 miles. What can we conclude about this
claim? Use α = 0.01.
Solution
Let μ be the true population mean. We can formulate the hypotheses as
H0 : μ = 18,000 versus H1 : μ < 18,000.
The observed test statistic (for n ≥ 30) is
𝑥̅ − 𝜇 17463 − 18000
𝑧= = = −2.52
𝑠𝑥̅ 1348
√40
Rejection region is {z < −0.01} = {z < −2.33}.
Decision: Because z = −2.52 is less than −2.33, the null hypothesis is rejected at α = 0.01. There
insufficient evidence to conclude that the mean mileage on sport cars is less than 18,000 miles
per year.
Example
A manufacturer of sodas uses machines to dispense it into bottles at an average amount of 1196
litres per minute. The management went ahead to acquire a new machine which was tested on a
sample of 32 hourly production, producing at an average of 1234 litres per minute with a standard
deviation of 101. The production manager is convinced that the new machine
(i) Does not affect production at 5% level of significance
(ii) Is better than the old machine at 1% level of significance
Solution
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 101
𝑆𝑥̅ = = = 17.8544
√𝑅𝑎𝑛𝑑𝑜𝑚 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒 √32
𝑥̅ − 𝜇 1234 − 1196
𝑍𝑐 = = = 2.1283
𝑆𝑥̅ 17.8544
(i) 𝐻0 : 𝑥̅ = 𝜇 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 : 𝑥̅ ≠ 𝜇
𝑍𝑐 = 2.1283
𝑎𝑛𝑑 𝑍𝑡 = 1.96
𝑆𝑖𝑛𝑐𝑒 𝑡ℎ𝑒 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑡ℎ𝑒 𝑡𝑎𝑏𝑢𝑙𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑤𝑒 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0
𝑇ℎ𝑖𝑠 𝑖𝑚𝑝𝑙𝑖𝑒𝑠 𝑡ℎ𝑒 𝑛𝑒𝑤 𝑚𝑎𝑐ℎ𝑖𝑛𝑒 𝑎𝑓𝑓𝑒𝑐𝑡𝑠 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛
(ii) 𝐻0 : 𝑥̅ = 𝜇 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 : 𝑥̅ > 𝜇
𝑍𝑐 = 2.1283
𝑎𝑛𝑑 𝑍𝑡 = 2,33
𝑆𝑖𝑛𝑐𝑒 𝑡ℎ𝑒 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖𝑠 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑡ℎ𝑒 𝑡𝑎𝑏𝑢𝑙𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑤𝑒 𝑎𝑐𝑐𝑒𝑝𝑡 𝐻0
𝑇ℎ𝑖𝑠 𝑖𝑚𝑝𝑙𝑖𝑒𝑠 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑚𝑎𝑐ℎ𝑖𝑛𝑒𝑠
Example
In a frequently traveled stretch of the I-75 highway, where the posted speed is 70 mph, it is
thought that people travel on the average of at least 75 mph. To check this claim, the following
radar measurements of the speeds (in mph) are obtained for 10 vehicles traveling on this stretch
of the interstate highway: 66, 74, 79, 80, 69, 77, 78, 65, 79 and 81
Do the data provide sufficient evidence to indicate that the mean speed at which people travel on
this stretch of highway is at most 75 mph? Test the appropriate hypothesis using α = 0.01. Draw
Solution
We need to test H0 : μ = 75 vs. Ha : μ > 75
Example
Current statistics show that 12% of the population is believed to be infected by the COVID-19
virus. In a random sample of 80 persons screened for the virus while on transit 9 tested positive of
the virus. Do you think at 5% level of significance the current statistics claim is true?
Solution
𝐻0 : 𝑝0 = 𝑝1 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 : 𝑝0 ≠ 𝑝1
Example
Research has shown that out of 85 school going children aged below 10 years 60 were found to be
going to school before the introduction of universal free primary education, but after the
introduction of the scheme it has been found out that 37 pupils out of 53 go to school. Is the scheme
effective at 99% confidence?
Solution
𝐻0 : 𝑝2 = 𝑝1 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 : 𝑝2 ≠ 𝑝1
60
𝐿𝑒𝑡 𝑝1 = 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡ℎ𝑒 𝑖𝑛𝑡𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑐ℎ𝑒𝑚𝑒 = = 0.7059
85
37
𝑎𝑛𝑑 𝑝2 = 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑎𝑓𝑡𝑒𝑟 𝑡ℎ𝑒 𝑖𝑛𝑡𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑐ℎ𝑒𝑚𝑒 = = 0.69811
53
60 + 37 97
𝐿𝑒𝑡 𝑝 = 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑔𝑜 𝑡𝑜 𝑠𝑐ℎ𝑜𝑜𝑙 = = = 0.7029
85 + 53 138
𝑤ℎ𝑖𝑐ℎ 𝑓𝑜𝑙𝑙𝑜𝑤𝑠 𝑡ℎ𝑎𝑡 𝑞 = 1 − 𝑝 = 1 − 0.7029 = 0.2971
𝑝𝑞 𝑝𝑞
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑜𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛𝑠 = 𝑠𝑝 = 𝑆𝑝1−𝑝2 = √ +
𝑛1 𝑛2
0.7029𝑥0.2971 0.7029𝑥0.2971
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑠𝑝 = √ + = 0.07998
85 53
𝑝2 −𝑝1 0.69811−0.7059
𝑍𝑐 = = = 0.0974
𝑆𝑝 0.07998
𝑎𝑛𝑑 𝑍𝑡 = 2.58
𝑆𝑖𝑛𝑐𝑒 𝑡ℎ𝑒 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖𝑠 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑡ℎ𝑒 𝑡𝑎𝑏𝑢𝑙𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑤𝑒 𝑎𝑐𝑐𝑒𝑝𝑡 𝐻0
𝑇ℎ𝑖𝑠 𝑖𝑚𝑝𝑙𝑖𝑒𝑠 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒 𝑠𝑐ℎ𝑒𝑚𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑖. 𝑒 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒
Example
The following information was obtained from the accounts department in a sugar processing plant
in relation to the average salary paid the workers in different departments in the plant.
Department Production human resource
Sample size 45 32
Average salary ($) 950 1000
Standard deviation ($) 102 85
At 99% confidence, determine if there is any significant difference in the average pay in different
departments in the same plant.
Solution
𝐻0 : 𝑥̅1 = 𝑥̅ 2 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 : 𝑥̅1 ≠ 𝑥̅ 2
𝑠12 𝑠22
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑜𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑤𝑜 𝑚𝑒𝑎𝑛𝑠 = 𝑠𝑥̅ = 𝑆𝑥̅1−𝑥̅2 =√ +
𝑛1 𝑛2
1022 852
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑠𝑥̅ = √ + = 21.3771
45 32
𝒕 − 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒂𝒍 𝒕𝒆𝒔𝒕