Chapter 4 Stat
Chapter 4 Stat
Introduction
Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two
or more population (or treatment) means by examining the variances of samples that are taken.
ANOVA allows one to determine whether the differences between the samples are simply due to
random error (sampling errors) or whether there are systematic treatment effects that cause the
mean in one group to differ from the mean in another.
Most of the time ANOVA is used to compare the equality of three or more means, however
when the means from two samples are compared using ANOVA it is equivalent to using a t-test
to compare the means of independent samples.
ANOVA (analysis of variance) is used to compare the means of several populations by
partitioning the sum of squares. One-way analysis of variance is based on assessing how much of
the overall variation in the data is attributable to differences between the group means, and
comparing this with the amount attributable to differences between individuals in the same
group. Analysis of variance is a statistical technique used to compare more than two population
means by isolating the sources of variability. The ANOVA technique requires partitioning of
total variance into component parts.
The F-distribution
The F-distribution is obtained from the ratio of two independent chi-square variables divided by
their respective degrees of freedom. Assume that X ~ 2 m and Y ~ 2 n independently, then
X
F n ~ F where F represents the F-distribution with m degrees of freedom on the the
m ,n m,n
Y
m
numerator and n degrees of freedom on the denominator.
Theorem: Let x1 , x2 ,..., xn be a random sample from a normal population with var iance 1 2 and
y1 , y2 ,..., yn be a random sample from a normal population with var iance 2 2 , and let S12 and
2
S1
12
S22 denote the two sample variances. Then the random variable F 2
has an F-
S2
22
distribution with n1 1 degrees of freedom on the numerator and n2 1 degrees of freedom on
the denominator.
Note: The F-distribution is non-negative and non-symmetric.
One-Way ANOVA
Assumptions
1. Independent samples are taken using random sampling.
2. For each population, the variable under consideration is normally distributed.
3. The standard deviations of the variable under consideration are the same for all the
populations [assumption of equal variance (homogeneity of variance)].
The null hypothesis is that the several population means are mutually equal. The assumption
underlying the use of the analysis of variance is that the several sample means were obtained
from normally distributed populations having the same variance.
Suppose there are k groups to be compared. Let n observations are obtained from each of the k
groups and yij denote the jth (j=1,2, …,n) observation from the ith observation (i=1,2,…,k)
Notations
Groups
Observation 1 2 . . . k
1 y11 y21 . . . yk1
2 y12 y22 . . . yk2
. . . . . . .
. . . . . . .
N y1n y2n . . . ykn
Total y1. y2. . . . yk. y..
Mean y1. y2. yk. y..
Where yi. is the sum of all observations for the ith group
yi.
yi. mean of the ith group
n
k k n
y.. grand total: y.. yi. yij
i 1 i 1 j 1
y..
y.. where N total number of observations nk
N
Model: y ij i ij i 1,2,..., k j 1,2,..., n
Grand mean
i the i th treatment effect
ij random error compenent
ij ~ N (0, 2 )
Hypotheses: For the above model the hypothesis is
The null hypothesis for an ANOVA always assumes the population means are equal. Since the
null hypothesis assumes all the means are equal, we could reject the null hypothesis if only mean
is not equal. Then we may write the null hypothesis and alternative hypothesis as:
H 0 : 1 2 .... k
H 1 : i j for atleast one i, j
2. Sum of squares between groups (SSB): Between Sum of Squares (or Treatment Sum of
Squares) – variation in the data between the different samples (or treatments).
k
yi. 2 y.. 2
SSB
i 1 n N
3. Sum of squares within groups OR Error sum of square (SSE): variation in the data from
each individual treatment.
Or SSE = ⅀ ∑ ( − . )
SSE SST SSB
The next step in an ANOVA is to compute the “average” sources of variation in the data using
SST, SSTR, and SSE.
Mean squares
1. Mean square between groups (MSB): average between variation” (k is the number of
columns in the data table or the number of groups )
SSB
MSB
k 1
2. Error mean square (MSE): average within variation”
SSE
MSE
N k
For a one-way ANOVA the test statistic is equal to the ratio of MSB and MSE. This is the ratio
of the “average between variations” to the “average within variation.” In addition, this ratio is
known to follow an F distribution. Hence,
MSB
Fcal
MSE
The intuition here is relatively straightforward. If the average between variation rises
relative to the average within variation, the F statistic will rise and so will our chance of
rejecting the null hypothesis.
Obtain the Critical Value
To find the critical value from an F distribution you must know the numerator (MSTR)
and denominator (MSE) degrees of freedom, along with the significance level. FCV has
df1 and df2 degrees of freedom, where df1 is the numerator degrees of freedom equal to
K-1 and df2 is the denominator degrees of freedom equal to N-K.
Decision Rule You reject the null hypothesis if: F (observed value) > FCV (critical value).
Use the analysis of procedure to test that there is difference between the means. α=0.05.
Solution:
Step1: State the null and alternative hypothesis
H 0 : 1 2 3
H 1 : i j for atleast one i, j
0.05
Step2: Rejection region: Reject H0 if Fcal F ,k 1, N k F0.05, 2,12 3.88
y.. 2 (1200) 2
96,000
N 15
k n
2 y.. 2
SST yij 96698 96000 698 with N 1 14 deg rees of freedom
i 1 j 1 N
2 2
SSB
k
yi.
y..
400
2
425
2
375
2
96,000 with k 1 2 df
i 1 n N 5 5 5
250
SSE SST SSB 698 250 448
Step7: Conclusion: At α=0.05, the three methods yield the same mean score.
2. A firm wishes to compare three programs for training workers to perform a certain manual
task. Twenty four new employees are randomly assigned to the training programs, with 8 in
each program. At the end of the training period, a test is conducted to see how quickly
trainees can perform the task. The resulting ANOVA table is given below:
MSB 4.215
Step3: Find mean squares Fcal 23.03
MSE 0.183
Step4: Decision Rule: Reject H0 since Fcal 23.03 3.47
Step5: Conclusion: At α=0.05, there is a difference in the mean time of performing the task.
Exercise: Given the following ANOVA table
Source of Variation Degrees of Sum of squares Mean square F-ratio
freedom
Between groups 2 - -
Error - - 32 Fcal 7.04
Total 11 -
a. Complete the ANOVA table
b. At α=0.05test whether there is difference between the means
3. A researcher wishes to try three different techniques to lower the blood pressure of
individuals diagnosed with high blood pressure. The subjects are randomly assigned to
three groups; the first group takes medication, the second group exercises, and the third
group follows a special diet. After four weeks, the reduction in each person’s blood
pressure is recorded. At a α=0.05, test the claim that there is no difference among the
means. The data are shown.
4. A firm wishes to compare three programs for training workers to perform a certain manual
task. Twenty four new employees are randomly assigned to the training programs, with 8 in
each program. At the end of the training period, a test is conducted to see how quickly
trainees can perform the task. The resulting ANOVA table is given below:
Source of Variation Degrees of Sum of squares Mean square F-ratio
freedom
Between groups 2 8.43 4.215
Error 21 3.84 0.183 Fcal 23.03
Total 23 12.27
At α=0.05 test the hypothesis that there is difference in the mean time of performing the task.
Solution:
Step1: State the null and alternative hypothesis
H 0 : 1 2 3
H 1 : i j for atleast one i, j
0.05
Step2: Rejection region: Reject H0 if Fcal F ,k 1, N k F0.05, 2 , 21 3.47
MSB 4.215
Step3: Find mean squares Fcal 23.03
MSE 0.183
Step4: Decision Rule: Reject H0 since Fcal 23.03 3.47
Step5: Conclusion: At α=0.05, there is a difference in the mean time of performing the task.
Exercise: Given the following ANOVA table
Source of Variation Degrees of Sum of squares Mean square F-ratio
freedom
Between groups 2 - -
Fcal 7.04
Error - - 32
Total 11 -
Scheffé Test
To conduct the Scheffé test, you must compare the means two at a time, using all possible
combinations of means. For example, if there are three means, the following comparisons must be
done
,
To find the critical value for the Scheffé test, multiply the critical value for the F test
by k-1
Example: Using the Scheffé test, test each pair of means in Example 12–1 to see whether a specific
difference exists, at α=0.05.
Solution
Tukey Test
The Tukey test can also be used after the analysis of variance has been completed to make pair wise
comparisons between means when the groups have the same sample size.
The symbol for the test value in the Tukey test is q
When the absolute value of q is greater than the critical value for the Tukey test, there is a significant
difference between the two means being compared.
Example: Using the Tukey test, test each pair of means in Example 12–1 to see whether a specific
difference exists, at α=0.05.
The Scheffé test is the most general, and it can be used when the samples are of different sizes
Furthermore, the Scheffé test can be used to make comparisons such as the average of and
compared with .
However, the Tukey test is more powerful than the Scheffé test for making pair wise comparisons
for the means. A rule of thumb for pair wise comparisons is to use the Tukey test when the
samples are equal in size and the Scheffé test when the samples differ in size.