0% found this document useful (0 votes)
10 views

Chapter 4 Stat

Chapter 4 discusses Analysis of Variance (ANOVA), a statistical method used to test the equality of two or more population means by examining sample variances. It covers the F-distribution, one-way ANOVA assumptions, hypotheses, and calculations for sum of squares and mean squares, culminating in the ANOVA table. The chapter also includes examples illustrating the application of ANOVA in comparing means across different groups.

Uploaded by

Tigist G
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 4 Stat

Chapter 4 discusses Analysis of Variance (ANOVA), a statistical method used to test the equality of two or more population means by examining sample variances. It covers the F-distribution, one-way ANOVA assumptions, hypotheses, and calculations for sum of squares and mean squares, culminating in the ANOVA table. The chapter also includes examples illustrating the application of ANOVA in comparing means across different groups.

Uploaded by

Tigist G
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Chapter 4: Analysis of Variance

Introduction
Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two
or more population (or treatment) means by examining the variances of samples that are taken.
ANOVA allows one to determine whether the differences between the samples are simply due to
random error (sampling errors) or whether there are systematic treatment effects that cause the
mean in one group to differ from the mean in another.
Most of the time ANOVA is used to compare the equality of three or more means, however
when the means from two samples are compared using ANOVA it is equivalent to using a t-test
to compare the means of independent samples.
ANOVA (analysis of variance) is used to compare the means of several populations by
partitioning the sum of squares. One-way analysis of variance is based on assessing how much of
the overall variation in the data is attributable to differences between the group means, and
comparing this with the amount attributable to differences between individuals in the same
group. Analysis of variance is a statistical technique used to compare more than two population
means by isolating the sources of variability. The ANOVA technique requires partitioning of
total variance into component parts.
The F-distribution
The F-distribution is obtained from the ratio of two independent chi-square variables divided by
their respective degrees of freedom. Assume that X ~  2 m and Y ~  2 n independently, then
X
F n ~ F where F represents the F-distribution with m degrees of freedom on the the
m ,n m,n
Y
m
numerator and n degrees of freedom on the denominator.

Theorem: Let x1 , x2 ,..., xn be a random sample from a normal population with var iance  1 2 and
y1 , y2 ,..., yn be a random sample from a normal population with var iance  2 2 , and let S12 and
2
S1
 12
S22 denote the two sample variances. Then the random variable F  2
has an F-
S2
 22
distribution with n1  1 degrees of freedom on the numerator and n2  1 degrees of freedom on
the denominator.
Note: The F-distribution is non-negative and non-symmetric.
One-Way ANOVA
Assumptions
1. Independent samples are taken using random sampling.
2. For each population, the variable under consideration is normally distributed.
3. The standard deviations of the variable under consideration are the same for all the
populations [assumption of equal variance (homogeneity of variance)].
The null hypothesis is that the several population means are mutually equal. The assumption
underlying the use of the analysis of variance is that the several sample means were obtained
from normally distributed populations having the same variance.
Suppose there are k groups to be compared. Let n observations are obtained from each of the k
groups and yij denote the jth (j=1,2, …,n) observation from the ith observation (i=1,2,…,k)
Notations
Groups
Observation 1 2 . . . k
1 y11 y21 . . . yk1
2 y12 y22 . . . yk2
. . . . . . .
. . . . . . .
N y1n y2n . . . ykn
Total y1. y2. . . . yk. y..
Mean y1. y2. yk. y..

Where yi. is the sum of all observations for the ith group
yi.
yi.  mean of the ith group
n
k k n
y.. grand total: y..   yi.   yij
i 1 i 1 j 1
y..
y..  where N  total number of observations  nk
N
Model: y ij     i   ij i  1,2,..., k j  1,2,..., n

  Grand mean
 i  the i th treatment effect
 ij  random error compenent
 ij ~ N (0,  2 )
Hypotheses: For the above model the hypothesis is
The null hypothesis for an ANOVA always assumes the population means are equal. Since the
null hypothesis assumes all the means are equal, we could reject the null hypothesis if only mean
is not equal. Then we may write the null hypothesis and alternative hypothesis as:
H 0 : 1   2  ....   k
H 1 :  i   j for atleast one i, j

Calculate the appropriate test statistic


The test statistic in ANOVA is the ratio of the between and within variation in the data. It
follows an F distribution.
Calculation of sum of squares and mean squares
1. Total sum of squares (SST): Total Sum of Squares – the total variation in the data. It is
the sum of the between and within variation.
k n
2 y.. 2
SST    yij 
i 1 j 1 N

2. Sum of squares between groups (SSB): Between Sum of Squares (or Treatment Sum of
Squares) – variation in the data between the different samples (or treatments).
k
yi. 2 y.. 2
SSB   
i 1 n N
3. Sum of squares within groups OR Error sum of square (SSE): variation in the data from
each individual treatment.
Or SSE = ⅀ ∑ ( − . )
SSE  SST  SSB
The next step in an ANOVA is to compute the “average” sources of variation in the data using
SST, SSTR, and SSE.
Mean squares
1. Mean square between groups (MSB): average between variation” (k is the number of
columns in the data table or the number of groups )
SSB
MSB 
k 1
2. Error mean square (MSE): average within variation”
SSE
MSE 
N k
For a one-way ANOVA the test statistic is equal to the ratio of MSB and MSE. This is the ratio
of the “average between variations” to the “average within variation.” In addition, this ratio is
known to follow an F distribution. Hence,
MSB
Fcal 
MSE

 The intuition here is relatively straightforward. If the average between variation rises
relative to the average within variation, the F statistic will rise and so will our chance of
rejecting the null hypothesis.
Obtain the Critical Value

 To find the critical value from an F distribution you must know the numerator (MSTR)
and denominator (MSE) degrees of freedom, along with the significance level. FCV has
df1 and df2 degrees of freedom, where df1 is the numerator degrees of freedom equal to
K-1 and df2 is the denominator degrees of freedom equal to N-K.

 Decision Rule You reject the null hypothesis if: F (observed value) > FCV (critical value).

The Analysis of Variance Table (ANOVA table)


Source of Variation Degrees of Sum of squares Mean square F-ratio
freedom
Between groups k-1 SSB MSB MSB
Fcal 
MSE
Error N-k SSE MSE
Total N-1 SST

Rejection Rule: Reject H0 if Fcal  F , k 1, N  k


Examples
1. Fifteen trainers in a technical program are randomly assigned to three different types of
instructional approaches. The achievement of the test score at the end of the program is
given below
Instructional methods
A1 A2 A3
86 90 82
79 76 68
81 88 73
70 82 71
84 89 85
Total 400 425 375
Mean 80 85 75

Use the analysis of procedure to test that there is difference between the means. α=0.05.
Solution:
Step1: State the null and alternative hypothesis
H 0 : 1   2   3
H 1 :  i   j for atleast one i, j
  0.05
Step2: Rejection region: Reject H0 if Fcal  F ,k 1, N  k  F0.05, 2,12  3.88

Step3: Find the sum of squares


k n
2
 y ij  86 2  79 2  ...  812  96698
i 1 j 1

y.. 2 (1200) 2
  96,000
N 15
k n
2 y.. 2
SST   yij   96698  96000  698 with N  1  14 deg rees of freedom
i 1 j 1 N
2 2

SSB  
k
yi.

y..

400 
2

425
2

375
2
 96,000 with k  1  2 df
i 1 n N 5 5 5
 250
SSE  SST  SSB  698  250  448

Step4: Find mean squares


SSB 250 SSE 448
MSB    125 and MSE    37.33
k 1 2 N  k 12
MSB 125
 Fcal    3.35
MSE 37.33
Step5: ANOVA table
Source of Variation Degrees of Sum of squares Mean square F-ratio
freedom
Between groups 2 250 125
Error 12 448 37.33 Fcal  3.35
Total 14 698

Step6: Decision Rule: Do not reject H0 since Fcal  3.35  3.88

Step7: Conclusion: At α=0.05, the three methods yield the same mean score.
2. A firm wishes to compare three programs for training workers to perform a certain manual
task. Twenty four new employees are randomly assigned to the training programs, with 8 in
each program. At the end of the training period, a test is conducted to see how quickly
trainees can perform the task. The resulting ANOVA table is given below:

Source of Variation Degrees of Sum of squares Mean square F-ratio


freedom
Between groups 2 8.43 4.215
Error 21 3.84 0.183 Fcal  23.03
Total 23 12.27
At α=0.05 test the hypothesis that there is difference in the mean time of performing the task.
Solution:
Step1: State the null and alternative hypothesis
H 0 : 1   2   3
H 1 :  i   j for atleast one i, j
  0.05
Step2: Rejection region: Reject H0 if Fcal  F ,k 1, N  k  F0.05, 2 , 21  3.47

MSB 4.215
Step3: Find mean squares Fcal    23.03
MSE 0.183
Step4: Decision Rule: Reject H0 since Fcal  23.03  3.47

Step5: Conclusion: At α=0.05, there is a difference in the mean time of performing the task.
Exercise: Given the following ANOVA table
Source of Variation Degrees of Sum of squares Mean square F-ratio
freedom
Between groups 2 - -
Error - - 32 Fcal  7.04
Total 11 -
a. Complete the ANOVA table
b. At α=0.05test whether there is difference between the means
3. A researcher wishes to try three different techniques to lower the blood pressure of
individuals diagnosed with high blood pressure. The subjects are randomly assigned to
three groups; the first group takes medication, the second group exercises, and the third
group follows a special diet. After four weeks, the reduction in each person’s blood
pressure is recorded. At a α=0.05, test the claim that there is no difference among the
means. The data are shown.
4. A firm wishes to compare three programs for training workers to perform a certain manual
task. Twenty four new employees are randomly assigned to the training programs, with 8 in
each program. At the end of the training period, a test is conducted to see how quickly
trainees can perform the task. The resulting ANOVA table is given below:
Source of Variation Degrees of Sum of squares Mean square F-ratio
freedom
Between groups 2 8.43 4.215
Error 21 3.84 0.183 Fcal  23.03
Total 23 12.27
At α=0.05 test the hypothesis that there is difference in the mean time of performing the task.
Solution:
Step1: State the null and alternative hypothesis
H 0 : 1   2   3
H 1 :  i   j for atleast one i, j
  0.05
Step2: Rejection region: Reject H0 if Fcal  F ,k 1, N  k  F0.05, 2 , 21  3.47

MSB 4.215
Step3: Find mean squares Fcal    23.03
MSE 0.183
Step4: Decision Rule: Reject H0 since Fcal  23.03  3.47

Step5: Conclusion: At α=0.05, there is a difference in the mean time of performing the task.
Exercise: Given the following ANOVA table
Source of Variation Degrees of Sum of squares Mean square F-ratio
freedom
Between groups 2 - -
Fcal  7.04
Error - - 32
Total 11 -

c. Complete the ANOVA table


d. At α=0.05test whether there is difference between the means
Determining Which Mean(s) Is/Are Different? OR Multiple Comparisons
Determining Which Mean(s) Is/Are Different: If you fail to reject the null hypothesis in an
ANOVA then you are done. You know, with some level of confidence, that the treatment means
are statistically equal. However, if you reject the null then you must conduct a separate test to
determine which mean(s) is/are different. There are several techniques for testing the differences
between means, but the most common test is the Least Significant Difference Test, TUKEY Test
and Scheffé test.
Fisher’s LSD (Least Significant Difference)
Fisher’s LSD is a method for comparing treatment group means after The ANOVA null hypothesis of
equal means has been rejected using the ANOVA F-test. If the F-test fails to reject the null hypothesis this
procedure should not be used.
Anova F-test
Let’s assume we have 4 treatment groups A, B, C, and D. The summary statistics for the
groups are
Since the F-test greater than F Critical, we reject the null hypothesis
At this point we are interested in doing pair wise comparisons of the means.
At this point we are interested in doing pair wise comparisons of the means. That is, we want to test
hypotheses of the sort : = , : = , : = .
The LSD method for testing the hypothesis : = proceed as follows
Exercise : Suppose the National Transportation Safety Board (NTSB) wants to examine the
safety of compact cars, midsize cars, and full-size cars. It collects a sample of three for each of
the treatments (cars types). Using the hypothetical data provided below, test whether the mean
pressure applied to the driver’s head during a crash test is equal for each types of car. Use α =
5%. Conduct the least significant difference?

Scheffé Test
 To conduct the Scheffé test, you must compare the means two at a time, using all possible
combinations of means. For example, if there are three means, the following comparisons must be
done
,
To find the critical value for the Scheffé test, multiply the critical value for the F test

by k-1
Example: Using the Scheffé test, test each pair of means in Example 12–1 to see whether a specific
difference exists, at α=0.05.
Solution
Tukey Test
The Tukey test can also be used after the analysis of variance has been completed to make pair wise
comparisons between means when the groups have the same sample size.
The symbol for the test value in the Tukey test is q

When the absolute value of q is greater than the critical value for the Tukey test, there is a significant
difference between the two means being compared.
Example: Using the Tukey test, test each pair of means in Example 12–1 to see whether a specific
difference exists, at α=0.05.

 The Scheffé test is the most general, and it can be used when the samples are of different sizes
Furthermore, the Scheffé test can be used to make comparisons such as the average of and
compared with .
 However, the Tukey test is more powerful than the Scheffé test for making pair wise comparisons
for the means. A rule of thumb for pair wise comparisons is to use the Tukey test when the
samples are equal in size and the Scheffé test when the samples differ in size.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy