Title: One-way Anova: Inferences about More than Two Population Means
1The greatest blessing in life is in giving and
not taking.
1-Way Anova
1
2One-Way Analysis of Variance
Y DEPENDENT VARIABLE (yield)
(response variable) (quality
indicator) X INDEPENDENT VARIABLE (A
possibly influential FACTOR)
2
3OBJECTIVE To determine the impact of X on
Y Mathematical Model Y f (x, ?) , where ?
(impact of) all factors other than X
Ex Y Battery Life
(hours) X Brand of Battery ?? Many other
factors (possibly, some were
unaware of)
4Completely Randomized Design (CRD)
- Goal to study the effect of Factor X
- The same of observations are taken randomly and
independently from the individuals at each level
of Factor X - i.e. n1n2nc (c levels)
4
5Example Y LIFETIME (HOURS)
BRAND
3 replications per level
5
6Analysis of Variance
6
7Statistical Model
C levels OF BRAND R observations for each level
1 2 R
1 2 C
Y11 Y12 Y1R
Yij ?? ?i ?ij i 1, . . . . . , C j 1, .
. . . . , R
Y21 YcI
Yij
YcR
7
8Where ?? OVERALL AVERAGE i index
for FACTOR (Brand) LEVEL j? index for
replication ?i Differential effect
associated with ith level of X (Brand i)
mi m and ?ij noise or error
due to other factors associated with the
(i,j)th data value.
mi AVERAGE associated with ith level of X
(brand i) m AVERAGE of mi s.
8
9Yij ? ?i ?ij
By definition, ???i 0
C
i1
The experiment produces R x C Yij data values.
The analysis produces estimates of
?,????????????????c?. (We can then get estimates
of the ?ij by subtraction).
9
10Let Y1, Y2, etc., be level means
Y ??Y i /C GRAND MEAN (assuming same
data points in each column) (otherwise, Y
mean of all the data)
c
i1
10
11MODEL Yij ? ?i ?ij
Y estimates ?
Yi - Y estimates??i ( mi m)
(for all i)
These estimates are based on Gauss (1796)
PRINCIPLE OF LEAST SQUARES and on COMMON SENSE
11
12MODEL Yij ? ?j ?ij
If you insert the estimates into the MODEL, (1)
Yij Y (Yj - Y ) ?ij.
lt
it follows that our estimate of ?ij is (2) ?ij
Yij Yj, called residual
lt
12
13Then, Yij Y (Yi - Y ) ( Yij - Yi) or,
(Yij - Y ) (Yi - Y) (Yij - Yi )
(3)
Variability in Y associated with all other
factors
Variability in Y associated with X
TOTAL VARIABILITY in Y
13
14If you square both sides of (3), and double sum
both sides (over i and j), you get, after some
unpleasant algebra, but lots of terms which
cancel
C
C R
C R
?????(Yij - Y )2 R ? (Yi - Y)2 ??????(Yij
- Yi)2
i1
i1 j1
i1 j1
TSS TOTAL SUM OF SQUARES
SSB SUM OF SQUARES BETWEEN SAMPLES
SSW (SSE) SUM OF SQUARES WITHIN SAMPLES
(
(
(
(
(
(
14
15ANOVA TABLE
SOURCE OF VARIABILITY
Mean square
(M.S.)
SSQ
DF
Between samples (due to brand)
SSB
SSB
C - 1
MSB
C - 1
Within samples (due to error)
SSW
MSW
(R - 1) C
SSW
(R-1)C
TOTAL TSS RC -1
15
16Example Y LIFETIME (HOURS)
BRAND
3 replications per level
SSB 3 ( 2.6 - 5.82 4.6 - 5.8 2
7.4 - 5.82) 3 (23.04)
69.12
16
17SSW ?
(1.8 - 2.6)2 .64 (4.2 - 4.6)2 .16
(9.0 -7.4)2 2.56 (5.0 - 2.6)2 5.76
(5.4 - 4.6)2 .64 (7.4 - 7.4)2 0 (1.0
- 2.6)2 2.56 (4.2 - 4.6)2 .16
(5.8 - 7.4)2 2.56
8.96 .96
5.12
Total of (8.96 .96 5.12), SSW
46.72
17
18ANOVA TABLE
Source of Variability
df
M.S.
SSQ
7 8 - 1
69.12
BRAND
9.87
ERROR
2.92
16 2 (8)
46.72
TOTAL 115.84 23 (3 8)
-1
18
19We can show
VCOL
E (MSB) ?2
MEASURE OF DIFFERENCES AMONG LEVEL MEANS
(
R
(
??(?i - ?)2
C-1
?i
E (MSW) ?2
(Assuming Yij follows N(?j , ?2) and they are
independent)
19
20E ( MSBC ) ?2 VCOL E ( MSW ) ?2
This suggests that
Theres some evidence of non-zero VCOL, or level
of X affects Y
if MSBC
gt 1 ,
MSW
if MSBC
No evidence that VCOL gt 0, or that level of X
affects Y
lt 1 ,
MSW
20
21With HO Level of X has no
impact on Y HI
Level of X does have
impact on Y,
We need
MSBC
gt gt 1
MSW
to reject HO.
21
22More Formally,
HO ?1 ?2 ?c 0 HI not all ?j 0
OR
(All level means are equal)
HO ?1 ?2 ?c HI not all ?j are
EQUAL
22
23The distribution of
MSB
Fcalc , is
MSW
The F - distribution with (C-1, (R-1)C)
degrees of freedom
?
Assuming HO true.
C Table Value
23
24In our problem ANOVA TABLE
Source of Variability
M.S.
Fcalc
SSQ
df
7
69.12
BRAND
9.87
3.38
ERROR
2.92 9.87 2.92
16
46.72
24
25F table table 8
? .05
C 2.66 3.38
(7,16 DF)
25
26Hence, at ? .05, Reject Ho . (i.e., Conclude
that level of BRAND does have an impact on
battery lifetime.)
26
27MINITAB INPUT
- life brand
- 1.8 1
- 5.0 1
- 1.0 1
- 4.2 2
- 5.4 2
- 4.2 2
- . .
- . .
- . .
- 9.0 8
- 7.4 8
- 5.8 8
27
28ONE FACTOR ANOVA (MINITAB)
MINITAB STATgtgtANOVAgtgtONE-WAY
Analysis of Variance for life Source DF
SS MS F P brand 7
69.12 9.87 3.38 0.021 Error 16
46.72 2.92 Total 23 115.84
Estimate of the common variance s2
28
2929
30Assumptions
MODEL Yij ? ?i ?ij
1.) the ?ij are indep. random variables 2.) Each
?ij is Normally Distributed E(?ij) 0 for all
i, j 3.) ?2(?ij) constant for all i, j
Run order plot
Normality plot test
Residual plot test
30
31Diagnosis Normality
- The points on the normality plot must more or
less follow a line to claim normal distributed.
- There are statistic tests to verify it
scientifically. - The ANOVA method we learn here is not sensitive
to the normality assumption. That is, a mild
departure from the normal distribution will not
change our conclusions much.
Normal probability plot normality test of
residuals
31
32Minitab statgtgtbasic statisticsgtgtnormality test
32
33Diagnosis Constant Variances
- The points on the residual plot must be more or
less within a horizontal band to claim constant
variances. - There are statistic tests to verify it
scientifically. - The ANOVA method we learn here is not sensitive
to the constant variances assumption. That is,
slightly different variances within groups will
not change our conclusions much.
Tests and Residual plot fitted values vs.
residuals
33
34Minitab Stat gtgt Anova gtgt One-way
34
35Minitab Statgtgt Anovagtgt Test for Equal variances
35
36Diagnosis Randomness/Independence
- The run order plot must show no systematic
patterns to claim randomness. - There are statistic tests to verify it
scientifically. - The ANOVA method is sensitive to the randomness
assumption. That is, a little level of dependence
between data points will change our conclusions a
lot.
Run order plot order vs. residuals
36
37Minitab Stat gtgt Anova gtgt One-way
37
38KRUSKAL - WALLIS TEST
(Non - Parametric Alternative)
HO The probability distributions are
identical for each level of the factor HI
Not all the distributions are the same
38
39Brand
A B C 32 32
28 30 32 21 30 26
15 29 26 15 26 22
14 23 20 14 20
19 14 19 16 11 18
14 9 12 14 8
BATTERY LIFETIME (hours) (each column rank
ordered, for simplicity)
Mean 23.9 22.1 14.9 (here,
irrelevant!!)
39
40HO no difference in distribution among the
three brands with
respect to battery lifetime HI At
least one of the 3 brands differs in
distribution from the others with respect to
lifetime
40
41Ranks in ( )
Brand
A B
C 32 (29)
32 (29) 28
(24) 30 (26.5) 32 (29)
21 (18) 30 (26.5) 26 (22)
15 (10.5) 29 (25) 26
(22) 15 (10.5) 26 (22) 22 (19) 14
(7) 23 (20) 20 (16.5) 14 (7) 20
(16.5) 19 (14.5) 14 (7) 19 (14.5)
16 (12) 11 (3) 18 (13) 14 (7)
9 (2) 12 (4) 14 (7) 8 (1)
T1 197 T2 178 T3 90 n1
10 n2 10 n3 10
41
42TEST STATISTIC
K
12
? (Tj2/nj ) - 3 (N 1)
H
N (N 1)
j 1
nj data values in column j N ??nj K
Columns (levels) Tj SUM OF RANKS OF DATA ON COL
j When all DATA COMBINED (There is
a slight adjustment in the formula as a function
of the number of ties in rank.)
K
j 1
42
43H
12 197 2 178 2 902 30 (31)
10 10 10
- 3 (31)
8.41 (with adjustment for ties, we get 8.46)
43
44What do we do with H?
We can show that, under HO , H is well
approximated by a ?2 distribution with
df K - 1.
Here, df 2, and at ?? .05, the critical value
5.99
8
Reject HO conclude that mean lifetime NOT the
same for all 3 BRANDS
44
45Minitab Stat gtgt Nonparametrics gtgt Kruskal-Wallis
- Kruskal-Wallis Test life versus brand
- Kruskal-Wallis Test on life
- brand N Median AveRank Z
- 1 3 1.800 4.5 -2.09
- 2 3 4.200 7.8 -1.22
- 3 3 4.600 11.8 -0.17
- 4 3 7.000 16.5 1.05
- 5 3 6.600 13.3 0.22
- 6 3 4.200 7.8 -1.22
- 7 3 7.800 20.0 1.96
- 8 3 7.400 18.2 1.48
- Overall 24 12.5
- H 12.78 DF 7 P 0.078
- H 13.01 DF 7 P 0.072 (adjusted for ties)
45