One-way Anova: Inferences about More than Two Population Means - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

One-way Anova: Inferences about More than Two Population Means

Description:

Title: One-way Anova: Inferences about More than Two Population Means Author: kfan Last modified by: mi8357 Created Date: 11/7/2006 12:18:06 AM Document presentation ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 46
Provided by: kfan
Category:

less

Transcript and Presenter's Notes

Title: One-way Anova: Inferences about More than Two Population Means


1
The greatest blessing in life is in giving and
not taking.
1-Way Anova
1
2
One-Way Analysis of Variance
Y DEPENDENT VARIABLE (yield)
(response variable) (quality
indicator) X INDEPENDENT VARIABLE (A
possibly influential FACTOR)
2
3
OBJECTIVE To determine the impact of X on
Y Mathematical Model Y f (x, ?) , where ?
(impact of) all factors other than X
Ex Y Battery Life
(hours) X Brand of Battery ?? Many other
factors (possibly, some were
unaware of)
4
Completely Randomized Design (CRD)
  • Goal to study the effect of Factor X
  • The same of observations are taken randomly and
    independently from the individuals at each level
    of Factor X
  • i.e. n1n2nc (c levels)

4
5
Example Y LIFETIME (HOURS)
BRAND
3 replications per level
5
6
Analysis of Variance
6
7
Statistical Model
C levels OF BRAND R observations for each level
1 2        R
1 2 C
Y11 Y12 Y1R
Yij ?? ?i ?ij i 1, . . . . . , C j 1, .
. . . . , R
Y21 YcI

Yij
YcR
                       
7
8
Where ?? OVERALL AVERAGE i index
for FACTOR (Brand) LEVEL j? index for
replication ?i Differential effect
associated with ith level of X (Brand i)
mi m and ?ij noise or error
due to other factors associated with the
(i,j)th data value.
mi AVERAGE associated with ith level of X
(brand i) m AVERAGE of mi s.

8
9
Yij ? ?i ?ij
By definition, ???i 0
C
i1
The experiment produces R x C Yij data values.
The analysis produces estimates of
?,????????????????c?. (We can then get estimates
of the ?ij by subtraction).
9
10
Let Y1, Y2, etc., be level means
Y ??Y i /C GRAND MEAN (assuming same
data points in each column) (otherwise, Y
mean of all the data)
c
i1
10
11
MODEL Yij ? ?i ?ij
Y estimates ?
Yi - Y estimates??i ( mi m)
(for all i)
These estimates are based on Gauss (1796)
PRINCIPLE OF LEAST SQUARES and on COMMON SENSE
11
12
MODEL Yij ? ?j ?ij
If you insert the estimates into the MODEL, (1)
Yij Y (Yj - Y ) ?ij.
lt
it follows that our estimate of ?ij is (2) ?ij
Yij Yj, called residual
lt
12
13
Then, Yij Y (Yi - Y ) ( Yij - Yi) or,
(Yij - Y ) (Yi - Y) (Yij - Yi )



(3)
Variability in Y associated with all other
factors
Variability in Y associated with X
TOTAL VARIABILITY in Y


13
14
If you square both sides of (3), and double sum
both sides (over i and j), you get, after some
unpleasant algebra, but lots of terms which
cancel

C
C R
C R
?????(Yij - Y )2 R ? (Yi - Y)2 ??????(Yij
- Yi)2

i1
i1 j1
i1 j1
TSS TOTAL SUM OF SQUARES
SSB SUM OF SQUARES BETWEEN SAMPLES
SSW (SSE) SUM OF SQUARES WITHIN SAMPLES
(


(
(
(
(
(
14
15
ANOVA TABLE
SOURCE OF VARIABILITY
Mean square
(M.S.)
SSQ
DF
Between samples (due to brand)
SSB
SSB
C - 1
MSB

C - 1
Within samples (due to error)
SSW
MSW

(R - 1) C
SSW
(R-1)C
TOTAL TSS RC -1
15
16
Example Y LIFETIME (HOURS)
BRAND
3 replications per level
SSB 3 ( 2.6 - 5.82 4.6 - 5.8 2
7.4 - 5.82) 3 (23.04)
69.12
16
17
SSW ?
(1.8 - 2.6)2 .64 (4.2 - 4.6)2 .16
(9.0 -7.4)2 2.56 (5.0 - 2.6)2 5.76
(5.4 - 4.6)2 .64 (7.4 - 7.4)2 0 (1.0
- 2.6)2 2.56 (4.2 - 4.6)2 .16
(5.8 - 7.4)2 2.56
8.96 .96
5.12
Total of (8.96 .96 5.12), SSW
46.72
17
18
ANOVA TABLE
Source of Variability
df
M.S.
SSQ
7 8 - 1
69.12
BRAND
9.87
ERROR
2.92
16 2 (8)
46.72
TOTAL 115.84 23 (3 8)
-1
18
19
We can show
VCOL

E (MSB) ?2
MEASURE OF DIFFERENCES AMONG LEVEL MEANS
(
R
(
??(?i - ?)2

C-1
?i
E (MSW) ?2
(Assuming Yij follows N(?j , ?2) and they are
independent)
19
20
E ( MSBC ) ?2 VCOL E ( MSW ) ?2
This suggests that
Theres some evidence of non-zero VCOL, or level
of X affects Y
if MSBC
gt 1 ,
MSW
if MSBC
No evidence that VCOL gt 0, or that level of X
affects Y
lt 1 ,
MSW
20
21
With HO Level of X has no
impact on Y HI
Level of X does have
impact on Y,
We need
MSBC
gt gt 1
MSW
to reject HO.
21
22
More Formally,
HO ?1 ?2 ?c 0 HI not all ?j 0
OR
(All level means are equal)
HO ?1 ?2 ?c HI not all ?j are
EQUAL
22
23
The distribution of
MSB
Fcalc , is
MSW
The F - distribution with (C-1, (R-1)C)
degrees of freedom
?
Assuming HO true.
C Table Value
23
24
In our problem ANOVA TABLE
Source of Variability
M.S.
Fcalc
SSQ
df
7
69.12
BRAND
9.87
3.38
ERROR
2.92 9.87 2.92
16
46.72
24
25
F table table 8
? .05
C 2.66 3.38
(7,16 DF)
25
26
Hence, at ? .05, Reject Ho . (i.e., Conclude
that level of BRAND does have an impact on
battery lifetime.)
26
27
MINITAB INPUT
  • life brand
  • 1.8 1
  • 5.0 1
  • 1.0 1
  • 4.2 2
  • 5.4 2
  • 4.2 2
  • . .
  • . .
  • . .
  • 9.0 8
  • 7.4 8
  • 5.8 8

27
28
ONE FACTOR ANOVA (MINITAB)
MINITAB STATgtgtANOVAgtgtONE-WAY
Analysis of Variance for life Source DF
SS MS F P brand 7
69.12 9.87 3.38 0.021 Error 16
46.72 2.92 Total 23 115.84
Estimate of the common variance s2
28
29
29
30
Assumptions
MODEL Yij ? ?i ?ij
1.) the ?ij are indep. random variables 2.) Each
?ij is Normally Distributed E(?ij) 0 for all
i, j 3.) ?2(?ij) constant for all i, j
Run order plot
Normality plot test
Residual plot test
30
31
Diagnosis Normality
  • The points on the normality plot must more or
    less follow a line to claim normal distributed.
  • There are statistic tests to verify it
    scientifically.
  • The ANOVA method we learn here is not sensitive
    to the normality assumption. That is, a mild
    departure from the normal distribution will not
    change our conclusions much.

Normal probability plot normality test of
residuals
31
32
Minitab statgtgtbasic statisticsgtgtnormality test
32
33
Diagnosis Constant Variances
  • The points on the residual plot must be more or
    less within a horizontal band to claim constant
    variances.
  • There are statistic tests to verify it
    scientifically.
  • The ANOVA method we learn here is not sensitive
    to the constant variances assumption. That is,
    slightly different variances within groups will
    not change our conclusions much.

Tests and Residual plot fitted values vs.
residuals
33
34
Minitab Stat gtgt Anova gtgt One-way
34
35
Minitab Statgtgt Anovagtgt Test for Equal variances
35
36
Diagnosis Randomness/Independence
  • The run order plot must show no systematic
    patterns to claim randomness.
  • There are statistic tests to verify it
    scientifically.
  • The ANOVA method is sensitive to the randomness
    assumption. That is, a little level of dependence
    between data points will change our conclusions a
    lot.

Run order plot order vs. residuals
36
37
Minitab Stat gtgt Anova gtgt One-way
37
38
KRUSKAL - WALLIS TEST
(Non - Parametric Alternative)
HO The probability distributions are
identical for each level of the factor HI
Not all the distributions are the same
38
39
Brand
A B C 32 32
28 30 32 21 30 26
15 29 26 15 26 22
14 23 20 14 20
19 14 19 16 11 18
14 9 12 14 8
BATTERY LIFETIME (hours) (each column rank
ordered, for simplicity)
Mean 23.9 22.1 14.9 (here,
irrelevant!!)
39
40
HO no difference in distribution among the
three brands with

respect to battery lifetime HI At
least one of the 3 brands differs in
distribution from the others with respect to
lifetime
40
41
Ranks in ( )
Brand
A B
C 32 (29)
32 (29) 28
(24) 30 (26.5) 32 (29)
21 (18) 30 (26.5) 26 (22)
15 (10.5) 29 (25) 26
(22) 15 (10.5) 26 (22) 22 (19) 14
(7) 23 (20) 20 (16.5) 14 (7) 20
(16.5) 19 (14.5) 14 (7) 19 (14.5)
16 (12) 11 (3) 18 (13) 14 (7)
9 (2) 12 (4) 14 (7) 8 (1)
T1 197 T2 178 T3 90 n1
10 n2 10 n3 10
41
42
TEST STATISTIC
K
12
? (Tj2/nj ) - 3 (N 1)
H
N (N 1)
j 1
nj data values in column j N ??nj K
Columns (levels) Tj SUM OF RANKS OF DATA ON COL
j When all DATA COMBINED (There is
a slight adjustment in the formula as a function
of the number of ties in rank.)
K
j 1
42
43
H

12 197 2 178 2 902 30 (31)
10 10 10



- 3 (31)
8.41 (with adjustment for ties, we get 8.46)
43
44
What do we do with H?
We can show that, under HO , H is well
approximated by a ?2 distribution with
df K - 1.
Here, df 2, and at ?? .05, the critical value
5.99
8
Reject HO conclude that mean lifetime NOT the
same for all 3 BRANDS
44
45
Minitab Stat gtgt Nonparametrics gtgt Kruskal-Wallis
  • Kruskal-Wallis Test life versus brand
  • Kruskal-Wallis Test on life
  • brand N Median AveRank Z
  • 1 3 1.800 4.5 -2.09
  • 2 3 4.200 7.8 -1.22
  • 3 3 4.600 11.8 -0.17
  • 4 3 7.000 16.5 1.05
  • 5 3 6.600 13.3 0.22
  • 6 3 4.200 7.8 -1.22
  • 7 3 7.800 20.0 1.96
  • 8 3 7.400 18.2 1.48
  • Overall 24 12.5
  • H 12.78 DF 7 P 0.078
  • H 13.01 DF 7 P 0.072 (adjusted for ties)

45
Write a Comment
User Comments (0)
About PowerShow.com