Always be mindful of the kindness and not the faults of others' - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Always be mindful of the kindness and not the faults of others'

Description:

degrees of freedom. Assuming. HO true. C = Table Value. In our problem: ANOVA TABLE ... Tj = SUM OF RANKS OF DATA ON COL j When all DATA COMBINED ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 45
Provided by: kfan
Category:

less

Transcript and Presenter's Notes

Title: Always be mindful of the kindness and not the faults of others'


1
Always be mindful of the kindness and not the
faults of others.
2
One-way Anova Inferences about More than Two
Population Means
  • Model and test for one-way anova
  • Assumption checking
  • Nonparamateric alternative

3
Analysis of Variance One Factor Designs
Y DEPENDENT VARIABLE (yield)
(response variable) (quality
indicator) X INDEPENDENT VARIABLE (A
possibly influential FACTOR)
4
OBJECTIVE To determine the impact of X on
Y Mathematical Model Y f (x, ?) , where ?
(impact of) all factors other than X
Ex Y Battery Life
(hours) X Brand of Battery ?? Many other
factors (possibly, some were
unaware of)
5
Statistical Model
(Brand is, of course, represented as
categorical)
LEVEL OF BRAND
1 2        C
1 2 R
Y11 Y12 Y1c
Yij ?? ?j ?ij i 1, . . . . . , R j 1, .
. . . . , C
Y21 YRI

Yij
YRc
                       
6
Where ?? OVERALL AVERAGE j index
for FACTOR (Brand) LEVEL i? index for
replication ?j Differential effect
(response) associated with jth level of
X and ?ij noise or error
associated with the (particular) (i,j)th
data value.
Let mj AVERAGE associated with jth level of X
? tj mj m and m AVERAGE of mj .

7
Yij ? ?j ?ij
By definition, ???j 0
C
j1
The experiment produces R x C Yij data values.
The analysis produces estimates of
?,????????????????c?. (We can then get estimates
of the ?ij by subtraction).
8
2
C
3
1

Y11 Y12 Y1c

Y21 YRI
YRc
                 
      
Y c
(Y j)
      
Y 1
Y 2
Y1, Y2, etc., are Column Means
              
9
Y ??Y j /C GRAND MEAN (assuming same
data points in each column) (otherwise, Y
mean of all the data)
c
j1
10
MODEL Yij ? ?j ?ij
Y estimates ?
Yj - Y estimates??j ( mj m)
(for all j)
These estimates are based on Gauss (1796)
PRINCIPLE OF LEAST SQUARES and on COMMON SENSE
11
MODEL Yij ? ?j ?ij
If you insert the estimates into the MODEL, (1)
Yij Y (Yj - Y ) ?ij.
lt
it follows that our estimate of ?ij is (2) ?ij
Yij - Yj
lt
12
Then, Yij Y (Yj - Y ) ( Yij - Yj) or,
(Yij - Y ) (Yj - Y) (Yij - Yj )



(3)
Variability in Y associated with all other
factors
Variability in Y associated with X
TOTAL VARIABILITY in Y


13
If you square both sides of (3), and double sum
both sides (over i and j), you get, after some
unpleasant algebra, but lots of terms which
cancel

C
C R
C R
?????(Yij - Y )2 R ? (Yj - Y)2 ??????(Yij
- Yj)2

j1
j1 i1
j1 i1
TSS TOTAL SUM OF SQUARES
SSBC SUM OF SQUARES BETWEEN COLUMNS
SSW (SSE) SUM OF SQUARES WITHIN COLUMNS
(


(
(
(
(
(
14
ANOVA TABLE
SOURCE OF VARIABILITY
Mean square
(M.S.)
SSQ
DF
Between Columns (due to brand)
SSBC
SSBC
C - 1
MSBC

C - 1
Within Columns (due to error)
SSW
MSW

(R - 1) C
SSW
(R-1)C
TOTAL TSS RC -1
15
Example Y LIFETIME (HOURS)
BRAND
3 replications per level
SSBC 3 ( 2.6 - 5.82 4.6 - 5.8 2
7.4 - 5.82) 3 (23.04)
69.12
16
SSW
(1.8 - 2.6)2 .64 (4.2 - 4.6)2 .16
(9.0 -7.4)2 2.56 (5.0 - 2.6)2 5.76
(5.4 - 4.6)2 .64 (7.4 - 7.4)2 0 (1.0
- 2.6)2 2.56 (4.2 - 4.6)2 .16
(5.8 - 7.4)2 2.56
8.96 .96
5.12
Total of (8.96 .96 5.12), SSW
46.72
17
ANOVA TABLE
Source of Variability
df
M.S.
SSQ
7 8 - 1
69.12
BRAND
9.87
ERROR
2.92
16 2 (8)
46.72
TOTAL 115.84 23 (3 8)
-1
18
We can show
VCOL

E (MSBC) ?2
MEASURE OF DIFFERENCES AMONG COLUMN MEANS
(
R
(
??(?j - ?)2

C-1
?j
E (MSW) ?2
(Assuming each Yij has (constant) standard
deviation, ?) (More about assumptions, Later)
19
E ( MSBC ) ?2 VCOL E ( MSW ) ?2
This suggests that
Theres some evidence of non-zero VCOL, or level
of X affects Y
if MSBC
gt 1 ,
MSW
if MSBC
No evidence that VCOL gt 0, or that level of X
affects Y
lt 1 ,
MSW
20
With HO Level of X has no
impact on Y HI
Level of X does have
impact on Y,
We need
MSBC
gt gt 1
MSW
to reject HO.
21
More Formally,
HO ?1 ?2 ?c 0 HI not all ?j 0
OR
(All column means are equal)
HO ?1 ?2 ?c HI not all ?j are
EQUAL
22
The distribution of
MSBC
Fcalc , is
MSW
The F - distribution with (C-1, (R-1)C)
degrees of freedom
?
Assuming HO true.
C Table Value
23
In our problem ANOVA TABLE
Source of Variability
M.S.
Fcalc
SSQ
df
7
69.12
BRAND
9.87
3.38
ERROR
2.92 9.87 2.92
16
46.72
24
F table table 8
? .05
C 2.66 3.38
(7,16 DF)
25
Hence, at ? .05, Reject Ho . (i.e., Conclude
that level of BRAND does have an impact on
battery lifetime.)
26
MINITAB INPUT
  • life brand
  • 1.8 1
  • 5.0 1
  • 1.0 1
  • 4.2 2
  • 5.4 2
  • 4.2 2
  • . .
  • . .
  • . .
  • 9.0 8
  • 7.4 8
  • 5.8 8

27
ONE FACTOR ANOVA (MINITAB)
MINITAB STATgtgtANOVAgtgtONE-WAY
Analysis of Variance for life Source DF
SS MS F P brand 7
69.12 9.87 3.38 0.021 Error 16
46.72 2.92 Total 23 115.84
Estimate of the common variance s2
28
(No Transcript)
29
Assumptions
MODEL Yij ? ?j ?ij
1.) the ?ij are indep. random variables 2.) Each
?ij is Normally Distributed E(?ij) 0 for all
i, j 3.) ?2(?ij) constant for all i, j
Run order plot
Normality plot test
Residual plot test
30
Diagnosis Normality
  • The points on the normality plot must more or
    less follow a line to claim normal distributed.
  • There are statistic tests to verify it
    scientifically.
  • The ANOVA method we learn here is not sensitive
    to the normality assumption. That is, a mild
    departure from the normal distribution will not
    change our conclusions much.

Normality plot normal scores vs. residuals
31
Minitab statgtgtbasic statisticsgtgtnormality test
32
Diagnosis Constant Variances
  • The points on the residual plot must be more or
    less within a horizontal band to claim constant
    variances.
  • There are statistic tests to verify it
    scientifically.
  • The ANOVA method we learn here is not sensitive
    to the constant variances assumption. That is,
    slightly different variances within groups will
    not change our conclusions much.

Residual plot fitted values vs. residuals
33
From Battery data
34
Minitab Statgtgt Anovagtgt Test for Equal variances
35
Diagnosis Randomness/Independence
  • The run order plot must show no systematic
    patterns to claim randomness.
  • There are statistic tests to verify it
    scientifically.
  • The ANOVA method is sensitive to the constant
    variances assumption. That is, a little level of
    dependence between data points will change our
    conclusions a lot.

Run order plot order vs. residuals
36
From Battery data
37
KRUSKAL - WALLIS TEST
(Non - Parametric Alternative)
HO The probability distributions are
identical for each level of the factor HI
Not all the distributions are the same
38
Brand
A B C 32 32
28 30 32 21 30 26
15 29 26 15 26 22
14 23 20 14 20
19 14 19 16 11 18
14 9 12 14 8
BATTERY LIFETIME (hours) (each column rank
ordered, for simplicity)
Mean 23.9 22.1 14.9 (here,
irrelevant!!)
39
HO no difference in distribution among the
three brands with

respect to battery lifetime HI At
least one of the 3 brands differs in
distribution from the others with respect to
lifetime
40
Ranks in ( )
Brand
A B
C 32 (29)
32 (29) 28
(24) 30 (26.5) 32 (29)
21 (18) 30 (26.5) 26 (22)
15 (10.5) 29 (25) 26
(22) 15 (10.5) 26 (22) 22 (19) 14
(7) 23 (20) 20 (16.5) 14 (7) 20
(16.5) 19 (14.5) 14 (7) 19 (14.5)
16 (12) 11 (3) 18 (13) 14 (7)
9 (2) 12 (4) 14 (7) 8 (1)
T1 197 T2 178 T3 90 n1
10 n2 10 n3 10
41
TEST STATISTIC
K
12
? (Tj2/nj ) - 3 (N 1)
H
N (N 1)
j 1
nj data values in column j N ??nj K
Columns (levels) Tj SUM OF RANKS OF DATA ON COL
j When all DATA COMBINED (There is
a slight adjustment in the formula as a function
of the number of ties in rank.)
K
j 1
42
H

12 197 2 178 2 902 30 (31)
10 10 10



- 3 (31)
8.41 (with adjustment for ties, we get 8.46)
43
What do we do with H?
We can show that, under HO , H is well
approximated by a ?2 distribution with
df K - 1.
Here, df 2, and at ?? .05, the critical value
5.99
8
Reject HO conclude that mean lifetime NOT the
same for all 3 BRANDS
44
Minitab Stat gtgt Nonparametrics gtgt Kruskal-Wallis
  • Kruskal-Wallis Test life versus brand
  • Kruskal-Wallis Test on life
  • brand N Median AveRank Z
  • 1 3 1.800 4.5 -2.09
  • 2 3 4.200 7.8 -1.22
  • 3 3 4.600 11.8 -0.17
  • 4 3 7.000 16.5 1.05
  • 5 3 6.600 13.3 0.22
  • 6 3 4.200 7.8 -1.22
  • 7 3 7.800 20.0 1.96
  • 8 3 7.400 18.2 1.48
  • Overall 24 12.5
  • H 12.78 DF 7 P 0.078
  • H 13.01 DF 7 P 0.072 (adjusted for ties)
Write a Comment
User Comments (0)
About PowerShow.com