Statistics PowerPoint PPT Presentation

presentation player overlay
1 / 20
About This Presentation
Transcript and Presenter's Notes

Title: Statistics


1
Analysis of Variance
ANOVA
2
Basics
  • Idea
  • Total variation is partitioned ? 2 components
  • Each component has its specific source of
    variation
  • Purposes
  • Estimate and test hypothesis about population
    means
  • Extension of previous concepts
  • Variables
  • Treatment (txt) -gt interventions
  • Response outcome of interest
  • Extraneous may have an effect on outcome (but
    are not the focus)
  • ? Txt -gt ? on the outcome (on average)?

3
Completely Randomized Design
  • Suppose there are 5 populations under analysis
  • 5C210 possible pairwise comparisons
  • We would like
  • More effective computations
  • Control the likelihood of obtaining a false
    conclusion
  • P(HaHo)0.05 -gt failing to reject0.95
  • If tests were independent (0.95)100.5987
    -gtP(HaiHo)0.4013!
  • (In real data test on same data rarely
    independent)
  • Solution One-Way ANOVA
  • Only one source of variation FACTOR is
    assessed
  • Extension samples of t-test for independent
    samples
  • Experimental design txt assigned completely at
    random
  • Number of subjects in e/txt does not need to be
    equal

4
Data Setup
5
Fixed-Effects Model
? GRAND MEAN mean of all k population
means ?j TREATMENT EFFECT difference between
mean of jth population and grand mean eij ERROR
TERM difference between individual measurement
and its population mean
6
Assumptions
  • k sets of observed data k independent random
    samples from respective population
  • Each population N(?j,?j2)
  • Each population has the same variance s2
  • tj are unknown constants ?tj0
  • eij are
  • Normally and independent distributed
  • Mean 0 (mean of xij is mj)
  • Variance s2

7
Hypothesis
Ho m1 m2 mk vs. Ha not all mj are
equal OR Ho tj 0 j1,2,,kk vs. Ha not
all tj 0
Ho true
Ho not true
Equal variances Normal distribution
Graphs adapted from Aczel, A. (1998), Complete
Business Statistics, McGraw-Hill/Irwin, Mass.,
4th ed. (CD-ROM)
8
Test Statistic
Sum of Squares Total Sum of Squares Among
Groups Sum of Squares Within Groups
SST SSA SSW
MSW
MSA
9
Test Statistic
VR MSA/MSW
VR Fk-1,N-k
10
Decision rule
Reject Ho if VR?Fk-1,N-k
Graph adapted from Aczel, A. (1998), Complete
Business Statistics, McGraw-Hill/Irwin, Mass.,
4th ed. (CD-ROM)
11
Further Analysis
ANOVA
Do Not Reject H0
Stop
Data
Reject H0
Confidence Intervals for Population Means
Further Analysis
Tukey Pairwise Comparisons Test
Source Aczel, A. (1998), Complete Business
Statistics, McGraw-Hill/Irwin, Mass., 4th ed.
(CD-ROM)
12
Tukey's HSD
Objective test for significant differences
between individual pairs of means HSD honestly
significant difference Scenario 1 all sample
sizes equal q studentized range statistic,
I.e. range of treatment means from an
ANOVA standardized by MSE/n k number of
means N total number of observations Scenario
2 sample sizes not equal (limited to
a?0.05) nj the smallest of the two sample
sizes
13
Example
Daniels exercise 8.2.3 density of binding
sites for H-imipramine in blood platelets in
seasonal (SAD1), nonseasonal (NonSAD2)
depressed patients and in healthy individuals
(Control3). Is there a difference in population
means? a0.05
Sum of Source
DF Squares Mean Square
F Value Pr gt F Model
2 739071.958 369535.979
13.39 lt.0001 Error
28 772982.816 27606.529
Corrected Total 30 1512054.774
The ANOVA
Procedure Tukey's
Studentized Range (HSD) Test for val
NOTE This test controls the Type I
experimentwise error rate. Comparisons
significant at the 0.05 level are indicated by
.
Difference txt
Between Simultaneous 95
Comparison Means Confidence
Limits 3 - 1
362.06 166.84 557.28
3 - 2 430.63 208.60
652.65 1 - 3
-362.06 -557.28 -166.84
1 - 2 68.57
-107.70 244.83 2 - 3 -430.63
-652.65 -208.60
2 - 1 -68.57 -244.83 107.70
data ex1 input txt val cards 1 634 1 585 2
771 2 546 3 1067 3 1176 proc anova class
txt model valtxt means txt / tukey run
14
Randomized Complete Block Design
Motivation to deal with non-homogenous groups -gt
subdivide experimental units into homogenous
groups -gtBLOCKS Design number of experimental
units in a blocknumber (or multiple) of studied
treatments Treatments assigned at random to
experimental units within e/block e/txt appears
in every block and e/block has every
txt Objective by blocking -gt isolate, remove
from the error term variation attributable to
blocks Examples of blocks breed of animal, age,
laboratory, day, etc.
15
Data Structure
Treatment Blocks 1 2 3 k Total
Mean 1 x11 x12 x13 x1k T1. x1. 2 x21
x22 x23 x2k T2. x2. 3 x31 x32 x33
x3k T3. x3. . . . . . . . . . . . . . .
. . . . . . . . . . n xn1 xn2 xn3 xnk
Tn. xn. Total T.1 T.2 T.3 T.k T.. Mean x.1
x.2 x.3 x.k x..
16
Fixed-Effects Model
Constant
Error term
Block effect
Treatment effect
  • CONSTANT
  • bi BLOCK EFFECT
  • ?j TREATMENT EFFECT
  • eij ERROR TERM sources of variation other than
    treatment and block

17
Two-Way ANOVA
Two-way ANOVA observation analyzed under two
criteria, the block and the treatment group to
which it belongs Assumptions 1. Xij is a
random independent sample (n1) from one of kn
populations 2. Each of kn populations N(mij,s2)
-gt eij are independent, N(0, s2) 3. Block and
treatment effects are additive, I.e. there is no
interaction between treatments and block
Block-treatment effect block effect
treatment effect Its violation -gt misleading
results. Concern when largest meangt50
smallest When assumptions hold-gt tj and bi are
fixed constants -gt fixed-effects
18
Two-Way ANOVA
Hypotheses Ho tj0, j1,2,,k vs. Ha not
all tj0 No block effect because (1) primary
interest is on treatment effect and (2) blocks
are obtained non-randomly SSTSSBlSSTrSSE
df kn-1 n-1 k-1 (n-1)(k-1)
19
Two-Way ANOVA
Statistical Decision when Ho is
true, MSTr/MSE Fk-1,(n-1)(k-1) if VR?F
reject
20
Example
Daniels exercise 8.3.4 influence of time of
day on length of home visits by the nursing
staff. Individual differences among nurses might
be large-gtblocking factor. Do the data provide
sufficient evidence to indicate a difference in
length of home visit among different times of the
day? a0.05
Sum of Source
DF Squares Mean Square F Value
Pr gt F Model 6
655.8750000 109.3125000 30.68
lt.0001 Error 9
32.0625000 3.5625000 Corrected Total
15 687.9375000
Source DF Anova SS
Mean Square F Value Pr gt F time
3 124.6875000 41.5625000
11.67 0.0019 nurse
3 531.1875000 177.0625000 49.70
lt.0001
data ex2 input nurse time stay cards A
EM 27 A LM 28 D EA 20 D LA 14 proc
anova class time nurse model staytime
nurse means time/tukey run
Write a Comment
User Comments (0)
About PowerShow.com