Title: Two and more factors in analysis of variance
1Two and more factors in analysis of variance
- Factorial and nested designs
2Factorial design
- Each level of the first factor is combined with
each level of the second one. By two levels in
each factor - 2 factors -gt 4 combinations
- 3 factors -gt 8 combinations
- Generally Number of combinations is product of
number of levels for each factor
3Mowing, fertilization, removing of dominant
Usually each combination in several replications
4Factorial designs in terrain - factors shape and
pattern
5Another possibility - nested design
factor A (local)
factor C (plant)
sing. observ.
Plant 1 from the first locality has nothing
common with plant 1 from any other locality.
6Factorial design
7Proportional design
- The same proportion of replications of each
factor at each level of other factor contingency
table of no. of replications ?2 equals zero -
i.e. factors are absolutely independent - In ideal case is the same number of observations
in all combinations, but proportional design is
enough
8formula for expected frequency in contingency
table
So, for example for non-fertilized non-mowed
I.e. the same proportional representation of the
first factors level by each level of the second
factor then we consider the factors independent
9When factors are independent, and design is
balanced
Balanced design Weights of rats
10When factors are independent, and design is
proportional
Proportion design Weights of rats
11When factors are dependent, i.e. design isnt
balanced nor proportional
Non-proportional design Weights of rats
According to marginal means it seems as listening
of music can affect weight of rats. (There are
methods, which can partly cope with it LS
means, but power of test is lowered for both
factors).
12Statistica can compute anything, but
- If I have proportion design, the result should be
always the same. - Two-way ANOVA can be computed even in
non-proportion design default there (Type III
sum of squares - orthogonal) is alright, but I
can, according to the experiment situation,
decide myself for other type (perhaps Type I -
sequential), and I should know, what means what
(and why are results different).
13Model of two-way ANOVA
Two factors (mown and fertilized) - index i is
level of the first factor (non-mown, mown), index
j is level of the second one, k replication in
within group response is e.g. number of species.
Grand mean
Effect of fertilization
Effect of mowing
Error variability
Interaction
Parameterisation is usually such, that a, ß, and
? would be balanced around zero (then µ is really
mean of everything).
14Three null hypothesis
- ai0 for all i mowing has no effect
- ßi0 for all j fertilization has no effect
- ?ij0 for all combinations of ij - there is no
interaction between mowing and fertilization - Null interaction means, that main effects are
purely additive
15Null interaction
Effect of every factor is independent of the
level of other factor ATTENTION it means
additivity
16Interaction is deviation from additivity
e.g.
17Can be seen well in graphs (interaction plot)
Do not forget to stress, that connection of means
isnt an interpolation here we just want to
visualize interaction with help of (non)
parallelism of lines
18Can be seen well in diagram (interaction plot)
When I refer about result, it isnt enough to
write that interaction is significant, but one
need to say why (where is the deviation from
additivity).
19Null hypothesis of main effects - averaged over
all levels of the second effect
- ai0 for all i mowing has no effect
(at mean over all levels of fertilization) - ßi0 for all j fertilization has no effect
(at mean over all levels of mowing)
20You have to use head when interpreting results!!!
(and look at diagram)
Administrate two medicines separately and
together (factorial design) - main effects are
insignificant it doesnt mean the medicines are
ineffective though. Just their effects cancel
when applied together.
21Holds again grand/overall variability expressed
with help of SSTOT can be divided
SSA SSB SSAB(interaction)
SSTOT sum of deviations from grand mean SSA
sum of deviations of marginal means of factor A
groups from grand mean, weigh by number of
observations (similar to SSB) SSAB weigh sum of
squares of deviations of means combination from
means if there is pure additivity
Explained by model
Error (Residual)
SSerror(resid)
Expected without interaction
22Example mown, fertilized, number of species as
response
Test of null hypothesis, that mean number of
species is zero everywhere
23a, b are sums of levels for factors A and B, n is
number of observations in all groups Holds DFA
a-1, DFBb-1, DFAB(a-1)(b-1), DFTOTn-1 DFerror
DFTOT - DFA - DFB - DFAB Holds again, that
fraction MS SS/DF is estimation of grand
variance, if null hypothesis is true
24If all the effects are fixed
Test Feffect MSeffect / MSerror
25Problem what is in denomination depends on
which factor is with fixed effect and which
factor with random effect (especially important
if one of the factors is experimental (and thus
of our major interest), and the other is
locality. Important for experimental design
planning!
26I, the experimenter, am the one deciding, which
model I will use
classic ANOVA factorial
ANOVA without interactions (also Main effects
ANOVA) - non-additivity is part of random
variability it makes possible to work with data
with one observation for each factor combination
(better avoid it though)
27Experimental design
C RANDOMIZED BLOCKS
WRONG
LATIN SQUARE
Pseudoreplications
28Completely randomized blocks
- I test by two-way analysis of variance without
repetition (error variability is deviations from
additivity, i.e. interaction between block and
treatment) - It can give more powerful test, if blocks explain
something, i.e. help to control variability.
29Multiple comparison
Similar to one-way analysis of variance if I do
it on interaction I compare all
factorially-made groups with each other if I do
it on main effect, I compare additive effects of
single levels. I am the one deciding what will be
compared.
30Friedman test - nonparametric ANOVA for
completely randomized blocks
Based on sequencing values inside block
where a is number of levels of factor studied, b
is number of blocks and Ri is sum of ranks for
level i of factor studied.
31Two-factorial experiment I compare daisy and
sunflower and their response to level of
nutrients (response is height of plant)
Three null hypothesis 1. Height of daisies and
sunflowers isnt different (it can sometimes
happen, we are testing totally unrealistic null
hypothesis, we didnt need to test this one
obviously) 2. Height of plants is independent of
level of nutrients 3. Effect of level of
nutrients is the same for both species
32We have a problem
- Data are positively skewed (the least important
problem) - There is distinctive inhomogeneity of variances
(CV could be constant, i.e. SD linearly depends
on mean) - Classic interaction tests additivity thus if
fertilization elongates daisies from 10 to 20 cm,
sunflowers should be elongated from 100 to 110
cm. From biological point of view this isnt
absolutely the same effect to both species.
33Additive effect
Multiplicative effect
with every value we multiply error thus SD is
linearly dependent on mean. eijk has lognormal
distribution centered around 1.
After log-transformation
is multiplicative effect changed to additive
34Logarithmic transformation
- Changes lognormal distribution to normal one
- If SD was linearly dependent on mean, it leads to
homogeneity of variances - Changes multiplicative effects to additive ones
- ATTENTION it makes everything simultaneously
I cannot want just one of those
35Many biological data contain zeroes
- Transformation often used X log(X1) has
similar quality, but not exactly the same,
especially if there are low X values.
Particularly inaccurate can be the change from
multiplicativity to additivity!!! - Sometimes is used X log(bXa), where a and b
are constants. (but the change to additivity from
multiplicativity is never achieved)
36Other transformations used
- For Poisson distribution (numbers of randomly
placed individuals) - For percentages (p as a number between 0 and 1)
37Nested design
We measure length of corollas tubes
factor A (local)
factor C (plant)
sing. observ.
Plant 1 from the first locality has nothing
common with plant 1 from any other locality.
38The top factor in hierarchy can be either with
fixed effect or with random one
- Factors in lower position of hierarchy are almost
always with random effect (it is possible to
compute it also with fixed one, but it is very
unusual case) - In analysis of sum of squares we count squares of
differences of each observation (or mean) and its
hierarchically nearest upper relevant mean. - If hierarchically lower effects are random, then
we test every effect against nearest
hierarchically lower effect
39Test of null hypothesis, that mean tube length is
zero
Null hypothesis on lower hierarchical levels
plants do not differ in mean length of their
tubes in scope of locality
Flocality MSlocality/MSplant
Fplant MSplant/MSerror 2,15/2,240,96
Ideal, when model is balanced - Statistika
compute it even if it isnt, but they are various
approximations.
40Most frequent use
- Analysis of variability among single hierarchical
levels, e.g. in taxonomy - Often I am interested mainly (only) in
hierarchically higher factor, everything else is
just for increasing test power. - I.e. I can have just 6 pounds, three pastured and
three non-pastured (I am not able to have more).
In each of them I lay out 10 squares for biomass
sampling, and I do three analytic determinations
from every square. Analysis of variability can
help me to plan optimal sampling design.
41Mind mixed samples
- I can spare my work, but they must be
independently replicated!
These arent independent observations
42More complicated models of ANOVA
- Factorial and nested designs can be combined in
different ways, whereas some of them will be with
fixed effect and some with random one
43Split plot (main plots and split plots - two
error levels)
6 plots (3 calcite, 3 granite), 3 types of
impacts in each plot
44ANOVA - Repeated measures
- I have some experimental design and I follow the
state of individual objects in time, e.g. growing
plants, etc.
45Replicated BACI - repeated measures