A Bestiary of Experimental and Sampling Designs

About This Presentation

Title:

A Bestiary of Experimental and Sampling Designs

Description:

A Bestiary of Experimental and Sampling Designs – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 68

Provided by: Biol58

Category:

more less

Transcript and Presenter's Notes

Title: A Bestiary of Experimental and Sampling Designs

1
A Bestiary of Experimental and Sampling Designs
2
REMINDERS

The goal of experimental design is to minimize
the potential sources of confusion (Hurlbert
1984)
Temporal (and spatial) variability
Procedural effects
Experimenter bias
Experimenter-generated variability (random
error)
Inherent variability among experimental units
Non-demonic intrusion
it is the elementary principles of experimental
design, not advanced or esoteric ones, which are
most frequently and severely violated by
ecologists...

3
The design of an experiment

The details of
Replication
Randomization
Independence
are these always obvious in biological
research? Are they system-dependent?

4
We cannot draw blood from a stone

Even the most sophisticated analysis CANNOT
rescue a poor design!!

5
Categorical variables

They are classified into one or more unique
categories
Sex (male, female)
Trophic status (producer, herbivore, carnivore)
Habitat type (shade, sun)
Species

6
Continuous variables

They are measured on a continuous numerical scale
(real or integer values)
Size
Species richness
Habitat coverage
Population density
NOTE Discrete random variables such as counts
are still considered continuous variables because
they represent a numerical scale and not a
category

7
Dependent and independent variables

The assignment of dependent and independent
variables implies a hypothesis of cause and
effect that you are trying to test.
The dependent variable is the response variable
The independent variable is the predictor
variable

8
Ordinate (vertical y-axis)
Abscissa (horizontal x-axis)
By convention independent variables are plotted
in the x-axis and dependent variables in the
y-axis in this example we are implying that
lambda (population growth) depends or is affected
directly by time since fire
9
Four classes of experimental design
Dependent (response) variable Independent (predictor) variable Independent (predictor) variable
Continuous Categorical
Continuous Regression ANOVA
Categorical Logistic regression Tabular
10
The Analysis of Covariance (ANCOVA)

It is used when there are two independent
variables, one of which is categorical and one of
which is continuous (the covariate)

11
Four classes of experimental design
Dependent variable Independent variable Independent variable
Continuous Categorical
Continuous Regression ANOVA
Categorical Logistic regression Tabular
12
Regression designs

Single-factor regression
Multiple regression

13
Single-factor regression

Collect data on a set of independent replicates.
For each replicate, measure both the predictor
and the response variables.
e.g. Hypothesis seed density (the predictor
variable) is responsible for rodent density (the
response variable).

14
Plot Seeds Rodents/m2
1 50 3.2
2 12 11.7
. . .
n 300 5.3
Plots
Variables
15
Single-factor regression

You assume that the predictor variable is a
causal variable changes in the value of the
predictor would cause a change in the value of
the response.
This is very different from a study in which you
would examine the correlation (statistical
covariation) between two variables.

16
In regression (Model I)

You are assuming that the value of the
independent variable is known exactly and is not
subject to measurement error

17
Assumptions and caveats

Adequate replication.
Independence of the data.
Ensure that the range of values sampled for the
predictor variable is large enough to capture the
full range of responses by the response variable.
Ensure that the distribution of predictor values
is approximately uniform within the sample range.

18
A
What is different between these two designs?
B
Would the conclusions be different?
19
A
What is different between these two designs?
B
Would the conclusions be different?
20
Multiple regression

Two or more continuous predictor variables are
measured for each replicate, along with the
single response variable

21
Assumptions and caveats

Adequate replication.
Independence of the data.
Ensure that the range of values sampled for the
predictor variables is large enough to capture
the full range of responses by the response
variable.
Ensure that the distribution of predictor values
is approximately uniform within the sample range.

These are the same assumptions as for the
single-factor regression BUT additionally
22
Multiple regression

Ideally, the different predictor variables
should be independent of one another however in
reality, many predictor variables are correlated
(e.g., height and weight).
This collinearity makes it difficult to estimate
accurately regression parameters and to tease
apart how much variation in the response variable
is associated with each of the predictor
variables.

23
Multiple regression

As always, replication becomes important as we
add more predictor variables to the analysis.
In many cases it is easier to collect additional
predictor variables on the same replicates than
to obtain additional independent replicates.
Avoid the temptation to measure everything that
you can just because it is possible.
Think about measuring variables that are
meaningful for you study system!

24
Multiple regression

It is a mistake to think that a model selection
algorithm can reliably identify the correct set
of predictor variables...

25
Four classes of experimental design
Dependent variable Independent variable Independent variable
Continuous Categorical
Continuous Regression ANOVA
Categorical Logistic regression Tabular
26
ANOVA designs

Analysis of Variance
Treatments refers to the different categories of
the predictor variables.
Replicates each of the observations made.

27
ANOVA designs

Single-factor designs
Randomized block designs
Nested designs
Multifactor designs
Split-plot designs
Repeated measurements designs
BACI designs (before-after-control-impact)

28
Single-factor designs

It is one of the simplest, but most powerful,
experimental designs.
Can readily accommodate studies in which the
number of replicates per treatment is not
identical (unequal sample size).

29
Single-factor designs

In a single-factor design, each of the treatments
represent variation in a single predictor
variable or factor
Each value of the factor that represents a
particular treatment is called a treatment level

30
Id Treatment Replicate Number of flowers
1 Watered 1 9
2 Not watered 1 4
. . . .
11 Watered 6 10
12 Not watered 6 2
31
Good news, bad news

This design does not explicitly accommodate
environmental heterogeneity, so we need to sample
the entire array of background conditions.
This means the results can potentially be
generalized across all environments, BUT
If the background noise is much stronger than the
signal of the treatments, the experiment may have
low power, and therefore the analysis may not
reveal treatment differences unless there are
many replicates.

32
Randomized block designs

An effective way to incorporate environmental
heterogeneity into a design.
A block is a delineated area or time period
within which environmental conditions are
relatively homogeneous.
Blocks can be placed randomly or systematically
in the study area, but should be arranged so that
the environmental conditions are more similar
within blocks than between them.

33
Randomized block designs
Valid blocking
Invalid blocking
34
Randomized block designs

Once blocks are established, replicates will
still be assigned randomly to treatments, but a
single replicate from each of the treatments is
assigned to each block.

35
Id Treatment Block Number of flowers
1 Watered 1 9
2 Not watered 1 4
. . .
11 Watered 6 10
12 Not watered 6 2
36
Caveats

Blocks should have enough room to accommodate a
single replicate of each of the treatments, and
enough spacing between replicates to ensure their
independence.
The blocks themselves also have to be far enough
apart from each other to ensure independence of
replicates among blocks.

37
Advantages

It can be used to control for environmental
gradients and patchy habitats.
It is useful when your replication is constrained
by space or time.
Can be adapted for a matched pair lay-out.

38
Disadvantages

If the sample size is small and the block effect
weak, the randomized block design is less
powerful than the simple one-way layout.
If blocks are too small, you may introduce
non-independence by physically crowding the
treatments together (e.g., nectar-removal and
control plots on p. 152 of Gotelli Ellison).
If any of the replicates are lost, the data from
the block cannot be used unless the missing
values can be estimated indirectly.

39
Disadvantages

It assumes that there is no interaction between
the blocks and the treatments.
BUT, replication within blocks will indeed tease
apart main effects, block effects, and the
interaction between blocks and treatments. It
will also address the problem of missing data
from within a block.

40
Nested designs

It is any design in which there is subsampling
within each of the replicates..
In this design the subsamples are not independent
of one another (if we analyze them assuming
independence is it an example of
pseudoreplication)
The rational of this design is to increase the
precision with which we estimate the response of
each replicate.

41
Id Treatment Subsample Replicate Number of flowers
1 Watered 1 1 9
2 Watered 2 1 4
3 Watered 3 1 7
. . . . .
19 Not watered 1 7 16
20 Not watered 2 7 10
21 Not watered 3 7 2
42
Advantages

Subsampling increases the precision of the
estimate for each replicate in the design.
Allows to test two hypothesis
First Is there variation among treatments?
Second Is there variation among replicates
within treatments?
Can be extended to a hierarchical sampling
design.

43
Disadvantages

They are often analyzed incorrectly!
It is difficult or even impossible to analyze
properly if the sample sizes are not equal.
It often represents a case of misplaced sampling
effort.
Subsampling is not a solution to inadequate
replication

44
Randomized block designs

Strictly speaking, the randomized block and the
nested ANOVA are two-factor designs, but the
second factor (i.e., the blocks or subsamples) is
included only to control for sampling variation
and is not of primary interest.

45
Multifactor designs

In a multifactor design, the treatments cover two
(or more) different factors, and each factor is
applied in combination in different treatments.
In a multifactor design, there are different
levels of the treatment for each factor.

46
Multifactor designs

Why not just run two separate experiments?
Efficiency. It is often more cost effective to
run a single experiment than to run two separate
experiments.
A multifactor design allows you to test for both
main effects and for interaction effects.

47
Multifactor designs

the main effects are the additive effects of each
level of one treatment averaged over all levels
of the other treatment.
the interaction effects represent unique
responses to particular treatment combinations
that cannot be predicted simply from knowing the
main effects.

48
Interactions
60
50
40
West
30
North
20
10
0
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Which of these graphs are showing interactions
between direction (west or north) and quarter
(1st to 4th)?
49
Orthogonal

The key element of a proper multifactorial design
is that the treatments are fully crossed or
orthogonal every treatment level of the first
factor must be represented with every treatment
level of the second factor and so on
If some of the treatment combinations are missing
we end with a confounded design.

50
Two-factor design
Substrate treatment Substrate treatment Substrate treatment Substrate treatment
Granite Slate Cement
Predator treatment Unmanipulated
Predator treatment Cage Control
Predator treatment Predator exclusion
Predator treatment Predator intrusion
51
Advantages

The key advantage is the ability to tease apart
main effects and interactions between factors.
The interaction measures the extent to which
different treatment combinations act additively,
synergistically, or antagonistically.

52
Disadvantages

The number of treatment combinations can quickly
become too large for adequate replication!
It does not account for spatial heterogeneity.
This can be handled by a simple randomized block
design, in which each block contains exactly one
of the treatment combinations.
It may not be possible to establish all
orthogonal treatment combinations.

53
Split-plot designs

It is an extension of the randomized block design
to two treatments.
What distinguishes a split plot design from a
randomized block design is that a second
treatment factor is also applied, this time at
the level of the entire plot.

54
Split plot design
Substrate treatment The subplot factor Substrate treatment The subplot factor Substrate treatment The subplot factor Substrate treatment The subplot factor
Granite Slate Cement
Predator treatment The whole-plot factor Unmanipulated
Predator treatment The whole-plot factor Control
Predator treatment The whole-plot factor Predator exclusion
Predator treatment The whole-plot factor Predator intrusion
55
Advantages

The chief advantage is the efficient use of
blocks for the application of two treatments.
This is a simple layout that controls for
environmental heterogeneity.

56
Disadvantages

As with nested designs, a very common mistake is
for investigators to analyze a split-plot design
as a two factor ANOVA

57
Repeated measurements designs

It is used whenever multiple observations on the
same replicate are collected at different times
(it can be thought of as a split-plot in which a
single replicate serves as a block, and the
subplot factor is time).

58
Repeated measurements designs

The between-subjects factor corresponds to the
whole-plot factor.
The within-subjects factor corresponds to the
different times.
The multiple observations on a single individual
are not independent of one another why do you
think this is?

59
Advantages

Efficiency.
It allows each replicate to serve as its own
block or control.
It allows us to test for interactions between
treatments and time.

60
Circularity

Both the randomized block and the repeated
measures designs make a special assumption of
circularity for the within-subjects factor.
It means that the variance of the difference
between any two treatment levels in the subplots
is always the same i.e. there is the same
variance between t1 and t2, as between t2 and t3,
etc..

61
For repeated measures design it means that the
variance of the difference of observations
between any pair of times is the same
This assumption is unlikely to be met in
biological systems because of their temporal
memory!
62
Disadvantages

In many cases the assumption of circularity is
unlikely to be met for repeated measures.
The best way to meet the circularity assumption
is to use evenly spaced sampling times along with
knowledge of the natural history of your
organisms to select the appropriate sampling
interval.

63
Alternatives

To set enough replicates so that a different set
is sampled at each time period. With this design,
time can be treated as a simple factor in a
two-factor analysis of variance.
Use the repeated measures layout but collapse the
correlated repeated measures into a single
response variable for each individual, and then
use a simple one-factor analysis of variance i.e.
instead of height at age 0 and height at age 1
use growth

64
Think outside the ANOVA Box

Many ecological experiments test a continuous
predictor at only a few values so they can be
shoehorned into an ANOVA design
One Alternative Experimental regression design!

65
Four classes of experimental design
Dependent variable Independent variable Independent variable
Continuous Categorical
Continuous Regression ANOVA
Categorical Logistic regression Tabular
66
Tabular designs