Experimental Design The basics - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Experimental Design The basics

Description:

... between the number of storks nesting in chimneys and the ... Although storks bringing babies makes a nice story the causation is likely reversed ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 48
Provided by: richardp80
Category:

less

Transcript and Presenter's Notes

Title: Experimental Design The basics


1
Experimental Design - The basics
  • Richard Preziosi

2
How to formulate hypotheses
  • Where do you start?
  • What is a hypothesis?
  • Stating a hypothesis
  • Generating predictions
  • Statistical hypotheses (different!)
  • Only after completing this process will you be
    able to decide what data to collect

3
Hypotheses Where do you start?
  • Start by stating your research question
  • E.g. Why are male and female humans different
    sizes?
  • Your question may easily produce more than one
    hypothesis, thats fine.

4
Hypothesis
  • A hypothesis is a clear statement articulating a
    plausible candidate explanation for observations
  • It should be constructed in such a way as to
    allow gathering of data that can be used to
    either refute or support the candidate explanation

5
Stating a Hypothesis
  • Phrase your hypothesis as a possible answer to
    your research question.
  • E.g. Male and female size differ because males
    grow faster than females

6
Generating predictions
  • These are the testable statements that follow
    logically from your hypothesis
  • E.g. males have a faster growth rate than
    females

7
Statistical hypotheses
  • Predictions should lead you to testable
    statistical hypotheses
  • Note that the hypothesis of interest in
    statistics is the one where nothing is different
    (the null hypothesis)
  • A clearly stated null hypothesis will generally
    lead you to the correct statistical test
  • E.g. There is no difference in the growth rate
    of males and females

8
Question
Hypothesis
Predictions
Statistical (Null) hypothesis
9
Pitfalls of generating predictions
  • Weak tests
  • Indirect measures
  • Non-useful outcomes
  • Your tests must satisfy the devils advocate
    (e.g. reviewers or examiners)

10
Weak test
  • Consider the hypothesis Students enjoy the
    course in radiation training more than the
    workshop in experimental design
  • Prediction Students will get better grades in
    radiation training than in experimental design
  • This is a weak test (prediction) because other
    explanations are equally likely AND because we
    have used an indirect measure (grades as a
    measure of enjoyment)

11
Non-useful outcomes
  • These are hypotheses that may well prove
    interesting if true but are uninformative if false

12
Satisfying skeptics
  • Reviewers will look for logical flaws in your
    experiments. You do not want to finish your paper
    with
  • My results indicate that mechanism A determines
    apoptosis rates. Although mechanism B could also
    produce the same response I believe that
    mechanism A is the important one
  • This will earn you a review of the form
  • This study provides no clear evidence to
    distinguish between mechanisms A and B. The
    authors need to redesign their study and start
    again. Recommendation, reject this manuscript

13
Pilot Studies and Preliminary Data
  • May be observational or mini-experiments
  • Ensures sensible questions
  • Can you observe the phenomenon?
  • Practice and validate techniques
  • Minimize training effects of data
  • Recognize logistic constraints
  • Standardization across observers
  • Allows tuning of design and statistics
  • Assessment of sample sizes (power)
  • Test run of statistical analysis

14
Experimental ManipulationVs. Natural Variation
  • In Manipulation studies you change an aspect of
    the system and measure effects on traits of
    interest (majority of lab studies and
    Agricultural studies)
  • In Correlational studies you measure associations
    between traits of interest (often assuming one is
    influencing the other) (Many Environmental and
    most Human studies)
  • Consider the hypothesis Long tail streamers seen
    in many species of birds have evolved to make
    males more attractive to females

15
Correlational study usingNatural Variation
  • In the bird tail length example we could
  • Measure the tails of males at the beginning of
    the breeding season
  • Observe the number of matings each male has
  • Do statistics to determine if there is a
    relationship between tail length and number of
    matings
  • Results showing a relationship would support our
    hypothesis
  • Results not showing a relationship would go
    against our hypothesis

16
Manipulative study
  • In the bird tail length example we could have 4
    groups of birds
  • Results showing males with artificially long
    tails had more mates supports our hypothesis
  • Results showing males with reduced tails had
    fewer mates also supports our hypothesis
  • A comparison of group 1 males with the
    unmanipulated males acts as a control comparison

17
Arguments for correlational studies
  • Often less work (but larger sample sizes usually
    needed)
  • Deals with real levels of biological variation
    (manipulations may take things outside naturally
    occurring limits)
  • Requires less handling of organisms (important if
    there are constraints like stress to animals or
    endangered species)
  • Manipulative studies may produce unintended
    effects (e.g. flight ability in example or
    epistatic effects in knockouts)
  • Manipulation may not be possible
  • May provide a baseline study manipulative expts.

18
Arguments for manipulative studies(really,
against correlational studies)
  • Third variables
  • Reverse causation
  • These can be BIG problems if they occur

19
Third Variables
  • Third variables occur when there is an apparent
    link between A and B but in fact there is no
    direct link or mechanism. Instead both A and B
    depend on C, the third variable.
  • This means that patterns in correlations studies
    are just that, correlations.
  • Remember, correlation does not imply causation

20
Third Variables - an example
  • In the bird tail length example lets say that we
    do see a correlation between tail length and
    number of mates
  • Suppose that females are actually attracted to
    territories not males, but that males on better
    territories can grow larger tails
  • The third variable here is territory quality and
    it drives both tail length and number of mates
    and produces an apparent relationship.

21
Third Variables - Two famous examples
  • Fisher suggested that the link between smoking
    and cancer was correlational not causative and
    that another factor, perhaps stress, led people
    both to smoke and develop cancer.
  • Fewer women postgrads marry than women in the
    population as a whole. This relationship is
    presumable due to some other correlated factor
    (third variable)

22
Reverse causation
  • This occurs when it is assumed that correlation
    implies causation
  • In some cases this can be ruled out based on
    other data or common sense
  • In the bird example it is unlikely that the
    number of mates for a male has any effect on tail
    length measured at the start of the mating
    season.

23
Reverse causation - a famous example
  • There is a correlation between the number of
    storks nesting in chimneys and the number of
    children in a house (old data from Holland)
  • Although storks bringing babies makes a nice
    story the causation is likely reversed
  • Larger families tend to live in larger houses
    with more chimneys, and hence more opportunities
    for storks to nest.

24
Variation, replication and sampling
  • Variation among individuals
  • Replication and the experimental unit
  • Pseudoreplication

25
Variation among individuals
  • Variation among individuals is a given for most
    biological systems
  • In any experiment we are concerned with variation
    in the Response or Dependent Variable
  • Variation in the response variable can be divided
    into
  • Variation explained by experimental factors (IV)
  • Variation not explained by experimental factors
    (AKA error variation, random variation noise)
  • In most studies we are interested in reducing
    noise and, hopefully, increasing explained
    variation

26
Variation among individuals
  • Single measurements from each treatment do not
    allow us to distinguish between noise and effect
  • make sure you have a sufficient number of
    individuals that experience the same manipulation
  • These individuals that receive the same
    manipulation are called replicates
  • What is the experimental unit?

27
Pseudoreplication
  • This occurs when there is confusion between
    treatments, replicates and blocks.
  • Consider an experiment comparing the effect of a
    toxicant on fish behaviour.
  • Lets say the toxicant is prepared in a batch and
    drip fed into the treatments tanks (water is drip
    fed into the control tanks)
  • Are the replicates
  • Each fish in a tank?
  • Each tank?
  • Each set of tanks on a common drip?
  • Each batch of toxicant?
  • Dont expect a simple answer, the answer is in
    the biology, not in statistics

28
Common sources of Pseudoreplication
  • Shared enclosures
  • Common environments
  • Relatedness
  • Pseudoreplicated stimulus
  • Non-independence of group behaviour
  • Pseudoreplicated measurements over time
  • Species comparisons
  • Sometimes pseudoreplication is unavoidable

29
Random sampling
  • Proper random sampling means that each individual
    has an equal chance of being allocated to each
    treatment group
  • The problem with non-random treatment of samples
    is that any bias in assignment of individuals or
    systematic pattern to errors may bias your
    results
  • True random samples almost always require the use
    of computers or random number tables

30
Random assignment and treatment
  • Random means not only random assignment but also
    random treatment
  • Lets say that you are examining the effect of
    rhizosphere bacteria on plant growth.
  • Not only should each plant have an equal
    opportunity of being assigned to the bacterial or
    non-bacterial (control) group all other aspects
    of the process should be random as well.
  • Plants should be planted in equivalent compost
    (possibly in random order)
  • Plants should be randomly allocated to growth
    chambers and perhaps positions in chambers

31
Haphazard sampling
  • Haphazard does not mean Random
  • A haphazard sample is based on personal
    assignment by the experimenter in a fashion that
    they believe is random
  • Often severely biased even if the experimenter is
    consciously trying to take a random sample
  • Consider trying to randomly select mice from a
    bucket or randomly pippetting out aliquots of a
    cell culture
  • True random samples usually involve setting up
    experimental units BEFORE assigning treatments
  • BUT this is not always possible, use common sense
    (or blind assignment)

32
Self selection
  • This is a real problem with survey or poll data
  • The subset of a population that respond to
    surveys is rarely a random sample and thus may
    bias your results
  • By all means use surveys to inform your research
    BUT be very suspicious of anything but general
    conclusions

33
Pitfalls of Random Sampling
  • Make sure that the randomization procedure you
    use does what you intend
  • Randomise the order of collecting data - learning
    effects
  • Random samples Vs. Representative samples - dont
    let computers do your thinking for you

34
Sample size - how many replicates
  • Too few replicates can be a disaster - too many
    can be a crime!
  • Always use educated guesswork - i.e. look at
    similar experiments by previous workers and
    determine what worked.
  • Pay attention to differences between the studies
  • Formal power analysis - do if possible!!!
  • Requires that you have some guess of variation
    among replicates
  • Requires that you have an idea of how big of a
    treatment effect you can expect (or require)
  • Requires that you know what statistical test you
    will use

35
Sample size - Resource Equation Model
  • Can be used for complex studies or when variation
    among individuals is unknown
  • Only appropriate for quantitative data
  • Gives conservative estimates of sample size so
    more appropriate for
  • Large effect size (e.g. lab rather than clinical)
  • Testing for significant effects rather than
    estimating parameters
  • E N - T - B
  • N is the total number of individuals -1
  • T is the number of treatments -1
  • B is the number of blocks -1
  • E is the error df and should be between 10 and 20
  • In some cases E should be larger (see Festing et
    al.)

36
Sample size optimization (Festing et al.)
37
Controls
  • This is the reference against which the results
    of an experimental manipulation can be compared
  • Thus your control group should be identical to
    your treatment group in everything except the
    treatment itself
  • Simple concept, common mistake
  • If the predictions and statistical hypotheses
    have been constructed well then the control group
    will be obvious
  • Lack of a control group makes an experiment
    pointless

38
Types of Controls
  • Negative control - unmanipulated
  • Positive control - manipulated but not treated
    (vehicle control, sham procedure control)
  • Concurrent control - run at the same time as the
    treatment group
  • Historic control - based on previous data (be
    certain that individuals are identical except for
    the treatment)

39
Blind Procedures
  • Designed to remove the perception that
    unconscious bias might taint results
  • Particularly useful when response variables are
    measured in a subjective way
  • Blind Procedure - person measuring has does not
    know what treatment has been applied
  • Double Blind - Both the subject and the person
    measuring does not know the treatment assigned
    (human studies)

40
When controls are not needed (or allowed)
  • In medical or veterinary studies controls may be
    an ethical issue, Historical controls can be used
    but give careful consideration to criticisms
  • When sets of treatments are being compared (e.g.
    effect of two drugs on rat behaviour)

41
Factorial experiments
  • 2 group comparison (t-test) design
  • Treatment and control compared
  • 1 factor design
  • Control and several levels of treatment compared
  • 2 factor design
  • More than one treatment considered simultaneously
  • Allows estimation of both main effects AND the
    interaction between them

42
Main effects and interactions
  • Food Strain Interact

X - -
X X -
X X X
X X X
X X X
43
Main effects and interactions
44
Completely randomized designs Vs. Blocking
  • Completely Randomized designs are usually simple
  • Completely Randomized designs assume small among
    individual variation
  • If among individual variation can be attributed
    to a known factor then you can BLOCK by that
    factor, reduce error variation and increase your
    signal to noise ratio (clearer results)

45
Advantages of blocking
46
Advantages of blocking
  • Blocking is commonly used to remove effects of
  • Space
  • Time
  • Individual characters that can be ranked
  • Continuous characters that effect among
    individual variation can be used as covariates to
    remove effects and improve signal to noise ratio

47
The most common design errors
  • Ad hoc designs
  • Inappropriate control/treatment groups
  • Sample sizes too large or too small
  • Failure to use blocking
  • Lab animal studies failure to use isogenic
    strains when GxE unimportant
Write a Comment
User Comments (0)
About PowerShow.com