Bayesian Statistics: A Biologist - PowerPoint PPT Presentation

About This Presentation
Title:

Bayesian Statistics: A Biologist

Description:

Title: PowerPoint Presentation Author: PC Manager Last modified by: PC Manager Created Date: 4/21/2003 2:01:13 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 20
Provided by: PCM90
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Statistics: A Biologist


1
Bayesian Statistics A Biologists Interpretation
Marguerite Pelletier URI Natural Resources
Science / U.S. EPA
2
How have Bayesian Methods been used?
  • Federal allocation of money Bayesian analysis
    of population characteristics such as poverty
    in small geographic areas
  • Microsoft Windows Office Assistant Bayesian
    artificial intelligence algorithm
  • It has been suggested that Bayesian statistics
    be used in environmental science because it
    addresses questions about the probability of
    events occurring, which allows better
    decision-making

3
Bayesian Statistics vs. Frequentist Statistics
  • Frequentist (Traditional) Statistics
  • Assumes a fixed, true value for parameter of
    interest (e.g., mean, std dev)
  • Expected value average value obtained by
    random sampling repeated ad infinitum
  • Can only reject the null hypothesis (Ho), not
    support the alternative hypothesis (Ha)
    p-values indicate statistical rareness
  • Large sample sizes make rejection of Ho more
    likely
  • Confidence intervals generated shows
    confidence about value of parameter, not how
    likely that parameter is in real life

4
Bayesian Statistics vs. Frequentist Statistics,
cont.
  • Bayesian Statistics
  • Assumes parameter of interest (e.g., mean, std
    dev) variable and based on the data
  • Can test the probability of the alternate
    hypothesis (Ha) or hypotheses given the data
    (which is what most scientists really care
    about)
  • Generates probability for any hypothesis being
    true
  • Sample sizes taken into account large sample
    size alone wont cause acceptance of the
    hypothesis
  • Creates credible intervals rather than
    confidence intervals tells how likely the
    answer is in the real world

5
How do Bayesian Statistics Work?
Posterior probability Fishers Likelihood
function Prior probability
Expected
likelihood function Likelihood function Given
data, with a known (or predicted) distribution
(i.e., Normal, Poisson), a likelihood
function (probability distribution) can be
calculated Prior probability based on
existing data or a subjective indication of
what the investigator believes to be
true Expected likelihood function marginal
distribution of data given hyperparameter
takes sample size into account
Bayes Rule Posterior ? Likelihood Priors
6
Problems with Bayesian Statistics
  • Computationally intense (integration of complex
    functions) Howeverbetter computers and
    development of Markov Chain Monte Carlo methods
    made techniques more accessible
  • Not directly applicable for many complex
    statistical analyses Can be used for certain
    regression techniques and to generate posterior
    distn given a prior. Attempts to utilize it in
    clustering unsuccessful
  • Not readily available in most common
    statistical software (SPSS, SAS)
  • Not applicable to very rare events priors
    dominate the function so the posterior
    doesnt change implies that further study is
    not needed/useful

7
So When are Bayesian Statistics Useful?
  • When limited data available formalizes the
    use of Best Professional Judgment (Case
    Study 1)
  • When Bayesian algorithms have been developed
    for a statistic e.g., regression (Case
    Study 2)
  • After using more traditional statistical
    methods develop a probability
    distribution (Case Study 3)
  • When the answer is a single number rather than
    a complex function (e.g., simple calculation
    not complex multivariate analysis)

8
Case Study 1 Development of a Bayesian
Probability Network in the Neuse River Estuary,
N.C.
(Borsuk ME, Stow CA, Reckhow KH 2003. An
integrated approach to TMDL development for the
Neuse River estuary using a Bayesian probability
network. Journal of Water Resources Planning and
Management, accepted)
9
Summary of Project
  • Neuse River estuary impaired due to nitrogen
    (eutrophication problems), requiring a Total
    Maximum Daily Load (TMDL) to be developed
  • For development of a TMDL, links must be
    developed between pollutant load ( N ), and
    water quality impairment
  • Because of the range of endpoints and the need
    to determine probability of impact, a
    Bayesian Network was developed
  • Data for the model came from routine water
    quality monitoring and from elicited judgment
    of scientific experts

10
River N
River Flow
Algal Density
Pfisteria abundance
CarbonProduction
WaterTemperature
Bayesian Network
Sediment OxygenDemand
System variable
Node or Submodel
Oxygen Concentration
Duration of Stratification
Association
ShellfishAbundance
Days ofHypoxia
Frequency of Cross-Channel Winds
Frequency of Fish Kills
Fish Population Health
11
Use of Bayesian Network (focus on Fish Kills)
  • Fish kills low bottom D.O. cross-channel
    winds (force bottom water fish to shores)
    fish health (influences susceptibility)
  • Two expert fisheries biologists asked about the
    likelihood of fish kill given certain
    conditions (various wind/hypoxia/fish health
    scenarios)
  • All probabilistic relationships (including fish
    kill info) incorporated into Bayesian
    network.
  • Four nitrogen reduction scenarios assessed 0,
    15, 30, 45 and 60 (relative to 1991-1995
    baseline) using Latin Hypercube sampling
  • As N inputs decreased, mean chl and exceedance
    frequency also reduced.
  • Fish kills dont change substantially with N
    reduction fish kills relatively rare,
    effect of reduced C production is damped out
    further along the causal chain

12
Case Study 2 Assessing Spatial Population
Viability Models using Bayesian Statistics
(Mac Nally R, Fleishman E, Fay JP, Murphy DD
2003. Modeling butterfly species richness using
mesoscale environmental variables model
construction and validation for the mountain
ranges in the Great Basin of western North
America. Biological Conservation 11021-31.
13
Summary of Project
  • Species richness ? local environmental
    variables
  • Over large scales these variables hard to
    collect
  • This study (14) environmental variables from
    GIS and remote sensing used to predict
    butterfly species richness
  • Poisson regression used to develop appropriate
    models from the 28 variables (IV IV2)
    Schwartz Information Criteria used for selection
  • Appropriate variables then used in Bayesian
    Poisson model
  • Model output validated against additional field
    data

14
Bayesian Poisson Regression
log ?i ? ? ?kXik ? Yi Poisson (
?i )
where ?i mean (unobservable, true) spp
richness at site i ?, ?k regression
coefficients non-informative priors ? model
error Yi observed spp richness
  • Markov Chain-Monte Carlo algorithm 1000
    iteration burn-in, 3000 iterations to
    generate parameter estimates and mean spp
    richness estimates
  • New model run using validation data and
    regression-coefficient distn from the 1st
    model
  • Model worked well for same mountain range, but
    not for new range

15
Case Study 3 Assessing Spatial Population
Viability Models using Bayesian Statistics
(McCarthy MA, Lindenmayer DB, Possingham HP 2001.
Assessing spatial PVA models of arboreal
marsupials using significance tests and Bayesian
statistics. Biological Conservation 98191-200.
16
Summary of Project
  • Population Viability Analysis used in
    Conservation Biology to assess potential for
    species extinction
  • Many models based on limited data assessed
    via significance tests or Bayesian methods
  • Metapopulation models (for 4 arboreal
    marsupials) were developed
  • 2 competing null models also developed
  • No effect of fragmentation
  • No dispersal between patches
  • Models were compared using likelihood and
    Bayesian methods

17
Model Comparison
  • Predicted presence in patches was compared to
    observed presence using logistic
    regression ln (o/(1 o)) ? ?ln(p/(1 -
    p)) where o observed presence p predicted
    presence ?, ? regression coefficients
  • Significant differences between predicted and
    observed if ? significantly different from 0
    or ? significantly different from 1
  • Models compared using log-likelihood models
    with higher log-likelihood values (closer to
    0) more closely match data
  • Bayesian posterior probabilities used to
    compare models higher probabilities more
    closely match data prior all 3 models equally
    plausible Probability of Model likelihood of
    model / sum of all likelihoods

18
Conclusions
  • Comparison with actual data
  • Full model best for greater glider,
    yellow-bellied glider
  • No fragmentation model best for mountain
    brushtail possum, ringtail possum (but
    predicted values ½ observed values)
  • Log-likelihood values
  • Confirm no fragmentation model best for 2
    possum spp
  • Confimed full model best for the greater
    glider
  • Yellow bellied glider equally represented by
    full model and no dispersal model
  • Bayesian statistics confirmed log-likelihood
    results
  • Authors indicated that significance tests
    useful to assess model accuracy Bayesian
    methods useful for comparing models but
    computationally intense

19
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com