Bayesian Methods and WinBUGS Rich Evans Iowa State University College of Veterinary Medicine, Produc - PowerPoint PPT Presentation

1 / 141
About This Presentation
Title:

Bayesian Methods and WinBUGS Rich Evans Iowa State University College of Veterinary Medicine, Produc

Description:

See the BUGS project web page for the technique. 8/20/09. Bayesian methods ... asks for the swine farm sizes (total number of hogs and pigs on June 1, 1995) ... – PowerPoint PPT presentation

Number of Views:558
Avg rating:3.0/5.0
Slides: 142
Provided by: guth7
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Methods and WinBUGS Rich Evans Iowa State University College of Veterinary Medicine, Produc


1
Bayesian Methods and WinBUGSRich EvansIowa
State UniversityCollege of Veterinary
Medicine,Production Animal Medicine
2
Agenda
  • Basics
  • Run a simple program
  • Advanced
  • Use more controls
  • Additional programming
  • Convergence
  • Hierarchical models

3
Bayesian inference Why do it?
  • Efron, B.  (1986), Why isn't everyone a
    Bayesian? (C/R p5-11 p330-331)'', The American
    Statistician, 40 , 1-5
  • Before MCMC
  • Theoretical reasons
  • Philosophical reasons
  • Easier for some problems
  • Another tool in the statistical toolbox

4
Motivating example Estimating Prevalence
  • Estimate the prevalence of M. Hyo. in a swine
    population.
  • Use a Bayesian version of the Discordant
    Resolution method (Alonzo and Pepe, 1999).
  • 63 swine were tested with TWEEN 20 and the DAKO
    ELISA tests.
  • Subjects with discordant results were retested
    using the IDEXX.

5
Example
Test positive1
6
Example Issues and fix-ups
  • There arent good confidence intervals for
    prevalence using the DR method
  • Easy with the Bayesian method
  • Uncertainty about the test results is not
    reflected in the inference
  • Want to model uncertainty about the resolution
  • 2 of 3 values agreeing is not conclusive

7
Model
  • Each subject has a disease-positive probability
  • The 2 or 3 binary outcomes are sampled with this
    probability
  • The subject disease probabilities come from a
    common distribution, and their mean is the
    disease prevalence.
  • The prevalence has a beta prior distribution, and
    those parameters have hyper-priors

8
missing included in the analysis
9
Example
Posterior distribution may be intractable
Markov chain sampling (e.g., using WinBUGS)
Monte Carlo estimates
Inference, e.g.,
10
WinBUGS results
11
Prior parameters for prev
12
DR method prev 0.73
13
(No Transcript)
14
Bayesian inference for m
Prior information (dist) for m
Data, sampling distribution
turn the crank (definition of conditional dist,
joint/marg)
Posterior distribution
Posterior expected values and variances (The
dist. of a rv contains all possible info about
the rv)
15
Model parts
Data
WinBUGS
Markov chains (like data from posterior)
CODA or BOA
Inference or prediction
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
WinBUGS resources
  • The BUGS project
  • http//www.mrc-bsu.cam.ac.uk/bugs/
  • CODA or BOA
  • Mailing list
  • Helpful hints and tricks
  • Obtaining WinBUGS
  • Get the key to unlock the student version

20
WinBUGS Development web-site
  • The site is designed to facilitate the
    distribution of various (free) extensions of
    WinBUGS,
  • allow users to implement their own specialized
    functions by 'hard-wiring' them into the software
    via compiled Pascal code
  • The home-page for the new web-site
    ishttp//homepages.tesco.net/creeping_death/

21
WinBUGS
  • Model specification language rather than a
    programming language

22
WinBUGS
  • Need only the data and model
  • MCMC diagnostics
  • Trace plots
  • Autocorrelation
  • B-G-R plots
  • It will output in CODA format
  • Output text files
  • Use .ind and .out extensions (not .txt)

23
WinBUGS easy example
  • Estimate an experiment mean ?
  • We have some information from a prior experiment
  • In order to make the inference more precise, we
    will include information from the prior
    experiment in the inference for ?

24
Example
Sampling distribution
parameter of interest
Prior distribution
The mean and standard error from a prior
experiment
Prior distribution with non-informative
parameters
25
Distributions in WinBUGS
  • Provides many distributions
  • It is possible to use arbitrary distributions
  • See the BUGS project web page for the technique

26
(No Transcript)
27
model
body
Data and initial values
28
model
Sampling distribution
Prior distributions
Standard deviation of y
29
model
is is distributed as
Distributions have ds
less than dash is assignment
30
model
Loops are S-Plus syntax. Indexing can be nested
(yxi) and multivariate (yi,j)
Data and initial values in S-Plus format
Not all parameters need user specified inits
31
1 2 3
4 5
32
Running an example
  • Model -gt specification tool
  • Load model
  • Load data
  • Compile
  • Load inits

33
Running an example
  • Model -gt Update tool
  • Burn in
  • Inference -gt samples
  • Choose parameters to monitor
  • Update tool to continue iterating

34
Running an example
  • Run a basic program
  • Burn in, thinning, over relax
  • Monitoring and clearing variables
  • AutoC button
  • Stop and start a program during a run

35
Running an example
  • Run with multiple chains
  • Eyeball convergence
  • Auto corr
  • Trace plots
  • Quantiles
  • GR convergence

36
End Part I
37
Advanced WinBUGS
38
Editing Graphics
  • Edit -gt object properties
  • Title
  • Axis bounds
  • Fonts
  • margins
  • Smoothing

39
Editing text
  • Attributes menu
  • Text menu

40
Scripts
  • Set of Scripting commands that circumvent the
    button interface
  • Files (.odc or .txt)
  • Script file (commands)
  • WinBUGS program
  • Data file(s)
  • Initial value file(s)
  • Can run in the background

41
Scripts
  • See the WinBUGS help file for the list of
    commands
  • Run the Normal example
  • Use forward / slashes in file names!!!
  • check('c/Projects/talks/DavisISVEE/Normalexample.
    odc')

42
Indexing
  • Similar to Splus
  • Yi,j,k
  • Yzi
  • Useful for ragged arrays
  • Vector for the response (Y) and a corresponding
    group identification vector (z)

z has discrete values
43
Indexing ragged array
44
Indexing ragged arrayWinBUGS
For (i in 1N) Yzi
dnorm(muzi,tauzi)
45
Censoring
  • Restrict variables
  • Y dnorm(a,s)I(0,3)
  • Y dgamma(a,s)I(,X)

Parameter
46
Output analysis
  • Convergence (in a coming lecture)
  • Diagnostics and summaries
  • Data analysis and inference
  • Histograms
  • Auto and cross corrrelation
  • Summary statistics and probability intervals

47
Output analysis
Histogram, adjustable smoothing
(easily cut and pasted into Excel, which is
useful when investigating many similar parameters)
48
Output analysis ranks
  • Ranking institutions, surgeons, farms, regions is
    relatively easy in WinBUGS
  • The idea is to assign ranks at each iteration of
    the MC generation and then do Monte Carlo on the
    ranks

49
Example Swine farm size by state
  • NAHMS is a United States Department of
    Agriculture sponsored program to collect, analyze
    and disseminate information about animal
    (primarily production animal) health, management,
    and productivity.
  • The swine surveys are limited to swine farms and
    to states that produce the majority of swine. In
    1995, 418 swine farms from 16 states (these
    states accounted for 91 of the total US hog and
    pig production in 1995) were surveyed with
    questionnaires.
  • The number of farms surveyed in each state is
    roughly proportional to the number of swine farms
    in the state. One survey question asks for the
    swine farm sizes (total number of hogs and pigs
    on June 1, 1995).

50
The model
51
NAHMS data and ranks
52
WinBUGS data input
  • Splus array syntax
  • But indices are in a different order
  • Rectangular format
  • Uses separate files
  • Can mix rectangular and Splus formats

53
Splus array notation
WinBUGS reads data into an array by filling the
right-most index first (columns) S-Plus fills
the left-most index first (rows).
1 1 - 1 1 2 - 2 1 3 - 3 2 1 - 4 2 2 - 5 2 3 - 6
1 1 - 1 2 1 - 2 1 2 - 3 2 2 - 4 1 3 - 5 2 3 - 6
c(1,2,3,4,5,6) 2 x 3 matrix
54
Splus notation
Y structure(.Data c(1,2,3,4,6),.Dim c(2,3) )
Splus (rows first) 1 3 5 2 4 6
WinBUGS (columns first)
1 2 3 4 5 6
55
Splus notation
  • Use Splus to export data to a text file
  • dput(Y,mydata.txt)
  • Cut and paste into WinBUGS document
  • Edit the .Dim statement

56
Notation example
Y 1 2 3 4 5 6
dput(Y,mydata.txt)
list(Ystructure(.Datac(1,4,2,5,3,6),.Dim(2,3)))
list(Ystructure(.Datac(1,4,2,5,3,6),.Dim(3,2)))
Y 1 4 2 5 3 6
Y (no fix) 1 4 2 5 3 6
Write WB code reflecting new shape, 3x2
57
Additions/corrections to WinBUGS 1.4 manual
  • Model specification/Formatting of data When
    importing matrix data from Splus version 6 or
    above using the dput' command, you will need to
    replace Splus's nrow' and ncol' by a .Dim'
    statement. The ordering of the dimensions remains
    as in the manual. The procedure for arrays
    remains as in the manual, except that Splus's
    inverted commas and .Dimnames' statement should
    be removed.

58
Rectangular format
Put data in a separate .odc file Click on the
file when loading data
First index is always empty
Age 47 23 30 19
Y,1 Y,2 Y,3 1 2 3
4 5 6
59
WinBUGS error messages
  • Traps
  • MCMC cant update
  • Syntax errors
  • Incorrect indexing
  • Compile errors
  • Wrong bracketing
  • Data and index dont match

60
WinBUGS error messages
  • Most are given in lower left corner
  • Traps
  • Not enough information to update (prior too vague
  • Numerical instability
  • Sampling impossible values (e.g. outside of a
    uniform dist.
  • Click update button and try one more time

61
(No Transcript)
62
Multiple chains
  • Uses different initial values in different list
    statements
  • Indicate the number of chains in the model
    specification dialog box

63
Exporting results
  • Cut and paste graph and output windows
  • May not be presentation quality
  • Can paste summary tables into Excel
  • Export chains to Splus using CODA button

64
Coda button
  • Generates two windows with data that can be saved
    as text files and then imported
  • directly using CODA
  • Into Splus
  • Index window
  • Indicates the position of the variates in the
  • Variate window
  • One long column

65
(No Transcript)
66
Coda button
  • Important to have the correct extensions
  • WB will save them as .odc but
  • CODA needs .ind and .out
  • Splus needs .txt
  • Reshape the column after importing to Splus so
    that the columns correspond to parameters

67
Using WinBUGS examples
  • Many examples
  • Pick one similar to your problem
  • Use that code as a basis

68
Model Assessment
  • Carlin and Louis (2000)
  • Conditional predictive ordinate method
  • Leave-one-out cross validation approach
  • Easy in WinBUGS because missing values are easy
    to accommodate in WinBUGS
  • Generate the posterior predictive distribution of
    the left-out data point
  • Is the observed data consistent with the
    predictive distribution?

69
Model Selection
  • Bayes Factor
  • Ratio of marginal likelihoods
  • Can be done in WinBUGS
  • Model index becomes a new parameter

70
MCMC and Convergence
71
Bayesian Example
  • Want to estimate the mean ? of an experiment
  • We have some information from a previous
    experiment
  • In order to make the inference more precise, we
    will include information from the prior
    experiment in the estimate of ?

72
Bayesian Example
Sampling distribution
parameter of interest
Prior distribution
The mean and standard error from a prior
experiment
Prior distribution with non-informative
parameters
73
Example (Bayes theorem)turn the crank
Suppose one of these integrals is not
analytically tractable
Posterior distribution of the parameter of
interest
74
Example
  • We need to do numerical integration to evaluate
    the integrals
  • Many methods, e.g., importance sampling
  • Need one that can be used with many parameters
    (for modern problems)
  • A sample from f would give an estimate of f

75
Example
Intractable
Markov chain sampling (e.g., using WinBugs)
Monte Carlo estimates
Inference, e.g.,
76
Monte Carlo estimates
  • Use the (chain) sample variance unless there is
    autocorrelation
  • Geweke has a fix up
  • Use percentiles for probability intervals
  • 2.5 and 97.5
  • But check for multimodal
  • Histograms for densities

77
Monte Carlo
PI boundaries may not be symmetric!
78
Examples of Markov chain methods
  • There are many MCMC algorithms
  • Gibbs sampler
  • Metropolis algorithm
  • Reversible jump MCMC
  • MCMCMC
  • Grid based MCMC
  • Grouped, Collapsed
  • Gilks Wild algorithm

79
Markov chain methods
  • Dont need to know these methods to do Bayesian
    inference
  • WinBUGS uses several of them in a black box

80
History Generating Markov chain with the Gibbs
sampler
For each parameter, find the complete conditional
and then sample iteratively to generate a Markov
chain
81
Example Gibbs sampler
  • Many parameters mean many complete conditionals
  • Some help using group sampling (e.g., mean vector
    of a MVN)

82
Example Gibbs sampler
?(3)
?(4)
?(2)
?(0)
?(1)
? 2(2)
? 2(3)
? 2(4)
?2(0)
? 2(1)
This chain is an approximately random sample from
83
Example Gibbs sampler
  • Need find the closed form for each conditional
    and then be able to sample them.
  • For many problems most complete conditionals are
    tractable (e.g., can find the normalizing
    constant). Otherwise use the Gilks - Wild
    algorithm (or other numerical methods) for sub
    sampling

84
Example Gibbs sampler
  • After the chains are generated check convergence.
  • Then Monte Carlo methods are used for inference.
  • Model diagnostics (required for any modeling)
  • The above three steps are also required when
    using WinBUGS

85
Convergence of Markov Chains
86
Convergence
Operationally, effective convergence of Markov
chain simulation has been reached when inferences
for quantities of interest do not depend on the
starting point of the simulations (Brooks and
Gelman, 1998)
87
Contours of the joint target distribution
? 2(0)
? 2(1)
? 2 axis
? (1)
? (2)
? axis
?(0)
88
Convergence
  • Want to sample under the mode(s) and tails of the
    target distribution, in the correct proportion
  • Use burn in (initial run to reach the target
    distribution)
  • Use remaining variates (with thinning) for
    inference. Thinning is the term for keeping
    every kth variate for inference
  • This can all be done in WinBUGS

89
Convergence
  • Slow convergence due to autocorrelation among
    variates
  • Poor choice of initial values (in flat tails of
    dist.) may cause slow or no convergence
  • Multiple modes
  • Very flat posterior distribution will cause chain
    to wander
  • Lack of parameter identifiability (Eberly
    Carlin 1999)

90
Convergence identifiable parametersFrom C L
  • Ironically, the most common source of MCMC
    convergence difficulties is a result of the
    methodologies own power. The MCMC approach is so
    generally applicable and easy to use that the
    class of candidate models for a given data set
    now appears limited only by the users
    imagination. However, with this generality has
    come the temptation to fit models so large that
    their parameters are unidentified, or nearly so.

91
Unidentified
With flat priors on the µs, only the sum is
identified by the data
The Gibbs sampler will not reveal this problem,
as the complete conditionals are both normal
distributions (C L, 2000)
92
Identifiability
93
Screening example
(1-sp)
94
P converges, but the others do not
sensitivity
Apparent prev
True prev
specificity
95
Convergence Tests
  • Cowles, and Carlin (1996), Markov chain Monte
    Carlo convergence diagnostics A comparative
    review'', JASA
  • Brooks - Gelman - Rubin test (multiple chains)
  • Gewekes test
  • Thick felt tip pen test
  • Many others
  • Should also check autocorrelation

96
Trace Plot for µ - 2000 iterations
Look for patterns, or increasing/decreasing
97
Thick felt tip pen test
98
No ConvergenceThe Markov chain may be wandering
around in the tails or have a flat distribution.
Lack of noise indicates possibly large
autocorrelation
99
Autocorrelation lag
100
Autcorrelation
101
Small autocorrelation
102
Autocorrelation (over relaxation)
Note the small cyclic pattern - probably doesnt
matter
103
Mild autocorrelation may be removed by selecting
the thinning interval. However it is best to
avoid discarding iterations
There are Monte Carlo estimates of variation that
account for autocorrelation
104
Gewekes test
  • Compares the first part of a chain with the last
    part in modified two group t-test
  • Recommends first 10 and last 50 of the chain
  • Standard errors are based on spectral density
    (time series) estimates

105
Gewekes test
106
Brooks-Gelman-Rubin test (in WB)
  • Uses multiple chains for the same parameter with
    different starting points selected from an
    overdispersed guess of the target distribution
  • Use 10 (?), more if multimodal (Cowles and
    Carlin, 1996)
  • WinBUGS is slower with more chains

107
Brooks - Gelman - Rubin test
  • ANOVA compares pooled chain variance to within
    chain variance
  • Are chains far apart compared to the internal
    variability?

108
B-G-R convergence
Between variance is larger than expected relative
to within variance at the beginning of the plot.
Looks OK here
109
Brooks - Gelman - Rubin test
  • Initially pooled chain variance is greater than
    within chain if initial values are overdispersed

110
GR diag button in WinBUGS
  • R length pooled interval of a parameter / mean
    length of within seq. interval
  • The width of the central 80 (empirical) interval
    of the pooled runs is green,
  • the average width of the 80 intervals within the
    individual runs is blue
  • ratio R ( pooled/within ) is red

111
GR in WinBUGS
  • for plotting purposes the pooled and within
    interval widths are normalized to have an overall
    maximum of one.
  • For convergence
  • Need convergence of R to 1,
  • and stability for pooled and within

112
BGR plot
113
GR Plot and output
Control click to get data dump
114
Multimodal
iteration
115
Means of a Mixture model with non- mixed data
116
Assessing convergence with the posterior tails
  • Markov chain methods sample from the posterior
    distribution
  • Tail values are sampled less often
  • If tail probabilities have not converged, then
    the chain doesnt represent the posterior
    distribution

117
Stats
Quantiles (2.5, 50, 97.5)
Look for convergence in the tails (tail quantiles)
118
Hierarchical models
119
Hierarchical models
  • Chapter 5 in Bayesian Data Analysis, Gelman,
    Carlin, Stern and Rubin
  • multiple parameters that can be regarded as
    related or connected in some way by the structure
    of the problem, implying that a joint probability
    model should reflect the dependence among them.

120
Example 1 (Gelman)
  • Let pj be the survival probability of cardiac
    patients at hospital j, j1,,K
  • It may be reasonable to expect that the estimates
    of pj, which represent a sample of hospitals,
    should be related to each other.
  • That is, the pj come from a common distribution

121
Example 2 (cluster effect)
  • The randomization of an experiment occurs at a
    different level than the observational units.
  • Randomize sows to treatment and control
  • Observe piglet mortality
  • But the piglets were not randomized
  • Correlated within sow
  • The sow level parameters (within treatment) are
    related

122
H-M
Sow population
Random sample of 4 sows
treatment
Control
Litter 1
Litter 4
Litter 2
Litter 3
0 1 0 0 0 1 1 1
0 1 1 1
123
H-M
Hyperpriors
The sow probabilities come from a common
distribution. Think of the sow probs. as data,
and use the data to update the parameters
Piglet j from sow i has probability pig of
survival (g is group)
124
H-M
  • Note that this model is also called a mixed
    effects model.
  • The treatment group is the fixed effect
  • The sow effect is random
  • Test the group effect while accounting for the
    within sow correlation

125
H-M inference for group effect
  • Recall that inference is through posterior
    distributions
  • R1 lt- a1/(a1 ß1)
  • R2 lt- a2/(a2 ß2)
  • Generate Markov chains for R1- R2
  • Pr(R1- R2gt0)

126
H-M hyperpriors
  • May be able to elicit values from experts
  • This may be less clear if the hierarchy is large
  • Empirical Bayes
  • Use the data to get the hyperparameters
  • but then fix the PIs
  • Non-informative
  • But there may be little info in the data about
    the hyperparameters
  • Some parameters may have difficulty converging
  • Is that a problem?

127
Caveat
  • It is sometimes difficult to determine if the
    second stage (or higher) is correct, that is, are
    the sow parameters really related
  • For example, if the sows came from different
    suppliers then the genetics may be different.

128
Caveat example
  • NAHMS example

129
Caveat example
  • Are the parameters for North Carolina related to
    the parameters for the other states?
  • If not
  • model separately
  • If uncertain
  • use robust methods

130
H-M A little theory
  • Normal models
  • Normal likelihood
  • Normal prior on the means
  • for(i in 1k)
  • ybari dnorm(mui,tau)
  • mui dnorm(theta,omega
  • theta dnorm(alpha,beta)
  • Assume precisions are known

131
H-M A little theory
  • In many cases, primary interest is in the mui
    or a function of the mui
  • Hospitals in the Gelman example
  • States in the NAHMS example
  • Small area estimation

132
If theta is known (not hierarchical)
  • The posterior mean of mui is a weighted sum of
    ybari and theta.
  • The weights functions of the variances

133
H-M A little theoryGiven the prior parameters
134
H-M A little theory
  • However, if the hyperprior is non-informative and
    theta is integrated

A weighted average of all the ybari. This pools
the data from the other sources
135
Example Wes
136
References and Resources
137
References
  • Chen, Shao, Ibrahim (2000). Monte Carlo methods
    in Bayesian Computation. Springer
  • Casella George (1992). Explaining the Gibbs
    sampler. The American Statistician, p. 167
  • Smith Roberts (1993). Bayesian computation via
    the Gibbs sampler and Related Markov chain
    methods. JRSS-B, p. 3

138
References
  • Tanner (1996 3rd edition). Tools for Statistical
    Inference Methods for the Exploration of
    Posterior Distributions and Likelihood Functions.
    Springer
  • Gilks, Richardson, Spiegelhalter (eds.)(1996).
    Markov Chain Monte Carlo in Practice. Chapman and
    Hall

139
References
  • Cowles, Mary Kathryn , and Carlin, Bradley P.
     (1996), Markov chain Monte Carlo convergence
    diagnostics A comparative review'', JASA, 91 ,
    883-904
  • Eberly, Lynn E. , and Carlin, Bradley P.  (2000),
    Identifiability and convergence issues for
    Markov chain Monte Carlo fitting of spatial
    models'', Statistics in Medicine, 19 (17-) ,
    2279-2294
  • . Brooks, Stephen P. , and Gelman, Andrew
     (1998), General methods for monitoring
    convergence of iterative simulations'', Journal
    of Computational and Graphical Statistics, 7 ,
    434-455

140
References
  • Carlin, Bradley P. , and Louis, Thomas A.
     (2000), Bayes and Empirical Bayes methods for
    data analysis'', Chapman Hall Ltd (London New
    York)
  • Gelman, A. , Carlin, J. B. , Stern, H. S. , and
    Rubin, D. B.  (1995), Bayesian data analysis'',
    Chapman Hall Ltd (London New York)

141
WinBUGS Resources
  • The BUGS project
  • http//www.mrc-bsu.cam.ac.uk/bugs/
  • Classic BUGS and WinBUGS
  • Get the key to unlock the student version
  • CODA or BOA
  • Mailing list
  • Helpful hints and tricks
Write a Comment
User Comments (0)
About PowerShow.com