Title: Bayesian Methods and WinBUGS Rich Evans Iowa State University College of Veterinary Medicine, Produc
1Bayesian Methods and WinBUGSRich EvansIowa
State UniversityCollege of Veterinary
Medicine,Production Animal Medicine
2Agenda
- Basics
- Run a simple program
- Advanced
- Use more controls
- Additional programming
- Convergence
- Hierarchical models
3Bayesian inference Why do it?
- Efron, B. (1986), Why isn't everyone a
Bayesian? (C/R p5-11 p330-331)'', The American
Statistician, 40 , 1-5 - Before MCMC
- Theoretical reasons
- Philosophical reasons
- Easier for some problems
- Another tool in the statistical toolbox
4Motivating example Estimating Prevalence
- Estimate the prevalence of M. Hyo. in a swine
population. - Use a Bayesian version of the Discordant
Resolution method (Alonzo and Pepe, 1999). - 63 swine were tested with TWEEN 20 and the DAKO
ELISA tests. - Subjects with discordant results were retested
using the IDEXX.
5Example
Test positive1
6Example Issues and fix-ups
- There arent good confidence intervals for
prevalence using the DR method - Easy with the Bayesian method
- Uncertainty about the test results is not
reflected in the inference - Want to model uncertainty about the resolution
- 2 of 3 values agreeing is not conclusive
7Model
- Each subject has a disease-positive probability
- The 2 or 3 binary outcomes are sampled with this
probability - The subject disease probabilities come from a
common distribution, and their mean is the
disease prevalence. - The prevalence has a beta prior distribution, and
those parameters have hyper-priors
8missing included in the analysis
9Example
Posterior distribution may be intractable
Markov chain sampling (e.g., using WinBUGS)
Monte Carlo estimates
Inference, e.g.,
10WinBUGS results
11Prior parameters for prev
12DR method prev 0.73
13(No Transcript)
14Bayesian inference for m
Prior information (dist) for m
Data, sampling distribution
turn the crank (definition of conditional dist,
joint/marg)
Posterior distribution
Posterior expected values and variances (The
dist. of a rv contains all possible info about
the rv)
15Model parts
Data
WinBUGS
Markov chains (like data from posterior)
CODA or BOA
Inference or prediction
16(No Transcript)
17(No Transcript)
18(No Transcript)
19WinBUGS resources
- The BUGS project
- http//www.mrc-bsu.cam.ac.uk/bugs/
- CODA or BOA
- Mailing list
- Helpful hints and tricks
- Obtaining WinBUGS
- Get the key to unlock the student version
20WinBUGS Development web-site
- The site is designed to facilitate the
distribution of various (free) extensions of
WinBUGS, - allow users to implement their own specialized
functions by 'hard-wiring' them into the software
via compiled Pascal code - The home-page for the new web-site
ishttp//homepages.tesco.net/creeping_death/
21WinBUGS
- Model specification language rather than a
programming language
22WinBUGS
- Need only the data and model
- MCMC diagnostics
- Trace plots
- Autocorrelation
- B-G-R plots
- It will output in CODA format
- Output text files
- Use .ind and .out extensions (not .txt)
23WinBUGS easy example
- Estimate an experiment mean ?
- We have some information from a prior experiment
- In order to make the inference more precise, we
will include information from the prior
experiment in the inference for ?
24Example
Sampling distribution
parameter of interest
Prior distribution
The mean and standard error from a prior
experiment
Prior distribution with non-informative
parameters
25Distributions in WinBUGS
- Provides many distributions
- It is possible to use arbitrary distributions
- See the BUGS project web page for the technique
26(No Transcript)
27model
body
Data and initial values
28model
Sampling distribution
Prior distributions
Standard deviation of y
29model
is is distributed as
Distributions have ds
less than dash is assignment
30model
Loops are S-Plus syntax. Indexing can be nested
(yxi) and multivariate (yi,j)
Data and initial values in S-Plus format
Not all parameters need user specified inits
311 2 3
4 5
32Running an example
- Model -gt specification tool
- Load model
- Load data
- Compile
- Load inits
33Running an example
- Model -gt Update tool
- Burn in
- Inference -gt samples
- Choose parameters to monitor
- Update tool to continue iterating
34Running an example
- Run a basic program
- Burn in, thinning, over relax
- Monitoring and clearing variables
- AutoC button
- Stop and start a program during a run
35Running an example
- Run with multiple chains
- Eyeball convergence
- Auto corr
- Trace plots
- Quantiles
- GR convergence
36End Part I
37Advanced WinBUGS
38Editing Graphics
- Edit -gt object properties
- Title
- Axis bounds
- Fonts
- margins
- Smoothing
39Editing text
- Attributes menu
- Text menu
40Scripts
- Set of Scripting commands that circumvent the
button interface - Files (.odc or .txt)
- Script file (commands)
- WinBUGS program
- Data file(s)
- Initial value file(s)
- Can run in the background
41Scripts
- See the WinBUGS help file for the list of
commands - Run the Normal example
- Use forward / slashes in file names!!!
- check('c/Projects/talks/DavisISVEE/Normalexample.
odc')
42Indexing
- Similar to Splus
- Yi,j,k
- Yzi
- Useful for ragged arrays
- Vector for the response (Y) and a corresponding
group identification vector (z)
z has discrete values
43Indexing ragged array
44Indexing ragged arrayWinBUGS
For (i in 1N) Yzi
dnorm(muzi,tauzi)
45Censoring
- Restrict variables
- Y dnorm(a,s)I(0,3)
- Y dgamma(a,s)I(,X)
Parameter
46Output analysis
- Convergence (in a coming lecture)
- Diagnostics and summaries
- Data analysis and inference
- Histograms
- Auto and cross corrrelation
- Summary statistics and probability intervals
47Output analysis
Histogram, adjustable smoothing
(easily cut and pasted into Excel, which is
useful when investigating many similar parameters)
48Output analysis ranks
- Ranking institutions, surgeons, farms, regions is
relatively easy in WinBUGS - The idea is to assign ranks at each iteration of
the MC generation and then do Monte Carlo on the
ranks
49Example Swine farm size by state
- NAHMS is a United States Department of
Agriculture sponsored program to collect, analyze
and disseminate information about animal
(primarily production animal) health, management,
and productivity. - The swine surveys are limited to swine farms and
to states that produce the majority of swine. In
1995, 418 swine farms from 16 states (these
states accounted for 91 of the total US hog and
pig production in 1995) were surveyed with
questionnaires. - The number of farms surveyed in each state is
roughly proportional to the number of swine farms
in the state. One survey question asks for the
swine farm sizes (total number of hogs and pigs
on June 1, 1995).
50The model
51NAHMS data and ranks
52WinBUGS data input
- Splus array syntax
- But indices are in a different order
- Rectangular format
- Uses separate files
- Can mix rectangular and Splus formats
53Splus array notation
WinBUGS reads data into an array by filling the
right-most index first (columns) S-Plus fills
the left-most index first (rows).
1 1 - 1 1 2 - 2 1 3 - 3 2 1 - 4 2 2 - 5 2 3 - 6
1 1 - 1 2 1 - 2 1 2 - 3 2 2 - 4 1 3 - 5 2 3 - 6
c(1,2,3,4,5,6) 2 x 3 matrix
54Splus notation
Y structure(.Data c(1,2,3,4,6),.Dim c(2,3) )
Splus (rows first) 1 3 5 2 4 6
WinBUGS (columns first)
1 2 3 4 5 6
55Splus notation
- Use Splus to export data to a text file
- dput(Y,mydata.txt)
- Cut and paste into WinBUGS document
- Edit the .Dim statement
56Notation example
Y 1 2 3 4 5 6
dput(Y,mydata.txt)
list(Ystructure(.Datac(1,4,2,5,3,6),.Dim(2,3)))
list(Ystructure(.Datac(1,4,2,5,3,6),.Dim(3,2)))
Y 1 4 2 5 3 6
Y (no fix) 1 4 2 5 3 6
Write WB code reflecting new shape, 3x2
57Additions/corrections to WinBUGS 1.4 manual
- Model specification/Formatting of data When
importing matrix data from Splus version 6 or
above using the dput' command, you will need to
replace Splus's nrow' and ncol' by a .Dim'
statement. The ordering of the dimensions remains
as in the manual. The procedure for arrays
remains as in the manual, except that Splus's
inverted commas and .Dimnames' statement should
be removed.
58Rectangular format
Put data in a separate .odc file Click on the
file when loading data
First index is always empty
Age 47 23 30 19
Y,1 Y,2 Y,3 1 2 3
4 5 6
59WinBUGS error messages
- Traps
- MCMC cant update
- Syntax errors
- Incorrect indexing
- Compile errors
- Wrong bracketing
- Data and index dont match
60WinBUGS error messages
- Most are given in lower left corner
- Traps
- Not enough information to update (prior too vague
- Numerical instability
- Sampling impossible values (e.g. outside of a
uniform dist. - Click update button and try one more time
61(No Transcript)
62Multiple chains
- Uses different initial values in different list
statements - Indicate the number of chains in the model
specification dialog box
63Exporting results
- Cut and paste graph and output windows
- May not be presentation quality
- Can paste summary tables into Excel
- Export chains to Splus using CODA button
64Coda button
- Generates two windows with data that can be saved
as text files and then imported - directly using CODA
- Into Splus
- Index window
- Indicates the position of the variates in the
- Variate window
- One long column
65(No Transcript)
66Coda button
- Important to have the correct extensions
- WB will save them as .odc but
- CODA needs .ind and .out
- Splus needs .txt
- Reshape the column after importing to Splus so
that the columns correspond to parameters
67Using WinBUGS examples
- Many examples
- Pick one similar to your problem
- Use that code as a basis
68Model Assessment
- Carlin and Louis (2000)
- Conditional predictive ordinate method
- Leave-one-out cross validation approach
- Easy in WinBUGS because missing values are easy
to accommodate in WinBUGS - Generate the posterior predictive distribution of
the left-out data point - Is the observed data consistent with the
predictive distribution?
69Model Selection
- Bayes Factor
- Ratio of marginal likelihoods
- Can be done in WinBUGS
- Model index becomes a new parameter
70MCMC and Convergence
71Bayesian Example
- Want to estimate the mean ? of an experiment
- We have some information from a previous
experiment - In order to make the inference more precise, we
will include information from the prior
experiment in the estimate of ?
72Bayesian Example
Sampling distribution
parameter of interest
Prior distribution
The mean and standard error from a prior
experiment
Prior distribution with non-informative
parameters
73Example (Bayes theorem)turn the crank
Suppose one of these integrals is not
analytically tractable
Posterior distribution of the parameter of
interest
74Example
- We need to do numerical integration to evaluate
the integrals - Many methods, e.g., importance sampling
- Need one that can be used with many parameters
(for modern problems) - A sample from f would give an estimate of f
75Example
Intractable
Markov chain sampling (e.g., using WinBugs)
Monte Carlo estimates
Inference, e.g.,
76Monte Carlo estimates
- Use the (chain) sample variance unless there is
autocorrelation - Geweke has a fix up
- Use percentiles for probability intervals
- 2.5 and 97.5
- But check for multimodal
- Histograms for densities
77Monte Carlo
PI boundaries may not be symmetric!
78Examples of Markov chain methods
- There are many MCMC algorithms
- Gibbs sampler
- Metropolis algorithm
- Reversible jump MCMC
- MCMCMC
- Grid based MCMC
- Grouped, Collapsed
- Gilks Wild algorithm
79Markov chain methods
- Dont need to know these methods to do Bayesian
inference - WinBUGS uses several of them in a black box
80History Generating Markov chain with the Gibbs
sampler
For each parameter, find the complete conditional
and then sample iteratively to generate a Markov
chain
81Example Gibbs sampler
- Many parameters mean many complete conditionals
- Some help using group sampling (e.g., mean vector
of a MVN)
82Example Gibbs sampler
?(3)
?(4)
?(2)
?(0)
?(1)
? 2(2)
? 2(3)
? 2(4)
?2(0)
? 2(1)
This chain is an approximately random sample from
83Example Gibbs sampler
- Need find the closed form for each conditional
and then be able to sample them. - For many problems most complete conditionals are
tractable (e.g., can find the normalizing
constant). Otherwise use the Gilks - Wild
algorithm (or other numerical methods) for sub
sampling
84Example Gibbs sampler
- After the chains are generated check convergence.
- Then Monte Carlo methods are used for inference.
- Model diagnostics (required for any modeling)
- The above three steps are also required when
using WinBUGS
85Convergence of Markov Chains
86Convergence
Operationally, effective convergence of Markov
chain simulation has been reached when inferences
for quantities of interest do not depend on the
starting point of the simulations (Brooks and
Gelman, 1998)
87Contours of the joint target distribution
? 2(0)
? 2(1)
? 2 axis
? (1)
? (2)
? axis
?(0)
88Convergence
- Want to sample under the mode(s) and tails of the
target distribution, in the correct proportion - Use burn in (initial run to reach the target
distribution) - Use remaining variates (with thinning) for
inference. Thinning is the term for keeping
every kth variate for inference - This can all be done in WinBUGS
89Convergence
- Slow convergence due to autocorrelation among
variates - Poor choice of initial values (in flat tails of
dist.) may cause slow or no convergence - Multiple modes
- Very flat posterior distribution will cause chain
to wander - Lack of parameter identifiability (Eberly
Carlin 1999)
90Convergence identifiable parametersFrom C L
- Ironically, the most common source of MCMC
convergence difficulties is a result of the
methodologies own power. The MCMC approach is so
generally applicable and easy to use that the
class of candidate models for a given data set
now appears limited only by the users
imagination. However, with this generality has
come the temptation to fit models so large that
their parameters are unidentified, or nearly so.
91Unidentified
With flat priors on the µs, only the sum is
identified by the data
The Gibbs sampler will not reveal this problem,
as the complete conditionals are both normal
distributions (C L, 2000)
92Identifiability
93Screening example
(1-sp)
94P converges, but the others do not
sensitivity
Apparent prev
True prev
specificity
95Convergence Tests
- Cowles, and Carlin (1996), Markov chain Monte
Carlo convergence diagnostics A comparative
review'', JASA - Brooks - Gelman - Rubin test (multiple chains)
- Gewekes test
- Thick felt tip pen test
- Many others
- Should also check autocorrelation
96Trace Plot for µ - 2000 iterations
Look for patterns, or increasing/decreasing
97Thick felt tip pen test
98No ConvergenceThe Markov chain may be wandering
around in the tails or have a flat distribution.
Lack of noise indicates possibly large
autocorrelation
99Autocorrelation lag
100Autcorrelation
101Small autocorrelation
102Autocorrelation (over relaxation)
Note the small cyclic pattern - probably doesnt
matter
103Mild autocorrelation may be removed by selecting
the thinning interval. However it is best to
avoid discarding iterations
There are Monte Carlo estimates of variation that
account for autocorrelation
104Gewekes test
- Compares the first part of a chain with the last
part in modified two group t-test - Recommends first 10 and last 50 of the chain
- Standard errors are based on spectral density
(time series) estimates
105Gewekes test
106Brooks-Gelman-Rubin test (in WB)
- Uses multiple chains for the same parameter with
different starting points selected from an
overdispersed guess of the target distribution - Use 10 (?), more if multimodal (Cowles and
Carlin, 1996) - WinBUGS is slower with more chains
107Brooks - Gelman - Rubin test
- ANOVA compares pooled chain variance to within
chain variance - Are chains far apart compared to the internal
variability?
108B-G-R convergence
Between variance is larger than expected relative
to within variance at the beginning of the plot.
Looks OK here
109Brooks - Gelman - Rubin test
- Initially pooled chain variance is greater than
within chain if initial values are overdispersed
110GR diag button in WinBUGS
- R length pooled interval of a parameter / mean
length of within seq. interval - The width of the central 80 (empirical) interval
of the pooled runs is green, - the average width of the 80 intervals within the
individual runs is blue - ratio R ( pooled/within ) is red
111GR in WinBUGS
- for plotting purposes the pooled and within
interval widths are normalized to have an overall
maximum of one. - For convergence
- Need convergence of R to 1,
- and stability for pooled and within
112BGR plot
113GR Plot and output
Control click to get data dump
114Multimodal
iteration
115Means of a Mixture model with non- mixed data
116Assessing convergence with the posterior tails
- Markov chain methods sample from the posterior
distribution - Tail values are sampled less often
- If tail probabilities have not converged, then
the chain doesnt represent the posterior
distribution
117Stats
Quantiles (2.5, 50, 97.5)
Look for convergence in the tails (tail quantiles)
118Hierarchical models
119Hierarchical models
- Chapter 5 in Bayesian Data Analysis, Gelman,
Carlin, Stern and Rubin - multiple parameters that can be regarded as
related or connected in some way by the structure
of the problem, implying that a joint probability
model should reflect the dependence among them.
120Example 1 (Gelman)
- Let pj be the survival probability of cardiac
patients at hospital j, j1,,K - It may be reasonable to expect that the estimates
of pj, which represent a sample of hospitals,
should be related to each other. - That is, the pj come from a common distribution
121Example 2 (cluster effect)
- The randomization of an experiment occurs at a
different level than the observational units. - Randomize sows to treatment and control
- Observe piglet mortality
- But the piglets were not randomized
- Correlated within sow
- The sow level parameters (within treatment) are
related
122H-M
Sow population
Random sample of 4 sows
treatment
Control
Litter 1
Litter 4
Litter 2
Litter 3
0 1 0 0 0 1 1 1
0 1 1 1
123H-M
Hyperpriors
The sow probabilities come from a common
distribution. Think of the sow probs. as data,
and use the data to update the parameters
Piglet j from sow i has probability pig of
survival (g is group)
124H-M
- Note that this model is also called a mixed
effects model. - The treatment group is the fixed effect
- The sow effect is random
- Test the group effect while accounting for the
within sow correlation
125H-M inference for group effect
- Recall that inference is through posterior
distributions - R1 lt- a1/(a1 ß1)
- R2 lt- a2/(a2 ß2)
- Generate Markov chains for R1- R2
- Pr(R1- R2gt0)
126H-M hyperpriors
- May be able to elicit values from experts
- This may be less clear if the hierarchy is large
- Empirical Bayes
- Use the data to get the hyperparameters
- but then fix the PIs
- Non-informative
- But there may be little info in the data about
the hyperparameters - Some parameters may have difficulty converging
- Is that a problem?
127Caveat
- It is sometimes difficult to determine if the
second stage (or higher) is correct, that is, are
the sow parameters really related - For example, if the sows came from different
suppliers then the genetics may be different.
128Caveat example
129Caveat example
- Are the parameters for North Carolina related to
the parameters for the other states? - If not
- model separately
- If uncertain
- use robust methods
130H-M A little theory
- Normal models
- Normal likelihood
- Normal prior on the means
- for(i in 1k)
- ybari dnorm(mui,tau)
- mui dnorm(theta,omega
- theta dnorm(alpha,beta)
- Assume precisions are known
131H-M A little theory
- In many cases, primary interest is in the mui
or a function of the mui - Hospitals in the Gelman example
- States in the NAHMS example
- Small area estimation
132If theta is known (not hierarchical)
- The posterior mean of mui is a weighted sum of
ybari and theta. - The weights functions of the variances
133H-M A little theoryGiven the prior parameters
134H-M A little theory
- However, if the hyperprior is non-informative and
theta is integrated
A weighted average of all the ybari. This pools
the data from the other sources
135Example Wes
136References and Resources
137References
- Chen, Shao, Ibrahim (2000). Monte Carlo methods
in Bayesian Computation. Springer - Casella George (1992). Explaining the Gibbs
sampler. The American Statistician, p. 167 - Smith Roberts (1993). Bayesian computation via
the Gibbs sampler and Related Markov chain
methods. JRSS-B, p. 3
138References
- Tanner (1996 3rd edition). Tools for Statistical
Inference Methods for the Exploration of
Posterior Distributions and Likelihood Functions.
Springer - Gilks, Richardson, Spiegelhalter (eds.)(1996).
Markov Chain Monte Carlo in Practice. Chapman and
Hall
139References
- Cowles, Mary Kathryn , and Carlin, Bradley P.
(1996), Markov chain Monte Carlo convergence
diagnostics A comparative review'', JASA, 91 ,
883-904 - Eberly, Lynn E. , and Carlin, Bradley P. (2000),
Identifiability and convergence issues for
Markov chain Monte Carlo fitting of spatial
models'', Statistics in Medicine, 19 (17-) ,
2279-2294 - . Brooks, Stephen P. , and Gelman, Andrew
(1998), General methods for monitoring
convergence of iterative simulations'', Journal
of Computational and Graphical Statistics, 7 ,
434-455
140References
- Carlin, Bradley P. , and Louis, Thomas A.
(2000), Bayes and Empirical Bayes methods for
data analysis'', Chapman Hall Ltd (London New
York) - Gelman, A. , Carlin, J. B. , Stern, H. S. , and
Rubin, D. B. (1995), Bayesian data analysis'',
Chapman Hall Ltd (London New York)
141WinBUGS Resources
- The BUGS project
- http//www.mrc-bsu.cam.ac.uk/bugs/
- Classic BUGS and WinBUGS
- Get the key to unlock the student version
- CODA or BOA
- Mailing list
- Helpful hints and tricks