Bayesian Methods and WinBUGS Rich Evans Iowa State University College of Veterinary Medicine, Produc

About This Presentation

Title:

Bayesian Methods and WinBUGS Rich Evans Iowa State University College of Veterinary Medicine, Produc

Description:

See the BUGS project web page for the technique. 8/20/09. Bayesian methods ... asks for the swine farm sizes (total number of hogs and pigs on June 1, 1995) ... – PowerPoint PPT presentation

Number of Views:558

Avg rating:3.0/5.0

Slides: 142

Provided by: guth7

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian Methods and WinBUGS Rich Evans Iowa State University College of Veterinary Medicine, Produc

1
Bayesian Methods and WinBUGSRich EvansIowa
State UniversityCollege of Veterinary
Medicine,Production Animal Medicine
2
Agenda

Basics
Run a simple program
Advanced
Use more controls
Additional programming
Convergence
Hierarchical models

3
Bayesian inference Why do it?

Efron, B. (1986), Why isn't everyone a
Bayesian? (C/R p5-11 p330-331)'', The American
Statistician, 40 , 1-5
Before MCMC
Theoretical reasons
Philosophical reasons
Easier for some problems
Another tool in the statistical toolbox

4
Motivating example Estimating Prevalence

Estimate the prevalence of M. Hyo. in a swine
population.
Use a Bayesian version of the Discordant
Resolution method (Alonzo and Pepe, 1999).
63 swine were tested with TWEEN 20 and the DAKO
ELISA tests.
Subjects with discordant results were retested
using the IDEXX.

5
Example
Test positive1
6
Example Issues and fix-ups

There arent good confidence intervals for
prevalence using the DR method
Easy with the Bayesian method
Uncertainty about the test results is not
reflected in the inference
Want to model uncertainty about the resolution
2 of 3 values agreeing is not conclusive

7
Model

Each subject has a disease-positive probability
The 2 or 3 binary outcomes are sampled with this
probability
The subject disease probabilities come from a
common distribution, and their mean is the
disease prevalence.
The prevalence has a beta prior distribution, and
those parameters have hyper-priors

8
missing included in the analysis
9
Example
Posterior distribution may be intractable
Markov chain sampling (e.g., using WinBUGS)
Monte Carlo estimates
Inference, e.g.,
10
WinBUGS results
11
Prior parameters for prev
12
DR method prev 0.73
13
(No Transcript)
14
Bayesian inference for m
Prior information (dist) for m
Data, sampling distribution
turn the crank (definition of conditional dist,
joint/marg)
Posterior distribution
Posterior expected values and variances (The
dist. of a rv contains all possible info about
the rv)
15
Model parts
Data
WinBUGS
Markov chains (like data from posterior)
CODA or BOA
Inference or prediction
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
WinBUGS resources

The BUGS project
http//www.mrc-bsu.cam.ac.uk/bugs/
CODA or BOA
Mailing list
Helpful hints and tricks
Obtaining WinBUGS
Get the key to unlock the student version

20
WinBUGS Development web-site

The site is designed to facilitate the
distribution of various (free) extensions of
WinBUGS,
allow users to implement their own specialized
functions by 'hard-wiring' them into the software
via compiled Pascal code
The home-page for the new web-site
ishttp//homepages.tesco.net/creeping_death/

21
WinBUGS

Model specification language rather than a
programming language

22
WinBUGS

Need only the data and model
MCMC diagnostics
Trace plots
Autocorrelation
B-G-R plots
It will output in CODA format
Output text files
Use .ind and .out extensions (not .txt)

23
WinBUGS easy example

Estimate an experiment mean ?
We have some information from a prior experiment
In order to make the inference more precise, we
will include information from the prior
experiment in the inference for ?

24
Example
Sampling distribution
parameter of interest
Prior distribution
The mean and standard error from a prior
experiment
Prior distribution with non-informative
parameters
25
Distributions in WinBUGS

Provides many distributions
It is possible to use arbitrary distributions
See the BUGS project web page for the technique

26
(No Transcript)
27
model
body
Data and initial values
28
model
Sampling distribution
Prior distributions
Standard deviation of y
29
model
is is distributed as
Distributions have ds
less than dash is assignment
30
model
Loops are S-Plus syntax. Indexing can be nested
(yxi) and multivariate (yi,j)
Data and initial values in S-Plus format
Not all parameters need user specified inits
31
1 2 3
4 5
32
Running an example

Model -gt specification tool
Load model
Load data
Compile
Load inits

33
Running an example

Model -gt Update tool
Burn in
Inference -gt samples
Choose parameters to monitor
Update tool to continue iterating

34
Running an example

Run a basic program
Burn in, thinning, over relax
Monitoring and clearing variables
AutoC button
Stop and start a program during a run

35
Running an example

Run with multiple chains
Eyeball convergence
Auto corr
Trace plots
Quantiles
GR convergence

36
End Part I
37
Advanced WinBUGS
38
Editing Graphics

Edit -gt object properties
Title
Axis bounds
Fonts
margins
Smoothing

39
Editing text

Attributes menu
Text menu

40
Scripts

Set of Scripting commands that circumvent the
button interface
Files (.odc or .txt)
Script file (commands)
WinBUGS program
Data file(s)
Initial value file(s)
Can run in the background

41
Scripts

See the WinBUGS help file for the list of
commands
Run the Normal example
Use forward / slashes in file names!!!
check('c/Projects/talks/DavisISVEE/Normalexample.
odc')

42
Indexing

Similar to Splus
Yi,j,k
Yzi
Useful for ragged arrays
Vector for the response (Y) and a corresponding
group identification vector (z)

z has discrete values
43
Indexing ragged array
44
Indexing ragged arrayWinBUGS
For (i in 1N) Yzi
dnorm(muzi,tauzi)
45
Censoring

Restrict variables
Y dnorm(a,s)I(0,3)
Y dgamma(a,s)I(,X)

Parameter
46
Output analysis

Convergence (in a coming lecture)
Diagnostics and summaries
Data analysis and inference
Histograms
Auto and cross corrrelation
Summary statistics and probability intervals

47
Output analysis
Histogram, adjustable smoothing
(easily cut and pasted into Excel, which is
useful when investigating many similar parameters)
48
Output analysis ranks

Ranking institutions, surgeons, farms, regions is
relatively easy in WinBUGS
The idea is to assign ranks at each iteration of
the MC generation and then do Monte Carlo on the
ranks

49
Example Swine farm size by state

NAHMS is a United States Department of
Agriculture sponsored program to collect, analyze
and disseminate information about animal
(primarily production animal) health, management,
and productivity.
The swine surveys are limited to swine farms and
to states that produce the majority of swine. In
1995, 418 swine farms from 16 states (these
states accounted for 91 of the total US hog and
pig production in 1995) were surveyed with
questionnaires.
The number of farms surveyed in each state is
roughly proportional to the number of swine farms
in the state. One survey question asks for the
swine farm sizes (total number of hogs and pigs
on June 1, 1995).

50
The model
51
NAHMS data and ranks
52
WinBUGS data input

Splus array syntax
But indices are in a different order
Rectangular format
Uses separate files
Can mix rectangular and Splus formats

53
Splus array notation
WinBUGS reads data into an array by filling the
right-most index first (columns) S-Plus fills
the left-most index first (rows).
1 1 - 1 1 2 - 2 1 3 - 3 2 1 - 4 2 2 - 5 2 3 - 6
1 1 - 1 2 1 - 2 1 2 - 3 2 2 - 4 1 3 - 5 2 3 - 6
c(1,2,3,4,5,6) 2 x 3 matrix
54
Splus notation
Y structure(.Data c(1,2,3,4,6),.Dim c(2,3) )
Splus (rows first) 1 3 5 2 4 6
WinBUGS (columns first)
1 2 3 4 5 6
55
Splus notation

Use Splus to export data to a text file
dput(Y,mydata.txt)
Cut and paste into WinBUGS document
Edit the .Dim statement

56
Notation example
Y 1 2 3 4 5 6
dput(Y,mydata.txt)
list(Ystructure(.Datac(1,4,2,5,3,6),.Dim(2,3)))
list(Ystructure(.Datac(1,4,2,5,3,6),.Dim(3,2)))
Y 1 4 2 5 3 6
Y (no fix) 1 4 2 5 3 6
Write WB code reflecting new shape, 3x2
57
Additions/corrections to WinBUGS 1.4 manual

Model specification/Formatting of data When
importing matrix data from Splus version 6 or
above using the dput' command, you will need to
replace Splus's nrow' and ncol' by a .Dim'
statement. The ordering of the dimensions remains
as in the manual. The procedure for arrays
remains as in the manual, except that Splus's
inverted commas and .Dimnames' statement should
be removed.

58
Rectangular format
Put data in a separate .odc file Click on the
file when loading data
First index is always empty
Age 47 23 30 19
Y,1 Y,2 Y,3 1 2 3
4 5 6
59
WinBUGS error messages

Traps
MCMC cant update
Syntax errors
Incorrect indexing
Compile errors
Wrong bracketing
Data and index dont match

60
WinBUGS error messages

Most are given in lower left corner
Traps
Not enough information to update (prior too vague
Numerical instability
Sampling impossible values (e.g. outside of a
uniform dist.
Click update button and try one more time

61
(No Transcript)
62
Multiple chains

Uses different initial values in different list
statements
Indicate the number of chains in the model
specification dialog box

63
Exporting results

Cut and paste graph and output windows
May not be presentation quality
Can paste summary tables into Excel
Export chains to Splus using CODA button

64
Coda button

Generates two windows with data that can be saved
as text files and then imported
directly using CODA
Into Splus
Index window
Indicates the position of the variates in the
Variate window
One long column

65
(No Transcript)
66
Coda button

Important to have the correct extensions
WB will save them as .odc but
CODA needs .ind and .out
Splus needs .txt
Reshape the column after importing to Splus so
that the columns correspond to parameters

67
Using WinBUGS examples

Many examples
Pick one similar to your problem
Use that code as a basis

68
Model Assessment

Carlin and Louis (2000)
Conditional predictive ordinate method
Leave-one-out cross validation approach
Easy in WinBUGS because missing values are easy
to accommodate in WinBUGS
Generate the posterior predictive distribution of
the left-out data point
Is the observed data consistent with the
predictive distribution?

69
Model Selection

Bayes Factor
Ratio of marginal likelihoods
Can be done in WinBUGS
Model index becomes a new parameter

70
MCMC and Convergence
71
Bayesian Example

Want to estimate the mean ? of an experiment
We have some information from a previous
experiment
In order to make the inference more precise, we
will include information from the prior
experiment in the estimate of ?

72
Bayesian Example
Sampling distribution
parameter of interest
Prior distribution
The mean and standard error from a prior
experiment
Prior distribution with non-informative
parameters
73
Example (Bayes theorem)turn the crank
Suppose one of these integrals is not
analytically tractable
Posterior distribution of the parameter of
interest
74
Example

We need to do numerical integration to evaluate
the integrals
Many methods, e.g., importance sampling
Need one that can be used with many parameters
(for modern problems)
A sample from f would give an estimate of f

75
Example
Intractable
Markov chain sampling (e.g., using WinBugs)
Monte Carlo estimates
Inference, e.g.,
76
Monte Carlo estimates

Use the (chain) sample variance unless there is
autocorrelation
Geweke has a fix up
Use percentiles for probability intervals
2.5 and 97.5
But check for multimodal
Histograms for densities

77
Monte Carlo
PI boundaries may not be symmetric!
78
Examples of Markov chain methods

There are many MCMC algorithms
Gibbs sampler
Metropolis algorithm
Reversible jump MCMC
MCMCMC
Grid based MCMC
Grouped, Collapsed
Gilks Wild algorithm

79
Markov chain methods

Dont need to know these methods to do Bayesian
inference
WinBUGS uses several of them in a black box

80
History Generating Markov chain with the Gibbs
sampler
For each parameter, find the complete conditional
and then sample iteratively to generate a Markov
chain
81
Example Gibbs sampler

Many parameters mean many complete conditionals
Some help using group sampling (e.g., mean vector
of a MVN)

82
Example Gibbs sampler
?(3)
?(4)
?(2)
?(0)
?(1)
? 2(2)
? 2(3)
? 2(4)
?2(0)
? 2(1)
This chain is an approximately random sample from
83
Example Gibbs sampler

Need find the closed form for each conditional
and then be able to sample them.
For many problems most complete conditionals are
tractable (e.g., can find the normalizing
constant). Otherwise use the Gilks - Wild
algorithm (or other numerical methods) for sub
sampling

84
Example Gibbs sampler

After the chains are generated check convergence.
Then Monte Carlo methods are used for inference.
Model diagnostics (required for any modeling)
The above three steps are also required when
using WinBUGS

85
Convergence of Markov Chains
86
Convergence
Operationally, effective convergence of Markov
chain simulation has been reached when inferences
for quantities of interest do not depend on the
starting point of the simulations (Brooks and
Gelman, 1998)
87
Contours of the joint target distribution
? 2(0)
? 2(1)
? 2 axis
? (1)
? (2)
? axis
?(0)
88
Convergence

Want to sample under the mode(s) and tails of the
target distribution, in the correct proportion
Use burn in (initial run to reach the target
distribution)
Use remaining variates (with thinning) for
inference. Thinning is the term for keeping
every kth variate for inference
This can all be done in WinBUGS

89
Convergence

Slow convergence due to autocorrelation among
variates
Poor choice of initial values (in flat tails of
dist.) may cause slow or no convergence
Multiple modes
Very flat posterior distribution will cause chain
to wander
Lack of parameter identifiability (Eberly
Carlin 1999)

90
Convergence identifiable parametersFrom C L

Ironically, the most common source of MCMC
convergence difficulties is a result of the
methodologies own power. The MCMC approach is so
generally applicable and easy to use that the
class of candidate models for a given data set
now appears limited only by the users
imagination. However, with this generality has
come the temptation to fit models so large that
their parameters are unidentified, or nearly so.

91
Unidentified
With flat priors on the µs, only the sum is
identified by the data
The Gibbs sampler will not reveal this problem,
as the complete conditionals are both normal
distributions (C L, 2000)
92
Identifiability
93
Screening example
(1-sp)
94
P converges, but the others do not
sensitivity
Apparent prev
True prev
specificity
95
Convergence Tests

Cowles, and Carlin (1996), Markov chain Monte
Carlo convergence diagnostics A comparative
review'', JASA
Brooks - Gelman - Rubin test (multiple chains)
Gewekes test
Thick felt tip pen test
Many others
Should also check autocorrelation

96
Trace Plot for µ - 2000 iterations
Look for patterns, or increasing/decreasing
97
Thick felt tip pen test
98
No ConvergenceThe Markov chain may be wandering
around in the tails or have a flat distribution.
Lack of noise indicates possibly large
autocorrelation
99
Autocorrelation lag
100
Autcorrelation
101
Small autocorrelation
102
Autocorrelation (over relaxation)
Note the small cyclic pattern - probably doesnt
matter
103
Mild autocorrelation may be removed by selecting
the thinning interval. However it is best to
avoid discarding iterations
There are Monte Carlo estimates of variation that
account for autocorrelation
104
Gewekes test

Compares the first part of a chain with the last
part in modified two group t-test
Recommends first 10 and last 50 of the chain
Standard errors are based on spectral density
(time series) estimates

105
Gewekes test
106
Brooks-Gelman-Rubin test (in WB)

Uses multiple chains for the same parameter with
different starting points selected from an
overdispersed guess of the target distribution
Use 10 (?), more if multimodal (Cowles and
Carlin, 1996)
WinBUGS is slower with more chains

107
Brooks - Gelman - Rubin test

ANOVA compares pooled chain variance to within
chain variance
Are chains far apart compared to the internal
variability?

108
B-G-R convergence
Between variance is larger than expected relative
to within variance at the beginning of the plot.
Looks OK here
109
Brooks - Gelman - Rubin test

Initially pooled chain variance is greater than
within chain if initial values are overdispersed

110
GR diag button in WinBUGS

R length pooled interval of a parameter / mean
length of within seq. interval
The width of the central 80 (empirical) interval
of the pooled runs is green,
the average width of the 80 intervals within the
individual runs is blue
ratio R ( pooled/within ) is red

111
GR in WinBUGS

for plotting purposes the pooled and within
interval widths are normalized to have an overall
maximum of one.
For convergence
Need convergence of R to 1,
and stability for pooled and within

112
BGR plot
113
GR Plot and output
Control click to get data dump
114
Multimodal
iteration
115
Means of a Mixture model with non- mixed data
116
Assessing convergence with the posterior tails

Markov chain methods sample from the posterior
distribution
Tail values are sampled less often
If tail probabilities have not converged, then
the chain doesnt represent the posterior
distribution

117
Stats
Quantiles (2.5, 50, 97.5)
Look for convergence in the tails (tail quantiles)
118
Hierarchical models
119
Hierarchical models

Chapter 5 in Bayesian Data Analysis, Gelman,
Carlin, Stern and Rubin
multiple parameters that can be regarded as
related or connected in some way by the structure
of the problem, implying that a joint probability
model should reflect the dependence among them.

120
Example 1 (Gelman)

Let pj be the survival probability of cardiac
patients at hospital j, j1,,K
It may be reasonable to expect that the estimates
of pj, which represent a sample of hospitals,
should be related to each other.
That is, the pj come from a common distribution

121
Example 2 (cluster effect)

The randomization of an experiment occurs at a
different level than the observational units.
Randomize sows to treatment and control
Observe piglet mortality
But the piglets were not randomized
Correlated within sow
The sow level parameters (within treatment) are
related

122
H-M
Sow population
Random sample of 4 sows
treatment
Control
Litter 1
Litter 4
Litter 2
Litter 3
0 1 0 0 0 1 1 1
0 1 1 1
123
H-M
Hyperpriors
The sow probabilities come from a common
distribution. Think of the sow probs. as data,
and use the data to update the parameters
Piglet j from sow i has probability pig of
survival (g is group)
124
H-M

Note that this model is also called a mixed
effects model.
The treatment group is the fixed effect
The sow effect is random
Test the group effect while accounting for the
within sow correlation

125
H-M inference for group effect

Recall that inference is through posterior
distributions
R1 lt- a1/(a1 ß1)
R2 lt- a2/(a2 ß2)
Generate Markov chains for R1- R2
Pr(R1- R2gt0)

126
H-M hyperpriors

May be able to elicit values from experts
This may be less clear if the hierarchy is large
Empirical Bayes
Use the data to get the hyperparameters
but then fix the PIs
Non-informative
But there may be little info in the data about
the hyperparameters
Some parameters may have difficulty converging
Is that a problem?

127
Caveat

It is sometimes difficult to determine if the
second stage (or higher) is correct, that is, are
the sow parameters really related
For example, if the sows came from different
suppliers then the genetics may be different.

128
Caveat example

NAHMS example

129
Caveat example

Are the parameters for North Carolina related to
the parameters for the other states?
If not
model separately
If uncertain
use robust methods

130
H-M A little theory

Normal models
Normal likelihood
Normal prior on the means
for(i in 1k)
ybari dnorm(mui,tau)
mui dnorm(theta,omega
theta dnorm(alpha,beta)
Assume precisions are known

131
H-M A little theory

In many cases, primary interest is in the mui
or a function of the mui
Hospitals in the Gelman example
States in the NAHMS example
Small area estimation

132
If theta is known (not hierarchical)

The posterior mean of mui is a weighted sum of
ybari and theta.
The weights functions of the variances

133
H-M A little theoryGiven the prior parameters
134
H-M A little theory

However, if the hyperprior is non-informative and
theta is integrated

A weighted average of all the ybari. This pools
the data from the other sources
135
Example Wes
136
References and Resources
137
References

Chen, Shao, Ibrahim (2000). Monte Carlo methods
in Bayesian Computation. Springer
Casella George (1992). Explaining the Gibbs
sampler. The American Statistician, p. 167
Smith Roberts (1993). Bayesian computation via
the Gibbs sampler and Related Markov chain
methods. JRSS-B, p. 3

138
References

Tanner (1996 3rd edition). Tools for Statistical
Inference Methods for the Exploration of
Posterior Distributions and Likelihood Functions.
Springer
Gilks, Richardson, Spiegelhalter (eds.)(1996).
Markov Chain Monte Carlo in Practice. Chapman and
Hall

139
References

Cowles, Mary Kathryn , and Carlin, Bradley P.
(1996), Markov chain Monte Carlo convergence
diagnostics A comparative review'', JASA, 91 ,
883-904
Eberly, Lynn E. , and Carlin, Bradley P. (2000),
Identifiability and convergence issues for
Markov chain Monte Carlo fitting of spatial
models'', Statistics in Medicine, 19 (17-) ,
2279-2294
. Brooks, Stephen P. , and Gelman, Andrew
(1998), General methods for monitoring
convergence of iterative simulations'', Journal
of Computational and Graphical Statistics, 7 ,
434-455

140
References

Carlin, Bradley P. , and Louis, Thomas A.
(2000), Bayes and Empirical Bayes methods for
data analysis'', Chapman Hall Ltd (London New
York)
Gelman, A. , Carlin, J. B. , Stern, H. S. , and
Rubin, D. B. (1995), Bayesian data analysis'',
Chapman Hall Ltd (London New York)

141
WinBUGS Resources