Markov Chain Monte Carlo MCMC Methods in Bayesian Estimation - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Markov Chain Monte Carlo MCMC Methods in Bayesian Estimation

Description:

University of Maryland. Slide 1. March 3, 2003. Markov Chain Monte Carlo (MCMC) Methods in Bayesian Estimation. Robert J. Mislevy ... – PowerPoint PPT presentation

Number of Views:262

Avg rating:3.0/5.0

Slides: 23

Provided by: bobmi9

Category:

more less

Transcript and Presenter's Notes

Title: Markov Chain Monte Carlo MCMC Methods in Bayesian Estimation

1
Markov Chain Monte Carlo (MCMC) Methods in
Bayesian Estimation

Robert J. Mislevy
University of Maryland
March 3, 2003

2
Topics

A generic full Bayesian model for measurement
models
Basic idea of MCMC
Properties of MCMC
Metropolis sampling
Metropolis-Hastings sampling

3
A full Bayesian model A generic measurement
model

Xij Response of Person i to Item j
qi Parameter(s) of Person i
bj Parameter(s) of Item j
h Parameter(s) for distribution of qs
t Parameter(s) for distribution of bs
Note Exchangeability assumed here for qs and
for bs--i.e., modeling all with the same prior.
Later well incorporate additional info, about
people and/or items.

4
A full Bayesian model The recursive expression
of the model
The measurement model Item response given
person item parameters Distributions for person
parameters Distributions for item
parameters Distribution for parameter(s) of
distributions for item parameters Distribution
for parameter(s) of distributions for person
parameters
5
A full Bayesian model The usual MSBNx diagram

Addresses just one person
Includes all responses for that person
Item parameters implicit, in the conditional
probabilities for item responses
q population distribution structure implicit, in
the prior distribution for this examinee

6
A full Bayesian model A BUGS diagram
bj
pij
qi
t
h
Xij
Items j
Persons i

Addresses all responses, all people, and all
items
Plates for people and items
Item parameters explicit
q population distribution structure explicit

7
A full Bayesian model Bayes theorem
Observe particular data x for all people and
all items. Want to make inferences about qs and
bs, now conditional on x. By Bayes Theorem,
Normalizing constant is nasty
8
A full Bayesian model Bayes theorem

Two strategies for drawing inferences without
having to evaluate the normalizing constant
Modal estimation. E.g. BILOG. At any point in
the posterior for b, can calculate value of
likelihood and its derivative. Tells you if the
point is a maximum, and if not, what direction to
step that might get you to a higher value of the
posterior.
Simulation-based approximation. E.g., BUGS.
Devise a chain for sampling from full
conditionals (see next slide). After becoming
stationery, a draw for a given variable in a
given cycle has the same distribution as a draw
from that variables marginal posterior.
Approximate distributions summary statistics
from many such draws.

9
Markov Chain Monte Carlo EstimationThe special
case of Gibbs Sampling

Draw values from full conditional
distributions
Start with a possible value for each variable in
cycle 0.
In cycle t1,

10
Markov Chain Monte Carlo EstimationGeneralizatio
ns of Gibbs Sampling

Dont need to go in the same order every cycle
Dont need to hit every variable in every cycle
Can sample in blocks of parameters (e.g., the
three item parameters of each item in the 3PL IRT
model)
Dont need to sample from the exact full
conditional--can do, for example, Metropolis or
Metropolis-Hastings approximations within cycles

11
Properties of MCMC (1)

Draws in cycle t1 depend on values in cycle t,
but given them, not on previous cycles--Markov
property of no memory.
Dependence on previous values introduces
autocorrelations across cycles. Depends on
problem structure, amount of data.
Under regularity conditions (e.g., can cover the
space, or get from any point to any other
point.), dependence on starting values is
forgotten after a sufficiently long run.
Hence,
burn in cycles, left out of summary
calculations.
Run multiple chains from different,
over-dispersed starting values, to see if they
look like theyre sampled from the same
stationery distribution.
Gelman-Rubin convergence diagnostics in BUGS
like ANOVAs

12
Properties of MCMC (2)

An example of a violation of regularity
conditions
a Heywood case in a factor analysis run. Needed
a prior on factor loadings that bounded them away
from 1 and -1.

13
Properties of MCMC (3)

Mixing means how much draws for a given
parameter can move around the space each cycle.
More autocorrelation goes along with poor
mixing.
Better mixing means the same number of cycles
provides more information about the posterior,
the ceiling being independent draws from the
posterior. Worse mixing means more cycles are
needed for (a) burn-in and (b) a given level of
precision for statistics for the posteriors.

relatively bad mixing
relatively good mixing
14
Metropolis and Metropolis-Hastings sampling
within Gibbs (1)

In straight Gibbs sampling, you draw from the
full conditional posterior for each parameter
each cycle.
Great when they are in a familiar form to sample
from, but sometimes they arent.
Metropolis and Metropolis-Hastings (MH) are
alternatives that can be used within Gibbs
sampling, when the full conditional can be
computed, but cant be sampled from directly.

15
Metropolis and Metropolis-Hastings sampling
within Gibbs (2)

Basic idea Draw from a different distribution
that you can compute AND you can sample from the
proposal distribution. Draws from the proposal
distribution are either accepted, or they are
rejected and the value of this variable in the
next cycle of the Gibbs sampler remains the same.
Almost any proposal distribution will work, as
long as it is defined over the right range.

16
Metropolis and Metropolis-Hastings sampling
within Gibbs (3)

Popular choice Normal distribution, with mean at
variables previous value and some sd--could be
determined empirically.
Best mixing when 30-40 of the proposals are
accepted. Worse mixing when too many or too few
are accepted.
What BUGS is doing when it says adapting is
trying M with trial values of the sd, seeing how
many proposals are accepted, then widening or
narrowing the distribution.

17
Metropolis sampling (1)

a variable in the posterior we are interested in.
its value in cycle t of a Gibbs sampler.
the full conditional for z, which includes data
and most recent draws for all other variables.
the proposal distribution, which we note may
depend on zt-- for example, N(zt,1).
a draw from the proposal distribution. Then

y
18
Metropolis sampling (2)

Metropolis algorithm holds when the proposal
distribution is symmetric
(E.g., this is the case when the proposal
distribution is the normal with a specified
distribution and mean given by the previous
value.)
Then accept y as zt1 with probability

19
Metropolis sampling (3)

Proposal distribution Normal with mean at
previous cycles value

20
Metropolis sampling (4)
y

Accept this y as zt1 with probability 1

21
Metropolis sampling (5)
y

Accept this y as zt1 with probability .75

22
Metropolis-Hastings sampling

Extension of Metropolis sampling, in which the
proposal distribution need not be symmetricie,
Now accept y as zt1 with probability
Simplifies to M when symmetry holds.

Write a Comment

User Comments (0)