Inference%20on%20Relational%20Models%20Using%20Markov%20Chain%20Monte%20Carlo - PowerPoint PPT Presentation

About This Presentation
Title:

Inference%20on%20Relational%20Models%20Using%20Markov%20Chain%20Monte%20Carlo

Description:

Inference on Relational Models Using Markov Chain Monte Carlo ... 'Rob...' 'Adv...' 'Rob...' 'Shak...' 'Haml...' 'Wm...' 'Seu...' 'The...' 'Seu...' 'Russell' ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 62
Provided by: brian94
Category:

less

Transcript and Presenter's Notes

Title: Inference%20on%20Relational%20Models%20Using%20Markov%20Chain%20Monte%20Carlo


1
Inference on Relational Models Using Markov Chain
Monte Carlo
  • Brian Milch
  • Massachusetts Institute of Technology
  • UAI Tutorial
  • July 19, 2007

2
Example 1 Bibliographies
3
Example 2 Aircraft Tracking
t1
t2
t3
4
Inference on Relational Structures
2.3 x 10-12
4.5 x 10-14
1.2 x 10-12
Russell
Norvig
Roberts
Russell
AI A Mod...
Advance...
AI A Mod...
Rus...
AI...
AI A...
Rus...
AI...
AI A...
Rob...
Adv...
Rob...
Seuss
Shak...
Tempest
The...
If you...
Hamlet
Rus...
AI...
AI A...
Shak...
Haml...
Wm...
Seu...
The...
Seu...
6.7 x 10-16
8.9 x 10-16
5.0 x 10-20
5
Markov Chain Monte Carlo (MCMC)
  • Markov chain s1, s2, ... over worlds where
    evidence E is true
  • Approximate P(QE) as fraction of s1, s2, ...
    that satisfy query Q

Q
E
6
Outline
  • Probabilistic models for relational structures
  • Modeling the number of objects
  • Three mistakes that are easy to make
  • Markov chain Monte Carlo (MCMC)
  • Gibbs sampling
  • Metropolis-Hastings
  • MCMC over events
  • Case studies
  • Citation matching
  • Multi-target tracking

7
Simple Example Clustering
10
20
30
40
50
60
70
80
90
100
Wingspan (cm)
8
Simple Bayesian Mixture Model
  • Number of latent objects is known to be k
  • For each latent object i, have parameter
  • For each data point j, have object selectorand
    observable value

9
BN for Mixture Model

?1
?2
?k

X1
X2
X3
Xn

C1
C2
C3
Cn
10
Context-Specific Dependencies

?1
?2
?k

X1
X2
X3
Xn

C1
C2
C3
Cn
2
1
2
11
Extensions to Mixture Model
  • Random number of latent objects k, with
    distribution p(k) such as
  • Uniform(1, , 100)
  • Geometric(0.1)
  • Poisson(10)
  • Random distribution ? for selecting objects
  • p(? k) Dirichlet(?1,..., ?k)(Dirichlet
    distribution over probability vectors)
  • Still symmetric each ?i ?/k

unbounded!
12
Existence versus Observation
  • A latent object can exist even if no observations
    correspond to it
  • Bird species may not be observed yet
  • Aircraft may fly over without yielding any blips
  • Two questions
  • How many objects correspond to observations?
  • How many objects are there in total?
  • Observed 3 species, each 100 times probably no
    more
  • Observed 200 species, each 1 or 2 times probably
    more exist

13
Expecting Additional Objects
r observed species
observe more later?


  • P(ever observe new species seen r so far)
    bounded by P(k ? r)
  • So as species observed ? ?, probability of ever
    seeing more ? 0
  • What if we dont want this?

14
Dirichlet Process Mixtures
  • Set k ?, let ? be infinite-dimensional
    probability vector with stick-breaking prior
  • Another view Define prior directly on partitions
    of data points, allowing unbounded number of
    blocks
  • Drawback Cant ask about number of unobserved
    latent objects (always infinite)

Ferguson 1983 Sethuraman 1994tutorials
Jordan 2005 Sudderth 2006
15
Outline
  • Probabilistic models for relational structures
  • Modeling the number of objects
  • Three mistakes that are easy to make
  • Markov chain Monte Carlo (MCMC)
  • Gibbs sampling
  • Metropolis-Hastings
  • MCMC over events
  • Case studies
  • Citation matching
  • Multi-target tracking

16
Mistake 1 Ignoring Interchangeability
  • Which birds are in species S1?
  • Latent object indices are interchangeable
  • Posterior on selector variable CB1 is uniform
  • Posterior on ?S1 has a peak for each cluster of
    birds
  • Really care about partition of observations
  • Partition with r blocks corresponds to k! /
    (k-r)! instantiations of the Cj variables

B1
B3
B2
B5
B4
1, 3, 2, 4, 5
(1, 2, 1, 3, 3), (1, 2, 1, 4, 4), (1, 4, 1, 3,
3), (2, 1, 2, 3, 3),
17
Ignoring Interchangeability, Contd
  • Say k 4. Whats prior probability that B1, B3
    are in one species, B2 in another?
  • Multiply probabilities for CB1, CB2, CB3
    (1/4) x (1/4) x (1/4)
  • Not enough! Partition B1, B3, B2
    corresponds to 12 instantiations of Cs
  • Partition with r blocks corresponds to kPr
    instantiations

(S1, S2, S1), (S1, S3, S1), (S1, S4, S1), (S2,
S1, S2), (S2, S3, S2), (S2, S4, S2)(S3, S1, S3),
(S3, S2, S3), (S3, S4, S3), (S4, S1, S4), (S4,
S2, S4), (S4, S3, S4)
18
Mistake 2 Underestimating the Bayesian Ockhams
Razor Effect
  • Say k 4. Are B1 and B2 in same species?
  • Maximum-likelihood estimation would yield one
    species with ? 50 and another with ? 52
  • But Bayesian model trades off likelihood against
    prior probability of getting those ? values

XB150
XB252
10
20
30
40
50
60
70
80
90
100
Wingspan (cm)
19
Bayesian Ockhams Razor
XB150
XB252
10
20
30
40
50
60
70
80
90
100
H1 Partition is B1, B2
? 1.3 x 10-4
H2 Partition is B1, B2
? 7.5 x 10-5
Dont use more latent objects than necessary to
explain your data
MacKay 1992
20
Mistake 3 Comparing Densities Across Dimensions
XB150
XB252
10
20
30
40
50
60
70
80
90
100
Wingspan (cm)
H1 Partition is B1, B2, ? 51
? 1.5 x 10-5
H1 wins by greater margin
H2 Partition is B1, B2, ?B1 50, ?B2 52
? 4.8 x 10-7
21
What If We Change the Units?
XB10.50
XB20.52
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Wingspan (m)
H1 Partition is B1, B2, ? 0.51
? 15
H2 Partition is B1, B2, ?B1 0.50, ?B2
0.52
? 48
Now H2 wins by a landslide
22
Lesson Comparing Densities Across Dimensions
  • Densities dont behave like probabilities (e.g.,
    they can be greater than 1)
  • Heights of density peaks in spaces of different
    dimension are not comparable
  • Work-arounds
  • Find most likely partition first, then most
    likely parameters given that partition
  • Find region in parameter space where most of the
    posterior probability mass lies

23
Outline
  • Probabilistic models for relational structures
  • Modeling the number of objects
  • Three mistakes that are easy to make
  • Markov chain Monte Carlo (MCMC)
  • Gibbs sampling
  • Metropolis-Hastings
  • MCMC over events
  • Case studies
  • Citation matching
  • Multi-target tracking

24
Why Not Exact Inference?
  • Number of possible partitions is superexponential
    in n
  • Variable elimination?
  • Summing out ?i couples all the Cjs
  • Summing out Cjcouples all the ?is

25
Markov Chain Monte Carlo (MCMC)
  • Start in arbitrary state (possible world) s1
    satisfying evidence E
  • Sample s2, s3, ... according to transition kernel
    T(si, si1), yielding Markov chain
  • Approximate p(Q E) by fraction of s1, s2, , sL
    that are in Q

Q
E
26
Why a Markov Chain?
  • Why use Markov chain rather than sampling
    independently?
  • Stochastic local search for high-probability s
  • Once we find such s, explore around it

27
Convergence
  • Stationary distribution ? is such that
  • If chain is ergodic (can get to anywhere from
    anywhere), then
  • It has unique stationary distribution ?
  • Fraction of s1, s2, ..., sL in Q converges to
    ?(Q) as L ? ?
  • Well design T so ?(s) p(s E)

and its aperiodic
28
Gibbs Sampling
  • Order non-evidence variables V1,V2,...,Vm
  • Given state s, sample from T as follows
  • Let s? s
  • For i 1 to m
  • Sample vi? from p(Vi s?-i)
  • Let s? (s?-i, Vi vi?)
  • Return s?
  • Theorem stationary distribution is p(s E)

Conditional for Vi given other vars in s?
Geman Geman 1984
29
Gibbs on Bayesian Network
  • Conditional for V depends only on factors that
    contain v
  • So condition on Vs Markov blanket mb(V)
    parents, children, and co-parents

V
30
Gibbs on Bayesian Mixture Model
  • Given current state s
  • Resample each ?i given prior and Xj Cj i
    in s
  • Resample each Cj given Xj and ?1k

context-specificMarkov blanket
Neal 2000
31
Sampling Given Markov Blanket
  • If V is discrete, just iterate over values,
    normalize, sample from discrete distrib.
  • If V is continuous
  • Simple if child distributions are conjugate to
    Vs prior posterior has same form as prior with
    different parameters
  • In general, even sampling from p(v s-V) can be
    hard

See BUGS software http//www.mrc-bsu.cam.ac.uk/b
ugs
32
Convergence Can Be Slow
?1 20
?2 90
species 2 is far away
10
20
30
40
50
60
70
80
90
100
should be two clusters
Wingspan (cm)
  • Cjs wont change until ?2 is in right area
  • ?2 does unguided random walk as long as no
    observations are associated with it
  • Especially bad in high dimensions

33
Outline
  • Probabilistic models for relational structures
  • Modeling the number of objects
  • Three mistakes that are easy to make
  • Markov chain Monte Carlo (MCMC)
  • Gibbs sampling
  • Metropolis-Hastings
  • MCMC over events
  • Case studies
  • Citation matching
  • Multi-target tracking

34
Metropolis-Hastings
Metropolis et al. 1953 Hastings 1970
  • Define T(si, si1) as follows
  • Sample s? from proposal distribution q(s? s)
  • Compute acceptance probability
  • With probability ?, let si1 s?
    else let si1 si

relative posteriorprobabilities
backward / forwardproposal probabilities
Can show that p(s E) is stationary distribution
for T
35
Metropolis-Hastings
  • Benefits
  • Proposal distribution can propose big steps
    involving several variables
  • Only need to compute ratio p(s? E) / p(s E),
    ignoring normalization factors
  • Dont need to sample from conditional distribs
  • Limitations
  • Proposals must be reversible, else q(s s?) 0
  • Need to be able to compute q(s s?) / q(s? s)

36
Split-Merge Proposals
  • Choose two observations i, j
  • If Ci Cj c, then split cluster c
  • Get unused latent object c?
  • For each observation m such that Cm c, change
    Cm to c? with probability 0.5
  • Propose new values for ?c, ?c?
  • Else merge clusters ci and cj
  • For each m such that Cm cj, set Cm ci
  • Propose new value for ?c

Jain Neal 2004
37
Split-Merge Example
?1 20
?2 90
?2 27
10
20
30
40
50
60
70
80
90
100
Wingspan (cm)
  • Split two birds from species 1
  • Resample ?2 to match these two birds
  • Move is likely to be accepted

38
Mixtures of Kernels
  • If T1,,Tm all have stationary distribution ?,
    then so does mixture
  • Example Mixture of split-merge and Gibbs moves
  • Point Faster convergence

39
Outline
  • Probabilistic models for relational structures
  • Modeling the number of objects
  • Three mistakes that are easy to make
  • Markov chain Monte Carlo (MCMC)
  • Gibbs sampling
  • Metropolis-Hastings
  • MCMC over events
  • Case studies
  • Citation matching
  • Multi-target tracking

40
MCMC States in Split-Merge
  • Not complete instantiations!
  • No parameters for unobserved species
  • States are partial instantiations of random
    variables
  • Each state corresponds to an event set of
    outcomes satisfying description

k 12, CB1 S2, CB2 S8, ?S2 31, ?S8 84
41
MCMC over Events
Milch Russell 2006
  • Markov chain over events ?, with stationary
    distrib. proportional to p(?)
  • Theorem Fraction of visited events in Q
    converges to p(QE) if
  • Each ? is either subset of Q or disjoint from Q
  • Events form partition of E

Q
E
42
Computing Probabilities of Events
  • Engine needs to compute p(??) / p(?n) efficiently
    (without summations)
  • Use instantiations that include all active
    parents of the variables they instantiate
  • Then probability is product of CPDs

43
States That Are Even More Abstract
  • Typical partial instantiation
  • Specifies particular species numbers, even though
    species are interchangeable
  • Let states be abstract partial instantiations
  • See Milch Russell 2006 for conditions under
    which we can compute probabilities of such events

k 12, CB1 S2, CB2 S8, ?S2 31, ?S8 84
? x ? y ? x k 12, CB1 x, CB2 y, ?x 31,
?y 84
44
Outline
  • Probabilistic models for relational structures
  • Modeling the number of objects
  • Three mistakes that are easy to make
  • Markov chain Monte Carlo (MCMC)
  • Gibbs sampling
  • Metropolis-Hastings
  • MCMC over events
  • Case studies
  • Citation matching
  • Multi-target tracking

45
Representative Applications
  • Tracking cars with cameras Pasula et al. 1999
  • Segmentation in computer vision Tu Zhu 2002
  • Citation matching Pasula et al. 2003
  • Multi-target tracking with radar Oh et al. 2004

46
Citation Matching Model
Pasula et al. 2003 Milch Russell 2006
Researcher NumResearchersPrior() Name(r)
NamePrior() Paper NumPapersPrior() FirstAutho
r(p) Uniform(Researcher r) Title(p)
TitlePrior() PubCited(c) Uniform(Paper
p) Text(c) NoisyCitationGrammar
(Name(FirstAuthor(PubCited(c))),
Title(PubCited(c)))
47
Citation Matching
  • Elaboration of generative model shown earlier
  • Parameter estimation
  • Priors for names, titles, citation formats
    learned offline from labeled data
  • String corruption parameters learned with Monte
    Carlo EM
  • Inference
  • MCMC with split-merge proposals
  • Guided by canopies of similar citations
  • Accuracy stabilizes after 20 minutes

Pasula et al., NIPS 2002
48
Citation Matching Results
Four data sets of 300-500 citations, referring
to 150-300 papers
49
Cross-Citation Disambiguation
Wauchope, K. Eucalyptus Integrating Natural
Language Input with a Graphical User Interface.
NRL Report NRL/FR/5510-94-9711 (1994).
Is "Eucalyptus" part of the title, or is the
author named K. Eucalyptus Wauchope?
50
Preliminary Experiments Information Extraction
  • P(citation text title, author names) modeled
    with simple HMM
  • For each paper recover title, author surnames
    and given names
  • Fraction whose attributes are recovered perfectly
    in last MCMC state
  • among papers with one citation 36.1
  • among papers with multiple citations 62.6

Can use inferred knowledge for disambiguation
51
Multi-Object Tracking
UnobservedObject
FalseDetection
52
State Estimation for Aircraft
Aircraft NumAircraftPrior() State(a, t) if t
0 then InitState() else StateTransition(Sta
te(a, Pred(t))) Blip(Source a, Time t)
NumDetectionsCPD(State(a, t)) Blip(Time t)
NumFalseAlarmsPrior() ApparentPos(r)if
(Source(r) null) then FalseAlarmDistrib()else
ObsCPD(State(Source(r), Time(r)))
53
Aircraft Entering and Exiting
Aircraft(EntryTime t) NumAircraftPrior() Exi
ts(a, t) if InFlight(a, t) then
Bernoulli(0.1) InFlight(a, t)if t lt
EntryTime(a) then falseelseif t EntryTime(a)
then trueelse (InFlight(a, Pred(t))
!Exits(a, Pred(t))) State(a, t)if t
EntryTime(a) then InitState() elseif
InFlight(a, t) then StateTransition(State(a,
Pred(t))) Blip(Source a, Time t) if
InFlight(a, t) then NumDetectionsCPD(State(a,
t))
plus last two statements from previous slide
54
MCMC for Aircraft Tracking
  • Uses generative model from previous slide
    (although not with BLOG syntax)
  • Examples of Metropolis-Hastings proposals

Figures by Songhwai Oh
Oh et al., CDC 2004
55
Aircraft Tracking Results
Estimation Error
Running Time
MCMC has smallest error, hardly degrades at all
as tracks get dense
MCMC is nearly as fast as greedy algorithm
much faster than MHT
Oh et al., CDC 2004
Figures by Songhwai Oh
56
Toward General-Purpose Inference
  • Currently, each new application requires new code
    for
  • Proposing moves
  • Representing MCMC states
  • Computing acceptance probabilities
  • Goal
  • User specifies model and proposal distribution
  • General-purpose code does the rest

57
General MCMC Engine
Milch Russell 2006
Model (in declarative language)
MCMC states partial worlds
  • Define p(s)

Custom proposal distribution (Java class)
  • Propose MCMC state s? given sn
  • Compute ratio q(sn s?) / q(s? sn)
  • Compute acceptance probability based on model
  • Set sn1

Handle arbitrary proposals efficiently using
context-specific structure
General-purpose engine (Java code)
58
Summary
  • Models for relational structures go beyond
    standard probabilistic inference settings
  • MCMC provides a feasible path for inference
  • Open problems
  • More general inference
  • Adaptive MCMC
  • Integrating discriminative methods

59
References
  • Blei, D. M. and Jordan, M. I. (2005) Variational
    inference for Dirichlet process mixtures. J.
    Bayesian Analysis 1(1)121-144.
  • Casella, G. and Robert, C. P. (1996)
    Rao-Blackwellisation of sampling schemes.
    Biometrika 83(1)81-94.
  • Ferguson T. S. (1983) Bayesian density
    estimation by mixtures of normal distributions.
    In Rizvi, M. H. et al., eds. Recent Advances in
    Statistics Papers in Honor of Herman Chernoff on
    His Sixtieth Birthday. Academic Press, New York,
    pages 287-302.
  • Geman, S. and Geman, D. (1984) Stochastic
    relaxation, Gibbs distributions and the Bayesian
    restoration of images. IEEE Trans. on Pattern
    Analysis and Machine Intelligence 6721-741.
  • Gilks, W. R., Thomas, A. and Spiegelhalter, D. J.
    (1994) A language and program for complex
    Bayesian modelling. The Statistician
    43(1)169-177.
  • Gilks, W. R., Richardson, S., and Spiegelhalter,
    D. J., eds. (1996) Markov Chain Monte Carlo in
    Practice. Chapman and Hall.
  • Green, P. J. (1995) Reversible jump Markov chain
    Monte Carlo computation and Bayesian model
    determination. Biometrika 82(4)711-732.

60
References
  • Hastings, W. K. (1970) Monte Carlo sampling
    methods using Markov chains and their
    applications. Biometrika 5797-109.
  • Jain, S. and Neal, R. M. (2004) A split-merge
    Markov chain Monte Carlo procedure for the
    Dirichlet process mixture model. J.
    Computational and Graphical Statistics
    13(1)158-182.
  • Jordan M. I. (2005) Dirichlet processes, Chinese
    restaurant processes, and all that. Tutorial at
    the NIPS Conference, available at
    http//www.cs.berkeley.edu/jordan/nips-tutorial05
    .ps
  • MacKay D. J. C. (1992) Bayesian Interpolation
    Neural Computation 4(3)414-447.
  • MacEachern, S. N. (1994) Estimating normal means
    with a conjugate style Dirichlet process prior
    Communications in Statistics Simulation and
    Computation 23727-741.
  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M.
    N., Teller, A. H. and Teller, E. (1953)
    Equations of state calculations by fast
    computing machines. J. Chemical Physics
    211087-1092.
  • Milch, B., Marthi, B., Russell, S., Sontag, D.,
    Ong, D. L., and Kolobov, A. (2005) BLOG
    Probabilistic Models with Unknown Objects. In
    Proc. 19th Intl Joint Conf. on AI, pages
    1352-1359.
  • Milch, B. and Russell, S. (2006) General-purpose
    MCMC inference over relational structures. In
    Proc. 22nd Conf. on Uncertainty in AI, pages
    349-358.

61
References
  • Neal, R. M. (2000) Markov chain sampling methods
    for Dirichlet process mixture models. J.
    Computational and Graphical Statistics 9249-265.
  • Oh, S., Russell, S. and Sastry, S. (2004) Markov
    chain Monte Carlo data association for general
    multi-target tracking problems. In Proc. 43rd
    IEEE Conf. on Decision and Control, pages
    734-742.
  • Pasula, H., Russell, S. J., Ostland, M., and
    Ritov, Y. (1999) Tracking many objects with many
    sensors. In Proc. 16th Intl Joint Conf. on AI,
    pages 1160-1171.
  • Pasula, H., Marthi, B., Milch, B., Russell, S.,
    and Shpitser, I. (2003) Identity uncertainty and
    citation matching. In Advances in Neural
    Information Processing Systems 15, MIT Press,
    pages 1401-1408.
  • Richardson,, S. and Green, P. J. (1997) On
    Bayesian analysis of mixtures with an unknown
    number of components. J. Royal Statistical
    Society B 59731-792.
  • Sethuraman, J. (1994) A constructive definition
    of Dirichlet priors. Statistica Sinica
    4639-650.
  • Sudderth, E. (2006) Graphical models for visual
    object recognition and tracking. Ph.D. thesis,
    Dept. of EECS, Massachusetts Institute of
    Technology, Cambridge, MA.
  • Tu, Z. and Zhu, S.-C. (2002) Image segmentation
    by data-driven Markov chain Monte Carlo. IEEE
    Trans. Pattern Analysis and Machine Intelligence
    24(5)657-673.
Write a Comment
User Comments (0)
About PowerShow.com