Exact and approximate inference in probabilistic graphical models - PowerPoint PPT Presentation

Loading...

PPT – Exact and approximate inference in probabilistic graphical models PowerPoint presentation | free to download - id: d4c47-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Exact and approximate inference in probabilistic graphical models

Description:

Hybrid Monte Carlo (use gradient information) Swendsen-Wang (large moves for Ising model) ... Combine best of both worlds (hybrid) Use smart deterministic proposals ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 54
Provided by: kpm2
Learn more at: http://people.cs.ubc.ca
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Exact and approximate inference in probabilistic graphical models


1
Exact and approximate inference in probabilistic
graphical models
  • Kevin MurphyMIT CSAIL UBC CS/Stats

www.ai.mit.edu/murphyk/AAAI04
AAAI 2004 tutorial
2
Recommended reading
  • Cowell, Dawid, Lauritzen, Spiegelhalter,
    Probabilistic Networks and Expert Systems 1999
  • Jensen 2001, Bayesian Networks and Decision
    Graphs
  • Jordan (due 2005) Probabilistic graphical
    models
  • Koller Friedman (due 2005), Bayes nets and
    beyond
  • Learning in graphical models,edited M. Jordan

3
Outline
  • Introduction
  • Exact inference
  • Approximate inference
  • Deterministic
  • Stochastic (sampling)
  • Hybrid deterministic/ stochastic

4
2 reasons for approximate inference
Low treewidth BUTNon-linear/ Non-Gaussian
High tree width
Chains
Nnxn grid
eg non-linear dynamical system
Trees (no loops)
eg (Bayesian) parameter estimation
X3
X1
X2
Loopy graphs
Y3
Y1
Y2
?
5
Complexity of approximate inference
  • Approximating P(XqXe) to within a constant
    factor for all discrete BNs is NP-hard.
  • In practice, many models exhibit weak coupling,
    so we may safely ignore certain dependencies.
  • Computing P(XqXe) for all polytrees with
    discrete and Gaussian nodes is NP-hard.
  • In practice, some of the modes of the posterior
    will have negligible mass.

Dagum93
Lerner01
6
2 objective functions
  • Approximate true posterior P(hv) by Q(h)
  • Variational globally optimize all terms wrt
    simpler Q
  • Expectation propagation (EP) sequentially
    optimize each term

P
Q
P
Q
min D(QP)
min D(PQ)
Q0 gt P0
P0 gt Q0
7
Outline
  • Introduction
  • Exact inference
  • Approximate inference
  • Deterministic
  • Variational
  • Loopy belief propagation
  • Expectation propagation
  • Graph cuts
  • Stochastic (sampling)
  • Hybrid deterministic/ stochastic

8
Free energy
  • Variational goal minimize D(PQ) wrt Q, where Q
    has a simpler form than P
  • P(h,v) simpler than P(hv), so use
  • Free energy is upper bound on neg log-likelihood

9
Point estimation
  • Use
  • Minimize
  • Iterative Conditional Modes (ICM)
  • For each iteration, for each hi
  • Example K-means clustering
  • Ignores uncertainty in P(hv), P(?v)
  • Tends to get stuck in local minima

Factors in markov blanket of hi
10
Expectation Maximization (EM)
  • Point estimates for parameters (ML or MAP), full
    posterior for hidden vars.
  • E-step minimize F(Q,P) wrt Q(h)
  • M-step minimize F(Q,P) wrt Q(h?)

Exact inference
Parameter prior
Expected complete-data log-likelihood
11
EM tricks of the trade
  • Generalized EM
  • Partial M-step reduce F(Q,P) wrt Q(h?)e.g.,
    gradient method
  • Partial E-step reduce F(Q,P) wrt
    Q(h)approximate inference
  • Avoiding local optima
  • Deterministic Annealing
  • Data resampling
  • Speedup tricks
  • Combine with conjugate gradient
  • Online/incremental updates

Neal98
Rose98
Elidan02
Salakhutdinov03
Bauer97,Neal98
12
Variational Bayes (VB)
Ghahramani00,Beal02
  • Use
  • For exponential family models with conjugate
    priors, this results in a generalized version of
    EM
  • E-step modified inference to take into account
    uncertainty of parameters
  • M-step optimize Q(h?) using expected sufficient
    statistics
  • Variational Message Passing automates this,
    assuming a fully factorized (mean field) Q

Winn04
variational-Bayes.org
13
Variational inference for discrete state models
with high treewidth
  • We assume the parameters are fixed.
  • We assume Q(h) has a simple form, so we can
    easily find
  • Mean field
  • Structured variational

Xing04
Mean field
Product of chains
Grid MRF
14
Variational inference for MRFs
  • Probability is exp(-energy)
  • Free energy average energy - entropy

15
Mean field for MRFs
  • Fully factorized approximation
  • Normalization constraint
  • Average energy
  • Entropy
  • Local minima satisfy

16
Outline
  • Introduction
  • Exact inference
  • Approximate inference
  • Deterministic
  • Variational
  • Loopy belief propagation
  • Expectation propagation
  • Graph cuts
  • Stochastic (sampling)
  • Hybrid deterministic/ stochastic

17
BP vs mean field for MRFs
  • Mean field updates
  • BP updates
  • Every node i sends a different message to j
  • Empirically, BP much better than MF (e.g., MF is
    not exact even for trees)
  • BP is (attempting to) minimize the Bethe free
    energy

Weiss01
Yedidia01
18
Bethe free energy
  • We assume the graph is a tree, in which case the
    following is exact
  • Constraints
  • Normalization
  • Marginalization
  • Average energy
  • Entropy

di neighbors for node i
19
BP minimizes Bethe free energy
Yedidia01
  • Theorem Yedida, Freeman, Weiss fixed points of
    BP are local stationary points of the Bethe free
    energy
  • BP may not converge other algorithms can
    directly minimize F?, but are slower.
  • If BP does not converge, it often means F? is a
    poor approximation

20
Kikuchi free energy
  • Cluster groups of nodes together
  • Energy per region
  • Free energy per region
  • Kikuchi free energy

Counting numbers
21
Counting numbers
3
1
2
1
2
3
6
4
5
6
4
5
Bethe
Kikuchi
12 23 14 25 36 45 56
Region graphs
1245 2356
1 2 3 4 5 6
25
C -1 -2 -1 -1 -2 -1
C1-(11)-1
Fkikuchi is exact if region graph contains 2
levels (regions and intersections)and has no
cycles equivalent to junction tree!
22
Generalized BP
3
1
2
  1. 2356 4578 5689

25 45 56 58
6
4
5
5
9
7
8
  • Fkikuchu no longer exact, but more accurate than
    Fbethe
  • Generalized BP can be used to minimize Fkikuchi
  • This method of choosing regions is called the
    cluster variational method
  • In the limit, we recover the junction tree
    algorithm.

Welling04
23
Outline
  • Introduction
  • Exact inference
  • Approximate inference
  • Deterministic
  • Variational
  • Loopy belief propagation
  • Expectation propagation
  • Graph cuts
  • Stochastic (sampling)
  • Hybrid deterministic/ stochastic

24
Expectation Propagation (EP)
Minka01
  • EP iterated assumed density filtering
  • ADF recursive Bayesian estimation interleaved
    with projection step
  • Examples of ADF
  • Extended Kalman filtering
  • Moment-matching (weak marginalization)
  • Boyen-Koller algorithm
  • Some online learning algorithms

25
Assumed Density Filtering (ADF)
x
Recursive Bayesian estimation(sequential
updating of posterior)
Y1
Yn
  • If p(yix) not conjugate to p(x), then p(xy1i)
    may not be tractably representable
  • So project posterior back to representable family
  • And repeat

update
project
Projection becomes moment matching
26
Expectation Propagation
  • ADF is sensitive to the order of updates.
  • ADF approximates each posterior myopically.
  • EP iteratively approximate each term.

intractable
Simple, non-iterative, inaccurate
EP
Simple, iterative, accurate
After Ghahramani
27
Expectation Propagation
  • Input
  • Initialize
  • Repeat
  • For i0..N
  • Deletion
  • Projection
  • Inclusion
  • Until convergence
  • Output q(x)

After Ghahramani
28
BP is a special case of EP
  • BP assumes fully factorized
  • At each iteration, for each factor i, for each
    node k, KL projection matches moments (computes
    marginals by absorbing from neighbors)

Xn1
fj
Xk
fi
Xn2
29
TreeEP
Minka03
  • TreeEP assumes q(x) is represented by a tree
    (regardless of true model topology).
  • We can use the Jtree algorithm to do the moment
    matching at each iteration.
  • Faster and more accurate than LBP.
  • Faster and comparably accurate to GBP.

30
Outline
  • Introduction
  • Exact inference
  • Approximate inference
  • Deterministic
  • Variational
  • Loopy belief propagation
  • Expectation propagation
  • Graph cuts
  • Stochastic (sampling)
  • Hybrid deterministic/ stochastic

31
MPE in MRFs
  • MAP estimation energy minimization
  • Simplifications
  • Only pairwise potentials Eijk0 etc
  • Special form for potentials
  • Binary variables xi 2 0,1

32
Kinds of potential
  • Metric
  • Semi-metric satisfies (2) (3)
  • Piecewise constant, eg.
  • Potts model (metric)
  • Piecewise smooth, eg.
  • Semi-metric
  • Metric
  • Discontinuity-preserving potentials avoid
    oversmoothing

33
GraphCuts
Kolmogorov04
  • Thm we can find argmin E(x) for binary variables
    and pairwise potentials in at most O(N3) time
    using maxflow/ mincut algorithm on the graph
    below iff potentials are submodular i.e.,
  • Metric potentials (eg. Potts) are always
    submodular.
  • Thm the general case (eg. non-binary or
    non-submodular) is NP-hard.

where
34
Finding strong local minimum
  • For the non-binary case, we can optimum wrt some
    large space of moves by iteratively solving
    binary subproblems.
  • ?-expansion any pixel can change to ?
  • ?-? swap any ? can switch to ? and vice versa

Picture from Zabih
35
Finding strong local minimum
  • Start with arbitrary assignment f
  • Done false
  • While done
  • Done true
  • For each label ?
  • Find
  • If E(f) lt E(f) then done false f f

Binary subproblem!
36
Properties of the 2 algorithms
  • ?-expansion
  • Requires V to be submodular (eg metric)
  • O(L) per cycle
  • Factor of 2c(V) within optimal
  • c1 for Potts model
  • ?-? swap
  • Requires V to be semi-metric
  • O(L2) per cycle
  • No comparable theoretical guarantee, but works
    well in practice

37
Summary of inference methods for pairwise MRFs
  • Marginals
  • Mean field
  • Loopy/ generalized BP (sum-product)
  • EP
  • Gibbs sampling
  • Swendsen-Wang
  • MPE/ Viterbi
  • Iterative conditional modes (ICM)
  • Loopy/generalized BP (max-product)
  • Graph cuts
  • Simulated annealing

See Boykov01, Weiss01 and Tappen03 for some
empirical comparisons
38
Outline
  • Introduction
  • Exact inference
  • Approximate inference
  • Deterministic
  • Stochastic (sampling)
  • Hybrid deterministic/ stochastic

39
Monte Carlo (sampling) methods
  • Goal estimate
  • e.g.,
  • Draw N independent samples xr P
  • Hard to draw (independent) samples from P

Accuracy is independentof dimensionality of X
40
Importance Sampling
  • We sample from Q(x) and reweight

Require Q(x)gt0 for all where P(x)gt0
P
Q
41
Importance Sampling for BNs (likelihood weighting)
  • Input CPDs P(XiX?i), evidence xE
  • Output
  • For each sample r
  • wr 1
  • For each node i in topological order
  • If Xi is observed
  • Then xir xiE wr wr P(XixiEX?i x?ir)
  • Else xir P(Xix?ir)

42
Drawbacks of importance sampling
  • Sample given upstream evidence, weight by
    downstream evidence.
  • Evidence reversal modify model to make all
    observed nodes be parents can be expensive
  • Does not scale to high dimensional spaces, even
    if Q similar to P, since variance of weights too
    high.

43
Sequential importance sampling (particle
filtering)
Arulampalam02,Doucet01
  • Apply importance sampling to a (nonlinear,
    nonGaussian) dynamical system.
  • Resample particles wp wt
  • Unlikely hypotheses get replaced

44
Markov Chain Monte Carlo (MCMC)
Neal93,Mackay98
  • Draw dependent samples xt from a chain with
    transition kernel T(x x), s.t.
  • P(x) is the stationary distribution
  • The chain is ergodic (all states can get to the
    stationary states)
  • If T satisfies detailed balancethen P ?

45
Metropolis Hastings
  • Sample xt Q(xxt-1)
  • Accept new state with probability
  • Satisfies detailed balance

46
Gibbs sampling
  • Metropolis method where Q is defined in terms of
    conditionals P(XiX-i).
  • Acceptance rate 1.
  • For graphical model, only need to condition on
    the Markov blanket

See BUGS software
47
Difficulties with MCMC
  • May take long time to mix (converge to
    stationary distribution).
  • Hard to know when chain has mixed.
  • Simple proposals exhibit random walk behavior.
  • Hybrid Monte Carlo (use gradient information)
  • Swendsen-Wang (large moves for Ising model)
  • Heuristic proposals

48
Outline
  • Introduction
  • Exact inference
  • Approximate inference
  • Deterministic
  • Stochastic (sampling)
  • Hybrid deterministic/ stochastic

49
Comparison of deterministic and stochastic methods
  • Deterministic
  • fast but inaccurate
  • Stochastic
  • slow but accurate
  • Can handle arbitrary hypothesis space
  • Combine best of both worlds (hybrid)
  • Use smart deterministic proposals
  • Integrate out some of the states, sample the rest
    (Rao-Blackwellization)
  • Non-parametric BP (particle filtering for graphs)

50
Examples of deterministic proposals
  • State estimation
  • Unscented particle filter
  • Machine learning
  • Variational MCMC
  • Computer vision
  • Data driven MCMC

Merwe00
deFreitas01
Tu02
51
Example of Rao-Blackwellized particle filters
  • Conditioned on the discrete switching nodes, the
    remaining system is linear Gaussian and can be
    integrated out using the Kalman filter.
  • Each particle contains sample value str and mean/
    covariance for P(Xty1t, s1tr)

52
Outline
  • Introduction
  • Exact inference
  • Approximate inference
  • Deterministic
  • Stochastic (sampling)
  • Hybrid deterministic/ stochastic
  • Summary

53
Summary of inference methods
Chain (online) Low treewidth High treewidth
Discrete BP forwards Boyen-Koller (ADF), beam search VarElim, Jtree, recursive conditioning Loopy BP, mean field, structured variational, EP, graphcuts Gibbs
Gaussian BP Kalman filter Jtree sparse linear algebra Loopy BP Gibbs
Other EKF, UKF, moment matching (ADF) Particle filter EP, EM, VB, NBP, Gibbs EP, variational EM, VB, NBP, Gibbs
BPbelief propagation, EP expectation
propagation, ADF assumed density filtering, EKF
extended Kalman filter, UKF unscented Kalman
filter, VarElim variable elimination, Jtree
junction tree, EM expectation maximization, VB
variational Bayes, NBP non-parametric BP
About PowerShow.com