Loopy Belief Propagation - PowerPoint PPT Presentation

About This Presentation
Title:

Loopy Belief Propagation

Description:

aka belief revision ... Weiss, Freeman, 'Correctness of Belief Propagation in Gaussian Graphical Models ... Generalized Belief Propagation algorithms have been ... – PowerPoint PPT presentation

Number of Views:282
Avg rating:3.0/5.0
Slides: 17
Provided by: bpot
Learn more at: https://www.cnbc.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Loopy Belief Propagation


1
Loopy Belief Propagation
  • a summary

2
What is inference?
  • Given
  • Observabled variables Y
  • Hidden variables X
  • Some model of P(X,Y)
  • We want to make some analysis of P(XY)
  • Estimate marginal P(S) for S µ X
  • Minimal Mean Squared Error configuration (MMSE)
  • This is just EXY
  • Maximum A-Posteriori configuration (MAP)
  • N most likely configurations
  • Minimum Variance (MVUE)

3
Representing Structure in P(X,Y)
  • Often, P(X,Y) Õk fk(XCk), where XCk µ X Y

Markov Random Field
Bayes Net
Factor Graph
P(X) f1(x1,x2,x3) f2(x3,x4) f3(x3,x5) / Z
P(X) P(x3x1,x2) P(x4x3) P(x5x3)
P(X) f1(x1,x2,x3) f2(x3,x4) f3(x3,x5)
f4(x1) f5(x2) / Z
4
Sum-Product Algorithm aka belief update
Quickly computes every single-variable marginal
P(xn) from a tree graph
Suppose the factor graph is a tree. For the tree
to the left, we have P(X)
f1(x1,x2)f2(x2,x3,x4)f3(x3,x5)f4(x4,x6) Then
marginalization (for example, computing P(x1))
can be sped up by exploiting the
factorization P(x1) å
f1(x1,x2)f2(x2,x3,x4)f3(x3,x5)f4(x4,x6) å
f1(x1,x2) (å f3(x3,x5)) (å f4(x4,x6))
x2,x3,x4,x5,x6
x2,x3,x4
x5
x6
5
Message Passing for Sum-Product
We can compute every marginal P(xn) quickly using
a system of message passing Message from
variable node n to factor node m vn,m(xn)
Õ mi,n(xn) Message from factor node m to
variable node n mm,n(xn) å fs(xN(s))
Õ vi,m(xi) Marginal P(xn) P(xn) / Õ
mm,n(xn) Each node n can pass a message to
neighbor m only once it has received a message
from all other adjacent nodes. Intuitively, each
message from n to m represents P(xmSn), where Sn
is the set of all children of node n.
i 2 N(n) \ n
xN(n) \ n
i 2 N(n) \ n
m 2 N(n)
6
Max-Product Algorithm aka belief revision
Quickly computes the Maximum A-Posteriori
configuration of a tree graph
Instead of summing P(X), we take the maximum to
get the maximal (instead of the
marginal) M(x1) max f1(x1,x2)f2(x2,x3,x4)f
3(x3,x5)f4(x4,x6) max f1(x1,x2) (max
f3(x3,x5)) (max f4(x4,x6)) Use the same message
passing system to compute the maximal of each
variable.
x2,x3,x4,x5,x6
x2,x3,x4
x5
x6
7
Computational Costof Max-Product and Sum-Product
  • Each message is of size M, where M is the number
    of states in the random variable.
  • usually pretty small
  • Each variable ! factor node message requires
    (N-2)M multiplies, where N is the number of
    neighbors off the variable node.
  • thats tiny
  • Each factor ! variable node message requires
    summation over N-1 variables, each of size M.
    Total computation per message is O(N MN).
  • not bad, as long as there arent any hub-like
    nodes.

8
What if the graph is not a tree
  • Several alternative methods
  • Gibbs sampling
  • Expectation Maximization
  • Variational methods
  • Elimination Algorithm
  • Junction-Tree algorithm
  • Loopy Belief Propagation

9
Elimination AlgorithmInferring P(x1)
10
Loopy Belief Propagation
  • Just apply BP rules in spite of loops
  • In each iteration, each node sends all messages
    in parallel
  • Seems to work for some applications

Decoding TurboCodes
11
Trouble with LBP
  • May not converge
  • A variety of tricks can help
  • Cycling Error old information is mistaken as
    new
  • Convergence Error unlike in a tree, neighbors
    need not be independent. However, LBP treats them
    as if they were.

Bolt Gaag On the convergence error in loopy
propagation (2004).
12
Good news about MAP in LBP
  • For a single loop, MAP values are correct
  • Although the maximals are not
  • If LPB converges, the resulting MAP configuration
    has higher probability than any other
    configuration in the Single Loops and Trees
    Neighborhood

Example SLT neighborhoods on a grid
Weiss, Freeman, On the optimality of solutions
of the max-product belief propagation algorithm
in arbitrary graphs (2001)
13
MMSE in LBP
  • If P(X) is jointly Gaussian, LBP will converge to
    the correct marginals.
  • For pairwise-connected markov random fields, if
    LBP converges, its marginals will minimize Bethe
    free energy.

Weiss, Freeman, Correctness of Belief
Propagation in Gaussian Graphical Models of
Arbitrary Topology (2001)
Yedidia, Freeman, Weiss, Bethe free energy,
Kikuchi approximations, and belief propagation
algorithms, (2001)
14
Free Energy
Suppose we were able to compute the marginals of
a probability distribution b(X) that closely
approximated P(XY). We would want b(X) to
resemble P(XY) as much as possible. The total
energy F of b(X) is the Kullback-Leibler
divergence between b(X) and P(XY) However, F
is difficult to compute. Also, the b(X) we are
working with is often ill-defined.
15
Kikuchi Free Energy
We can approximate total free energy using
Kikuchi Free energy.
  • Select a set of clusters of nodes of a factor
    graph
  • All nodes must be in at least one cluster
  • For each factor node in a cluster, all adjacent
    variables nodes must also be included.
  • For each cluster of variables Si, compute the
    total energy. Sum them together.
  • Fb(Si) is the KL-divergence between b(S_i) and
    the marginal P(S_iY)
  • Now we have double-counted the intersections
    between sets S_i. Subtract the free-energy of the
    intersections. Repeat.

Bethe free energy is Kukuchi free energy starting
with all clusters of size 2.
16
More advanced algorithmsGreater accuracy, at a
price
  • Generalized Belief Propagation algorithms have
    been developed to minimize Kicuchi free energy
    (Yedida, Freeman, Weiss, 2004)
  • The junction-tree algorithm is a special case
  • Alan Yuille (2000) has devised a message passing
    algorithm that minimizes Bethe free energy and is
    guaranteed to converge.
  • Other groups are working on fast robust Bethe
    minimization (Pretti Pelizzola 2003).
Write a Comment
User Comments (0)
About PowerShow.com