Title: Loopy Belief Propagation
1Loopy Belief Propagation
2What is inference?
- Given
- Observabled variables Y
- Hidden variables X
- Some model of P(X,Y)
- We want to make some analysis of P(XY)
- Estimate marginal P(S) for S µ X
- Minimal Mean Squared Error configuration (MMSE)
- This is just EXY
- Maximum A-Posteriori configuration (MAP)
- N most likely configurations
- Minimum Variance (MVUE)
3Representing Structure in P(X,Y)
- Often, P(X,Y) Õk fk(XCk), where XCk µ X Y
Markov Random Field
Bayes Net
Factor Graph
P(X) f1(x1,x2,x3) f2(x3,x4) f3(x3,x5) / Z
P(X) P(x3x1,x2) P(x4x3) P(x5x3)
P(X) f1(x1,x2,x3) f2(x3,x4) f3(x3,x5)
f4(x1) f5(x2) / Z
4Sum-Product Algorithm aka belief update
Quickly computes every single-variable marginal
P(xn) from a tree graph
Suppose the factor graph is a tree. For the tree
to the left, we have P(X)
f1(x1,x2)f2(x2,x3,x4)f3(x3,x5)f4(x4,x6) Then
marginalization (for example, computing P(x1))
can be sped up by exploiting the
factorization P(x1) å
f1(x1,x2)f2(x2,x3,x4)f3(x3,x5)f4(x4,x6) å
f1(x1,x2) (å f3(x3,x5)) (å f4(x4,x6))
x2,x3,x4,x5,x6
x2,x3,x4
x5
x6
5Message Passing for Sum-Product
We can compute every marginal P(xn) quickly using
a system of message passing Message from
variable node n to factor node m vn,m(xn)
Õ mi,n(xn) Message from factor node m to
variable node n mm,n(xn) å fs(xN(s))
Õ vi,m(xi) Marginal P(xn) P(xn) / Õ
mm,n(xn) Each node n can pass a message to
neighbor m only once it has received a message
from all other adjacent nodes. Intuitively, each
message from n to m represents P(xmSn), where Sn
is the set of all children of node n.
i 2 N(n) \ n
xN(n) \ n
i 2 N(n) \ n
m 2 N(n)
6Max-Product Algorithm aka belief revision
Quickly computes the Maximum A-Posteriori
configuration of a tree graph
Instead of summing P(X), we take the maximum to
get the maximal (instead of the
marginal) M(x1) max f1(x1,x2)f2(x2,x3,x4)f
3(x3,x5)f4(x4,x6) max f1(x1,x2) (max
f3(x3,x5)) (max f4(x4,x6)) Use the same message
passing system to compute the maximal of each
variable.
x2,x3,x4,x5,x6
x2,x3,x4
x5
x6
7Computational Costof Max-Product and Sum-Product
- Each message is of size M, where M is the number
of states in the random variable. - usually pretty small
- Each variable ! factor node message requires
(N-2)M multiplies, where N is the number of
neighbors off the variable node. - thats tiny
- Each factor ! variable node message requires
summation over N-1 variables, each of size M.
Total computation per message is O(N MN). - not bad, as long as there arent any hub-like
nodes.
8What if the graph is not a tree
- Several alternative methods
- Gibbs sampling
- Expectation Maximization
- Variational methods
- Elimination Algorithm
- Junction-Tree algorithm
- Loopy Belief Propagation
9Elimination AlgorithmInferring P(x1)
10Loopy Belief Propagation
- Just apply BP rules in spite of loops
- In each iteration, each node sends all messages
in parallel - Seems to work for some applications
Decoding TurboCodes
11Trouble with LBP
- May not converge
- A variety of tricks can help
- Cycling Error old information is mistaken as
new - Convergence Error unlike in a tree, neighbors
need not be independent. However, LBP treats them
as if they were.
Bolt Gaag On the convergence error in loopy
propagation (2004).
12Good news about MAP in LBP
- For a single loop, MAP values are correct
- Although the maximals are not
- If LPB converges, the resulting MAP configuration
has higher probability than any other
configuration in the Single Loops and Trees
Neighborhood
Example SLT neighborhoods on a grid
Weiss, Freeman, On the optimality of solutions
of the max-product belief propagation algorithm
in arbitrary graphs (2001)
13MMSE in LBP
- If P(X) is jointly Gaussian, LBP will converge to
the correct marginals. - For pairwise-connected markov random fields, if
LBP converges, its marginals will minimize Bethe
free energy.
Weiss, Freeman, Correctness of Belief
Propagation in Gaussian Graphical Models of
Arbitrary Topology (2001)
Yedidia, Freeman, Weiss, Bethe free energy,
Kikuchi approximations, and belief propagation
algorithms, (2001)
14Free Energy
Suppose we were able to compute the marginals of
a probability distribution b(X) that closely
approximated P(XY). We would want b(X) to
resemble P(XY) as much as possible. The total
energy F of b(X) is the Kullback-Leibler
divergence between b(X) and P(XY) However, F
is difficult to compute. Also, the b(X) we are
working with is often ill-defined.
15Kikuchi Free Energy
We can approximate total free energy using
Kikuchi Free energy.
- Select a set of clusters of nodes of a factor
graph - All nodes must be in at least one cluster
- For each factor node in a cluster, all adjacent
variables nodes must also be included. - For each cluster of variables Si, compute the
total energy. Sum them together. - Fb(Si) is the KL-divergence between b(S_i) and
the marginal P(S_iY) - Now we have double-counted the intersections
between sets S_i. Subtract the free-energy of the
intersections. Repeat.
Bethe free energy is Kukuchi free energy starting
with all clusters of size 2.
16More advanced algorithmsGreater accuracy, at a
price
- Generalized Belief Propagation algorithms have
been developed to minimize Kicuchi free energy
(Yedida, Freeman, Weiss, 2004) - The junction-tree algorithm is a special case
- Alan Yuille (2000) has devised a message passing
algorithm that minimizes Bethe free energy and is
guaranteed to converge. - Other groups are working on fast robust Bethe
minimization (Pretti Pelizzola 2003).