Title: Some Surprises in the Theory of Generalized Belief Propagation
1Some Surprises in the Theory of Generalized
Belief Propagation
- Jonathan Yedidia
- Mitsubishi Electric Research Labs (MERL)
- Collaborators
- Bill Freeman (MIT)
- Yair Weiss (Hebrew University)
See Constructing Free Energy Approximations and
Generalized Belief Propagation Algorithms, MERL
TR2004-040, to be published in IEEE Trans. Info.
Theory.
2Outline
- Introduction to GBP
- Quick Review of Standard BP
- Free Energies
- Region Graphs and Valid Approximations
- Some Surprises
- Maxent Normal Approximations
- a useful heuristic
3Factor Graphs
(Kschischang, et.al. 2001)
4Computing Marginal Probabilities
Fundamental for
- Decoding error-correcting codes
- Inference in Bayesian networks
- Computer vision
- Statistical physics of magnets
Non-trivial because of the huge number of terms
in the sum.
5Error-correcting Codes
(Tanner, 1981 Gallager, 1963)
Marginal Probabilities A posteriori bit
probabilities
6Statistical Physics
Marginal Probabilities local magnetization
7Standard Belief Propagation
beliefs messages
The belief is the BP approximation of the
marginal probability.
8BP Message-update Rules
Using
we get
a
i
a
i
9Variational (Gibbs) Free Energy
Kullback-Leibler Distance
Boltzmanns Law (serves to define the energy)
Variational Free Energy is minimized when
10So weve replaced the intractable problem of
computing marginal probabilities with the even
more intractable problem of minimizing a
variational free energy over the space of
possible beliefs. But the point is that now we
can introduce interesting approximations.
11Region-based Approximations to the Variational
Free Energy
(Kikuchi, 1951)
Exact Regions
(intractable)
12Defining a Region
- A region r is a set of variable nodes Vr and
factor nodes Fr such that if a factor node a
belongs to Fr , all variable nodes neighboring a
must belong to Vr.
Regions
Not a Region
13Region Definitions
Region states
Region beliefs
Region energy
Region average energy
Region entropy
Region free energy
14Valid Approximations
Introduce a set of regions R, and a counting
number cr for each region r in R, such that cr1
for the largest regions, and for every factor
node a and variable node i,
Indicator functions
Count every node once!
15Entropy and Energy
- Counting each factor
node once makes the approximate energy
exact (if the beliefs are). - Counting each variable
node once makes the approximate entropy
reasonable (at least the entropy is correct when
all states are equiprobable).
16Comments
- We could actually use different counting numbers
for energy and entropyleads to fractional BP
(Weigerinck Heskes 2002) or convexified free
energy (Wainwright et.al., 2002) algorithms. - Of course, we also need to impose normalization
and consistency constraints on our free energy.
17Methods to Generate Valid Region-based
Approximations
Region Graphs
Junction graphs
Cluster Variational Method (Kikuchi)
Aji-McEliece
Junction trees
Bethe
(Bethe is example of Kikuchi for Factor graphs
with no 4-cycles Bethe is example of
Aji-McEliece for normal factor graphs.)
18Example of a Region Graph
A,C,1,2,4,5
B,D,2,3,5,6
C,E,4,5,7,8
D,F,5,6,8,9
2,5
5,8
D,5,6
C,4,5
5 is a child of 2,5
5
19Definition of a Region Graph
- Labeled, directed graph of regions.
- Arc may exist from region A to region B if B is a
subset of A. - Sub-graphs formed from regions containing a
given node are connected. - where is the set of
ancestors of region r. (Mobius Function) - We insist that
20Bethe Method
(after Bethe, 1935)
Two sets of regions Large regions containing
a single factor node a and all attached variable
nodes. Small regions containing a single
variable node i.
3
6
9
A1,2,4,5
D5,6
F5,6,8,9
B2,3,5,6
C4,5
E4,5,7,8
7
2
6
8
1
3
4
9
5
21Bethe Approximation to Gibbs Free Energy
Equal to the exact Gibbs free energy when the
factor graph is a tree because in that case,
22Minimizing the Bethe Free Energy
23Bethe BP
Identify
to obtain BP equations
24Cluster Variation Method
(Kikuchi, 1951)
Form a region graph with an arbitrary number of
different sized regions. Start with largest
regions. Then find intersection regions of
the largest regions, discarding any regions that
are sub-regions of other intersection regions.
Continue finding intersections of those
intersection regions, etc. All intersection
regions obey , where S(r )
is the set of super-regions of region r.
25Region Graph Created Using CVM
A,C,1,2,4,5
B,D,2,3,5,6
C,E,4,5,7,8
D,F,5,6,8,9
2,5
5,8
D,5,6
C,4,5
5
26Minimizing a Region Graph Free Energy
- Minimization is possible, but it may be awkward
because of all the constraints that must be
satisfied. - We introduce generalized belief propagation
algorithms whose fixed points are provably
identical to the stationary points of the region
graph free energy.
27Generalized Belief Propagation
- Belief in a region is the product of
- Local information (factors in region)
- Messages from parent regions
- Messages into descendant regions from parents who
are not descendants. - Message-update rules obtained by enforcing
marginalization constraints.
28Generalized Belief Propagation
2
1
3
4
5
6
7
8
9
29Generalized Belief Propagation
30Generalized Belief Propagation
31Generalized Belief Propagation
2
1
3
4
5
6
7
8
9
32Generalized Belief Propagation
Use Marginalization Constraints to Derive
Message-Update Rules
2
1
3
4
5
6
7
8
9
33Generalized Belief Propagation
Use Marginalization Constraints to Derive
Message-Update Rules
2
1
3
2
1
3
4
4
5
6
5
6
7
8
7
9
8
9
34Generalized Belief Propagation
Use Marginalization Constraints to Derive
Message-Update Rules
2
1
3
2
1
3
4
4
5
6
5
6
7
8
7
9
8
9
35Generalized Belief Propagation
Use Marginalization Constraints to Derive
Message-Update Rules
2
1
3
2
1
3
4
4
5
6
5
6
7
8
7
9
8
9
36(Mild) Surprise 1
- Region beliefs (even those given by Bethe/BP) may
not be realizable as the marginals of any global
belief.
1
a
b
is a perfectly acceptable solution to BP, but
cannot arise from any
2
3
c
37(Minor) Surprise 2
- For some sets of beliefs (sometimes even when the
marginal beliefs are the exactly correct ones),
the Bethe entropy is negative!
Each large region has counting number 1, each
small region has counting number -2, so if the
beliefs are
then the entropy is negative
38(Serious) Surprise 3
- When there are no interactions, the minimum of
the CVM free energy should correspond to the
equiprobable global distribution, but sometimes
it doesnt!
39Example
Fully connected pairwise model, using CVM and all
triplets as the largest regions, with pairs and
singlets as the intersection regions.
40Two simple distributions
- Equiprobable distribution each state is equally
probable for all beliefs. - e.g.
- Distribution obtained by marginalizing the global
distribution that only allows all-zeros or
all-ones. Equipolarized distribution - e.g.
Each region gives
Each region gives
41Surprise 3 (cont.)
- Surprisingly, the equipolarized distribution
can have a greater entropy than the equiprobable
distribution for some valid CVM approximations. - This is a serious problem, because if the model
gets the wrong answer without any interactions,
it cant be expected to be correct with
interactions.
42The Fix Consider only Maxent-Normal
Approximations
- A maxent-normal approximation is one which gives
a maximum of the entropy when the beliefs
correspond to the equiprobable distribution. - Bethe approximations are provably maxent-normal.
- Some other CVM and other region-graph
approximations are also provably maxent-normal. - Using quartets on square lattices
- Empirically, these approximations give good
results - Other valid CVM or region-graph approximations
are not maxent-normal. - Empirically, these approximations give poor
results
43A Very Simple Heuristic
- Sum all the counting numbers.
- If the sum is greater than N, the equipolarized
distribution will have a greater entropy than the
equiprobable distribution. (Too MuchVery Bad) - If the sum is less than 0, the equipolarized
distribution will have a negative entropy. (Too
Little) - If the sum equals 1, the equipolarized
distribution will have the correct entropy. (Just
Right)
4410x10 Ising Spin Glass
Random fields
Random interactions
45(No Transcript)
46Conclusions
- Standard BP essentially equivalent to minimizing
the Bethe free energy. - Bethe method and cluster variation method are
special cases of the more general region graph
method for generating valid free energy
approximations. - GBP is essentially equivalent to minimizing
region graph free energy. - One should be careful to use maxent-normal
approximations. - Sometimes you can prove that your approximation
is maxent-normal and simple heuristics can be
used to prove when it isnt.