Title: Optimization With Parity Constraints: From Binary Codes to Discrete Integration
1Optimization With Parity Constraints From Binary
Codes to Discrete Integration
- Stefano Ermon, Carla P. Gomes,
- Ashish Sabharwal, and Bart Selman
- Cornell University
- IBM Watson Research Center
- UAI - 2013
2High-dimensional integration
- High-dimensional integrals in statistics, ML,
physics - Expectations / model averaging
- Marginalization
- Partition function / rank models / parameter
learning - Curse of dimensionality
- Quadrature involves weighted sum over exponential
number of items (e.g., units of volume)
n dimensional hypercube
L
L2
L3
Ln
L4
3Discrete Integration
Size visually represents weight
2n Items
5
- We are given
- A set of 2n items
- Non-negative weights w
- Goal compute total weight
- Compactly specified weight function
- factored form (Bayes net, factor graph, CNF, )
- Example 1 n2 variables, sum over 4 items
- Example 2 n 100 variables, sum over 2100 1030
items (intractable)
4
1
0
5
2
Goal compute 5 0 2 1 8
1
0
4Hardness
Hard
EXP
PSPACE
PP
PH
- 0/1 weights case
- Is there at least a 1? ? SAT
- How many 1 ? ? SAT
- NP-complete vs. P-complete. Much harder
- General weights
- Find heaviest item (combinatorial optimization,
MAP) - Sum weights (discrete integration)
- ICML-13 WISH Approximate Discrete Integration
via Optimization. E.g., partition function via
MAP inference - MAP inference often fast in practice
- Relaxations / bounds
- Pruning
NP
P
Easy
0
3
4
7
5WISH Integration by Hashing and Optimization
- The algorithm requires only O(n log n) MAP
queries to approximate the partition function
within a constant factor
Outer loop over n variables
MAP inference on model augmented with random
parity constraints Repeat log(n) times
Aggregate MAP inference solutions
AUGMENTED MODEL
Original graphical model
s 0,1n
Parity check nodes enforcing A s b (mod 2)
s
n binary variables
6Visual working of the algorithm
n times
1 random parity constraint
2 random parity constraints
3 random parity constraints
Function to be integrated
.
.
.
.
Log(n) times
median M1
median M2
median M3
Mode M0
7Accuracy Guarantees
- Theorem ICML-13 With probability at least 1- d
(e.g., 99.9) WISH computes a 16-approximation of
the partition function (discrete integral) by
solving ?(n log n) MAP inference queries
(optimization). - Theorem ICML-13 Can improve the approximation
factor to (1e) by adding extra variables and
factors. - Example factor 2 approximation with 4n variables
- Remark faster than enumeration only when
combinatorial optimization is efficient
8Summary of contributions
- Introduction and previous work
- WISH Approximate Discrete Integration via
Optimization. - Partition function / marginalization via MAP
inference - Accuracy guarantees
- MAP Inference subject to parity constraints
- Tractable cases and approximations
- Integer Linear Programming formulation
- New family of polynomial time (probabilistic)
upper and lower bounds on partition function that
can be iteratively tightened (will reach within
constant factor) - Sparsity of the parity constraints
- Techniques to improve solution time and bounds
quality - Experimental improvements over variational
techniques
9MAP inference with parity constraints
- Hardness, approximations, and bounds
10Making WISH more scalable
- Would approximations to the optimization (MAP
inference with parity constraints) be useful? YES - Bounds on MAP (optimization) translate to bounds
on the partition function Z (discrete integral) - Lower bounds (local search) on MAP ? lower bounds
on Z - Upper bounds (LP,SDP relaxation) on MAP ? upper
bounds on Z - Constant-factor approximations on MAP ? constant
factor on Z - Question Are there classes of problems where we
can efficiently approximate the optimization (MAP
inference) in the inner loop of WISH?
11Error correcting codes
- Communication over a noisy channel
- Bob There has been a transmission error! What
was the message actually sent by Alice? - Must be a valid codeword
- As close as possible to received message y
Noisy channel
x
y
01001
01101
Redundant parity check bit 0 XOR 1 XOR 0 XOR 0
Parity check bit 1 ? 0 XOR 1 XOR 1 XOR 0 0
12Decoding a binary code
Noisy channel
x
y
01101
01001
ML-decoding graphical model
Noisy channel model
x
Transmitted string must be a codeword
More complex probabilistic model
MAP inference is NP hard to approximate within
any constant factor Stern, Arora,..
Max w(x) subject to A x b (mod 2) Equivalent to
MAP inference on augmented model
LDPC Routinely solved 10GBase-T Ethernet, Wi-Fi
802.11n, digital TV,..
13Decoding via Integer Programming
- MAP inference subject to parity constraints
encoded as an Integer Linear Program (ILP) - Standard MAP encoding
- Compact (polynomial) encoding by Yannakakis for
parity constraints - LP relaxation relax integrality constraint
- Polynomial time upper bounds
- ILP solving strategy cuts branching LP
relaxations - Solve a sequence of LP relaxations
- Upper and lower bounds that improve over time
14Iterative bound tightening
- Polynomial time upper ad lower bounds on MAP that
are iteratively tightened over time - Recall bounds on optimization (MAP) ?
(probabilistic) bounds on the partition function
Z. New family of bounds. - WISH When MAP is solved to optimality
(LowerBound UpperBound), guaranteed constant
factor approximation on Z
15Sparsity of the parity constraints
- Improving solution time and bounds quality
16Inducing sparsity
- Observations
- Problems with sparse A x b (mod 2) are
empirically easier to solve (similar to
Low-Density Parity Check codes) - Quality of LP relaxation depends on A and b , not
just on the solution space. Elementary row
operations (e.g., sum 2 equations) do not change
solution space but affect the LP relaxation. - Reduce A x b (mod 2) to row-echelon form with
Gaussian elimination (linear equations over
finite field) - Greedy application of elementary row operations
Matrix A in row-echelon form
17Improvements from sparsity
- Quality of LP relaxations significantly improves
- Finds integer solutions faster (better lower
bounds)
Without sparsification, fails at finding integer
solutions (LB)
Upper bound improvement
18Generating sparse constraints
We optimize over solutions of A x b mod
2 (parity constraints)
- WISH based on Universal Hashing
- Randomly generate A in 0,1in, b in 0,1i
- Then A x b (mod 2) is
- Uniform over 0,1i
- Pairwise independent
- Suppose we generate a sparse matrix A
- At most k variables per parity constraint (up to
k ones per row of A) - A xb (mod 2) is still uniform, not pairwise
independent anymore - E.g. for k1, A x b mod 2 is equivalent to
fixing i variables. Lots of correlation. (Knowing
A x b tells me a lot about A y b)
n
A
x
b
i
(mod 2)
Given variable assignments x and y , the
events A x b (mod 2) and A y b (mod 2) are
independent.
19Using sparse parity constraints
- Theorem With probability at least 1- d (e.g.,
99.9) WISH with sparse parity constraints
computes an approximate lower bound of the
partition function. - PRO Easier MAP inference queries
- For example, random parity constraints of length
1 ( on a single variable). Equivalent to MAP
with some variables fixed. - CON We lose the upper bound part. Output can
underestimate the partition function. - CON No constant factor approximation anymore
20MAP with sparse parity constraints
- MAP inference with sparse constraints evaluation
- ILP and BranchBound outperform message-passing
(BP, MP and MPLP)
10x10 attractive Ising Grid
10x10 mixed Ising Grid
21Experimental results
- ILP provides probabilistic upper and lower bounds
that improve over time and are often tighter than
variational methods (BP, MF, TRW)
22Experimental results (2)
- ILP provides probabilistic upper and lower bounds
that improve over time and are often tighter than
variational methods (BP, MF, TRW)
23Conclusions
- ICML-13 WISH Discrete integration reduced to
small number of optimization instances (MAP) - Strong (probabilistic) accuracy guarantees
- MAP inference is still NP-hard
- Scalability Approximations and Bounds
- Connection with max-likelihood decoding
- ILP formulation sparsity (Gauss sparsification
uniform hashing) - New family of probabilistic polynomial time
computable upper and lower bounds on partition
function. Can be iteratively tightened (will
reach within a constant factor) - Future work
- Extension to continuous integrals and variables
- Sampling from high-dimensional probability
distributions
24Extra slides