Optimization With Parity Constraints: From Binary Codes to Discrete Integration - PowerPoint PPT Presentation

About This Presentation

Title:

Optimization With Parity Constraints: From Binary Codes to Discrete Integration

Description:

OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon, Carla P. Gomes, Ashish Sabharwal+, and Bart Selman* – PowerPoint PPT presentation

Number of Views:164

Avg rating:3.0/5.0

Slides: 25

Provided by: Stef381

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Optimization With Parity Constraints: From Binary Codes to Discrete Integration

1
Optimization With Parity Constraints From Binary
Codes to Discrete Integration

Stefano Ermon, Carla P. Gomes,
Ashish Sabharwal, and Bart Selman
Cornell University
IBM Watson Research Center
UAI - 2013

2
High-dimensional integration

High-dimensional integrals in statistics, ML,
physics
Expectations / model averaging
Marginalization
Partition function / rank models / parameter
learning
Curse of dimensionality
Quadrature involves weighted sum over exponential
number of items (e.g., units of volume)

n dimensional hypercube
L
L2
L3
Ln
L4
3
Discrete Integration
Size visually represents weight
2n Items
5

We are given
A set of 2n items
Non-negative weights w
Goal compute total weight
Compactly specified weight function
factored form (Bayes net, factor graph, CNF, )
Example 1 n2 variables, sum over 4 items
Example 2 n 100 variables, sum over 2100 1030
items (intractable)

4
1

0
5
2
Goal compute 5 0 2 1 8
1
0
4
Hardness
Hard
EXP
PSPACE
PP
PH

0/1 weights case
Is there at least a 1? ? SAT
How many 1 ? ? SAT
NP-complete vs. P-complete. Much harder
General weights
Find heaviest item (combinatorial optimization,
MAP)
Sum weights (discrete integration)
ICML-13 WISH Approximate Discrete Integration
via Optimization. E.g., partition function via
MAP inference
MAP inference often fast in practice
Relaxations / bounds
Pruning

NP
P
Easy
0
3
4
7
5
WISH Integration by Hashing and Optimization

The algorithm requires only O(n log n) MAP
queries to approximate the partition function
within a constant factor

Outer loop over n variables
MAP inference on model augmented with random
parity constraints Repeat log(n) times
Aggregate MAP inference solutions
AUGMENTED MODEL
Original graphical model
s 0,1n
Parity check nodes enforcing A s b (mod 2)
s
n binary variables
6
Visual working of the algorithm
n times

How it works

1 random parity constraint
2 random parity constraints
3 random parity constraints
Function to be integrated
.
.
.
.
Log(n) times
median M1
median M2
median M3
Mode M0
7
Accuracy Guarantees

Theorem ICML-13 With probability at least 1- d
(e.g., 99.9) WISH computes a 16-approximation of
the partition function (discrete integral) by
solving ?(n log n) MAP inference queries
(optimization).
Theorem ICML-13 Can improve the approximation
factor to (1e) by adding extra variables and
factors.
Example factor 2 approximation with 4n variables
Remark faster than enumeration only when
combinatorial optimization is efficient

8
Summary of contributions

Introduction and previous work
WISH Approximate Discrete Integration via
Optimization.
Partition function / marginalization via MAP
inference
Accuracy guarantees
MAP Inference subject to parity constraints
Tractable cases and approximations
Integer Linear Programming formulation
New family of polynomial time (probabilistic)
upper and lower bounds on partition function that
can be iteratively tightened (will reach within
constant factor)
Sparsity of the parity constraints
Techniques to improve solution time and bounds
quality
Experimental improvements over variational
techniques

9
MAP inference with parity constraints

Hardness, approximations, and bounds

10
Making WISH more scalable

Would approximations to the optimization (MAP
inference with parity constraints) be useful? YES
Bounds on MAP (optimization) translate to bounds
on the partition function Z (discrete integral)
Lower bounds (local search) on MAP ? lower bounds
on Z
Upper bounds (LP,SDP relaxation) on MAP ? upper
bounds on Z
Constant-factor approximations on MAP ? constant
factor on Z
Question Are there classes of problems where we
can efficiently approximate the optimization (MAP
inference) in the inner loop of WISH?

11
Error correcting codes

Communication over a noisy channel
Bob There has been a transmission error! What
was the message actually sent by Alice?
Must be a valid codeword
As close as possible to received message y

Noisy channel
x
y
01001
01101
Redundant parity check bit 0 XOR 1 XOR 0 XOR 0
Parity check bit 1 ? 0 XOR 1 XOR 1 XOR 0 0
12
Decoding a binary code
Noisy channel
x
y

Max-likelihood decoding

01101
01001
ML-decoding graphical model
Noisy channel model
x
Transmitted string must be a codeword
More complex probabilistic model
MAP inference is NP hard to approximate within
any constant factor Stern, Arora,..
Max w(x) subject to A x b (mod 2) Equivalent to
MAP inference on augmented model
LDPC Routinely solved 10GBase-T Ethernet, Wi-Fi
802.11n, digital TV,..
13
Decoding via Integer Programming

MAP inference subject to parity constraints
encoded as an Integer Linear Program (ILP)
Standard MAP encoding
Compact (polynomial) encoding by Yannakakis for
parity constraints
LP relaxation relax integrality constraint
Polynomial time upper bounds
ILP solving strategy cuts branching LP
relaxations
Solve a sequence of LP relaxations
Upper and lower bounds that improve over time

14
Iterative bound tightening

Polynomial time upper ad lower bounds on MAP that
are iteratively tightened over time
Recall bounds on optimization (MAP) ?
(probabilistic) bounds on the partition function
Z. New family of bounds.
WISH When MAP is solved to optimality
(LowerBound UpperBound), guaranteed constant
factor approximation on Z

15
Sparsity of the parity constraints

Improving solution time and bounds quality

16
Inducing sparsity

Observations
Problems with sparse A x b (mod 2) are
empirically easier to solve (similar to
Low-Density Parity Check codes)
Quality of LP relaxation depends on A and b , not
just on the solution space. Elementary row
operations (e.g., sum 2 equations) do not change
solution space but affect the LP relaxation.
Reduce A x b (mod 2) to row-echelon form with
Gaussian elimination (linear equations over
finite field)
Greedy application of elementary row operations

Matrix A in row-echelon form
17
Improvements from sparsity

Quality of LP relaxations significantly improves
Finds integer solutions faster (better lower
bounds)

Without sparsification, fails at finding integer
solutions (LB)
Upper bound improvement
18
Generating sparse constraints
We optimize over solutions of A x b mod
2 (parity constraints)

WISH based on Universal Hashing
Randomly generate A in 0,1in, b in 0,1i
Then A x b (mod 2) is
Uniform over 0,1i
Pairwise independent
Suppose we generate a sparse matrix A
At most k variables per parity constraint (up to
k ones per row of A)
A xb (mod 2) is still uniform, not pairwise
independent anymore
E.g. for k1, A x b mod 2 is equivalent to
fixing i variables. Lots of correlation. (Knowing
A x b tells me a lot about A y b)

n
A
x
b
i

(mod 2)
Given variable assignments x and y , the
events A x b (mod 2) and A y b (mod 2) are
independent.
19
Using sparse parity constraints

Theorem With probability at least 1- d (e.g.,
99.9) WISH with sparse parity constraints
computes an approximate lower bound of the
partition function.
PRO Easier MAP inference queries
For example, random parity constraints of length
1 ( on a single variable). Equivalent to MAP
with some variables fixed.
CON We lose the upper bound part. Output can
underestimate the partition function.
CON No constant factor approximation anymore

20
MAP with sparse parity constraints

MAP inference with sparse constraints evaluation
ILP and BranchBound outperform message-passing
(BP, MP and MPLP)

10x10 attractive Ising Grid
10x10 mixed Ising Grid
21
Experimental results

ILP provides probabilistic upper and lower bounds
that improve over time and are often tighter than
variational methods (BP, MF, TRW)

22
Experimental results (2)

ILP provides probabilistic upper and lower bounds
that improve over time and are often tighter than
variational methods (BP, MF, TRW)

23
Conclusions

ICML-13 WISH Discrete integration reduced to
small number of optimization instances (MAP)
Strong (probabilistic) accuracy guarantees
MAP inference is still NP-hard
Scalability Approximations and Bounds
Connection with max-likelihood decoding
ILP formulation sparsity (Gauss sparsification
uniform hashing)
New family of probabilistic polynomial time
computable upper and lower bounds on partition
function. Can be iteratively tightened (will
reach within a constant factor)
Future work
Extension to continuous integrals and variables
Sampling from high-dimensional probability
distributions