# Optimization With Parity Constraints: From Binary Codes to Discrete Integration - PowerPoint PPT Presentation

PPT – Optimization With Parity Constraints: From Binary Codes to Discrete Integration PowerPoint presentation | free to download - id: 64dee7-Yzc3M

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Optimization With Parity Constraints: From Binary Codes to Discrete Integration

Description:

### OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal+, and Bart Selman* – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 25
Provided by: Stef381
Category:
Tags:
Transcript and Presenter's Notes

Title: Optimization With Parity Constraints: From Binary Codes to Discrete Integration

1
Optimization With Parity Constraints From Binary
Codes to Discrete Integration
• Stefano Ermon, Carla P. Gomes,
• Ashish Sabharwal, and Bart Selman
• Cornell University
• IBM Watson Research Center
• UAI - 2013

2
High-dimensional integration
• High-dimensional integrals in statistics, ML,
physics
• Expectations / model averaging
• Marginalization
• Partition function / rank models / parameter
learning
• Curse of dimensionality
• Quadrature involves weighted sum over exponential
number of items (e.g., units of volume)

n dimensional hypercube
L
L2
L3
Ln
L4
3
Discrete Integration
Size visually represents weight
2n Items
5
• We are given
• A set of 2n items
• Non-negative weights w
• Goal compute total weight
• Compactly specified weight function
• factored form (Bayes net, factor graph, CNF, )
• Example 1 n2 variables, sum over 4 items
• Example 2 n 100 variables, sum over 2100 1030
items (intractable)

4
1

0
5
2
Goal compute 5 0 2 1 8
1
0
4
Hardness
Hard
EXP
PSPACE
PP
PH
• 0/1 weights case
• Is there at least a 1? ? SAT
• How many 1 ? ? SAT
• NP-complete vs. P-complete. Much harder
• General weights
• Find heaviest item (combinatorial optimization,
MAP)
• Sum weights (discrete integration)
• ICML-13 WISH Approximate Discrete Integration
via Optimization. E.g., partition function via
MAP inference
• MAP inference often fast in practice
• Relaxations / bounds
• Pruning

NP
P
Easy
0
3
4
7
5
WISH Integration by Hashing and Optimization
• The algorithm requires only O(n log n) MAP
queries to approximate the partition function
within a constant factor

Outer loop over n variables
MAP inference on model augmented with random
parity constraints Repeat log(n) times
Aggregate MAP inference solutions
AUGMENTED MODEL
Original graphical model
s 0,1n
Parity check nodes enforcing A s b (mod 2)
s
n binary variables
6
Visual working of the algorithm
n times
• How it works

1 random parity constraint
2 random parity constraints
3 random parity constraints
Function to be integrated
.
.
.
.
Log(n) times
median M1
median M2
median M3
Mode M0
7
Accuracy Guarantees
• Theorem ICML-13 With probability at least 1- d
(e.g., 99.9) WISH computes a 16-approximation of
the partition function (discrete integral) by
solving ?(n log n) MAP inference queries
(optimization).
• Theorem ICML-13 Can improve the approximation
factor to (1e) by adding extra variables and
factors.
• Example factor 2 approximation with 4n variables
• Remark faster than enumeration only when
combinatorial optimization is efficient

8
Summary of contributions
• Introduction and previous work
• WISH Approximate Discrete Integration via
Optimization.
• Partition function / marginalization via MAP
inference
• Accuracy guarantees
• MAP Inference subject to parity constraints
• Tractable cases and approximations
• Integer Linear Programming formulation
• New family of polynomial time (probabilistic)
upper and lower bounds on partition function that
can be iteratively tightened (will reach within
constant factor)
• Sparsity of the parity constraints
• Techniques to improve solution time and bounds
quality
• Experimental improvements over variational
techniques

9
MAP inference with parity constraints
• Hardness, approximations, and bounds

10
Making WISH more scalable
• Would approximations to the optimization (MAP
inference with parity constraints) be useful? YES
• Bounds on MAP (optimization) translate to bounds
on the partition function Z (discrete integral)
• Lower bounds (local search) on MAP ? lower bounds
on Z
• Upper bounds (LP,SDP relaxation) on MAP ? upper
bounds on Z
• Constant-factor approximations on MAP ? constant
factor on Z
• Question Are there classes of problems where we
can efficiently approximate the optimization (MAP
inference) in the inner loop of WISH?

11
Error correcting codes
• Communication over a noisy channel
• Bob There has been a transmission error! What
was the message actually sent by Alice?
• Must be a valid codeword
• As close as possible to received message y

Noisy channel
x
y
01001
01101
Redundant parity check bit 0 XOR 1 XOR 0 XOR 0
Parity check bit 1 ? 0 XOR 1 XOR 1 XOR 0 0
12
Decoding a binary code
Noisy channel
x
y
• Max-likelihood decoding

01101
01001
ML-decoding graphical model
Noisy channel model
x
Transmitted string must be a codeword
More complex probabilistic model
MAP inference is NP hard to approximate within
any constant factor Stern, Arora,..
Max w(x) subject to A x b (mod 2) Equivalent to
MAP inference on augmented model
LDPC Routinely solved 10GBase-T Ethernet, Wi-Fi
802.11n, digital TV,..
13
Decoding via Integer Programming
• MAP inference subject to parity constraints
encoded as an Integer Linear Program (ILP)
• Standard MAP encoding
• Compact (polynomial) encoding by Yannakakis for
parity constraints
• LP relaxation relax integrality constraint
• Polynomial time upper bounds
• ILP solving strategy cuts branching LP
relaxations
• Solve a sequence of LP relaxations
• Upper and lower bounds that improve over time

14
Iterative bound tightening
• Polynomial time upper ad lower bounds on MAP that
are iteratively tightened over time
• Recall bounds on optimization (MAP) ?
(probabilistic) bounds on the partition function
Z. New family of bounds.
• WISH When MAP is solved to optimality
(LowerBound UpperBound), guaranteed constant
factor approximation on Z

15
Sparsity of the parity constraints
• Improving solution time and bounds quality

16
Inducing sparsity
• Observations
• Problems with sparse A x b (mod 2) are
empirically easier to solve (similar to
Low-Density Parity Check codes)
• Quality of LP relaxation depends on A and b , not
just on the solution space. Elementary row
operations (e.g., sum 2 equations) do not change
solution space but affect the LP relaxation.
• Reduce A x b (mod 2) to row-echelon form with
Gaussian elimination (linear equations over
finite field)
• Greedy application of elementary row operations

Matrix A in row-echelon form
17
Improvements from sparsity
• Quality of LP relaxations significantly improves
• Finds integer solutions faster (better lower
bounds)

Without sparsification, fails at finding integer
solutions (LB)
Upper bound improvement
18
Generating sparse constraints
We optimize over solutions of A x b mod
2 (parity constraints)
• WISH based on Universal Hashing
• Randomly generate A in 0,1in, b in 0,1i
• Then A x b (mod 2) is
• Uniform over 0,1i
• Pairwise independent
• Suppose we generate a sparse matrix A
• At most k variables per parity constraint (up to
k ones per row of A)
• A xb (mod 2) is still uniform, not pairwise
independent anymore
• E.g. for k1, A x b mod 2 is equivalent to
fixing i variables. Lots of correlation. (Knowing
A x b tells me a lot about A y b)

n
A
x
b
i

(mod 2)
Given variable assignments x and y , the
events A x b (mod 2) and A y b (mod 2) are
independent.
19
Using sparse parity constraints
• Theorem With probability at least 1- d (e.g.,
99.9) WISH with sparse parity constraints
computes an approximate lower bound of the
partition function.
• PRO Easier MAP inference queries
• For example, random parity constraints of length
1 ( on a single variable). Equivalent to MAP
with some variables fixed.
• CON We lose the upper bound part. Output can
underestimate the partition function.
• CON No constant factor approximation anymore

20
MAP with sparse parity constraints
• MAP inference with sparse constraints evaluation
• ILP and BranchBound outperform message-passing
(BP, MP and MPLP)

10x10 attractive Ising Grid
10x10 mixed Ising Grid
21
Experimental results
• ILP provides probabilistic upper and lower bounds
that improve over time and are often tighter than
variational methods (BP, MF, TRW)

22
Experimental results (2)
• ILP provides probabilistic upper and lower bounds
that improve over time and are often tighter than
variational methods (BP, MF, TRW)

23
Conclusions
• ICML-13 WISH Discrete integration reduced to
small number of optimization instances (MAP)
• Strong (probabilistic) accuracy guarantees
• MAP inference is still NP-hard
• Scalability Approximations and Bounds
• Connection with max-likelihood decoding
• ILP formulation sparsity (Gauss sparsification
uniform hashing)
• New family of probabilistic polynomial time
computable upper and lower bounds on partition
function. Can be iteratively tightened (will
reach within a constant factor)
• Future work
• Extension to continuous integrals and variables
• Sampling from high-dimensional probability
distributions

24
Extra slides