Loading...

PPT – Optimization With Parity Constraints: From Binary Codes to Discrete Integration PowerPoint presentation | free to download - id: 64dee7-Yzc3M

The Adobe Flash plugin is needed to view this content

Optimization With Parity Constraints From Binary

Codes to Discrete Integration

- Stefano Ermon, Carla P. Gomes,
- Ashish Sabharwal, and Bart Selman
- Cornell University
- IBM Watson Research Center
- UAI - 2013

High-dimensional integration

- High-dimensional integrals in statistics, ML,

physics - Expectations / model averaging
- Marginalization
- Partition function / rank models / parameter

learning - Curse of dimensionality
- Quadrature involves weighted sum over exponential

number of items (e.g., units of volume)

n dimensional hypercube

L

L2

L3

Ln

L4

Discrete Integration

Size visually represents weight

2n Items

5

- We are given
- A set of 2n items
- Non-negative weights w
- Goal compute total weight
- Compactly specified weight function
- factored form (Bayes net, factor graph, CNF, )
- Example 1 n2 variables, sum over 4 items
- Example 2 n 100 variables, sum over 2100 1030

items (intractable)

4

1

0

5

2

Goal compute 5 0 2 1 8

1

0

Hardness

Hard

EXP

PSPACE

PP

PH

- 0/1 weights case
- Is there at least a 1? ? SAT
- How many 1 ? ? SAT
- NP-complete vs. P-complete. Much harder
- General weights
- Find heaviest item (combinatorial optimization,

MAP) - Sum weights (discrete integration)
- ICML-13 WISH Approximate Discrete Integration

via Optimization. E.g., partition function via

MAP inference - MAP inference often fast in practice
- Relaxations / bounds
- Pruning

NP

P

Easy

0

3

4

7

WISH Integration by Hashing and Optimization

- The algorithm requires only O(n log n) MAP

queries to approximate the partition function

within a constant factor

Outer loop over n variables

MAP inference on model augmented with random

parity constraints Repeat log(n) times

Aggregate MAP inference solutions

AUGMENTED MODEL

Original graphical model

s 0,1n

Parity check nodes enforcing A s b (mod 2)

s

n binary variables

Visual working of the algorithm

n times

- How it works

1 random parity constraint

2 random parity constraints

3 random parity constraints

Function to be integrated

.

.

.

.

Log(n) times

median M1

median M2

median M3

Mode M0

Accuracy Guarantees

- Theorem ICML-13 With probability at least 1- d

(e.g., 99.9) WISH computes a 16-approximation of

the partition function (discrete integral) by

solving ?(n log n) MAP inference queries

(optimization). - Theorem ICML-13 Can improve the approximation

factor to (1e) by adding extra variables and

factors. - Example factor 2 approximation with 4n variables
- Remark faster than enumeration only when

combinatorial optimization is efficient

Summary of contributions

- Introduction and previous work
- WISH Approximate Discrete Integration via

Optimization. - Partition function / marginalization via MAP

inference - Accuracy guarantees
- MAP Inference subject to parity constraints
- Tractable cases and approximations
- Integer Linear Programming formulation
- New family of polynomial time (probabilistic)

upper and lower bounds on partition function that

can be iteratively tightened (will reach within

constant factor) - Sparsity of the parity constraints
- Techniques to improve solution time and bounds

quality - Experimental improvements over variational

techniques

MAP inference with parity constraints

- Hardness, approximations, and bounds

Making WISH more scalable

- Would approximations to the optimization (MAP

inference with parity constraints) be useful? YES - Bounds on MAP (optimization) translate to bounds

on the partition function Z (discrete integral) - Lower bounds (local search) on MAP ? lower bounds

on Z - Upper bounds (LP,SDP relaxation) on MAP ? upper

bounds on Z - Constant-factor approximations on MAP ? constant

factor on Z - Question Are there classes of problems where we

can efficiently approximate the optimization (MAP

inference) in the inner loop of WISH?

Error correcting codes

- Communication over a noisy channel
- Bob There has been a transmission error! What

was the message actually sent by Alice? - Must be a valid codeword
- As close as possible to received message y

Noisy channel

x

y

01001

01101

Redundant parity check bit 0 XOR 1 XOR 0 XOR 0

Parity check bit 1 ? 0 XOR 1 XOR 1 XOR 0 0

Decoding a binary code

Noisy channel

x

y

- Max-likelihood decoding

01101

01001

ML-decoding graphical model

Noisy channel model

x

Transmitted string must be a codeword

More complex probabilistic model

MAP inference is NP hard to approximate within

any constant factor Stern, Arora,..

Max w(x) subject to A x b (mod 2) Equivalent to

MAP inference on augmented model

LDPC Routinely solved 10GBase-T Ethernet, Wi-Fi

802.11n, digital TV,..

Decoding via Integer Programming

- MAP inference subject to parity constraints

encoded as an Integer Linear Program (ILP) - Standard MAP encoding
- Compact (polynomial) encoding by Yannakakis for

parity constraints - LP relaxation relax integrality constraint
- Polynomial time upper bounds
- ILP solving strategy cuts branching LP

relaxations - Solve a sequence of LP relaxations
- Upper and lower bounds that improve over time

Iterative bound tightening

- Polynomial time upper ad lower bounds on MAP that

are iteratively tightened over time - Recall bounds on optimization (MAP) ?

(probabilistic) bounds on the partition function

Z. New family of bounds. - WISH When MAP is solved to optimality

(LowerBound UpperBound), guaranteed constant

factor approximation on Z

Sparsity of the parity constraints

- Improving solution time and bounds quality

Inducing sparsity

- Observations
- Problems with sparse A x b (mod 2) are

empirically easier to solve (similar to

Low-Density Parity Check codes) - Quality of LP relaxation depends on A and b , not

just on the solution space. Elementary row

operations (e.g., sum 2 equations) do not change

solution space but affect the LP relaxation. - Reduce A x b (mod 2) to row-echelon form with

Gaussian elimination (linear equations over

finite field) - Greedy application of elementary row operations

Matrix A in row-echelon form

Improvements from sparsity

- Quality of LP relaxations significantly improves
- Finds integer solutions faster (better lower

bounds)

Without sparsification, fails at finding integer

solutions (LB)

Upper bound improvement

Generating sparse constraints

We optimize over solutions of A x b mod

2 (parity constraints)

- WISH based on Universal Hashing
- Randomly generate A in 0,1in, b in 0,1i
- Then A x b (mod 2) is
- Uniform over 0,1i
- Pairwise independent
- Suppose we generate a sparse matrix A
- At most k variables per parity constraint (up to

k ones per row of A) - A xb (mod 2) is still uniform, not pairwise

independent anymore - E.g. for k1, A x b mod 2 is equivalent to

fixing i variables. Lots of correlation. (Knowing

A x b tells me a lot about A y b)

n

A

x

b

i

(mod 2)

Given variable assignments x and y , the

events A x b (mod 2) and A y b (mod 2) are

independent.

Using sparse parity constraints

- Theorem With probability at least 1- d (e.g.,

99.9) WISH with sparse parity constraints

computes an approximate lower bound of the

partition function. - PRO Easier MAP inference queries
- For example, random parity constraints of length

1 ( on a single variable). Equivalent to MAP

with some variables fixed. - CON We lose the upper bound part. Output can

underestimate the partition function. - CON No constant factor approximation anymore

MAP with sparse parity constraints

- MAP inference with sparse constraints evaluation
- ILP and BranchBound outperform message-passing

(BP, MP and MPLP)

10x10 attractive Ising Grid

10x10 mixed Ising Grid

Experimental results

- ILP provides probabilistic upper and lower bounds

that improve over time and are often tighter than

variational methods (BP, MF, TRW)

Experimental results (2)

- ILP provides probabilistic upper and lower bounds

that improve over time and are often tighter than

variational methods (BP, MF, TRW)

Conclusions

- ICML-13 WISH Discrete integration reduced to

small number of optimization instances (MAP) - Strong (probabilistic) accuracy guarantees
- MAP inference is still NP-hard
- Scalability Approximations and Bounds
- Connection with max-likelihood decoding
- ILP formulation sparsity (Gauss sparsification

uniform hashing) - New family of probabilistic polynomial time

computable upper and lower bounds on partition

function. Can be iteratively tightened (will

reach within a constant factor) - Future work
- Extension to continuous integrals and variables
- Sampling from high-dimensional probability

distributions

Extra slides