Optimization With Parity Constraints: From Binary Codes to Discrete Integration - PowerPoint PPT Presentation

About This Presentation
Title:

Optimization With Parity Constraints: From Binary Codes to Discrete Integration

Description:

OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal+, and Bart Selman* – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 25
Provided by: Stef381
Category:

less

Transcript and Presenter's Notes

Title: Optimization With Parity Constraints: From Binary Codes to Discrete Integration


1
Optimization With Parity Constraints From Binary
Codes to Discrete Integration
  • Stefano Ermon, Carla P. Gomes,
  • Ashish Sabharwal, and Bart Selman
  • Cornell University
  • IBM Watson Research Center
  • UAI - 2013

2
High-dimensional integration
  • High-dimensional integrals in statistics, ML,
    physics
  • Expectations / model averaging
  • Marginalization
  • Partition function / rank models / parameter
    learning
  • Curse of dimensionality
  • Quadrature involves weighted sum over exponential
    number of items (e.g., units of volume)

n dimensional hypercube
L
L2
L3
Ln
L4
3
Discrete Integration
Size visually represents weight
2n Items
5
  • We are given
  • A set of 2n items
  • Non-negative weights w
  • Goal compute total weight
  • Compactly specified weight function
  • factored form (Bayes net, factor graph, CNF, )
  • Example 1 n2 variables, sum over 4 items
  • Example 2 n 100 variables, sum over 2100 1030
    items (intractable)

4
1

0
5
2
Goal compute 5 0 2 1 8
1
0
4
Hardness
Hard
EXP
PSPACE
PP
PH
  • 0/1 weights case
  • Is there at least a 1? ? SAT
  • How many 1 ? ? SAT
  • NP-complete vs. P-complete. Much harder
  • General weights
  • Find heaviest item (combinatorial optimization,
    MAP)
  • Sum weights (discrete integration)
  • ICML-13 WISH Approximate Discrete Integration
    via Optimization. E.g., partition function via
    MAP inference
  • MAP inference often fast in practice
  • Relaxations / bounds
  • Pruning

NP
P
Easy
0
3
4
7
5
WISH Integration by Hashing and Optimization
  • The algorithm requires only O(n log n) MAP
    queries to approximate the partition function
    within a constant factor

Outer loop over n variables
MAP inference on model augmented with random
parity constraints Repeat log(n) times
Aggregate MAP inference solutions
AUGMENTED MODEL
Original graphical model
s 0,1n
Parity check nodes enforcing A s b (mod 2)
s
n binary variables
6
Visual working of the algorithm
n times
  • How it works

1 random parity constraint
2 random parity constraints
3 random parity constraints
Function to be integrated
.
.
.
.
Log(n) times
median M1
median M2
median M3
Mode M0
7
Accuracy Guarantees
  • Theorem ICML-13 With probability at least 1- d
    (e.g., 99.9) WISH computes a 16-approximation of
    the partition function (discrete integral) by
    solving ?(n log n) MAP inference queries
    (optimization).
  • Theorem ICML-13 Can improve the approximation
    factor to (1e) by adding extra variables and
    factors.
  • Example factor 2 approximation with 4n variables
  • Remark faster than enumeration only when
    combinatorial optimization is efficient

8
Summary of contributions
  • Introduction and previous work
  • WISH Approximate Discrete Integration via
    Optimization.
  • Partition function / marginalization via MAP
    inference
  • Accuracy guarantees
  • MAP Inference subject to parity constraints
  • Tractable cases and approximations
  • Integer Linear Programming formulation
  • New family of polynomial time (probabilistic)
    upper and lower bounds on partition function that
    can be iteratively tightened (will reach within
    constant factor)
  • Sparsity of the parity constraints
  • Techniques to improve solution time and bounds
    quality
  • Experimental improvements over variational
    techniques

9
MAP inference with parity constraints
  • Hardness, approximations, and bounds

10
Making WISH more scalable
  • Would approximations to the optimization (MAP
    inference with parity constraints) be useful? YES
  • Bounds on MAP (optimization) translate to bounds
    on the partition function Z (discrete integral)
  • Lower bounds (local search) on MAP ? lower bounds
    on Z
  • Upper bounds (LP,SDP relaxation) on MAP ? upper
    bounds on Z
  • Constant-factor approximations on MAP ? constant
    factor on Z
  • Question Are there classes of problems where we
    can efficiently approximate the optimization (MAP
    inference) in the inner loop of WISH?

11
Error correcting codes
  • Communication over a noisy channel
  • Bob There has been a transmission error! What
    was the message actually sent by Alice?
  • Must be a valid codeword
  • As close as possible to received message y

Noisy channel
x
y
01001
01101
Redundant parity check bit 0 XOR 1 XOR 0 XOR 0
Parity check bit 1 ? 0 XOR 1 XOR 1 XOR 0 0
12
Decoding a binary code
Noisy channel
x
y
  • Max-likelihood decoding

01101
01001
ML-decoding graphical model
Noisy channel model
x
Transmitted string must be a codeword
More complex probabilistic model
MAP inference is NP hard to approximate within
any constant factor Stern, Arora,..
Max w(x) subject to A x b (mod 2) Equivalent to
MAP inference on augmented model
LDPC Routinely solved 10GBase-T Ethernet, Wi-Fi
802.11n, digital TV,..
13
Decoding via Integer Programming
  • MAP inference subject to parity constraints
    encoded as an Integer Linear Program (ILP)
  • Standard MAP encoding
  • Compact (polynomial) encoding by Yannakakis for
    parity constraints
  • LP relaxation relax integrality constraint
  • Polynomial time upper bounds
  • ILP solving strategy cuts branching LP
    relaxations
  • Solve a sequence of LP relaxations
  • Upper and lower bounds that improve over time

14
Iterative bound tightening
  • Polynomial time upper ad lower bounds on MAP that
    are iteratively tightened over time
  • Recall bounds on optimization (MAP) ?
    (probabilistic) bounds on the partition function
    Z. New family of bounds.
  • WISH When MAP is solved to optimality
    (LowerBound UpperBound), guaranteed constant
    factor approximation on Z

15
Sparsity of the parity constraints
  • Improving solution time and bounds quality

16
Inducing sparsity
  • Observations
  • Problems with sparse A x b (mod 2) are
    empirically easier to solve (similar to
    Low-Density Parity Check codes)
  • Quality of LP relaxation depends on A and b , not
    just on the solution space. Elementary row
    operations (e.g., sum 2 equations) do not change
    solution space but affect the LP relaxation.
  • Reduce A x b (mod 2) to row-echelon form with
    Gaussian elimination (linear equations over
    finite field)
  • Greedy application of elementary row operations

Matrix A in row-echelon form
17
Improvements from sparsity
  • Quality of LP relaxations significantly improves
  • Finds integer solutions faster (better lower
    bounds)

Without sparsification, fails at finding integer
solutions (LB)
Upper bound improvement
18
Generating sparse constraints
We optimize over solutions of A x b mod
2 (parity constraints)
  • WISH based on Universal Hashing
  • Randomly generate A in 0,1in, b in 0,1i
  • Then A x b (mod 2) is
  • Uniform over 0,1i
  • Pairwise independent
  • Suppose we generate a sparse matrix A
  • At most k variables per parity constraint (up to
    k ones per row of A)
  • A xb (mod 2) is still uniform, not pairwise
    independent anymore
  • E.g. for k1, A x b mod 2 is equivalent to
    fixing i variables. Lots of correlation. (Knowing
    A x b tells me a lot about A y b)

n
A
x
b
i

(mod 2)
Given variable assignments x and y , the
events A x b (mod 2) and A y b (mod 2) are
independent.
19
Using sparse parity constraints
  • Theorem With probability at least 1- d (e.g.,
    99.9) WISH with sparse parity constraints
    computes an approximate lower bound of the
    partition function.
  • PRO Easier MAP inference queries
  • For example, random parity constraints of length
    1 ( on a single variable). Equivalent to MAP
    with some variables fixed.
  • CON We lose the upper bound part. Output can
    underestimate the partition function.
  • CON No constant factor approximation anymore

20
MAP with sparse parity constraints
  • MAP inference with sparse constraints evaluation
  • ILP and BranchBound outperform message-passing
    (BP, MP and MPLP)

10x10 attractive Ising Grid
10x10 mixed Ising Grid
21
Experimental results
  • ILP provides probabilistic upper and lower bounds
    that improve over time and are often tighter than
    variational methods (BP, MF, TRW)

22
Experimental results (2)
  • ILP provides probabilistic upper and lower bounds
    that improve over time and are often tighter than
    variational methods (BP, MF, TRW)

23
Conclusions
  • ICML-13 WISH Discrete integration reduced to
    small number of optimization instances (MAP)
  • Strong (probabilistic) accuracy guarantees
  • MAP inference is still NP-hard
  • Scalability Approximations and Bounds
  • Connection with max-likelihood decoding
  • ILP formulation sparsity (Gauss sparsification
    uniform hashing)
  • New family of probabilistic polynomial time
    computable upper and lower bounds on partition
    function. Can be iteratively tightened (will
    reach within a constant factor)
  • Future work
  • Extension to continuous integrals and variables
  • Sampling from high-dimensional probability
    distributions

24
Extra slides
Write a Comment
User Comments (0)
About PowerShow.com