Towards Efficient Sampling: Exploiting Random Walk Strategy - PowerPoint PPT Presentation

About This Presentation

Title:

Towards Efficient Sampling: Exploiting Random Walk Strategy

Description:

Recent years have seen tremendous improvements in SAT solving. ... Harder formulas - handcraft formulas compare with analytic results. 28 ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 29

Provided by: wei8151

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Towards Efficient Sampling: Exploiting Random Walk Strategy

1
Towards Efficient Sampling Exploiting Random
Walk Strategy

Wei Wei, Jordan Erenrich, and Bart Selman

2
Motivations

Recent years have seen tremendous improvements in
SAT solving. Formulas with up to 300 variables
(1992) to formulas with one million variables.
Various techniques for answering
does a satisfying assignment exist for a
formula?
But there are harder questions to be answered .
how many satisfying assignments does a formula
have? Or closely related can we sample from the
satisfying assignments of a formula?

3
Complexity

SAT is NP-complete. 2-SAT is solvable in linear
time.
Counting assignments (even for 2cnf) is
P-complete, and is NP-hard to approximate
(Valiant, 1979).
Approximate counting and sampling are equivalent
if the problem is downward self-reducible.

4
Challenge

Can we extend SAT techniques to solve harder
counting/sampling problems?
Such an extension would lead us to a wide range
of new applications.

SAT testing
5
Standard Methods for Sampling - MCMC

Based on setting up a Markov chain with a
predefined stationary distribution.
Draw samples from the stationary distribution by
running the Markov chain for sufficiently long.
Problem for interesting problems, Markov chain
takes exponential time to converge to its
stationary distribution

6
Simulated Annealing

Simulated Annealing uses Boltzmann distribution
as the stationary distribution.
At low temperature, the distribution concentrates
around minimum energy states.
In terms of satisfiability problem, each
satisfying assignment (with 0 cost) gets the same
probability.
Again, reaching such a stationary distribution
takes exponential time for interesting problems.
shown in a later slide.

7
Standard Methods for Counting

Current solution counting procedures extend DPLL
methods with component analysis.
Two counting precedures are available. relsat
(Bayardo and Pehoushek, 2000) and cachet (Sang,
Beame, and Kautz, 2004). They both count exact
number of solutions.

Question Can state-of-the-art local search
procedures be used for SAT sampling/counting? (as
alternatives to standard Monte Carlo Markov Chain
and DPLL methods)

Yes! Shown in this talk
9
Our approach biased random walk

Biased random walk greedy bias pure random
walk. Example WalkSat (Selman et al, 1994),
effective on SAT.
Can we use it to sample from solution space?

Does WalkSat reach all solutions?

How uniform is the sampling?

10
WalkSat
Hamming distance
11
Probability Ranges in Different Domains
12
Improving the Uniformity of Sampling
WalkSat
SA

SampleSat
With probability p, the algorithm makes a biased
random walk move
With probability 1-p, the algorithm makes a SA
(simulated annealing) move

13
Comparison Between WalkSat and SampleSat
WalkSat
SampleSat
14
SampleSat
Hamming Distance
15
(No Transcript)
16
Analysis
17
Property of F

Proposition 1 SA with fixed temperature takes
exponential time to find a solution of F
This shows even for some simple formulas in 2cnf,
SA cannot reach a solution in poly-time

18
Analysis, cont.
19
SampleSat

In SampleSat algorithm, we can devide the search
into 2 stages. Before SampleSat reaches its first
solution, it behaves like WalkSat.

20
SampleSat, cont.

After reaching the solution, random walk
component is turned off because all clauses are
satisfied. SampleSat behaves like SA.
Proposition 3 SA at zero temperature samples all
solutions within a cluster uniformly.
This 2-stage model explains why SampleSat samples
more uniformly than random walk algorithms alone.

21
Verification on Larger formulas - ApproxCount

Small formulas -gt Figures, solution frequencies.
How to verify on large formulas? ApproxCount.
ApproxCount approximates the number of solutions
of Boolean formulas, based on SampleSat
algorithm.
Besides using it to justify the accuracy of our
sampling approach, ApproxCount is interesting on
its own right.

22
Algorithm

The algorithm works as follows (Jerrum and
Valiant, 1986)
Pick a variable X in current formula
Draw K samples from the solution space
Set variable X to its most sampled value t, and
the multiplier for X is K/(Xt). Note
1 ? multiplier ? 2
Repeat step 1-3 until all variables are set
The number of solutions of the original formula
is the product of all multipliers.

23
Accumulation of Errors
24
Within the Capacity of Exact Counters

We compare the results of approxcount with those
of the exact counters.

25
And beyond

We developed a family of formulas whose solutions
are hard to count
The formulas are based on SAT encodings of the
following combinatorial problem
If one has n different items, and you want to
choose from the n items a list (order matters) of
m items (mltn). Let P(n,m) represent the number
of different lists you can construct. P(n,m)
n!/(n-m)!

26
Hard Instances

Encoding of P(20,10) has only 200 variables, but
neither cachet or Relsat was able to count it in
5 days in our experiments.
On the other hard, ApproxCount is able to finish
in 2 hours, and estimates the solutions of even
larger instances.

27
Summary

Small formulas -gt complete analysis of the
search space
Larger formulas -gt compare ApproxCount results
with results of exact counting procedures
Harder formulas -gt handcraft formulas compare
with analytic results

28
Conclusion and Future Work

Shows good opportunity to extend SAT solvers to
develop algorithms for sampling and counting
tasks.
Next step Use our methods in probabilistic
reasoning and Bayesian inference domains.

Write a Comment

User Comments (0)