Learning DNF Formulas: the current state of affairs, and how you can win $1000 - PowerPoint PPT Presentation

About This Presentation

Title:

Learning DNF Formulas: the current state of affairs, and how you can win $1000

Description:

Learning DNF Formulas: the current state of affairs, and how you can win $1000 Ryan O Donnell including joint work with: Nader Bshouty Technion, – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 19

Provided by: Ryan1159

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning DNF Formulas: the current state of affairs, and how you can win $1000

1
Learning DNF Formulasthe current state of
affairs, and how you can win 1000

Ryan ODonnell
including joint work with
Nader Bshouty Technion, Elchanan Mossel UC
Berkeley, Rocco Servedio Columbia

2
Learning theory

Computational learning theory deals with an
algorithmic problem
Given random labeled examples from an unknown
function f 0,1n ? 0,1, try to find a
hypothesis h 0,1n ? 0,1 which is good at
predicting the label of future examples.

3
Valiants PAC model Val84

PAC Probably Approximately Correct
learning problem is identified with a concept
class C, which is a set of functions
(concepts) f 0,1n ? 0,1
nature/adversary chooses one particular f ?? C
and a probability distribution on inputs D
the learning algorithm now takes as inputs e and
d, and also gets random examples ?x, f(x)?, x
drawn from D
goal with probability 1-d, output a hypothesis h
which satisfies Prx?D h(x) ? f(x) lt e
efficiency running time of algorithm, counting
time 1 for each example hopefully poly(n, 1/e ,
1/d)

4
Example learning conjunctions

As an example, we present an algorithm (Val84)
for learning the concept class of conjunctions
i.e., C is the set of all AND functions.
start with the hypothesis h x1 ? x2 ? ?
xn
draw O((n / e) log(1/d)) examples
whenever you see a positive example e.g.,
?11010110, 1?, you know that the zero
coordinates (in this case, x3, x5, x8) cant be
in the target AND delete them from the
hypothesis
It takes a little reasoning to show this works,
but it does.

5
Learning DNF formulas

Probably the most important concept class we
would like to learn is DNF formulas e.g., the
set of all functions like
f (x1 ? x2 ? x6 ) ?? ( x1 ? x3) ? (x4 ? x5 ? x7
? x8).
(We actually mean poly-sized DNF the number of
terms should be nO(1), where n is the number of
variables.)
Why so important?
natural form of knowledge representation for
people
historical reasons considered by Valiant, who
called the problem tantalizing and apparently
simple
yet has proved a great challenge over the last 20
years

6
Talk overview

In this talk I will
1. Describe variants of the PAC model, and the
fastest known algorithms for learning DNF in each
of them
2. Point out road blocks for improving these
results, and present some open problems which are
simple, concrete, and lucrative
I will not
discuss every known model
consider the important problem of learning
restricted DNF e.g., monotone DNF, O(1)-term
DNF, etc.

7
The original PAC model

The trouble with this model is that, despite
Valiants initial optimism, PAC-learning DNF
formulas appears to be very hard.
The fastest known algorithm is due to Klivans and
Servedio KS01, and runs in time exp(n1/3
log2n).
Technique They show that for any DNF formula,
there is a polynomial in x1, , xn of degree at
most n1/3 log n which is positive whenever the
DNF is true and negative whenever the DNF is
false. Linear programming can be used to find a
hypothesis consistent with every example in time
exp(n1/3 log2n).
Note Consider the model, more difficult than
PAC, in which the learner is forced to output a
hypothesis which itself is a DNF. In this case,
the problem is NP-hard.

8
Distributional issues

One aspect of PAC learning that makes it very
difficult is that the adversary gets to pick a
different probability distribution for the
examples for every concept
Thus the adversary can pick a distribution which
puts all the probability weight on the most
difficult examples.
A very commonly studied easier model of learning
is called Uniform Distribution learning. Here,
the adversary must always use the uniform
distribution on 0,1n.
Under the uniform distribution, DNF can be
learned in quasipolynomial time

9
Uniform Distribution learning

In 1990, Verbeugt Ver90 observed that, under
the uniform distribution, any term in the target
DNF which is longer than log(n/e) is essentially
always false, and thus irrelevant. This fairly
easily leads to an algorithm for learning DNF
under uniform in quasipolynomial time roughly
nlog n.
In 1993, Linial, Mansour, and Nisan LMN93
introduced a powerful and sophisticated method of
learning under the uniform distribution based on
Fourier analysis. In particular, their algorithm
could learn depth d circuits in time roughly
nlogdn.
Fourier analysis proved very important for
subsequent learning of DNF under the uniform
distribution.

10
Membership queries

Still, DNF are not known to be learnable in
polynomial time even under the uniform
distribution.
Another common way to make learning easier is to
allow the learner membership queries. By this we
mean that in addition to getting random examples,
the learner is allowed to ask for the value of f
on any input it wants.
In some sense this begins to stray away from the
traditional model of learning in which the
learner is passive. However, its a natural,
commonly studied, important model.
Angluin and Kharitonov AK91 showed that
membership queries do not help in PAC-learning
DNF.

11
Uniform distribution with queries

However, membership queries are helpful in
learning under the uniform distribution.
Uniform Distribution learning with membership
queries is the easiest model weve seen thus far,
and in it there is finally some progress!
Mansour Man92, building on earlier work of
Kushilevitz and Mansour KM93, gave an algorithm
in this model learning DNF in time nlog log n.
Fourier based.
Finally, in 1994 Jackson Jac94 produced the
celebrated Harmonic Sieve, a novel algorithm
combining Fourier analysis and Freund and
Schapires boosting to learn DNF in polynomial
time.

12
Random Walk model

Recently weve been able to improve on this last
result. Bshouty-Mossel-O-Servedio-03 considers
a natural, passive model of learning of
difficulty intermediate between Uniform
Distribution learning and Uniform Distribution
learning with membership queries
In the Random Walk model, examples are not given
i.i.d. as usual, but are instead generated by a
standard random walk on the hypercube. The
learners hypothesis is evaluated under the
uniform distribution.
It can be shown that DNF are also learnable in
polynomial time in this model. The proof begins
with Jacksons algorithm and does some extra
Fourier analysis.

13
The current picture
Learning model Time source
PAC learning (distributional) 2O(n1/3log2n) KS01
Uniform Distribution nO(log n) Ver90
Random Walk poly(n) BMOS03
Uniform Distribution Membership queries poly(n) Jac94
EASIER
14
Poly time under uniform?

Perhaps the biggest open problem in DNF learning
(in all of learning theory?) is whether DNF can
be learned in polynomial time under the uniform
distribution.
One might ask, What is the current stumbling
block? Why cant we seem to do better than nlog
n time?
The answer is that were stuck on the
junta-learning problem.
Definition A k-junta is a function on n bits
which happens to depend on only k bits. (All
other n-k coordinates are irrelevant.)

15
Learning juntas

Since every boolean function on k bits has a DNF
of size 2k, it follows that the set of all
log(n)-juntas is a subset of the set of all
polynomial-size DNF formulas.
Thus to learn DNF under uniform in polynomial
time, we must be able to learn log(n)-juntas
under uniform in polynomial time.
The problem of learning k-juntas dates back to
Blum and Blum and Langley B94, BL94. There is
an extremely naive algorithm running in time nk
essentially, test all possible sets of k
variables to see if they are the junta. However,
even getting an n(1-O(1))k algorithm took some
time

16
Learning juntas

Mossel-O-Servedio-03 gave an algorithm for
learning k-juntas under the uniform distribution
which runs in time n.704k. The technique
involves trading off different polynomial
representations of boolean functions.
This is not much of an improvement for the
important case of k log n. However at least
it demonstrates that the nk barrier can be
broken.
It would be a gigantic breakthrough to learn
k-juntas in time no(k), or even to learn
?(1)-juntas in polynomial time.

17
An open problem

The junta-learning problem is a beautiful and
simple to state problem
An unknown and arbitrary function f 0,1n ?
0,1 is selected, which depends only on some k
of the bits.
The algorithm gets access to uniformly randomly
chosen examples, ?x, f(x)?.
With probability 1-d the algorithm should output
at least one bit (equivalently, all k bits) upon
which f depends.
The algorithms running time is considered to be
of the form na poly(n, 2k, 1/d), and a is the
important measure of complexity.
Can one get a ltlt k?

18
Cash money

Avrim Blum has put up CAH for anyone who can
make progress on this problem.
1000 Solve the problem for k log n or k
log log n, in polynomial time.
500 Solve the problem in polynomial time when
the function is known to be MAJORITY on log n
bits xored with PARITY on log n bits.
200 Find an algorithm running in time n.499k.

Write a Comment

User Comments (0)