Learning DNF Formulas: the current state of affairs, and how you can win $1000 - PowerPoint PPT Presentation

About This Presentation
Title:

Learning DNF Formulas: the current state of affairs, and how you can win $1000

Description:

Learning DNF Formulas: the current state of affairs, and how you can win $1000 Ryan O Donnell including joint work with: Nader Bshouty Technion, – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 19
Provided by: Ryan1159
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning DNF Formulas: the current state of affairs, and how you can win $1000


1
Learning DNF Formulasthe current state of
affairs, and how you can win 1000
  • Ryan ODonnell
  • including joint work with
  • Nader Bshouty Technion, Elchanan Mossel UC
    Berkeley, Rocco Servedio Columbia

2
Learning theory
  • Computational learning theory deals with an
    algorithmic problem
  • Given random labeled examples from an unknown
    function f 0,1n ? 0,1, try to find a
    hypothesis h 0,1n ? 0,1 which is good at
    predicting the label of future examples.

3
Valiants PAC model Val84
  • PAC Probably Approximately Correct
  • learning problem is identified with a concept
    class C, which is a set of functions
    (concepts) f 0,1n ? 0,1
  • nature/adversary chooses one particular f ?? C
    and a probability distribution on inputs D
  • the learning algorithm now takes as inputs e and
    d, and also gets random examples ?x, f(x)?, x
    drawn from D
  • goal with probability 1-d, output a hypothesis h
    which satisfies Prx?D h(x) ? f(x) lt e
  • efficiency running time of algorithm, counting
    time 1 for each example hopefully poly(n, 1/e ,
    1/d)

4
Example learning conjunctions
  • As an example, we present an algorithm (Val84)
    for learning the concept class of conjunctions
    i.e., C is the set of all AND functions.
  • start with the hypothesis h x1 ? x2 ? ?
    xn
  • draw O((n / e) log(1/d)) examples
  • whenever you see a positive example e.g.,
    ?11010110, 1?, you know that the zero
    coordinates (in this case, x3, x5, x8) cant be
    in the target AND delete them from the
    hypothesis
  • It takes a little reasoning to show this works,
    but it does.

5
Learning DNF formulas
  • Probably the most important concept class we
    would like to learn is DNF formulas e.g., the
    set of all functions like
  • f (x1 ? x2 ? x6 ) ?? ( x1 ? x3) ? (x4 ? x5 ? x7
    ? x8).
  • (We actually mean poly-sized DNF the number of
    terms should be nO(1), where n is the number of
    variables.)
  • Why so important?
  • natural form of knowledge representation for
    people
  • historical reasons considered by Valiant, who
    called the problem tantalizing and apparently
    simple
  • yet has proved a great challenge over the last 20
    years

6
Talk overview
  • In this talk I will
  • 1. Describe variants of the PAC model, and the
    fastest known algorithms for learning DNF in each
    of them
  • 2. Point out road blocks for improving these
    results, and present some open problems which are
    simple, concrete, and lucrative
  • I will not
  • discuss every known model
  • consider the important problem of learning
    restricted DNF e.g., monotone DNF, O(1)-term
    DNF, etc.

7
The original PAC model
  • The trouble with this model is that, despite
    Valiants initial optimism, PAC-learning DNF
    formulas appears to be very hard.
  • The fastest known algorithm is due to Klivans and
    Servedio KS01, and runs in time exp(n1/3
    log2n).
  • Technique They show that for any DNF formula,
    there is a polynomial in x1, , xn of degree at
    most n1/3 log n which is positive whenever the
    DNF is true and negative whenever the DNF is
    false. Linear programming can be used to find a
    hypothesis consistent with every example in time
    exp(n1/3 log2n).
  • Note Consider the model, more difficult than
    PAC, in which the learner is forced to output a
    hypothesis which itself is a DNF. In this case,
    the problem is NP-hard.

8
Distributional issues
  • One aspect of PAC learning that makes it very
    difficult is that the adversary gets to pick a
    different probability distribution for the
    examples for every concept
  • Thus the adversary can pick a distribution which
    puts all the probability weight on the most
    difficult examples.
  • A very commonly studied easier model of learning
    is called Uniform Distribution learning. Here,
    the adversary must always use the uniform
    distribution on 0,1n.
  • Under the uniform distribution, DNF can be
    learned in quasipolynomial time

9
Uniform Distribution learning
  • In 1990, Verbeugt Ver90 observed that, under
    the uniform distribution, any term in the target
    DNF which is longer than log(n/e) is essentially
    always false, and thus irrelevant. This fairly
    easily leads to an algorithm for learning DNF
    under uniform in quasipolynomial time roughly
    nlog n.
  • In 1993, Linial, Mansour, and Nisan LMN93
    introduced a powerful and sophisticated method of
    learning under the uniform distribution based on
    Fourier analysis. In particular, their algorithm
    could learn depth d circuits in time roughly
    nlogdn.
  • Fourier analysis proved very important for
    subsequent learning of DNF under the uniform
    distribution.

10
Membership queries
  • Still, DNF are not known to be learnable in
    polynomial time even under the uniform
    distribution.
  • Another common way to make learning easier is to
    allow the learner membership queries. By this we
    mean that in addition to getting random examples,
    the learner is allowed to ask for the value of f
    on any input it wants.
  • In some sense this begins to stray away from the
    traditional model of learning in which the
    learner is passive. However, its a natural,
    commonly studied, important model.
  • Angluin and Kharitonov AK91 showed that
    membership queries do not help in PAC-learning
    DNF.

11
Uniform distribution with queries
  • However, membership queries are helpful in
    learning under the uniform distribution.
  • Uniform Distribution learning with membership
    queries is the easiest model weve seen thus far,
    and in it there is finally some progress!
  • Mansour Man92, building on earlier work of
    Kushilevitz and Mansour KM93, gave an algorithm
    in this model learning DNF in time nlog log n.
    Fourier based.
  • Finally, in 1994 Jackson Jac94 produced the
    celebrated Harmonic Sieve, a novel algorithm
    combining Fourier analysis and Freund and
    Schapires boosting to learn DNF in polynomial
    time.

12
Random Walk model
  • Recently weve been able to improve on this last
    result. Bshouty-Mossel-O-Servedio-03 considers
    a natural, passive model of learning of
    difficulty intermediate between Uniform
    Distribution learning and Uniform Distribution
    learning with membership queries
  • In the Random Walk model, examples are not given
    i.i.d. as usual, but are instead generated by a
    standard random walk on the hypercube. The
    learners hypothesis is evaluated under the
    uniform distribution.
  • It can be shown that DNF are also learnable in
    polynomial time in this model. The proof begins
    with Jacksons algorithm and does some extra
    Fourier analysis.

13
The current picture
Learning model Time source
PAC learning (distributional) 2O(n1/3log2n) KS01
Uniform Distribution nO(log n) Ver90
Random Walk poly(n) BMOS03
Uniform Distribution Membership queries poly(n) Jac94
EASIER
14
Poly time under uniform?
  • Perhaps the biggest open problem in DNF learning
    (in all of learning theory?) is whether DNF can
    be learned in polynomial time under the uniform
    distribution.
  • One might ask, What is the current stumbling
    block? Why cant we seem to do better than nlog
    n time?
  • The answer is that were stuck on the
    junta-learning problem.
  • Definition A k-junta is a function on n bits
    which happens to depend on only k bits. (All
    other n-k coordinates are irrelevant.)

15
Learning juntas
  • Since every boolean function on k bits has a DNF
    of size 2k, it follows that the set of all
    log(n)-juntas is a subset of the set of all
    polynomial-size DNF formulas.
  • Thus to learn DNF under uniform in polynomial
    time, we must be able to learn log(n)-juntas
    under uniform in polynomial time.
  • The problem of learning k-juntas dates back to
    Blum and Blum and Langley B94, BL94. There is
    an extremely naive algorithm running in time nk
    essentially, test all possible sets of k
    variables to see if they are the junta. However,
    even getting an n(1-O(1))k algorithm took some
    time

16
Learning juntas
  • Mossel-O-Servedio-03 gave an algorithm for
    learning k-juntas under the uniform distribution
    which runs in time n.704k. The technique
    involves trading off different polynomial
    representations of boolean functions.
  • This is not much of an improvement for the
    important case of k log n. However at least
    it demonstrates that the nk barrier can be
    broken.
  • It would be a gigantic breakthrough to learn
    k-juntas in time no(k), or even to learn
    ?(1)-juntas in polynomial time.

17
An open problem
  • The junta-learning problem is a beautiful and
    simple to state problem
  • An unknown and arbitrary function f 0,1n ?
    0,1 is selected, which depends only on some k
    of the bits.
  • The algorithm gets access to uniformly randomly
    chosen examples, ?x, f(x)?.
  • With probability 1-d the algorithm should output
    at least one bit (equivalently, all k bits) upon
    which f depends.
  • The algorithms running time is considered to be
    of the form na poly(n, 2k, 1/d), and a is the
    important measure of complexity.
  • Can one get a ltlt k?

18
Cash money
  • Avrim Blum has put up CAH for anyone who can
    make progress on this problem.
  • 1000 Solve the problem for k log n or k
    log log n, in polynomial time.
  • 500 Solve the problem in polynomial time when
    the function is known to be MAJORITY on log n
    bits xored with PARITY on log n bits.
  • 200 Find an algorithm running in time n.499k.
Write a Comment
User Comments (0)
About PowerShow.com