A Quarter-Century of Efficient Learnability - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

A Quarter-Century of Efficient Learnability

Description:

View a k-DNF as a disjunction over 'metavariables', learn the disjunction using elimination. ... disjunctions, halfspaces, decision lists, parities, k-DNF, k-CNF. ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 20

Provided by: Ryan1227

Learn more at: https://www.cis.upenn.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Quarter-Century of Efficient Learnability

1
A Quarter-Century of Efficient Learnability

Rocco Servedio
Columbia University

Valiant 60th Birthday Symposium Bethesda,
Maryland May 30, 2009
2
1984
and of course...
3
(No Transcript)
4
Probably Approximately Correct learningValiant84

Valiant84 presents range of learning models,
oracles
D models (possibly complex) world
typically or

Concept class of Boolean functions over
domain X

Unknown target concept to be learned
from examples

Unknown and arbitrary distribution over X

Learner has access to i.i.d. draws from
labeled according to each
belongs to X, i.i.d. drawn from

5
PAC learning concept class

Learners goal
come up with hypothesis that will
have high accuracy on future examples.

Efficiently

For any target function
for any distribution over X,
with probability learner outputs
hypothesisthat is -accurate w.r.t.

Algorithm must be computationally efficient
should run in time
6
So, what can be learned efficiently?
PAC model, and its variants, provide a clean
theoretical framework for studying the
computational complexity of learning
problems. From
The results of learnability theory would then
indicate the maximum granularity of the single
concepts that can be acquired without
programming. This paper attempts to explore
the limits of what is learnable as allowed by
algorithmic complexity.The identification of
these limits is a major goal of the line of work
proposed in this paper.
7
25 years of efficient learnability
(Didnt just ask the question what can be
learned efficiently he did a great deal
towards answering it. (highlight some of these
contributions and how the field has evolved since
then)
In the rest of the 1980s, Valiant
colleagues gave remarkable results on the
abilities and limitations of computationally
efficient learning algorithms. This work
introduced research directions and questions that
continue to be intensively studied to this day.

Rest of talk survey some
positive results (algorithms)
negative results (two flavors of hardness
results)

8
Positive results learning k-DNF
Theorem Valiant84 k-DNF learnable in
polynomial time for any kO(1). k2 View a
k-DNF as a disjunction over
metavariables, learn the disjunction using
elimination.
25 years later improving this to k
is still a major open question! Much has been
learned in trying for this improvement
9
Poly-time PAC learning, general distributions

Decision lists (greedy alg.)Rivest87
Halfspaces (poly-time LP)Littlestone87, BEHW89,
Parities, integer lattices (Gaussian elim.)
HelmboldSloanWarmuth92, FischerSimon92
Restricted types of branching programs (DL
parities) ErgunKumarRubinfeld95,
BshoutyTamonWilson98
Geometric concept classes (random projections)
BshoutyChenHomer94, BGMST98, Vempala99,
and more

-

-

-

-
-
-
-
- -
10
General-distribution PAC learning, cont

Quasi-poly / sub-exponential-time learning
poly-size decision trees EhrenfeuchtHaussler89,
Blum92
poly-size DNF Bshouty96, TaruiTsukiji99,Klivans
S01
intersections of few poly(n)-weight halfspaces
KlivansODonnellS02
PTF method (halfspaces metavariables) - link
with complexity theory

x3
x1
x5
x1
x5
x4
1
-1
1
-1
1
-1
1
OR
AND
AND
AND
_
_
_
_
x2
x3
x5
x6
x3
x5
x1
x6
x7
-

-

-
-
-

-
-
-
-
-
-
-
-
- -
-
11
Distribution-specific learning

Theorem KearnsLiValiant87 monotone Boolean
functions can be weakly learned (accuracy
) in poly time under the uniform
distribution on
Ushered in study of algorithms for
uniform-distribution and distribution-specific
learning halfspaces Baum90, DNF Verbeurgt90,
Jackson95, decision trees KushilevitzMansour93,
AC0 LinialMansourNisan89, FurstJacksonSmith91,
extended AC0 JacksonKlivansS02, juntas
MosselODonnellS03, general monotone functions
BshoutyTamon96, BlumBurchLangford98,
ODonnellWimmer09, monotone decision trees
ODonnellS06, intersections of halfspaces
BlumKannan94, Vempala97, KwekPitt98,
KlivansODonnellS08, convex sets, much more
Key tool Fourier analysis of Boolean functions
Recently come full circle on monotone functions
ODonnellWimmer09 poly time,
accuracy optimal! (by
BlumBurchLangford98)

1
1
0
12
Other variants

After Valiant84, efficient learning algorithms
studied in many settings
Learning in the presence of noise malicious
Valiant85, agnostic KearnsSchapireSellie93,
random misclassification AngluinLaird87,
Related models Exact learning from queries and
counterexamples Angluin87, Statistical Query
Learning Kearns93, many others
PAC-style analyses of unsupervised learning
problems learning discrete distributions
KMRRSS94, learning mixture distributions
Dasgupta99, AroraKannan01, many others
Evolvability framework Valiant07, Feldman08,
Nice algorithmic results in all these settings.

13
Limits of efficient learnabilityis proper
learning feasible?
Proper learning learning algorithm for class
must uses hypotheses from

There are efficient proper learning algorithms
for conjunctions, disjunctions, halfspaces,
decision lists, parities, k-DNF, k-CNF.
What about k-term DNF can we learn
using k-term DNF as hypotheses?

14
Proper learning is computationally hard
Theorem PittValiant87 If
no poly-time algorithm can learn 3-term DNF
using 3-term DNF hypotheses. Given a graph
reduction produces distribution over
labeled examples such that high-accuracy
3-term DNF iff is
3-colorable. Note can learn 3-term DNF in
poly time using 3-CNF hypotheses! Often a
change of representation can make a difficult
learning task easy.
distribution over (011111, ) (001111,
-) (101111, ) (010111, -) (110111, )
(011101, -)
reduction
15
From 1987
This work showed computational barriers to
learning with restricted representations in
general, not just proper learning
Theorem PittValiant87 Learning k-term DNF
using (2k-3)-term DNF hypotheses is
hard. Opened door to whole range of hardness
results is hard to learn using
hypotheses from
16
to 2009

Great progress in recent years using
sophisticated machinery from hardness of
approximation.
ABFKP04 Hard to learn n-term DNF using
n100-size OR-of-halfspace hypotheses.
Feldman06 Holds even if learner can make
membership queries to target function.
KhotSaket08 Hard to (even weakly) learn
intersection of 2 halfspaces using 100 halfspaces
as hypothesis
If data is corrupted with 1 noise, then
FeldmanGopalanKhotPonnuswami08 Hard to (even
weakly) learn an AND using an AND as hypothesis.
Same for halfspaces.
GopalanKhotSaket07, Viola08 Hard to (even
weakly) learn a parity even using degree-100
GF(2) polynomials as hypotheses
Active area with lots of ongoing work.

17
Representation-Independent Hardness
Suppose there are no hypothesis restrictions
any poly-size circuit OK. Are there learning
problems that are still hard for computational
reasons?
Yes

Valiant84 Existence of pseudorandom functions
GoldreichGoldwasserMicali84 implies that
general Boolean circuits are (representation-indep
endently) hard to learn.

18
PKC and hardness of learning

Key insight of KearnsValiant89 Public-key
cryptosystems ? hard-to-learn functions.
Adversary can create labeled examplesof
by herselfso must not be learnable from
labeled examples, or else cryptosystem would be
insecure!
Theorem KearnsValiant89 Simple classes of
functions NC1, TC0, poly-size DFAs are
inherently hard to learn.

Theorem Regev05, KlivansSherstov06 Really
simple functions poly-size OR of halfspaces
are inherently hard to learn. Closing the gap
Can these results be extended to show that DNF
are inherently hard to learn? Or are DNF
efficiently learnable?
19
Efficient learnability Model and Results

Valiant
provided an elegant model for the computational
study of learning
followed this up with foundational results on
what is (and isnt) efficiently learnable
These fundamental questions continue to be
intensively studied and cross-fertilize other
topics in TCS.

Thank you, Les!

Write a Comment

User Comments (0)