Learning, testing, and approximating halfspaces - PowerPoint PPT Presentation

About This Presentation
Title:

Learning, testing, and approximating halfspaces

Description:

Measure distance between functions under uniform distribution. ... Watch your manners. Here's your example: Very brief Fourier interlude ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 42
Provided by: Ryan1220
Category:

less

Transcript and Presenter's Notes

Title: Learning, testing, and approximating halfspaces


1
Learning, testing, and approximatinghalfspaces
  • Rocco Servedio
  • Columbia University

DIMACS-RUTCOR Jan 2009
2
Overview
Halfspaces over
learning
testing
approximation
3
Joint work with
Kevin Matulef
Ilias Diakonikolas
Ryan ODonnell
Ronitt Rubinfeld
4
Approximation
  • Given a function
    goal is to obtain a simpler
  • function
    such that

  • Measure distance between functions under uniform
    distribution.

5
Approximating classes of functions
  • Interested in statements of the form
  • Every function in class has a simple
    approximator.

Example statement
1
Every -size decision tree can
be -approximated by a decision tree of depth

0
1
0
1
0
0
1
1
1
1
0
1
6
Testing
  • Goal infer global property of function via few
    local inspections
  • Tester makes black-box queries to arbitrary

oracle for
  • Tester must output
  • yes whp if
  • no whp if is -far fromevery
  • Any answer OK if is -close to some

distance
Usual focus information-theoretic queries
required
7
Some known property testing results
Class of functions over of queries
parity functions BLR93 deg-
polynomials AKK03 literals
PRS02 conjunctions PRS02 -juntas
FKRSS04 -term monotone DNF PRS02
-term DNF DLM07 size- decision trees
DLM07 -sparse polynomials
DLM07
8
Well get to learning later
9
Halfspaces
A function
is a halfspace if
such that
for all
  • Also called linear threshold functions (LTFs),
    threshold gates, etc.
  • Well studied in complexity theory
  • Fundamental to learning theory
  • Halfspaces are at the heart of many learning
    algorithms Perceptron, Winnow, boosting,
    Support Vector Machines,

10
Some examples of halfspaces
Weights can be all the same

but dont have to be

(decision list)

11
Whats a simple halfspace?
  • Every halfspace has a representation with integer
    weights
  • finite domain, so can nudge weights to rational
    s, scale to integers
  • is equivalent to

Some halfspaces over require
integer weightsMTT61, H94 Low-weight
halfspaces are nice for complexity, learning.
12
Approximating halfspaces using small weights?
Let be
an arbitrary halfspace. If is a halfspace
which -approximates how large do the
weights of need to be?
Lets warm up with a concrete example.
Consider
(view as n-bit binary numbers)
This is a halfspace
Any halfspace for requires weight

but its easy to -approximate
with weight
13
Approximating all halfspaces using small weights?
Let be
an arbitrary halfspace. If is a halfspace
which -approximates how large do the
weights of need to be?
So there are halfspaces that require weightbut
can be -approximated with weight
Can every halfspace be approximated by a
small-weight halfspace?
Yes
14
Every halfspace has a low-weight approximator
Theorem S06 Let
be any halfspace. For anythere is an
-approximator
withinteger weights that has
How good is this bound?
  • Cant do better in terms of may need some
  • Dependence on must be
    H94

15
Idea behind the approximation
  • Let

WOLOG have
Key idea look at how these weights decrease.
  • If weights decrease rapidly, then is well
    approximated by a junta
  • If weights decrease slowly, then is nice
    can get a handle on distribution of

16
A few more details
Let
How do these weights decrease?
Def Critical index of
is the first index such that
is small relative to the remaining weights
critical index
17
Sketch of approximation case 1
Critical index is first index
such that
First case
  • First weights all decrease rapidly
    factor of
  • Remaining weight after very small
  • Can showis -close to , so can approximate
    just by truncating
  • has relevant variables so can be
    expressed with integer weights each at most

18
Why does truncating work?
Lets write for
Have
only if either
or
each of these weights small, so unlikely by
Hoeffding bound
unlikely by more complicated argument (split up
into blocks symmetry argument on each block
bounds prob by ½ use independence)
19
Sketch of approximation case 2
Critical index is first index
such that
Second case
  • weights are smooth
  • Intuition
    behaves like Gaussian
  • Can show its OK to round weights
    to small integers (at most )

20
Why does rounding work?
Let
so

Have
only if either
or
each small, so unlikely by Hoeffding bound
unlikely since Gaussian is anticoncentrated
21
Sketch of approximation case 2
Critical index is first index
such that
Second case
  • weights are smooth
  • Intuition
    behaves like Gaussian
  • Can show its OK to round weights
    to small integers (at most )
  • Need to deal with first
    weights, but at mostmany they cost at most

END OF SKETCH
22
Extensions
Let be any
halfspace. For anythere is an -approximator

withinteger weights that has
We saw
Recent improvement DS09 replace
with
For
with bit flipped
Standard fact Every halfspace has
(but can be much less)
23
  • Proof uses structural properties of halfspaces
    from testing learning.
  • Can be viewed as (exponential)sharpening of
    Friedguts theorem
  • Every Boolean is -close to a function
    on variables.
  • We show
  • Every halfspace is -close to a function
    onvariables.
  • Combines
  • Littlewood-Offord type theorems on
    anticoncentration of
  • delicate linear programming arguments
  • Gives new proof of originalbound that does not
    use the critical index

approximation
24
So halfspaces have low-weight approximators.What
about testing?
Use approximation viewpoint two possibilities
depending on critical index.
First case critical index large
  • close to junta halfspace over
    variables
  • Implicitly identify the junta variables (high
    influence)
  • Do Occam-type implicit learning similar to
    DLMORSW07 (building on FKRSS02) check
    every possible halfspace over the junta variables
  • If is a halfspace, itll be close to some
    function you check
  • If far from every halfspace, itll be close
    to no function you check

25
So halfspaces have low-weight approximators.What
about testing?
Second case critical index small
  • every restriction of high-influence vars
    makes regular
  • all weights influences are small
  • Low-influence halfspaces have nice Fourier
    properties
  • Can use Fourier analysis to check that each
    restrictionis close to a low-influence halfspace
  • Also need to check
  • cross-consistency of different restrictions
    (close to low-influence halfspaces with same
    weights)?
  • global consistency with a single set of
    high-influence weights

s
most
26
A taste of Fourier
  • A helpful Fourier result about low-influence
    halfspaces
  • Theorem MORS07 Let be any Boolean
    function such that
  • all the degree-1 Fourier coefficients of are
    small
  • the degree-0 Fourier coefficient synchs up with
    the degree-1 coeffs
  • Then is close to a halfspace

27
A taste of Fourier
  • A helpful Fourier result about low-influence
    halfspaces
  • Theorem MORS07 Let be any Boolean
    function such that
  • all the degree-1 Fourier coefficients of are
    small
  • the degree-0 Fourier coefficient synchs up with
    the degree-1 coeffs
  • Then is close to a halfspace in fact,
    close to the halfspace
  • Useful for soundness portion of test

28
Testing halfspaces
  • When all the dust settles

Theorem MORS07
The class of halfspaces over is
testable with queries.
testing
approximation
29
What about learning?







-
Learning halfspaces from randomlabeled examples
is easy usingpoly-time linear programming.









-


-






-








-
-
-
-
-
-

-

-
-
-
-
-
-
-

-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
There are other harder learning models
?
!
  • The RFA model
  • Agnostic learning under uniform distribution

30
The RFA learning model
  • Introduced by BDD92 restricted focus of
    attention
  • For each labeled example the
    learner gets to choose one bit of the
    example that he can see (plus the label
    of course).
  • Examples are drawn from uniform distribution over
  • Goal is to construct -accurate hypothesis

Question BDD92, ADJKS98, G01 Are
halfspaces learnable in RFA model?
31
The RFA learning model in action
May I have a random example, please?
Sure, which bit would you like to see?
Oh, manuh, x7.
Heres your example

Thanks, I guess
Watch your manners
learner
oracle
32
Very brief Fourier interlude
  • Every
    has a unique Fourier representation

The coefficients
are sometimes called the Chow parameters of
33
Another view of the RFA learning model
RFA model learner gets
  • Every
    has a unique Fourier representation

The coefficients
are sometimes called the Chow parameters of
Not hard to see
In the RFA model, all the learner can do is
estimate the Chow parameters
  • With examples, can estimate any given
    Chow parameter to additive accuracy

34
(Approximately) reconstructing halfspaces from
their (approximate) Chow parameters
  • Perfect information about Chow parameters
    suffices for halfspaces

Theorem C61 If is a halfspace has
for all then
  • To solve 1-RFA learning problem, need a version
    of Chows theorem which is both robust and
    effective
  • robust only get approximate Chow parameters
    (and only hope for approximation to )
  • effective want an actual poly(n) time algorithm!

35
Previous results
ADJKS98 proved
Theorem Let be a weight- halfspace.
Let be
any Boolean function satisfyingfor all
Then is an -approximator for
  • Good for low-weight halfspaces, but could
    be

Goldberg01 proved
Theorem Let be any halfspace. Let
be any function
satisfyingfor all Then is
an -approximator for
  • Better bound for high-weight halfspaces, but
    superpolynomial in n.

Neither of these results is algorithmic.
36
Robust, effective version of Chows theorem
Theorem OS08 For any constant
and any halfspace given accurate enough
approximations of the Chow parameters
ofalgorithm runs in time and w.h.p.
outputs a halfspace that is -close to
Corollary OS08 Halfspaces are learnable to
any constant accuracy in time in
the RFA model.
  • Fastest runtime dependence on of any
    algorithm for learning halfspaces, even in usual
    random-examples model
  • Previous best runtime time
    for learning to constant accuracy
  • Any algorithm needs examples, i.e.
    bits of input

37
A tool from testing halfspaces
  • Recall helpful Fourier result about low-influence
    halfspaces
  • Theorem Let be any function which is
    such that
  • all the degree-1 Fourier coefficients of are
    small
  • the degree-0 Fourier coefficient synchs up with
    the degree-1 coeffs
  • Then is close to

If itself is a low-influence halfspace, means
we can plug in degree-1 Fourier coefficients
as weights and get a good approximator. Also
need to deal with high-influence casea hassle,
but doable.
We know (approximations to) these in the RFA
setting!
polynomial time!
38
Recap of whole talk
learning
testing
Halfspaces over
approximation
  • Every halfspace can be approximated to any
    constant accuracy with small integer weights.
  • Halfspaces can be tested with
    queries.
  • Halfspaces can be efficiently learned from
    (approximations of) their degree-0 and degree-1
    Fourier coefficients.

39
Future directions
  • Better quantitative results (dependence on ?)
  • Testing
  • Approximating
  • Learning (from Chow parameters)
  • What about approximating, testing, learning
    w.r.t. other distributions?
  • Rich theory of distribution-independent PAC
    learning
  • Less fully developed theory of distribution-indepe
    ndent testing HK03,HK04,HK05,AC06
  • Things are harder what is doable?
  • GS07 Any distribution-independent algorithm for
    testing whether is a halfspace requires
    queries.

40
Thank you for your attention
41
II. Learning a concept class
PAC learning concept class under the
uniform distribution
  • Setup Learner is given a sample of labeled
    examples
  • Target function is unknown to
    learner
  • Each example in sample is independent,
    uniform over

Goal For every , with probability
learner should output a hypothesis
such that
Write a Comment
User Comments (0)
About PowerShow.com