Harmonic Analysis in Learning Theory - PowerPoint PPT Presentation

About This Presentation
Title:

Harmonic Analysis in Learning Theory

Description:

Harmonic analysis is central to learning theoretic results ... Modified Levin algo finds ?a in time ~ns2. Uniform Learning from a. Classification Noise Oracle ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 48
Provided by: DuquesneU8
Category:

less

Transcript and Presenter's Notes

Title: Harmonic Analysis in Learning Theory


1
Harmonic Analysis in Learning Theory
  • Jeff Jackson
  • Duquesne University

2
Themes
  • Harmonic analysis is central to learning
    theoretic results in wide variety of models
  • Results generally strongest known for learning
    with respect to uniform distribution
  • Work on learning problems has led to some new
    harmonic results
  • Spectral properties of Boolean function classes
  • Algorithms for approximating Boolean functions

3
Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
4
Circuit Classes
  • Constant-depth AND/OR circuits (AC0 without the
    polynomial-size restriction call this
    CDC)
  • DNF depth-2 circuit with OR at root

Ù

Ú
Ú
Ú
d levels
. . .
. . .
Ù
Ù
Ù
. . .
. . .
. . .
v1 v2 v3
vn
Negations allowed
5
Decision Trees
v3
v2
v1
0
1
v4
0
0
1
6
Decision Trees
v3
x3 0
v2
v1
0
1
v4
0
0
1
x 11001
7
Decision Trees
v3
v2
v1
x1 1
0
1
v4
0
0
1
x 11001
8
Decision Trees
v3
v2
v1
0
1
v4
0
0
1
x 11001 f(x) 1
9
Function Size
  • Each function representation has a natural size
    measure
  • CDC, DNF of gates
  • DT of leaves
  • Size sF (f) of f with respect to class F is size
    of smallest representation of f within F
  • For all Boolean f, sCDC(f) sDNF(f) sDT(f)

10
Efficient Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Time poly(n,sF ,1/e)
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
11
Harmonic-Based Uniform Learning
  • LMN constant-depth circuits are
    quasi-efficiently (n polylog(s/e)-time) uniform
    learnable
  • BT monotone Boolean functions are uniform
    learnable in time roughly 2vn logn
  • Monotone For all x, i f(xxi0) f(xxi1)
  • Also exponential in 1/e (so assumes e constant)
  • But independent of any size measure

12
Notation
  • Assume f 0,1n ? -1,1
  • For all a in 0,1n, ?a (x) (-1) a x
  • For all a in 0,1n, Fourier coefficient f(a) of
    f at a is
  • Sometimes write, e.g., f(1) for f(100)




13
Fourier Properties of Classes
  • LMN f is a constant-depth circuit of depth d
    andS a a lt logd(s/e) ( a of 1s
    in a )
  • BTf is a monotone Boolean function andS
    a a lt vn / e)

14
Spectral Properties
15
Proof Techniques
  • LMN Hastads Switching Lemma harmonic
    analysis
  • BT Based on KKL
  • Define AS(f) n Prx,if(xxi0) ? f(xxi1)
  • If S a a lt AS(f)/e then SaÏS f2(a) lt e
  • For monotone f, harmonic analysis
    Cauchy-Schwartz shows AS(f) vn
  • Note This is tight for MAJ


16
Function Approximation
  • For all Boolean f,
  • For S Í 0,1n, define
  • LMN

17
The Fourier Learning Algorithm
  • Given e (and perhaps s, d)
  • Determine k such that for S a a lt k,
    SaÏS f2(a) lt e
  • Draw sufficiently large sample of examples
    ltx,f(x)gt to closely estimate f(a) for all aÎS
  • Chernoff bounds nk/e sample size sufficient
  • Output h sign(SaÎS f(a) ?a)
  • Run time n2k/e




18
Halfspaces
  • KOS Halfspaces are efficiently uniform
    learnable (given e is constant)
  • Halfspace wÎRn1 s.t. f(x) sign(w (xº1))
  • If S a a lt (21/e)2 then åaÏS f2(a) lt e
  • Apply LMN algorithm
  • Similar result applies for arbitrary function
    applied to constant number of halfspaces
  • Intersection of halfspaces key learning pblm


19
Halfspace Techniques
  • O (cf. BKS, BJTa)
  • Noise sensitivity of f at ? is probability that
    corrupting each bit of x with probability ?
    changes f(x)
  • NS? (f) ½(1-åa(1-2 ?)a f2(a))
  • KOS
  • If S a a lt 1/ ? then åaÏS f2(a) lt 3 NS?
    (f)
  • If f is halfspace then NSe lt 9v e



20
Monotone DT
  • OS Monotone functions are efficiently
    learnable given
  • e is constant
  • sDT(f) is used as the size measure
  • Techniques
  • Harmonic analysis for monotone f, AS(f) vlog
    sDT(f)
  • BT If S a a lt AS(f)/e then SaÏS f2(a)
    lt e
  • Friedgut T 2AS(f)/e s.t. SAËT f2(A) lt e



21
Weak Approximators
  • KKL also show that if f is monotone,there is an
    i such that -f(i) log2n/n
  • Therefore Prf(x) -?i(x) ½ log2n/2n
  • In general, h s.t. Prf h ½ 1/poly(n,s) is
    called a weak approximator to f
  • If A outputs a weak approximator for every f in
    F , then F is weakly learnable


22
Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
23
Weak Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt ½ - 1/p(n,s)
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
24
Efficient Weak Learning Algorithm for Monotone
Boolean Functions
  • Draw set of n2 examples ltx,f(x)gt
  • For i 1 to n
  • Estimate f(i)
  • Output h argmaxf(i)(-?i)



25
Weak Approximation for MAJ of Constant-Depth
Circuits
  • Note that adding a single MAJ to a CDC destroys
    the LMN spectral property
  • JKS MAJ of CDCs is quasi-efficiently
    quasi-weak uniform learnable
  • If f is a MAJ of CDCs of depth d, and if the
    number of gates in f is s, then there is a set A
    Í 0,1n such that
  • A lt logd s k
  • Prf(x) ?A(x) ½ 1/4snk

26
Weak Learning Algorithm
  • Compute k logds
  • Draw snk examples ltx,f(x)gt
  • Repeat for A lt k
  • Estimate f(A)
  • Until find A s.t. f(A) gt 1/2snk
  • Output h ?A
  • Run time npolylog(s)



27
Weak ApproximatorProof Techniques
  • Discriminator Lemma (HMPST)
  • Implies one of the CDCs is a weak approximator
    to f
  • LMN spectral characterization of CDC
  • Harmonic analysis
  • Beigel result used to extend weak learning to CDC
    with polylog MAJ gates

28
Boosting
  • In many (not all) cases, uniform weak learning
    algorithms can be converted to uniform (strong)
    learning algorithms using a boosting technique
    (S, F, )
  • Need to learn weakly with respect to near-uniform
    distributions
  • For near-uniform distribution D, find weak hj
    s.t. PrxDhj f gt ½ 1/poly(n,s)
  • Final h typically MAJ of weak approximators

29
Strong Learning for MAJ of Constant-Depth
Circuits
  • JKS MAJ of CDC is quasi-efficiently uniform
    learnable
  • Show that for near-uniform distributions, some
    parity function is a weak approximator
  • Beigel result again extends to CDC with poly-log
    MAJ gates
  • KP boosting there are distributions for
    which no parity is a weak approximator

30
Uniform Learning from a Membership Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Membership OracleMEM(f)
Learning AlgorithmA
x
f(x)
Accuracy e gt 0
31
Uniform Membership Learning of Decision Trees
  • KM
  • L1(f) åa f(a) sDT(f)
  • If S a f(a) e/L1(f) then SaÏS f2(a) lt e
  • GL Algorithm (memberhip oracle) for finding a
    f(a) ? in time n/?6
  • So can efficiently uniform membership learn DT
  • Output h same form as LMNh sign(SaÎS f(a) ?a)








32
Uniform Membership Learning of DNF
  • J
  • "(distributions D) ?a s.t. PrxDf(x)
    ?a(x) ½ 1/6sDNF
  • Modified GL can efficiently locate such ?a
    given oracle for near-uniform D
  • Boosters can provide such an oracle when uniform
    learning
  • Boosting provides strong learning
  • BJTb (see also KS)
  • Modified Levin algo finds ?a in time ns2

33
Uniform Learning from a Classification Noise
Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Classification Noise OracleEX? (f)
Learning AlgorithmA
Uniform random x
Prltx, f(x)gt1-? Prltx, -f(x)gt?
Accuracy e gt 0
Error rate ? gt 0
34
Uniform Learning from a Statistical Query Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Statistical Query OracleSQ(f)
Learning AlgorithmA
( q(), t )
EUq(x, f(x)) t
Accuracy e gt 0
35
SQ and Classification Noise Learning
  • K
  • If F is uniform SQ learnable in time poly(n,
    sF ,1/e, 1/t) then F is uniform CN learnable in
    time poly(n, sF ,1/e, 1/t, 1/(1-2?))
  • Empirically, almost always true that if F is
    efficiently uniform learnable then F is
    efficiently uniform SQ learnable (i.e., 1/t poly
    in other parameters)
  • Exception F PARn ?a a Î 0,1n, a n

36
Uniform SQ Hardness for PAR
  • BFJKMR
  • Harmonic analysis shows that for any q, ?a
    EUq(x, ?a(x)) q(0n1) q(a º 1)
  • Thus adversarial SQ response to (q,t) is q(0n1)
    whenever q(a º 1) lt t
  • Parseval q(b º 1) lt t for all but 1/t2 Fourier
    coefficients
  • So bad query eliminates only poly coefficients
  • Even PARlog n not efficiently SQ learnable






37
Uniform Learning from an Attribute Noise Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Attribute Noise OracleEXDN(f)
Learning AlgorithmA
Uniform random x
ltxÅr, f(x)gt, rDN
Accuracy e gt 0
Noise model DN
38
Uniform Learning with Independent Attribute Noise
  • BJTa
  • LMN algorithm produces estimates of f(a)
    ErDN?a(r)
  • Example application
  • Assume noise process DN is a product
    distribution
  • DN(x) ?i (pi(xi) (1-pi)(1-xi))
  • Assume pi lt 1/polylog n, 1/e at most
    quasi-poly(n) (mild restrictions)
  • Then modified LMN uniform learns attribute noisy
    AC0 in quasi-poly time


39
Agnostic Learning Model
Arbitrary Boolean Function
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) minimized
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
40
Near-Agnostic Learning via LMN
  • KKM
  • Let f be an arbitrary Boolean function
  • Fix any set S Í 1..n and fix e
  • Let g be any function s.t.
  • SaÏS g2(a) lt e and
  • Prf ? g is minimized (call this ?)
  • Then for h learned by LMN by estimating
    coefficients of f over S
  • Prf ? h lt 4? e


41
Average Case Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
D -randomf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
42
Average Case Learning of DT
  • JSa
  • D uniform over complete, non-redundantlog-depth
    DTs
  • DT efficiently uniform learnable on average
  • Output is a DT (proper learning)

43
Average Case Learning of DT
  • Technique
  • KM All Fourier coefficients of DT with min
    depth d are rational with denominator 2d
  • In average-case tree, coefficient f(i) for at
    least one variable vi has odd numerator
  • So log(denominator) is min depth of tree
  • Try all variables at root and find depth of child
    trees, choosing root with shallowest children
  • Recurse on child trees to choose their roots


44
Average Case Learning of DNF
  • JSb
  • D s terms, each term uniform from terms of
    length log s
  • Monotone DNF with ltn2 terms and DNF with ltn1.5
    terms properly and efficiently uniform learnable
    on average
  • Harmonic property
  • In average-case DNF, sign of f(i,j) (usually)
    indicates whether vi and vj are in a common term
    or not


45
Summary
  • Most uniform-learning results depend on harmonic
    analysis
  • Learning theory provides motivation for new
    harmonic observations
  • Even very weak harmonic results can be useful
    in learning-theory algorithms

46
Some Open Problems
  • Efficient uniform learning of monotone DNF
  • Best to date for small sDNF is S, time nslog
    s (based on BT, M, LMN)
  • Non-uniform learning
  • Relatively easy to extend many results to product
    distributions, e.g. FJS extends LMN
  • Key issue in real-world applicability

47
Open Problems (contd)
  • Weaker dependence on e
  • Several algorithms fully exponential (or worse)
    in 1/e
  • Additional proper learning results
  • Allows for interpretation of learned hypothesis
Write a Comment
User Comments (0)
About PowerShow.com