Rigorous Learning Curve Bounds from Statistical Mechanics - PowerPoint PPT Presentation

About This Presentation
Title:

Rigorous Learning Curve Bounds from Statistical Mechanics

Description:

Experimental learning curves fit a variety of functional forms, including exponentials. Curves analyzed using statistical mechanics methods, experience phase ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 42
Provided by: csHu
Category:

less

Transcript and Presenter's Notes

Title: Rigorous Learning Curve Bounds from Statistical Mechanics


1
Rigorous Learning Curve Bounds from Statistical
Mechanics
  • D. Haussler, M. Kearns,
  • H. S. Seung, N. Tishby

Presentation Talya Meltzer
2
Motivation
  • According to the VC-theory, minimizing the
    empirical error within a function class F on a
    random sample will lead to generalization error
    bounds
  • Realizable case
  • Unrealizable case
  • The VC-bounds are the best distribution-independen
    t upper bounds

3
Motivation
  • Yet, these bounds are vacuous for mltd
  • And fail to capture the true behavior of
    particular learning curves
  • Experimental learning curves fit a variety of
    functional forms, including exponentials
  • Curves analyzed using statistical mechanics
    methods, experience phase transitions (sudden
    drops in the generalization error)

4
Main Ideas
  • Decompose the hypothesis class into error
    shells
  • Attribute each hypothesis the correct
    generalization error, while taking the specific
    distribution into account
  • Use the thermodynamic limit method
  • Notate the correct scale at which to analyze a
    learning curve
  • Express the learning curve as a competition
    between an entropy function and an energy function

5
Overview The PAC Learning Model
  • The hypothesis class
  • Input
  • Assumptions
  • The examples in the training set S are sampled
    i.i.d according to a distribution D over X
  • D is unknown
  • D is fixed throughout the learning process
  • There exists a target function fX?Y, i.e. yi
    f(xi)
  • Goal find the target function

6
Overview The PAC Learning Model
  • Training (empirical) error
  • Generalization error
  • The class F is PAC-learnable, if there exists a
    learning algorithm which given e,d returns h?F
    such that
  • The training error is minimal

7
The Finite Realizable Case
  • The version space
  • The e-ball
  • If B(e) includes VS(S), then any function in the
    version space has generalization error e

8
The Finite Realizable Case
9
Decomposition into error shells
In a finite class, there is a finite number of
possible error values 0 e1 lt e2 lt lt er
1 rFlt8
10
Decomposition into error shells
So, we can replace the union bound in the exact
phrase
Now, with probability at least 1-d, any h
consistent with the sample obeys
To understand the behavior of this bound, we will
use the thermodynamic limit method
11
The Thermodynamic Limit
  • We consider an infinite sequence of classes of
    functions F1,F2,,FN,
  • FN f XN ? 0,1 , Nlog2(FN)
  • We are often interested in a parametric class of
    functions
  • The number of functions in the class at any given
    error value may have a limiting asymptotic
    behavior, as the number of parameters grows

12
The Thermodynamic Limit
  • Rewrite the expression
  • Notate the scaling function t(N) when chosen
    properly, captures the scale at which the
    learning curve is most interesting
  • Find a permissible entropy bound tightly
    captures the behavior of

The entropy of the j-th error shell POSITIVE
The minus energy of the j-th error shell
NEGATIVE
13
The Thermodynamic Limit
  • Formal definitions
  • t(N) a mapping from the natural numbers to the
    natural numbers, such that
  • s(e) a continuous function
  • s(e) is called a permissible entropy bound if
    there exists a natural number N0 such that for
    all N N0 and for all 1jr(N)

14
The Thermodynamic Limit
am/t(N) remains const, as m,N?8 a controls the
competition between the entropy and the energy
15
The Thermodynamic Limit
  • In order to describe infinite systems
  • We describe a system in finite size, then let the
    size grow to infinity
  • We normalize extensive variables by the volume
  • We keep the density fixed ? N/V const, as
    N,V ? 8

16
The Thermodynamic Limit
The Learning System vs. The Thermodynamic System
17
The Thermodynamic Limit
  • Benefits N isolated in the factor t(N), and the
    remaining factor is the continuous function
  • Define as the largest such that
  • In the thermodynamic limit, under certain
    conditions, we can bound the generalization error
    of any consistent hypothesis by

18
The Thermodynamic Limit
We will see that for egte the thermodynamic limit
of the sum is 0. Let 0ltt1 be an arbitrarily
small quantity
19
The Thermodynamic Limit
The limit will be indeed zero, provided that
r(N)o(expt(N)?)
20
The Thermodynamic Limit
  • Summary
  • e is the rightmost crossing point of s(e) and
    -aln(1-e)
  • in the thermodynamic limit, any hypothesis h
    consistent with m at(N) examples will have
    egen(h) e t (with probability 1).

21
Scaled Learning Curves
  • Extracting scaled learning curves
  • Let the value of a vary
  • Apply the thermodynamic limit method to each
    value
  • Plot the generalization error bound as a function
    of a (instead of m ? scaled)

22
Artificial Examples
Using weak permissible entropy bound for some
scaling function t(N)
s(e)1
23
Artificial Examples
Using single-peak permissible entropy bound
24
Artificial Examples
Using different single-peak as a permissible
entropy bound
25
Artificial Examples
Using double-peak permissible entropy bound
26
Phase Transitions
  • The sudden drops in the learning curves are
    called phase transitions
  • In thermodynamic systems, a phase transition is
    the transformation from one phase to another
  • A critical point is the conditions (such as
    temperature, pressure) at which the transition
    occur

27
Phase Transitions
Well known phase transitions solid to liquid,
liquid to gas...
28
Phase Transitions more
29
Phase Transitions Learning
  • In some learning curves, we see a transition from
    a finite generalization error to perfect learning
  • The transition occur in a critical a, i.e. when
    the sample reaches the size of m aCt(N)
  • In this critical point the system realizes the
    problem at once

30
(Almost) Real Examples The Ising Perceptron
fN arbitrary target function, defined by w0
31
(Almost) Real Examples The Ising Perceptron
Due to the spherically symmetric distribution
The number of perceptrons with hamming distance j
from the target
32
(Almost) Real Examples The Ising Perceptron
Weve seen this entropy bound as the single-peak
  • The phase transition to perfect learning occur
    in aC1.448
  • The critical m for perfect learning according to
    both the VC and cardinality bounds, is
    , rather than

33
(Almost) Real Examples The Ising Perceptron
  • The right zero crossing yields the upper bound on
    the generalization error
  • With high probability, there are no hypotheses in
    VS(S) with error less than the left zero crossing
    except for the target itself
  • VS(S) minus the target is contained within these
    zero crossings

34
The Thermodynamic LimitLower Bound
  • The thermodynamic limit method can provide a
    lower bound to the generalization error
  • The lower bound shows that the behavior examined
    in scaled learning curve, including phase
    transitions, can actually occur for certain
    function classes and distributions
  • We will use the energy function 2ae
  • The qualitative behavior of the curves obtained
    by intersecting with 2ae and -aln(1-e) is
    essentially the same

35
The Thermodynamic LimitLower Bound
  • We can construct
  • a function class sequence FN over XN
  • a distribution sequence DN over XN
  • a target function sequence fN
  • such that
  • s(e) is a permissible entropy bound with respect
    to t(N)N
  • for the largest e½ for which 2aes(e), there
    is a constant probability to find a consistent
    hypothesis with egen(h)e
  • ? e is a lower bound on the worst consistent
    hypothesis

36
The Finite Unrealizable Case
  • The data can be labeled according to a function
    not within our class
  • Or sampled by a distribution DN over XN0,1,
    which can also model noise in the examples
  • Use u(e) as a permissible energy bound, if for
    any h in F and any sample size m

for the realizable case we had and the exact
equality
37
The Finite Unrealizable Case
  • We can always choose
  • (and in certain cases we can do better)
  • The standard cardinality bound obtained
  • Since the class is finite, we can slice it into
    error shells and apply the thermodynamic limit,
    just as in the realizable case.
  • Choosing e to be the rightmost intersection of
    s(e) and au(e), we get for any tgt0

38
The Infinite Case
  • The covering approach build a finite ?-cover,
    F?, to the infinite class ? emin(?)?
  • Apply the thermodynamic limit by building a
    sequence of nested covers
  • Result a bound on the error of e?, the rightmost
    crossing function of s?(e) and au?(e)
  • Trade-off
  • The best error achievable in the chosen cover
    F? improves as ??0
  • The size of F? increases as ??0

39
Real World Example
Sufficient Dimensionality Reduction with
Irrelevance Statistics A. Globerson, G. Chechik,
N. Tishby
  • In this example
  • Main data images of all men with neutral face
    expression and light either from the right or the
    left
  • Irrelevance data similarly created with female
    images

40
Real World Example
41
Summary
  • Benefits of the method
  • Derives tighter bounds
  • Allows to describe the behavior for small samples
    as well ? useful in practice, where we want to
    work with md
  • Captures the phase transitions in learning
    curves, including transitions to perfect
    learning, which can actually occur
    experimentally in certain problems
  • Further work to be done
  • Refined extensions to the infinite case
Write a Comment
User Comments (0)
About PowerShow.com