A survey on PAC learning - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

A survey on PAC learning

Description:

The probably approximately correct ( PAC ) learning model is a theory for ... an universal program that emulates the program f given y whose input is z. ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 20
Provided by: logosIcI
Category:
Tags: pac | emulates | learning | mat | survey

less

Transcript and Presenter's Notes

Title: A survey on PAC learning


1
A survey on PAC learning
  • 2004.7.16
  • Chikayama Lab.
  • Shibata Takeshi

2
Abstract
  • The probably approximately correct ( PAC )
    learning model is a theory for methods of machine
    learning form examples such as Support Vector
    Machines, Neural Networks, Decision Trees.
  • Using the PAC learning model we can analyze
    quantitatively lower bound with respect to
    accuracy and confidence of results that
    algorithms output.

3
A Rectangle Learning Game
  • Class Axis-aligned rectangles on
  • Positive or negative examples are drawn
    independently from an unique but unknown
    distribution on .
  • Algorithm Take a minimal rectangle which is
    consistent with positive examples.

target (unknown)
4
A Rectangle Learning Game
  • Target rectangle t
  • Hypothesis rectangle h
  • Difference between t and h t ? h
  • Distribution P
  • Distribution of m examples

-
-
-
-


Let P(A) gt ?/4


-
5
Definition of the PAC learning model
  • A Class on X C ½ Power (X)
  • A given but unknown target t 2 C
  • A given but unknown distribution over X P
  • Examples are drown independently.
  • If an example is in the target, the oracle return
    1 , otherwise 0.
  • Learning algorithms
  • input ? , ? ( lt 1 , positive)
  • output a hypothesis h
  • must satisfy

6
Occams Algorithm
  • A class C is efficiently PAC learnable iff some
    learning algorithms for C are in polynomial time
    w.r.t. t and 1/?, 1/?, n, where n is the
    maximal description length of drown examples.
  • A hypothesis h is consistent with a set of
    examples S iff h(x) t(x) for all (x,t(x)) 2 S.
  • An Occams Algorithm
  • input m examples
  • output an hypothesis h which is consistent with
    input examples
  • must satisfy h (nt)? m? , where ? lt 1

7
Occams Razor
  • A Occams Algorithm is an efficiently PAC
    learning algorithm.
  • Let h0 is an output of some Occams algorithm.
  • Let HC h h is consistent with m examples
  • Let HL h h (nt)? m?

8
VC Dimension
  • Let S be a finite subset of X.
  • A class C shatters S iff SÅ c c 2 C 2S
  • The VC dimension of C is max S C shatters S

For example, VC dim. of hyperplaines on RN is N
1
9
Growth Function
  • A growth function for C is defined as
  • It is known that
  • The growth function is in polynomial w.r.t. m if
    the VC dimension is finite.

10
PAC Learning under Finite VC dim.
  • Assume that an algorithm output h which is
    consistent with m examples.
  • m to be bounded by polynomial w.r.t. 1/?, 1/?.

11
Structural Risk Minimization
  • Let us assume the case that target t ? C.
  • Any hypotheses are not always consistent with C,
    i.e. for all h 2 C, P(t ? h) ? 0 in some case.
  • Thus the best output is
  • On the other hand, the following equation
    consists, where d is VC dim. of C.

Empirical Risk
Select a class that minimizes the left side
depends on VC dim.
12
SRM on SVMs and Margin Maximization
  • If x R for all x 2 RN and the margin is more
    than ?, The VC dim. (d) of hyperplanes is
  • Thus, maximization of the margin means decreasing
    of VC dim.
  • That decreases the upper bound of risk.

13
Distribution-Dependent Models
  • Some classes such as DFAs and boolean formulas is
    reduced to discrete cube root problem, thus they
    are not efficiently PAC learnable.
  • It is one of the reason of that difficulty that
    the PAC model is distribution-free.
  • On simple distributions, MAT learnable classes
    such as DFAs are efficiently PAC learnable.

14
Minimally Adequate Teacher Learning
  • The Minimally Adequate Teacher (MAT) learning
    model is one of the query learning models.
    Learning algorithms can use 2 kinds of queries.
  • membership queries whether the example belongs
    to the target or not
  • equivalence queries whether the hypothesis h
    equal to the target t or not, and if not, a
    counter-example ( 2 t?h )
  • A class C is efficiently MAT learnable iff some
    algorithms output an equivalent representation in
    polynomial time w.r.t. maximal size of
    counter-examples and t for all t 2 C.
  • DFAs is MAT learnable.

15
Kolmogorov Complexity
  • an universal program
    that emulates the program f given y whose input
    is z.
  • The conditional Kolmogorov complexity of x given
    y is
  • For example, Kolmogorov complexity of
    without condition is less than write 1
    100100 times

16
The Simple Distribution
  • The simple distribution over 0,1 given a
    representation r of c 2 C is
  • note that
  • The PACS learning model is the PAC learning model
    under simple distribution.

17
PACS and MAT Learning
  • Teacher for MAT model T_min(t) return
    lexicographically smallest one in minimal length
    counter-examples, where t is a target concept.
  • A efficient learning algorithm for MAT model A
  • The longest query or counter-example using A and
    T_min n
  • Each query or counter-example is computed by the
    combination of A and T_min, and the number of its
    order.
  • Thus the Kolmogorov complexity given t is

18
PACS and MAT Learning
  • The number of x 2 0,1 such that K(xr) lt log2
    g(nt) is less than 2g(nt).
  • For all x 2 0,1 such that K(xr) lt log2 g(nt)
    ,
  • Thus the probability that some queries or
    counter-examples are not drown is no more than
  • m is bounded by polynomial w.r.t. n, t, 1/?,
    1/?.
  • C is efficiently PACS learnable by using A and
    finding same queries and counter examples form m
    examples.

19
Summary
  • Fundamental concepts of the PAC learning model
  • Occams Razor
  • VC dimension
  • Structural Risk Minimization
  • Distribution-dependent PAC learning models
  • PACS (PAC with Simple Distribution)
  • A Relationship between the PACS and MAT learning
    model
Write a Comment
User Comments (0)
About PowerShow.com