A survey on PAC learning - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

A survey on PAC learning

Description:

The probably approximately correct ( PAC ) learning model is a theory for ... an universal program that emulates the program f given y whose input is z. ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 20

Provided by: logosIcI

Category:

more less

Transcript and Presenter's Notes

Title: A survey on PAC learning

1
A survey on PAC learning

2004.7.16
Chikayama Lab.
Shibata Takeshi

2
Abstract

The probably approximately correct ( PAC )
learning model is a theory for methods of machine
learning form examples such as Support Vector
Machines, Neural Networks, Decision Trees.
Using the PAC learning model we can analyze
quantitatively lower bound with respect to
accuracy and confidence of results that
algorithms output.

3
A Rectangle Learning Game

Class Axis-aligned rectangles on
Positive or negative examples are drawn
independently from an unique but unknown
distribution on .
Algorithm Take a minimal rectangle which is
consistent with positive examples.

target (unknown)
4
A Rectangle Learning Game

Target rectangle t
Hypothesis rectangle h
Difference between t and h t ? h
Distribution P
Distribution of m examples

-
-
-
-

Let P(A) gt ?/4

-
5
Definition of the PAC learning model

A Class on X C ½ Power (X)
A given but unknown target t 2 C
A given but unknown distribution over X P
Examples are drown independently.
If an example is in the target, the oracle return
1 , otherwise 0.
Learning algorithms
input ? , ? ( lt 1 , positive)
output a hypothesis h
must satisfy

6
Occams Algorithm

A class C is efficiently PAC learnable iff some
learning algorithms for C are in polynomial time
w.r.t. t and 1/?, 1/?, n, where n is the
maximal description length of drown examples.
A hypothesis h is consistent with a set of
examples S iff h(x) t(x) for all (x,t(x)) 2 S.
An Occams Algorithm
input m examples
output an hypothesis h which is consistent with
input examples
must satisfy h (nt)? m? , where ? lt 1

7
Occams Razor

A Occams Algorithm is an efficiently PAC
learning algorithm.
Let h0 is an output of some Occams algorithm.
Let HC h h is consistent with m examples
Let HL h h (nt)? m?

8
VC Dimension

Let S be a finite subset of X.
A class C shatters S iff SÅ c c 2 C 2S
The VC dimension of C is max S C shatters S

For example, VC dim. of hyperplaines on RN is N
1
9
Growth Function

A growth function for C is defined as
It is known that
The growth function is in polynomial w.r.t. m if
the VC dimension is finite.

10
PAC Learning under Finite VC dim.

Assume that an algorithm output h which is
consistent with m examples.
m to be bounded by polynomial w.r.t. 1/?, 1/?.

11
Structural Risk Minimization

Let us assume the case that target t ? C.
Any hypotheses are not always consistent with C,
i.e. for all h 2 C, P(t ? h) ? 0 in some case.
Thus the best output is
On the other hand, the following equation
consists, where d is VC dim. of C.

Empirical Risk
Select a class that minimizes the left side
depends on VC dim.
12
SRM on SVMs and Margin Maximization

If x R for all x 2 RN and the margin is more
than ?, The VC dim. (d) of hyperplanes is

Thus, maximization of the margin means decreasing
of VC dim.
That decreases the upper bound of risk.

13
Distribution-Dependent Models

Some classes such as DFAs and boolean formulas is
reduced to discrete cube root problem, thus they
are not efficiently PAC learnable.
It is one of the reason of that difficulty that
the PAC model is distribution-free.
On simple distributions, MAT learnable classes
such as DFAs are efficiently PAC learnable.

14
Minimally Adequate Teacher Learning

The Minimally Adequate Teacher (MAT) learning
model is one of the query learning models.
Learning algorithms can use 2 kinds of queries.
membership queries whether the example belongs
to the target or not
equivalence queries whether the hypothesis h
equal to the target t or not, and if not, a
counter-example ( 2 t?h )
A class C is efficiently MAT learnable iff some
algorithms output an equivalent representation in
polynomial time w.r.t. maximal size of
counter-examples and t for all t 2 C.
DFAs is MAT learnable.

15
Kolmogorov Complexity

an universal program
that emulates the program f given y whose input
is z.
The conditional Kolmogorov complexity of x given
y is
For example, Kolmogorov complexity of
without condition is less than write 1
100100 times

16
The Simple Distribution

The simple distribution over 0,1 given a
representation r of c 2 C is

note that

The PACS learning model is the PAC learning model
under simple distribution.

17
PACS and MAT Learning

Teacher for MAT model T_min(t) return
lexicographically smallest one in minimal length
counter-examples, where t is a target concept.
A efficient learning algorithm for MAT model A
The longest query or counter-example using A and
T_min n

Each query or counter-example is computed by the
combination of A and T_min, and the number of its
order.
Thus the Kolmogorov complexity given t is

18
PACS and MAT Learning

The number of x 2 0,1 such that K(xr) lt log2
g(nt) is less than 2g(nt).
For all x 2 0,1 such that K(xr) lt log2 g(nt)
,

Thus the probability that some queries or
counter-examples are not drown is no more than

m is bounded by polynomial w.r.t. n, t, 1/?,
1/?.
C is efficiently PACS learnable by using A and
finding same queries and counter examples form m
examples.

19
Summary