A model of Inductive Bias Learning - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

A model of Inductive Bias Learning

Description:

A model of Inductive Bias Learning. Jonathan Baxter. Frans Oliehoek faolieho_at_science.uva.nl ... Model for learning the inductive bias ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 23
Provided by: Fra5250
Category:

less

Transcript and Presenter's Notes

Title: A model of Inductive Bias Learning


1
A model of Inductive Bias Learning
  • Jonathan Baxter
  • Frans Oliehoek
  • ltfaolieho_at_science.uva.nlgt

2
Overview of presentation
  • Problem selecting the inductive bias
  • Environment of related tasks
  • Revision PAC-learning model
  • Bias learning model
  • Example feature learning
  • The covering numbers (?)
  • Conclusion
  • Questions Discussion

3
Problem selecting the inductive bias
  • Important question
  • COIL example
  • How to choose the hypothesis space?
  • Large enough to contain solution
  • Small enough to generalize well
  • Model for learning the inductive bias
  • Assumption learner is embedded in environment
    of related tasks

4
Environment of related tasks
  • Idea learning a bias for related tasks
  • Less examples required per task
  • Bias appropriate for new tasks of same type
  • Examples
  • Handwritten character recognition
  • 1 task distinguish A from all characters
  • Pre-processing good for all different tasks
  • Recognizing n faces
  • Learning a bias that is good for recognizing new
    faces

5
Revision PAC-learning model
  • Input, output space X, Y
  • P prob. distr. on X Y
  • defines the (non-deterministic) task
  • H hypothesis space, h X?Y
  • l loss function. l YY?R
  • Training error erz(h)
  • training set z ((x1,y1),,(xm,ym))
  • Generalization error erP(h)

6
Revision PAC-learning model - 2
  • Upper bound on number of examples needed for a
    certain generalization error by VC-dimension of H
  • m 1/e ( 4log2(2/d) 8VC(H) log2(13 /e) )
  • Gives condition under which erP(h) is likely to
    be small, but is no guarantee H should still
    contain a good hypothesis!
  • Bias selection of hypothesis space H

7
Bias learning model
  • Goal learning a good bias
  • find an appropriate H for
  • the environment of related tasks
  • prob. distr. P on X Y is a task
  • Q is prob. distr. on P
  • what tasks the learner is likely to see
  • gives environment (P,Q)
  • Hypothesis space family H H

8
1 task vs. bias learning
  • find h ? H
  • erP(h)
  • erz(h)
  • z ((x1,y1),,(xm,ym))
  • sample complexity bounded by VC-dimension
  • find H ? H
  • erQ(H)
  • erz(H)
  • z
  • (x11,y11) (x1m,y1m)
  • .
  • (xn1,yn1) (xnm,ynm)
  • bounded by covering numbers

9
Uniform convergence for bias learners
  • Covering numbers
  • capacities related to lower bound on
    generalization error and to average loss
    functions over n hypotheses
  • Sample complexity bounds
  • Given
  • environment (P, Q)
  • (n,m) sample z
  • n gt
  • m gt
  • with probability 1-d, all H ? H satisfy erQ(H)
    erz(H) e

10
Implications
  • Any bias the learner selects can bound erQ(H)
  • In order to learn bias such that
  • erQ(H) erz(H) e, for all H ? H
  • Both m and n need to be sufficiently large
  • When a H ? H with small erz(H) has been learned,
    this H can be used to learn new related tasks
    with improved bounds
  • Fix d, e Examples required per task, m,
    decreases when number of tasks, n, increases. ?
    sharing information between tasks

11
Example feature learning
  • Feature learning a bias learning problem
  • Selecting strong features
  • f X?V, maps input to space V of lower dimension
  • F f , set of all feature maps
  • Then applying classification (regr., etc.)
  • g V ? Y, g ? G
  • G is a class of functions (H relative to V)
  • H Gf gf g ? G for each f
  • H Gf f ? F

12
Feature learning 2
  • H Hw
  • H has W parameters (vij, uij), w is the vector of
    parameters
  • Feature map learns k features, using neural net
    of h hidden units

13
Feature learning 3
  • We want to learn a good bias
  • z (n,m) sample
  • locate a Hw with a small erz(Hw)
  • erz(Hw) bla
  • gradient decent over w and (a1,,ak1)
  • What n, m for good generalization?
  • given by theorem, but what about capacities?
  • for squared loss

14
The covering numbers
  • Convergence theorem depends on the covering
    numbers
  • Characteristics of H similar to the VC dimension
    of H
  • We start with

15
The covering numbers 2
  • For each H ? H
  • function that maps from task P to lower bound
    of gen. error
  • set of all these functions
  • pseudo-metric
  • difference in gener. error for distribution Q

16
The covering numbers 3
  • e-cover of
  • is a set
  • The size of the smallest e-cover
  • Now the capacity of H is

17
Covering numbers 4
  • average error of n hypothesis on n different
    tasks
  • all of these functions for a certain H
  • the union over the hyp. family

18
Covering numbers 5
  • pseudo-metric
  • difference in error over a fixed vector of tasks
    P
  • again smallest e-cover
  • the capacity of

19
Feature learning contd
  • For the network used it can be shown
  • Therefore, if
  • with probability 1 - d any Hw satisfies
  • erQ(H) erz(H) e

20
Choosing the Hypothesis family
  • Choosing hyp. space family, H
  • which to select?
  • hyper-bias
  • claimed to be easier

21
Conclusions
  • Formal model to bias learning
  • Assumption learner embedded in environment of
    related tasks
  • Bounds on sample complexity
  • First step to formal model hierarchical learning

22
Discussion questions
  • Who starts?
Write a Comment
User Comments (0)
About PowerShow.com