Active Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Active Learning

Description:

The learner constructs the examples from basic units ... Instantaneous information gain from the ith example: 8. Selecting the Most Informative Queries ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 34
Provided by: hani153
Category:

less

Transcript and Presenter's Notes

Title: Active Learning


1
Active Learning
2
Learning from Examples
  • Passive learning
  • A random set of labeled examples

3
Active Learning
  • Active learning
  • The leaner chooses the specific examples to be
    labeled
  • ? The learner works harder, in order to use fewer
    examples

4
Membership Queries
  • The learner constructs the examples from basic
    units
  • Examples of problems that are only solvable under
    this setting (finite automata)
  • ?Problem might get to irrelevant regions of the
    input space

5
Selective Sampling
  • Available are 2 oracles
  • Sample returning unlabeled queries according to
    the input distribution
  • Label given an unlabeled example, returns its
    label
  • Query filtering From the set of unlabeled
    examples, choose the most informative, and query
    for their label.

6
Selecting the Most Informative Queries
  • Input X D (in Rd)
  • Concepts c X? 0,1
  • Bayesian Model C P
  • Version Space ViV(ltx1,c(x1)gtltxi,c(xi)gt)

7
Selecting the Most Informative Queries
  • Instantaneous information gain from the ith
    example

8
Selecting the Most Informative Queries
  • For the next example xi
  • P0 Pr(c(xi)0)

9
Example
  • X 0,1
  • W U
  • Vithe max X value labeled with 0, the min X
    value labeled with 1

10
Example - Expected Prediction Error
  • The final predictors error is proportional to
    the length of the VS segment
  • Both Wfinal and Wtarget are selected uniformly
    from the VS (p 1/L)
  • The error of each such pair is Wfinal - Wtarget
  • ? Using n random examples 1/n
  • But by cutting it in the middle the expected
    error decreases exponentially

11
Query by Committee(Seung, Opper Sompolinsky
1992Freund, Seung, Shamir Tishby)
  • Uses oracles
  • Gibbs(V,x)
  • h ? randp(V)
  • Return h(x)
  • Sample
  • Lable(x)

12
Query by Committee(Seung, Opper Sompolinsky
1992Freund, Seung, Shamir Tishby)
  • While (t lt Tn)
  • x Sample()
  • y1 Gibbs(V,x)
  • y2 Gibbs(V,x)
  • If (y1 ! y2) then
  • Label(x) (and use it to learn and to get the new
    VS)
  • t 0
  • Update Tn
  • endif
  • End
  • Return Gibbs(Vn,x)

13
QBC Finds Better Queries than Random
  • Prob of querying an example X which divides the
    VS to fractions F and 1-F 2F(1-F)
  • Reminder the information gain is H(F)
  • But this is not enough

14
Example
  • W is in 0,12
  • X is a line parallel to one of the axes
  • The error is proportional to the perimeter of the
    VS rectangle

15
  • If for a concept class C
  • VCdim(c) lt 8
  • The expected information gain of queries made by
    QBC is uniformly lower bounded by g gt 0
  • Then, with probability larger than 1-d over the
    target concepts, the sequence of examples and the
    choices made by QBC
  • NSample is bounded
  • NLabel is proportional to log(NSample)
  • The error probability of Gibbs(VQBC,x) lt e

16
QBC will Always Stop
  • The information gain of all the samples
    (Isamples) grows more slowly as the no. of
    samples grows (proportional to dlog(me/d) )
  • The information gain from queries (Iqueries) is
    lower bounded and thus grows linearly
  • IsamplesIqueries
  • The time between two query events grows
    exponentially
  • The algorithm will pass the Tn bound and stop

17
(No Transcript)
18
Isamples
  • Cumulative Information Gain
  • The expected cumulative info. gain

19
Isamples
  • Souers LemmaThe number of different sets of
    labels for m examples (em/d)d
  • Uniform distribution over n labels has the
    maximum entropy
  • ? The max expected info. gain is dlog(em/d)

20
The Error Probability
  • Definition Pr(h(x) ? c(x)) h,cPVS
  • This is exactly the probability of querying a
    sample in QBC
  • ? This is stopping condition in QBC

21
Before We Go Further
  • The basic intuition - gaining more information by
    choosing examples that cut the VS to parts of
    similar size
  • This condition is not sufficient
  • If there exists a lower bound on the expected
    info. gain QBC will work
  • The error bound in QBC is based on the analogy
    between the problem definition and Gibbs, not on
    the VS cutting.

22
But in Practice
  • Proved for linear separators if the sample space
    and VS distributions are uniform.
  • Is the setting realistic?
  • Implementation of Gibbs by Sampling from Convex
    Bodies

23
Kernel QBC
24
What about Noise?
  • In practice labels might be noisy
  • Active learners are sensitive to noise since they
    try to minimize redundancy

25
Noise Tolerant QBC
  • do
  • Let x be a random instance.
  • ?1 rand(posterior)
  • ?2 rand(posterior)
  • If argmax p(yx,?1) ? argmax p(yx,?2) then
  • ask for the label of x.
  • Update the posterior.
  • Until no labels were requested for t consecutive
    instances.
  • Return rand(posterior)

26
SVM Active Learning with Applications to Text
ClassificationTong Koller (2001)
  • Setting pool-based active learning
  • Aim Fast reduction of the VSs size
  • Identifying the query that halves the VS
  • Simple Margin choose the next query as the point
    closest to the current separator minwi
    F(x)
  • MinMax Margin max min(m,m-) to get an max
    split
  • Ratio Margin to get an equal split

27
SVM Active Learning with Applications to Text
ClassificationTong Koller (2001)
  • The VS in SVM is the unit vectors
  • (The data must be separable in the feature space)
  • Points in F ?? hyperplanes in W

28
SVM Active Learning with Applications to Text
ClassificationTong Koller (2001)
29
Results
  • Reuters and newsgroups data
  • Each document is represented by a 105 dimensions
    vector of words frequencies

30
Results
31
Results
32
Whats next
  • Theory meet practice
  • New methods (other than cutting the VS)
  • Generative setting (committee based sampling for
    training probabilistic classifiers,
    Engelson,1995)
  • Interesting applications

33
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com