Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006)

Description:

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented by: John Paisley – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 13
Provided by: John4298
Category:

less

Transcript and Presenter's Notes

Title: Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006)


1
Active Learning with Feedback on Both Features
and InstancesH. Raghavan, O. Madani and R.
JonesJournal of Machine Learning Research 7
(2006)
  • Presented by John Paisley

2
Outline
  • Discuss problem
  • Discuss proposed solution
  • Discuss results
  • Conclusion

3
Problem of Paper
  • Imagine you want to filter junk email via some
    classifier and youre willing to help train that
    classifier by labeling things, but you want to do
    it quickly because youre impatient.
  • Imagine you want to sort a database of news
    articles, etc.
  • This paper is concerned with trying to speed this
    process up, meaning reach a high performance in
    fewer iterations.

4
Suggestion of Paper
  • Traditionally, active learning will query a user
    about instances (articles, emails etc) and the
    user will provide a label for that instance
    (one-vs-rest in this paper).
  • This paper suggests that the user also be queried
    about features (words) and their relevance for
    distinguishing classes to speed up the learning
    process.
  • The reason is that, apparently, in typical
    applications, all words of a document are used as
    features in classification. Therefore the
    feature is a very high dimension and, with only a
    few labeled data, its hard to build a good
    classifier.
  • By asking about features, the dimensionality is
    (effectively) reduced early on with the
    nuisance dimensions (effectively) removed.

5
Traditional Active Learning
  1. Several instances are selected at random and
    labeled by a user
  2. A model is built (SVM using direct kernel here)
  3. Sequentially, the most uncertain (closest to
    boundary and called uncertainty sampling)
    instances are selected, labeled, and the model
    updated.
  4. The algorithm terminates at some point (when a
    high enough level of performance is reached).

6
Their Feature Feedback Addition
  1. (same) Several instances are selected at random
    and labeled by a user
  2. (same) A model (SVM using direct kernel here)
    is built.
  3. (same) Sequentially, the most uncertain (closest
    to boundary and called uncertainty sampling)
    instances are selected, labeled, and the model
    updated.
  4. Then, the user is shown a list of features
    (words) and asked whether they are relevant to
    distinguishing this class from others. Their
    algorithm then incorporates this in further
    training by simply multiplying that dimension by
    10 (arbitrary) to increase the impact that
    dimension has on classification (because of the
    direct kernel I assume)

7
How They Assess Performance (1)
  • Before humans are involved, they create an
    oracle that can rank features by importance (it
    has all labels a priori) as determined via
    Information Gain
  • Where P(c) it the probability of the class of
    interest, P(t) is the probability of the word of
    interest appearing in an article, and P(c,t) is
    their joint probability. The larger the IG, the
    more informative the word is on determining the
    class (e.g. football is informative for sports).

8
How They Assess Performance (2)
  • They devise their performance metric called
    efficiency
  • F1 is the harmonic mean of the precision and
    recall, where precision is the fraction of
    (e.g.) articles classified as 1 that are
    correct and recall is the fraction of articles
    correctly classified as 1 to all articles with
    label 1
  • They set M 1000, assuming that the classifier
    will be about perfect at that point and theyre
    measuring how far active learning (ACT) is from
    that perfection compared with random sampling.

Right Efficiency is defined as one minus blue
area divided by grey area. They only measure
after seeing 42 documents throughout the paper
9
Results with Oracle
  • These results show the ideal performance of
    feature feedback to see if its worthwhile to
    begin with.
  • Basically, they select the top n features that
    maximize performance (via Information Gain) and
    do active learning, reporting the efficiency
    after 42 documents, as well as the F1 score after
    7 and 22 documents. The F1 results are upper
    bounded by the far right column. The results
    indicate that selecting the most informative
    features speeds up learning (the uninformative
    features are distractions for the classifier in
    the early stages when there are only a few
    labels).

10
Results with Human
  • How well can a human label features compared with
    the oracle and, if not as well, is it still
    beneficial?
  • Experiment Have a human read an article and show
    the top 20 words from the oracle mixed in with
    some other words. Have the user mark relevant
    or not relevant/dont know for each. Below
    shows the human compared with the oracle. Also
    shown is the ability of 50 labeled documents
    (picked via uncertainty sampling) to select the
    top 20 words (via Information Gain) aka,
    traditional active learning after 50.
  • What it says is that after seeing one document, a
    human can tell the relevant features better than
    the classifier can after 50. Kappa is a measure
    of how well the humans agree (which they say is
    good).

11
Putting Humans In the Loop
  • They then took the human responses and simulated
    active learning with feature feedback. The
    experimenters were shown an article and the
    features to respond to (relevant or not) for
    that article and they input what the humans of
    the previous slide said. UNC is no feature
    feedback, ORA is the oracle (correct answers
    for the feature queries) and HIL is the human
    response (as opposed to oracle).
  • It says that humans speed up the active learning
    process.

12
Conclusions
  • Knowing what features are relevant at the early
    stages of active learning will help speed up the
    process of building an accurate classifier.
  • Far fewer instances will need to be labeled for
    the classifier to reach a high performance.
  • Humans are able to identify these features (in
    the case of identifying words for documents)
Write a Comment
User Comments (0)
About PowerShow.com