Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006)

About This Presentation

Title:

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006)

Description:

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented by: John Paisley – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 13

Provided by: John4298

Learn more at: http://people.ee.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006)

1
Active Learning with Feedback on Both Features
and InstancesH. Raghavan, O. Madani and R.
JonesJournal of Machine Learning Research 7
(2006)

Presented by John Paisley

2
Outline

Discuss problem
Discuss proposed solution
Discuss results
Conclusion

3
Problem of Paper

Imagine you want to filter junk email via some
classifier and youre willing to help train that
classifier by labeling things, but you want to do
it quickly because youre impatient.
Imagine you want to sort a database of news
articles, etc.
This paper is concerned with trying to speed this
process up, meaning reach a high performance in
fewer iterations.

4
Suggestion of Paper

Traditionally, active learning will query a user
about instances (articles, emails etc) and the
user will provide a label for that instance
(one-vs-rest in this paper).
This paper suggests that the user also be queried
about features (words) and their relevance for
distinguishing classes to speed up the learning
process.
The reason is that, apparently, in typical
applications, all words of a document are used as
features in classification. Therefore the
feature is a very high dimension and, with only a
few labeled data, its hard to build a good
classifier.
By asking about features, the dimensionality is
(effectively) reduced early on with the
nuisance dimensions (effectively) removed.

5
Traditional Active Learning

Several instances are selected at random and
labeled by a user
A model is built (SVM using direct kernel here)
Sequentially, the most uncertain (closest to
boundary and called uncertainty sampling)
instances are selected, labeled, and the model
updated.
The algorithm terminates at some point (when a
high enough level of performance is reached).

6
Their Feature Feedback Addition

(same) Several instances are selected at random
and labeled by a user
(same) A model (SVM using direct kernel here)
is built.
(same) Sequentially, the most uncertain (closest
to boundary and called uncertainty sampling)
instances are selected, labeled, and the model
updated.
Then, the user is shown a list of features
(words) and asked whether they are relevant to
distinguishing this class from others. Their
algorithm then incorporates this in further
training by simply multiplying that dimension by
10 (arbitrary) to increase the impact that
dimension has on classification (because of the
direct kernel I assume)

7
How They Assess Performance (1)

Before humans are involved, they create an
oracle that can rank features by importance (it
has all labels a priori) as determined via
Information Gain
Where P(c) it the probability of the class of
interest, P(t) is the probability of the word of
interest appearing in an article, and P(c,t) is
their joint probability. The larger the IG, the
more informative the word is on determining the
class (e.g. football is informative for sports).

8
How They Assess Performance (2)

They devise their performance metric called
efficiency
F1 is the harmonic mean of the precision and
recall, where precision is the fraction of
(e.g.) articles classified as 1 that are
correct and recall is the fraction of articles
correctly classified as 1 to all articles with
label 1
They set M 1000, assuming that the classifier
will be about perfect at that point and theyre
measuring how far active learning (ACT) is from
that perfection compared with random sampling.

Right Efficiency is defined as one minus blue
area divided by grey area. They only measure
after seeing 42 documents throughout the paper
9
Results with Oracle

These results show the ideal performance of
feature feedback to see if its worthwhile to
begin with.
Basically, they select the top n features that
maximize performance (via Information Gain) and
do active learning, reporting the efficiency
after 42 documents, as well as the F1 score after
7 and 22 documents. The F1 results are upper
bounded by the far right column. The results
indicate that selecting the most informative
features speeds up learning (the uninformative
features are distractions for the classifier in
the early stages when there are only a few
labels).

10
Results with Human

How well can a human label features compared with
the oracle and, if not as well, is it still
beneficial?
Experiment Have a human read an article and show
the top 20 words from the oracle mixed in with
some other words. Have the user mark relevant
or not relevant/dont know for each. Below
shows the human compared with the oracle. Also
shown is the ability of 50 labeled documents
(picked via uncertainty sampling) to select the
top 20 words (via Information Gain) aka,
traditional active learning after 50.
What it says is that after seeing one document, a
human can tell the relevant features better than
the classifier can after 50. Kappa is a measure
of how well the humans agree (which they say is
good).

11
Putting Humans In the Loop

They then took the human responses and simulated
active learning with feature feedback. The
experimenters were shown an article and the
features to respond to (relevant or not) for
that article and they input what the humans of
the previous slide said. UNC is no feature
feedback, ORA is the oracle (correct answers
for the feature queries) and HIL is the human
response (as opposed to oracle).
It says that humans speed up the active learning
process.

12
Conclusions

Knowing what features are relevant at the early
stages of active learning will help speed up the
process of building an accurate classifier.
Far fewer instances will need to be labeled for
the classifier to reach a high performance.
Humans are able to identify these features (in
the case of identifying words for documents)

Write a Comment

User Comments (0)