Effective MultiLabel Active Learning for Text Classification - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Effective MultiLabel Active Learning for Text Classification

Description:

Related Work. SVM-Based Active Learning for Multi-Label Text Classification. Experiments ... The model is trained on a set of randomly labeled data ... – PowerPoint PPT presentation

Number of Views:267
Avg rating:3.0/5.0
Slides: 24
Provided by: Emi2129
Category:

less

Transcript and Presenter's Notes

Title: Effective MultiLabel Active Learning for Text Classification


1
Effective Multi-Label Active Learning for Text
Classification
  • Bishan Yang1, Jian-Tao Sun2, Tengjiao Wang1, and
    Zheng Chen2

Computer Science Department, Peking
University1 Microsoft Research Asia2
KDD 2009, Paris
2
Outline
  • Motivation
  • Related Work
  • SVM-Based Active Learning for Multi-Label Text
    Classification
  • Experiments
  • Summary

3
Motivation
  • Text classification is everywhere
  • Web search
  • News classification
  • Email classification
  • Many text data are multi-labeled

Business
Politics
Travel
World news
Entertainment
Local news

4
Labeling Effort is Huge
  • Supervised learning approach
  • The model is trained on a set of randomly labeled
    data
  • Requires a sufficient amount of labeled data to
    ensure the quality of the model.

The more categories, the more judging effort for
each document, and more data needed to be
labeled.
xxxxxxxxxxxxxxxxxxxxxxxx
C2
C4
C1
C3
C5

xxxxxxxxxxxxxxxxxxxxxxxx
C2
C4
C1
C3
C5

xxxxxxxxxxxxxxxxxxxxxxxx
C2
C4
C1
C3
C5



5
Active Learning Reduce Labeling Effort
Train Classifier
Select an optimal set from
Selection Strategy
Data Pool
With an effective selection strategy, active
learner can obtain comparable accuracy with
supervised learner using much less labeled
data.
Important for multi-label text classification.
Query for true labels
Augment the labeled set
6
Challenges for Multi-Label Active Learning
  • How to select the most informative multi-labeled
    data?
  • Use selection strategy for single-label case?
    No
  • E.g.

x1
0.8
0.5
0.1
x2 is more informative?
C3
C1
C2
What about x1 actually has two labels?
0.6
0.1
0.1
x2
C3
C1
C2
7
Related Work
  • Single-label Active learning
  • Uncertainty sampling SIGIR94, JMLR05
  • Aims to label the most uncertain data
  • Expected-error reduction NIPS95, ICML01,
    ICCV03
  • Labels data to minimize the expected error
  • Committee-based COLT92, JMLR02
  • Labels data which has the largest disagreement
    among several committee members (classifiers)
    from the version space
  • Multi-label active learning
  • BinMin Springer06
  • Minimizes the loss on the most uncertain category
    for each data
  • MML ICIP04
  • Optimize the mean of the SVM hinge loss for the
    predicted classes
  • Two-dimensional active learning ICCV08,
    TPAMI08
  • Minimize the classification error on
    picture-label pairs

8
Our approach SVM-Based Active Learning for
Multi-Label Text Classification
  • Optimization goal
  • Maximize the reduction of the expected model loss

if x belongs to category , ,
otherwise, .
9
Sample Selection Strategy with SVM
  • Two main issues
  • How to measure the loss reduction of the
    multi-label classifier?
  • How to provide a good probability estimation for
    the conditional probability?

probability estimation
loss reduction
10
Estimation of Loss Reduction
  • Decompose the multi-label problem into several
    binary classifiers
  • For each binary classifier, the model loss is
    measured by the size of the version space.
  • SVM version space S. Tong 02
  • is the parameter space. The size of a
    version space is defined as the surface area of
    the hypersphere in .

11
Estimation of Loss Reduction (Cont.)
  • With version space duality, the loss reduction
    rate can be approximated by using the SVM output
    margin
  • Maximize the sum of loss reduction rate for all
    binary classifiers

loss of binary classifier built on ,
associated with class
If f correctly predict x, then f(x) ,
uncertainty If f does not correctly predict x,
Then f(x) , uncertainty
size of the version space for classifier
if x belongs to class i, then ,
otherwise
12
Probability Estimation
  • Intractable to directly compute the expected loss
    function
  • Limited training data
  • Large number of possible label vectors
    for each x
  • Approximate by the loss function with the largest
    conditional probability .

the label vector with the largest conditional
probability
13
How to predict ?
  • Main ideas
  • First build a classification model to predict the
    possible label number each data may have.
  • Then determine the label vector based on the
    prediction result.

14
How to predict ? (Cont.)
For each x, sort the probabilities in decreased
order and normalized to make their sum equals 1.
Assign probability output for each class
Train Logistic Regression Model
For each unlabeled data, predict the
probabilities of having different number of
labels.
Features Label the true label number of x
If the label number with the largest probability
is j, then
15
Experiments
  • Data sets
  • RCV1-V2 D. D. Lewis 04
  • Reuters newswire stories
  • Yahoos webpage collection N. Ueda 02, H. Kazawa
    05
  • hyperlinks from Yahoo!s top directory

16
Experiment Setup
  • Comparing methods
  • MMC (Maximum loss reduction with Maximal
    Confidence)
  • BinMin
  • MML
  • Random
  • SVMLight T. Joachims 02 is used as the based
    classifier.
  • Performance measures
  • Micro-Average F1 score

are the predicted labels
17
Results on RCV1-V2 Data set
  • Compare the label prediction methods
  • The proposed prediction method
  • Scut D. D. Lewis 04
  • Tune threshold for each class
  • SCut (threshold0)

18
Results on RCV1-V2 Data set (Cont.)
  • Initial labeled set 500 examples
  • 50 iterations, S 20

19
Results on RCV1-V2 Data set (Cont.)
  • Vary the size of initial labeled set, 50
    iterations, S 20

20
Results on RCV1-V2 Data set (Cont.)
  • Vary the sampling size per run initial labeled
    set 500 examples
  • Stop after adding 1,000 labeled data

21
Results on Yahoo! Data set
  • Initial labeled set 500
  • examples
  • 50 iterations, S 50

22
Summary
  • Multi-Label Active Learning for Text
    Classification
  • Important to reduce human labeling effort
  • Challenge task
  • SVM-based Multi-Label Active Learning
  • Optimize loss reduction rate based on SVM version
    space
  • Effective label prediction method
  • Successfully reduce labeling effort on real-world
    datasets
  • Future work
  • More efficient evaluation on the unlabeled pool
  • More multi-label classification tasks e.g. image
    classification

23
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com