Efficient Text Categorization with a Large Number of Categories - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Text Categorization with a Large Number of Categories

Description:

Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal Text Categorization How do people deal with a large number of classes? – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 19
Provided by: Rayi154
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Efficient Text Categorization with a Large Number of Categories


1
Efficient Text Categorization with a Large Number
of Categories
  • Rayid Ghani
  • KDD Project Proposal

2
Text Categorization
3
How do people deal with a large number of classes?
  • Use fast multiclass algorithms (Naïve Bayes)
  • Builds one model per class
  • Use Binary classification algorithms (SVMs) and
    break an n class problems into n binary problems
  • What happens with a 1000 class problem?
  • Can we do better?

4
ECOC to the Rescue!
  • An n-class problem can be solved by solving log2n
    binary problems
  • More efficient than one-per-class
  • Does it actually perform better?

5
What is ECOC?
  • Solve multiclass problems by decomposing them
    into multiple binary problems (Dietterich
    Bakiri 1995)
  • Use a learner to learn the binary problems

6
Training ECOC
Testing ECOC
f1 f2 f3 f4 f5
0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0
0 1
A B C D
1 1 1 1 0
X
7
ECOC - Picture
f1 f2 f3 f4 f5
A B C D
0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0
0 1
A
B
C
D
8
ECOC - Picture
f1 f2 f3 f4 f5
A B C D
0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0
0 1
A
B
C
D
9
ECOC - Picture
f1 f2 f3 f4 f5
A B C D
0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0
0 1
A
B
C
D
10
ECOC - Picture
f1 f2 f3 f4 f5
A B C D
0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0
0 1
A
B
C
D
X
1 1 1 1 0
11
This Proposal
Efficiency
NB
Preliminary Results
ECOC
(as used in Berger 99)
Classification Performance
Preliminary Results ECOC reduces the error of
the Naïve Bayes Classifier by 66 with NO
increase in computational cost
12
Proposed Solutions
  • Design codewords that minimize cost and maximize
    performance
  • Investigate the assignment of codewords to
    classes
  • Learn the decoding function
  • Incorporate unlabeled data into ECOC

13
Use unlabeled data with a large number of classes
  • How?
  • Use EM
  • Mixed Results
  • Think Again!
  • Use Co-Training
  • Disastrous Results
  • Think one more time

14
Use Unlabeled data
  • Current learning algorithms using unlabeled data
    (EM, Co-Training) dont work well with a large
    number of categories
  • ECOC works great with a large number of classes
    but there is no framework for using unlabeled
    data

15
Use Unlabeled Data
  • ECOC decomposes multiclass problems into binary
    problems
  • Co-Training works great with binary problems
  • ECOC Co-Train Learn each binary problem in
    ECOC with Co-Training
  • Preliminary Results Not so great! (very
    sensitive to initial labeled documents)

16
What Next?
  • Use improved version of co-training (gradient
    descent)
  • Less prone to random fluctuations
  • Uses all unlabeled data at every iteration
  • Use Co-EM (Nigam Ghani 2000) - hybrid of EM
    and Co-Training

17
Work Plan
  • Collect Datasets ?
  • Codeword Assignment - 2 weeks
  • Learning Decoding 1-2 weeks
  • Using Unlabeled Data - 2 weeks
  • Design Codes - 2 weeks
  • Project Write-up 1 week

18
Summary
  • Use ECOC for efficient text classification with a
    large number of categories
  • Reduce code length without sacrificing
    performance
  • Fix code length and Increase Performance
  • Generalize to domain-independent classification
    tasks involving a large number of categories

19
Testing ECOC
  • To test a new instance
  • Apply each of the n classifiers to the new
    instance
  • Combine the predictions to obtain a binary
    string(codeword) for the new point
  • Classify to the class with the nearest codeword
    (usually hamming distance is used as the distance
    measure)

20
The Decoding Step
  • Standard Map to the nearest codeword according
    to hamming distance
  • Can we do better?

21
The Real Question?
  • Tradeoff between learnability of binary
    problems and the error-correcting power of the
    code

22
Codeword assignment
  • Standard Procedure Assign codewords to classes
    randomly
  • Can we do better?

23
Goal of Current Research
  • Improve classification performance without
    increasing cost

Develop algorithms that increase performance
without affecting code length
Design short codes that perform well
24
Previous Results
  • Performance increases with length of code
  • Gives the same percentage increase in performance
    over NB regardless of training set size
  • BCH Codes gt Random Codes gt Hand-constructed Codes

25
Others have shown that ECOC
  • Works great with arbitrary long codes
  • Longer codes More Error-Correcting Power
    Better Performance
  • Longer codes More Computational Cost

26
ECOC to the Rescue!
  • An n-class problem can be solved by solving log2n
    problems
  • More efficient than one-per-class
  • Does it actually perform better?

27
Previous Results
(Ghani 2000)
Industry Sector Data Set
Naïve Bayes Shrinkage1 ME2 ME/ w Prior3 ECOC 63-bit
66.1 76 79 81.1 88.5
ECOC reduces the error of the Naïve Bayes
Classifier by 66 with no increase in
computational cost
  1. (McCallum et al. 1998) 2,3. (Nigam et
    al. 1999)

28
Design codewords
  • Maximize Performance (Accuracy, Precision ,
    Recall, F1?)
  • Minimize length of codes
  • Search in the space of codewords through gradient
    descent GError Code_Length

29
Codeword Assignment
  • Generate the confusion matrix and use that to
    assign the most confusable classes the codewords
    that are farthest apart
  • Pros
  • Focusing on confusable classes more can help
  • Cons
  • Individual binary problems can be very hard

30
The Decoding Step
  • Weight the individual classifiers according to
    their training accuracies and do weighted
    majority decoding.
  • Pose the decoding as a separate learning problem
    and use regression/Neural Network

1 0 0 1
1 1 0 1
Write a Comment
User Comments (0)
About PowerShow.com