Improving Text Classification by Shrinkage in a Hierarchy of Classes - PowerPoint PPT Presentation

About This Presentation
Title:

Improving Text Classification by Shrinkage in a Hierarchy of Classes

Description:

Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y. Ng – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 19
Provided by: Andrew1289
Category:

less

Transcript and Presenter's Notes

Title: Improving Text Classification by Shrinkage in a Hierarchy of Classes


1
Improving Text Classification by Shrinkage in
aHierarchy of Classes
  • Andrew McCallum
  • Just Research CMU
  • Tom Mitchell
  • CMU

Roni Rosenfeld CMU Andrew Y. Ng MIT AI Lab
2
  • The Task Document Classification
  • (also Document Categorization, Routing or
    Tagging)
  • Automatically placing documents in their correct
    categories.

Testing Data
(Crops)
Categories
Magnetism
Relativity
Evolution
Botany
Irrigation
Crops
Training Data
corn wheat silo farm grow...
corn tulips splicing grow...
water grating ditch farm tractor...
selection mutation Darwin Galapagos DNA...
...
...
3
The Idea Shrinkage / Deleted
Interpolation We can improve the parameter
estimates in a leaf by averaging them with the
estimates in its ancestors.
(Crops)
Testing Data
Science
Physics
Biology
Agriculture
Categories
Magnetism
Relativity
Evolution
Botany
Irrigation
Crops
Training Data
corn wheat silo farm grow...
corn tulips splicing grow...
water grating ditch farm tractor...
selection mutation Darwin Galapagos DNA...
...
...
4
A Probabilistic Approach toDocument
Classification
Naïve Bayes
where cj is a class, d is a document, wdi is the
i th word of document d
Maximum a posteriori estimate of Pr(wc), with a
Dirichlet prior, a1 (AKA Laplace smoothing)
where N(w,d) is number of times word w occurs in
document d.
5
Shrinkage / Deleted Interpolation
James and Stein, 1961 / Jelinek and Mercer,
1980
(Uniform)
Science
Physics
Biology
Agriculture
Magnetism
Relativity
Evolution
Botany
Irrigation
Crops
6
Learning Mixture Weights
Learn the ls via EM, performing the E-step with
leave-one-out cross-validation.
Uniform
E-step
Use the current ls to estimate the degree to
which each node was likely to have generated the
words in held out documents.
Science
Agriculture
M-step
Use the estimates to recalculate new values for
the ls.
Crops
corn wheat silo farm grow...
7
Learning Mixture Weights
E-step
M-step
8
Newsgroups Data Set
(Subset of Ken Langs 20 Newsgroups set)
computers
religion
sport
politics
motor
mac
guns
atheism
misc
misc
ibm
X
auto
baseball
mideast
motorcycle
hockey
graphics
christian
windows
15 classes, 15k documents,1.7 million words, 52k
vocabulary
9
Newsgroups HierarchyMixture Weights
10
Newsgroups HierarchyMixture Weights
235 training documents (15/class)
7497 training documents (500/class)
11
Industry Sector Data Set
www.marketguide.com
(11)
transportation
utilities
consumer
energy
services
...
...
...
water
coal
electric
gas
integrated
air
misc
film
appliance
communication
furniture
railroad
water
trucking
oilgas
71 classes, 6.5k documents,1.2 million words,
30k vocabulary
12
Industry Sector Classification Accuracy
13
Newsgroups Classification Accuracy
14
Yahoo Science Data Set
www.yahoo.com/Science
(30)
agriculture
biology
physics
CS
space
...
...
...
...
...
dairy
AI
botany
cell
courses
crops
craft
magnetism
HCI
missions
agronomy
evolution
forestry
relativity
264 classes, 14k documents,3 million words, 76k
vocabulary
15
Yahoo Science Classification Accuracy
16
Pruning the tree for computational efficiency
www.marketguide.com
(11)
transportation
utilities
consumer
energy
services
...
...
...
water
coal
electric
gas
integrated
air
misc
film
appliance
communication
furniture
railroad
water
trucking
oilgas
17
Related Work
  • Shrinkage in Statistics
  • Stein 1955, James Stein 1961
  • Deleted Interpolation in Language Modeling
  • Jelinek Mercer 1980, Seymore Rosenfeld
    1997
  • Bayesian Hierarchical Modeling for n-grams
  • MacKay Peto 1994
  • Class hierarchies for text classification
  • Koller Sahami 1997
  • Using EM to set mixture weights in a hierarchical
    clustering model for unsupervised learning
  • Hofmann Puzicha 1998

18
Conclusions
  • Shrinkage in a hierarchy of classes can
    dramatically improve classification accuracy
    (29)
  • Shrinkage helps especially when training data is
    sparse. In models more complex than naïve Bayes,
    it should be even more helpful.
  • The hierarchy can be pruned for exponential
    reduction in computation necessary for
    classification only minimal loss of accuracy.

19
Future Work
  • Learning hierarchies that aid classification.
  • Using more complex generative models.
  • Capturing word dependancies
  • Clustering words in each ancestor
Write a Comment
User Comments (0)
About PowerShow.com