Improving Text Classification by Shrinkage in a Hierarchy of Classes

About This Presentation

Title:

Improving Text Classification by Shrinkage in a Hierarchy of Classes

Description:

Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y. Ng – PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 19

Provided by: Andrew1289

Learn more at: https://people.cs.umass.edu

Category:

more less

Transcript and Presenter's Notes

Title: Improving Text Classification by Shrinkage in a Hierarchy of Classes

1
Improving Text Classification by Shrinkage in
aHierarchy of Classes

Andrew McCallum
Just Research CMU
Tom Mitchell
CMU

Roni Rosenfeld CMU Andrew Y. Ng MIT AI Lab
2

The Task Document Classification
(also Document Categorization, Routing or
Tagging)
Automatically placing documents in their correct
categories.

Testing Data
(Crops)
Categories
Magnetism
Relativity
Evolution
Botany
Irrigation
Crops
Training Data
corn wheat silo farm grow...
corn tulips splicing grow...
water grating ditch farm tractor...
selection mutation Darwin Galapagos DNA...
...
...
3
The Idea Shrinkage / Deleted
Interpolation We can improve the parameter
estimates in a leaf by averaging them with the
estimates in its ancestors.
(Crops)
Testing Data
Science
Physics
Biology
Agriculture
Categories
Magnetism
Relativity
Evolution
Botany
Irrigation
Crops
Training Data
corn wheat silo farm grow...
corn tulips splicing grow...
water grating ditch farm tractor...
selection mutation Darwin Galapagos DNA...
...
...
4
A Probabilistic Approach toDocument
Classification
Naïve Bayes
where cj is a class, d is a document, wdi is the
i th word of document d
Maximum a posteriori estimate of Pr(wc), with a
Dirichlet prior, a1 (AKA Laplace smoothing)
where N(w,d) is number of times word w occurs in
document d.
5
Shrinkage / Deleted Interpolation
James and Stein, 1961 / Jelinek and Mercer,
1980
(Uniform)
Science
Physics
Biology
Agriculture
Magnetism
Relativity
Evolution
Botany
Irrigation
Crops
6
Learning Mixture Weights
Learn the ls via EM, performing the E-step with
leave-one-out cross-validation.
Uniform
E-step
Use the current ls to estimate the degree to
which each node was likely to have generated the
words in held out documents.
Science
Agriculture
M-step
Use the estimates to recalculate new values for
the ls.
Crops
corn wheat silo farm grow...
7
Learning Mixture Weights
E-step
M-step
8
Newsgroups Data Set
(Subset of Ken Langs 20 Newsgroups set)
computers
religion
sport
politics
motor
mac
guns
atheism
misc
misc
ibm
X
auto
baseball
mideast
motorcycle
hockey
graphics
christian
windows
15 classes, 15k documents,1.7 million words, 52k
vocabulary
9
Newsgroups HierarchyMixture Weights
10
Newsgroups HierarchyMixture Weights
235 training documents (15/class)
7497 training documents (500/class)
11
Industry Sector Data Set
www.marketguide.com
(11)
transportation
utilities
consumer
energy
services
...
...
...
water
coal
electric
gas
integrated
air
misc
film
appliance
communication
furniture
railroad
water
trucking
oilgas
71 classes, 6.5k documents,1.2 million words,
30k vocabulary
12
Industry Sector Classification Accuracy
13
Newsgroups Classification Accuracy
14
Yahoo Science Data Set
www.yahoo.com/Science
(30)
agriculture
biology
physics
CS
space
...
...
...
...
...
dairy
AI
botany
cell
courses
crops
craft
magnetism
HCI
missions
agronomy
evolution
forestry
relativity
264 classes, 14k documents,3 million words, 76k
vocabulary
15
Yahoo Science Classification Accuracy
16
Pruning the tree for computational efficiency
www.marketguide.com
(11)
transportation
utilities
consumer
energy
services
...
...
...
water
coal
electric
gas
integrated
air
misc
film
appliance
communication
furniture
railroad
water
trucking
oilgas
17
Related Work

Shrinkage in Statistics
Stein 1955, James Stein 1961
Deleted Interpolation in Language Modeling
Jelinek Mercer 1980, Seymore Rosenfeld
1997
Bayesian Hierarchical Modeling for n-grams
MacKay Peto 1994
Class hierarchies for text classification
Koller Sahami 1997
Using EM to set mixture weights in a hierarchical
clustering model for unsupervised learning
Hofmann Puzicha 1998

18
Conclusions

Shrinkage in a hierarchy of classes can
dramatically improve classification accuracy
(29)
Shrinkage helps especially when training data is
sparse. In models more complex than naïve Bayes,
it should be even more helpful.
The hierarchy can be pruned for exponential
reduction in computation necessary for
classification only minimal loss of accuracy.