Distributional Clustering of Words for Text Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Distributional Clustering of Words for Text Classification

Description:

Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR 98 Distributional Clustering Word similarity based on ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 12
Provided by: nitina6
Category:

less

Transcript and Presenter's Notes

Title: Distributional Clustering of Words for Text Classification


1
Distributional Clustering of Words for Text
Classification
  • L. Douglas Baker
  • Andrew Kachites McCallum
  • SIGIR98

2
Distributional Clustering
  • Word similarity based on class label distribution
  • puck and goalie
  • team

3
Distributional Clustering
  • Clustering words based on class distribution -
    (supervised)
  • Similarity between wt ws?similarity between
    P(Cwt) P(Cws)
  • Information theoretic measure to calculate
    similarity between distributions
  • Kullback-Leibler divergence to the mean

4
Distributional Clustering
Class 8 Autos and Class 9 Motorcycles
5
Distributional Clustering
6
Kullback-Leibler Divergence
Here,
D is asymmetric and D?infinity when P(y)0 and
P(x)?0
Also, D 0
7
Kullback-Leibler Divergence
Where,
Jensen-Shannon Divergence is a special case of
symmetrised KL-Divergence. P(wt)P(ws)0.5
8
Clustering Algorithm
Characteristics -Greedy Aggressive -Local
Optimal -Hard Clustering -Agglomerative
9
Experiments
  • Dataset
  • 20 Newsgroups
  • Reuters-21578
  • Yahoo Science Hierarchy
  • Compared with
  • Supervised Latent Semantic indexing
  • Class-based clustering
  • Feature selection by mutual information with the
    class variable
  • Feature selection by Markov-blanket method
  • Classifier NBC

10
Results
11
Conclusion
  • Useful semantic word clusterings
  • Higher classification accuracy
  • Smaller classification models
  • Word clustering vs. feature selection ??
  • What if the data is
  • Noisy??
  • Sparse??
Write a Comment
User Comments (0)
About PowerShow.com