Classifying Parts of Speech Based on Sparse Data PowerPoint PPT Presentation

presentation player overlay

About This Presentation

Transcript and Presenter's Notes

Title: Classifying Parts of Speech Based on Sparse Data

1
Classifying Parts of Speech Based on Sparse Data

Katherine Brainard
2
The Problem

Sparse data has little contextual information
Many words fall into this category
Automatic PoS taggers and finders are useful

3
Approach

Relatively easy to learn categories from frequent
words
Infrequent words often more regular than their
common counterparts
Learn frequent words, then use these to classify
infrequent
Uses clustering for the frequent words

4
Evaluating the Model

Somewhat tricky - want eval function that doesnt
encourage degenerate behavior
Evaluation separated from clustering
Used both bigram probability model and comparison
with already-tagged data

5
Results

Improvement of 36 from delaying processing of
data
About 2.5 times better than classifying
infrequent words into one lump
Using just contextual data produced the best
performance

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user