Classifying Parts of Speech Based on Sparse Data PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Classifying Parts of Speech Based on Sparse Data


1
Classifying Parts of Speech Based on Sparse Data

Katherine Brainard
2
The Problem
  • Sparse data has little contextual information
  • Many words fall into this category
  • Automatic PoS taggers and finders are useful

3
Approach
  • Relatively easy to learn categories from frequent
    words
  • Infrequent words often more regular than their
    common counterparts
  • Learn frequent words, then use these to classify
    infrequent
  • Uses clustering for the frequent words

4
Evaluating the Model
  • Somewhat tricky - want eval function that doesnt
    encourage degenerate behavior
  • Evaluation separated from clustering
  • Used both bigram probability model and comparison
    with already-tagged data

5
Results
  • Improvement of 36 from delaying processing of
    data
  • About 2.5 times better than classifying
    infrequent words into one lump
  • Using just contextual data produced the best
    performance
Write a Comment
User Comments (0)
About PowerShow.com