Classifying Unknown Proper Noun Phrases Without Context - PowerPoint PPT Presentation

About This Presentation
Title:

Classifying Unknown Proper Noun Phrases Without Context

Description:

Classifying Unknown Proper Noun Phrases Without Context Joseph Smarr & Christopher D. Manning Symbolic Systems Program Stanford University April 5, 2002 – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 27
Provided by: josephs94
Category:

less

Transcript and Presenter's Notes

Title: Classifying Unknown Proper Noun Phrases Without Context


1
Classifying Unknown Proper Noun Phrases Without
Context
  • Joseph Smarr Christopher D. Manning
  • Symbolic Systems Program
  • Stanford University
  • April 5, 2002

2
The Problem of Unknown Words
  • No statistics are generated for unknown words ?
    problematic for statistical NLP
  • Same problem for Proper Noun Phrases
  • Also need to bracket entire PNP
  • Particularly acute in domains with large number
    of terms or new words being constantly generated
  • Drug names
  • Company names
  • Movie titles
  • Place Names
  • Peoples Names

3
Proper Noun Phrase Classification
  • Task Given a Proper Noun Phrase (one or more
    words that collectively refer to an entity),
    assign it a semantic class (e.g. drug name,
    company name, etc)
  • Example MUC ENAMEX test (classifying PNPs in
    text as organizations, places, and people)
  • Problem How do we classify unknown PNPs?

4
Existing Techniques for PNP Classification
  • Large, manually constructed lists of names
  • Includes common words (Inc., Dr., etc.)
  • Syntactic patterns in surrounding context
  • XXXX himself ? person
  • profession of/at/with XXXX ? organization
  • Machine learning with word-level features
  • Capitalization, punctuation, special chars, etc.

5
Limitations of Existing Techniques
  • Manually constructed lists and rules
  • Slow/expensive to create and maintain
  • Domain-specific solutions
  • Wont generate to new categories
  • Misses valuable source of information
  • People often classify PNPs by how they look

Cotrimoxazole
Wethersfield
Alien Fury Countdown to Invasion
6
Whats in a Name?
  • Claim If people can classify unknown PNPs
    without context, they must be using the
    composition of the PNP itself
  • Common accompanying words
  • Common letters and letter sequences
  • Number and length of words in PNP
  • Idea Build a statistical generative model that
    captures these features from data

7
Common Words and Letter Sequences
8
Number and Length of Words
9
Generative Model Used for Classification
  • Probabilistic generative model for each category
  • Parameters set from
  • statistics in training data
  • cross-validation on held-out data (20)
  • Standard Bayesian Classification

Predicted-Category(pnp) argmaxc P(cpnp)
argmaxc
P(c)aP(pnpc)
10
Generative Model for Each Category
Length n-gram model and word model P(pnpc)
Pn-gram(word-lengths(pnp))
Pword i?pnp P(wiword-length(wi))
Word model mixture of character n-gram model and
common word model P(wilen) llenPn-gram(wilen)
k/len (1-llen) Pword(wilen)
N-Gram Models deleted interpolation P0-gram(symbo
lhistory) uniform-distribution Pn-gram(sh)
lC(h)Pempirical(sh) (1- lC(h))P(n-1)-gram(sh)
11
Walkthrough Example Alec Baldwin
  • Length sequence 0, 0, 0, 4, 7, 0
  • Words ____Alec , lec Baldwin

Cumulative Log Probability
12
Walkthrough Example Baldwin
Note Baldwin appears both in a persons name and
in a place name
13
Experimental Setup
  • Five categories of Proper Noun Phrases
  • Drugs, companies, movies, places, people
  • Train on 90 of data, test on 10
  • 20 of training data held-out for parameter
    setting (cross validation)
  • 5000 examples per category total
  • Each result presented is average/stdev of 10
    separate train/test folds
  • Three types of tests
  • pairwise 1 category vs. 1 category
  • 1-all 1 cateory vs. union of all other
    categories
  • n-way every category for itself

14
Experimental Results Classification Accuracy
15
Experimental ResultsConfusion Matrix
Predicted Category
drug nyse movie place person
drug nyse movie place person
Correct Category
16
Sources of Incorrect Classification
  • Words that appear in one category drive
    classification in other categories
  • e.g. Delaware misclassified as company because of
    GTE Delaware LP, etc.
  • Inherent ambiguity
  • e.g. movies named after people/places/etc
  • ? Nuremberg ? John Henry
  • ? Love, Inc. ? Prozac Nation

17
Examples of Misclassified PNPs
  • Errors from misleading words
  • Calcium Stanley
  • Best Foods (24 movies with Best, 2 companies)
  • Bloodhounds, Inc.
  • Nebraska (movie One Standing Nebraska)
  • Chris Rock (24 movies with Rock, no other people)
  • Can you classify these PNPs?
  • R C
  • Randall Hopkirk
  • Steeple Aston
  • Nandanar
  • Gerdau

18
Contribution of Model Features
  • Character n-gram is best single feature
  • Word model is good, but subsumed by character
    n-gram
  • Length n-gram helps character n-gram, but not much

19
Effect of Increasing N-Gram Length
character n-gram model
length n-gram model
  • Classification accuracy of n-gram models alone
  • Longer n-grams are useful, but only to a point

20
Effect of Increasing Training Data
  • Classifier approaches full potential with little
    training data
  • Increasing training data even more is unlikely to
    help much

21
Compensating for Word-Length Bias
  • Problem Character n-gram model places more
    emphasis on longer words because more terms get
    multiplied
  • But are longer words really more important?
  • Solution Take (k/length)th root of each words
    probability
  • Treat each word like a single base with an
    ignored exponent
  • Observation Performance is best when kgt1
  • Deviation from theoretical expectation

22
Compensating for Word-Length Bias
23
Generative Models Can Also Generate!
  • Step 1 Stochastically generate word-length
    sequence using length n-gram model
  • Step 2 Generate each word using character n-gram
    model

movie Alien in Oz Dragons The Ever Harlane El
Tombre place Archfield Lee-Newcastleridge Qatad
drug Ambenylin Carbosil DM 49 Esidrine Plus
Base with Moisturalent nyse Downe Financial
Grp PR Host Manage U.S.B. Householding
Ltd. Intermedia Inc.
person Benedict W. Suthberg Elias
Lindbert Atkinson Hugh Grob II
24
Acquiring Proficiency in New Domains
  • Challenge quickly build a high-accuracy PNP
    classifier for two novel categories
  • Example Cheese or Disease?
  • Game show on MTVs Idiot Savants
  • Results 93.5 accuracy within 10 minutes of
    suggesting categories!
  • Not possible with previous methods

25
Conclusions
  • Reliable regularities in the way names are
    constructed
  • Can be used to complement contextual cues (e.g.
    Bayesian prior)
  • Not surprising given conscious process of
    constructing names (e.g. Prozac)
  • Statistical methods perform well without the need
    for domain-specific knowledge
  • Allows for quick generalization to new domains

26
Bonus Does Your Name Look Like A Name?
  • Ron Kaplan
  • Dan Klein
  • Miler Lee
  • Chris Manning / Christopher D. Manning
  • Bob Moore / Robert C. Moore
  • Emily Bender
  • Ivan Sag
  • Chung-chieh Shan
  • Stu Shieber / Stuart M. Shieber
  • Joseph Smarr
  • Mark Stevenson
  • Dominic Widdows
Write a Comment
User Comments (0)
About PowerShow.com