Word Senses and Word Sense Disambiguation - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Word Senses and Word Sense Disambiguation

Description:

Word Senses and Word Sense Disambiguation. CIS 530 Introduction to NLP ... Example: {chump, fish, fool, gull, mark, patsy, fall guy, sucker, schlemiel, ... – PowerPoint PPT presentation

Number of Views:225
Avg rating:3.0/5.0
Slides: 44
Provided by: mitchel4
Category:

less

Transcript and Presenter's Notes

Title: Word Senses and Word Sense Disambiguation


1
Word Senses and Word Sense Disambiguation
  • CIS 530 Introduction to NLP

2
I. Lexical Meaning Word Sense
  • Slides adapted from slides by
  • Bonnie Dorr, Martha Palmer, David Yarowsky,

3
An Ambiguous Word bank
  • The bank on State Street
  • Two clear senses
  • The rising ground bordering a lake, river or sea
  • An establishment for the custody, loan exchange,
    or issue of money, for the extension of credit,
    and for facilitating the transmission or funds
  • Different senses are not always so easily
    delineated.

4
Word Sense Disambiguation is Important for
Machine Translation
  • Iraq lost the battle.
  • Ilakuka centwey ciessta.
  • Iraq battle lost.
  • John lost his computer.
  • John-i computer-lul ilepelyessta.
  • John computer misplaced.
  • (Korean)

5
WSD Is Required for Speech Synthesis
  • slightly elevated lead levels
  • ? lead role (rhymes with seed) or
  • ? lead mines (rhymes with bed)
  • The speaker produces too little bass
  • ? string bass (rhymes with vase) or
  • ? sea bass (rhymes with lass)

6
French/Spanish Accent Restoration
  • une famille des pecheurs
  • ? pêcheurs (meaning fisherman) or
  • ? pécheurs (meaning sinners)

7
WSD Really Requires Semantic Constraints
  • Iraq lost the battle.
  • Ilakuka centwey ciessta.
  • Iraq battle lost.
  • John lost his computer.
  • John-i computer-lul ilepelyessta.
  • John computer misplaced.
  • Semantic Constraints
  • lose1(Agent, Patient competition) ltgt ciessta
  • lose2 (Agent, Patient physobj) ltgt
    ilepelyessta

8
Lexical Relations I Homonomy
  • A bank holds investments in a custodial account
  • Agriculture is burgeoning on the east bank
  • Variants
  • homophones read vs. red
  • homographs bass vs. bass

9
Lexical Relations II Polysemy
  • The bank is constructed from red brickI withdrew
    the money from the bank
  • Distinguishing polysemy from homonymy is not
    straightforward

10
Word Sense Disambiguation
  • For any given lexeme, can its senses be reliably
    distinguished?
  • Assumes a fixed set of senses for each lexical
    item

11
Lexical Relations III Synonymy
  • What is synonymy?
  • How big is that plane?
  • How large is that plane?
  • Very hard to find true synonyms
  • A big fat apple
  • ?A large fat apple
  • Influences on substitutability
  • subtle shades of meaning differences
  • polysemy
  • register
  • collocational constraints

12
WordNet
  • Most widely used hierarchically organized lexical
    database for English (Fellbaum, 1998)

Demo http//www.cogsci.princeton.edu/wn/
13
Word Sense and OntoNotes
  • Meaning of nouns and verbs are specified using a
    catalog of possible senses
  • All the senses are annotatable at 90 ITA

Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
  • Enter into an official record
  • Wish, purpose or intend to achieve something

14
WSD with OntoNotes verbs
  • Picked 217 verbs with the most number of
    instances annotated with sense groupings
  • 35K instances total
  • WN polysemy of 10.4 reduced to 5.1
  • WN polysemy range 59 to 2
  • Coarse polysemy range 16 to 2
  • Results
  • Average Baseline accuracy 0.6803
  • Average ITA 0.8253
  • Average MaxEnt accuracy 0.8272 (no stat. sign.
    w. ITA)
  • Average SVM accuracy 0.8220 (no stat. sign. w.
    ITA)
  • Also tried other classifiers with worse results

14
15
Format of WordNet Entries
16
Distribution of Senses among WordNet Verbs
17
Lexical Relations in WordNet
18
Synsets in WordNet
  • Example chump, fish, fool, gull, mark, patsy,
    fall guy, sucker, schlemiel, shlemiel, soft
    touch, mug
  • Definition a person who is gullible and easy to
    take advantage of.  
  • Important This exact synset makes up one sense
    for each of the entries listed in the synset.
  • Theoretically, each synset can be viewed as a
    concept in a taxonomy
  • WN represents give as 45 senses, one of which
    is the synset supply, provide, render, furnish.

19
Hyponomy in WordNet
20
II. Decision Lists for Word Sense Disambiguation
  • Slides adapted from slides by
  • David Yarowsky
  • (describing Davids PhD dissertation work)

21
Decision Lists for Homonym Disambiguation
22
Outline of Decision List Algorithm I
23
Step 2 Collect Training Contexts
24
Step 3 Measure Collocational Distributions
25
Step 3 Measure Collocational Distributions
26
Step 4 Sort by Log-Likelihood
27
Step 5 Classify New Data
28
Performance Accent Restoration
29
Performance WSD Machine Translation
30
Performance Speech Synthesis
31
Comparative Evaluation I
32
Comparative Evaluation II
33
An Unsupervised(!) Algorithm
  • Yarowsky, D. Decision Lists for Lexical
    Ambiguity Resolution Application to Accent
    Restoration in Spanish and French.'
  • In Proceedings of the 32nd Annual Meeting of the
    Association for Computational Linguistics. Las
    Cruces, NM, pp. 88-95, 1994.

34
One Sense per discourse hypothesis
  • Words tend to exhibit only one sense in a given
    discourse or document

35
Step 1 Identify all examples of target word
  • Store contexts in initial untagged training set

36
Step 2 Tag examples
  • For each sense, identify a small set of labelled
    training examples
  • Use seed words Plant manufacturing vs. life

37
Sample Initial State after Step 2
38
Step 3a Run supervised algorithm
39
OSPD constraint
40
Steps 3b, 3c Apply classifier ( OSPD) to all
41
Iterate until Done.
42
Final Decision List
43
Evaluation
Write a Comment
User Comments (0)
About PowerShow.com