WORD SENSE DISAMBIGUATION - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

WORD SENSE DISAMBIGUATION

Description:

finance fashion. banking telecommunication. work music. business channel ... requires a lot of work'; 'no schools offer graduate study in interior design' ... – PowerPoint PPT presentation

Number of Views:312
Avg rating:3.0/5.0
Slides: 53
Provided by: basakm
Category:

less

Transcript and Presenter's Notes

Title: WORD SENSE DISAMBIGUATION


1
WORD SENSE DISAMBIGUATION
  • Basak Mutlum
  • ECOE - 20030807

2
OUTLINE
  • Problem definition of Word Sense Disambiguation
  • WSD Approaches
  • Our Approach
  • An Example
  • Evaluation and Results
  • Related Work
  • Conclusion

3
INTRODUCTION
  • Problem One word many senses
  • most words have multiple meanings or senses ?
    there is ambiguity about how they are to be
    interpreted.
  • Ex The money is in the bank. Riverbank?
    Financial institution? Row of items?

4
AMBIGUITY
  • How many senses does the noun stock have?
  • 17 senses
  • Verbs are more ambiguous.
  • How many senses does the verb fall have?
  • 32 senses

5
Word Sense Disambiguation (WSD)
  • Highly ambiguous words cause problems for Natural
    Language Processing applications.
  • Task to determine which of the senses of an
    ambiguous word is invoked in a particular use of
    the word.

6
Word Sense
  • A meaning of a word in WordNet.
  • A word sense is an association of a particular
    word form with a particular concept.
  • Sense-tagged data means data which is
    semantically annotated with senses based on a
    sense inventory such as WordNet.
  • c1
  • w1
  • c2
  • w2
  • c3

s11
s12
s21
s22
7
Applications of WSD
  • Machine Translation
  • Information Retrieval
  • Speech Processing
  • Text Processing
  • Grammatical Analysis
  • Content Analysis

8
Methodologies for WSD
  • Supervised Disambiguation
  • During the training sense-tagged data is used.
  • Unsupervised Disambiguation
  • During the training, the sense label of a word
    occurrence is not known.

9
Supervised WSD
  • Aim To build a classifier from the sense-tagged
    corpus which correctly classifies new cases.
  • Machine Learning algorithms are used
  • Naïve Bayes
  • Decision Lists
  • Decision Trees
  • K-nearest Neighbor
  • Neural Networks

10
Naïve Bayes
  • Where S is the set of senses, and V is the
    context of the ambiguous word
  • The probability of observing the conjunction of
    attributes is just the product of the
    probabilities for the individual attributes based
    on the assumption that the attribute values are
    conditionally independent given the target value.
  • P(s) is found by counting the frequency of the
    sense s in the training data.

11
Decision Lists
  • Ambiguous words have a pointer to a decision
    list.
  • This given list is searched for the highest
    ranking match in the word's context, and a sense
    is returned for that match.
  • If all entries in a decision list fail to match
    in a particular context, a default value is used.

12
Supervised WSD (cont.)
  • Naïve Bayes has been frequently applied in WSD
    with good results (Pedersen, 2000)
  • Gale, Church and Yarowsky (1992a 1992b) used a
    variant of Bayes ratio on six ambiguous nouns and
    reports 90 accuracy.
  • Mooney (1996) reported that Naïve Bayes and
    neural networks achieved the highest performance
    among six classifiers
  • Yarowsky (2000) makes use of hierarchical
    decision lists and achieves top performance in
    the SENSEVAL framework on the 36 test words

13
Resources for Sense Tagged Data
  • WordNet
  • Hand-build (but large) hierarchy of word senses
  • Basically a hierarchical thesaurus
  • SensEval
  • A WSD competition, of which there have been 3
    iterations
  • Training / test sets for a wide range of words,
    difficulties, and parts-of-speech
  • SemCor
  • A big chunk of the Brown corpus annotated with
    WordNet senses
  • OtherResources
  • The Open Mind Word Expert
  • Parallel texts

14
Problems of Supervised WSD
  • Lack of sense-tagged data
  • Bootstrapping
  • Yarowsky (1995)
  • Parallel Corpora ambiguities are resolved in
    translation
  • Brown et al. 1991b
  • Ng et al. (2003)
  • A word-aligned parallel corpus can be viewed as a
    partially sense-tagged corpus
  • Gale et al. 1992

15
Data Sparseness
  • Semcor 200.000 words
  • In our system 100-million-word corpus
  • In English there are 100.000 words
  • A word has 3.22 senses in WordNet 2.0 on average.
  • A typical user uses 10.000-50.000 words in daily
    life
  • In the Internet 8 billion sites for unlabeled
    data

16
Unsupervised WSD
  • No sense-tagged data is used.
  • Yarowsky (1995)
  • one sense per discourse the sense of a word
    is highly consistent within a document
  • one sense per collocation A word reoccurring
    in collocation with the same word will almost
    surely have the same sense
  • Lesk (1986) used a dictionary Yarowsky (1992)
    used a thesaurus
  • Use of a parallel corpus (Brown et al. 1991) or
    a bilingual dictionary (Dagan and Itai 1994)

17
Word Sense Discrimination
  • In word sense discrimination, the goal is to
    discriminate senses without having a predefined
    set of senses to choose from.
  • Word sense discrimination typically uses
    unsupervised learning methods to group
    tokens/contexts into clusters based on some
    similarity metric.
  • Schütze and Pedersen 1995
  • Pedersen and Bruce, 1997
  • Schütze, 1998

18
Knowledge Bases for WSD
  • To specify the senses of a word various sense
    inventories have been used
  • Machine Readable Dictionaries (MRDs)
  • Oxford Advanced Learners Dictionary
  • Longman Dictionary of Contemporary English
  • Thesauri
  • Rogets International Thesaurus
  • Computational Lexicons
  • WordNet
  • COMLEX

19
OUTLINE
  • Problem definition of Word Sense Disambiguation
  • WSD Approaches
  • Our Approach
  • An Example
  • Evaluation and Results
  • Related Work
  • Conclusion

20
OUR APPROACH
  • An unsupervised algorithm based on sense
    similarity and syntactic context.
  • Main intuition
  • two different words are likely to have similar
    meanings if they occur in similar local contexts

21
Resources of the Algorithm
  • an untagged training corpus
  • 100-million-word WSJ corpus
  • a concept hierarchy
  • WordNet 2.0
  • a similarity measure
  • Lins (1997) similarity measure
  • a broad-coverage parser
  • MINIPAR

22
LOCAL CONTEXT
  • buy has 5 meanings in WordNet 2.0.
  • The local context of buy is
  • ( V subj I mod N )
  • ( V obj car mod N )
  • ( V aux will mod Aux )

23
LOCAL CONTEXT (cont.)
  • word local context t
  • buy ( V aux will mod Aux )
  • (V subj I mod N)
  • (V obj car mod N )
  • will ( Aux aux buy head V )
  • I ( N subj buy head V )
  • car ( N obj buy head V )
  • ( N mod red mod A )
  • ( N det a mod Det)
  • red ( A mod car head N )
  • a ( Det det car head N )

24
  • GRAMMATICAL RELATIONSHIPS OF MINIPAR
  • appo "ACME president, --appo ? P.W.
    Buckman"
  • aux "should ? aux-- resign"
  • be "is ? be-- sleeping"
  • c "that ? c-- John loves Mary"
  • comp1 first complement
  • det "the ? det -- hat"
  • gen "Jane's ? gen -- uncle"
  • have "have ? have-- disappeared"
  • i relationship btw. a C clause and its
    I clause
  • inv-aux inverted auxiliary "Will ? inv-aux--
    you stop it?
  • inv-be inverted be "Is ? inv-be-- she
    sleeping"
  • inv-have inverted have "Have ? inv-have--
    you slept"
  • mod relationship btw. a word and its
    adjunct modifier
  • pnmod post nominal modifier
  • p-spec specifier of prepositional phrases
  • pcomp-c clausal complement of prepositions
  • pcomp-n nominal complement of prepositions

25
Posterior Modifications of MINIPARRULE-1
Prepositions
  • I am sitting at a table.
  • sit VmodatmodPrep
  • at PrepmodsitheadV
  • at Preppcomp-ntablemodN
  • table Npcomp-natheadPrep
  • sit Vprep-attablemodifierN
  • table Nprep-atsitheadV

26
Rule-2 s-ob relation
  • The gardener cut the grass.
  • cut VsubjgardenermodN
  • gardener NsubjcutheadV
  • cut VobjgrassmodN
  • grass NobjcutheadV
  • gardener Ns-obgrassmodifierN
  • grass Ns-obgardenerheadN

27
Rule-3 Subordinates
  • The thief who stole her bag is
    caught.
  • thief NrelfinmodC
  • fin CrelthiefheadN
  • fin CwhnwhomodN
  • who NwhnfinheadC
    steal VsubjthiefmodifierN
  • fin CistealmodV thief
    NsubjstealheadV
  • steal VifinheadC
  • steal VsubjwhomodN
  • who NsubjstealheadV

fin
28
Rule-4 subj-be
  • His car is very fast.
  • be VBEpredfastmodA
  • fast ApredbeheadVBE
  • fast AsubjcarmodN
  • car NsubjfastheadA
  • fast Asubj-becarmodifierN
  • car Nsubj-befastheadA

29
ALGORITHM
  • Parse the training corpus and build up a local
    context feature database by extracting local
    contexts and applying the rules.
  • Parse the input text and extract local context of
    each ambiguous noun.

30
Example
  • Jane applied for a job.
  • job has 15 meanings in WordNet 2.0.
  • The local context of job is
  • Nprep-forapplyheadV

31
ALGORITHM
  • Parse the training corpus and build up a local
    context feature database by extracting local
    contexts and applying the rules.
  • Parse the input text and extract local context of
    each ambiguous noun.
  • For each ambiguous noun w,
  • search the local context feature database for
    each feature in its local context and find words
    that are in a similar local context as w.

32
Example (cont.)
  • Other words that appeared in prep-for relation
    with the verb apply

33
ALGORITHM
  • Parse the training corpus and build up a local
    context feature database by extracting local
    contexts and applying the rules.
  • Parse the input text and extract local context of
    each ambiguous noun.
  • For each ambiguous noun w,
  • search the local context feature database for
    each feature in its local context and find words
    that are in a similar local context as w.
  • These words are called the selectors set S of w.
  • Select a sense of w that maximizes the similarity
    between the senses of w and the senses of its
    selector set S.

34
Lin Similarity Measure
  • Similarity between hill and coast is

35
Similarity Matrix
36
A Walk Through Example
  • Ex ... the modern world of work.
  • Lets disambiguate the noun work.
  • The local context of work is
  • Nprep-ofworldheadN
  • work has 7 senses in WordNet 2.0 as a noun.

37
A Walk Through Example (cont.)
  • Selectors of work that have been used with
    world of
  • wonder medicine
  • politics computer
  • finance fashion
  • banking telecommunication
  • work music
  • business channel
  • difference retailing
  • art bloc
  • journalism advertising
  • sport computing

38
A Walk Through Example (cont.)
  • Senses of work
  • 1. (435) work -- (activity directed toward making
    or doing something "she checked several points
    needing further work")
  • 2. (359) work, piece of work -- (a product
    produced or accomplished through the effort or
    activity or agency of a person or thing "it is
    not regarded as one of his more memorable works"
    "the symphony was hailed as an ingenious work"
    "he was indebted to the pioneering work of John
    Dewey" "the work of an active imagination"
    "erosion is the work of wind or water over time")
  • 3. (96) employment, work -- (the occupation for
    which you are paid "he is looking for
    employment" "a lot of people are out of work")
  • 4. (44) study, work -- (applying the mind to
    learning and understanding a subject (especially
    by reading) "mastering a second language
    requires a lot of work" "no schools offer
    graduate study in interior design")
  • 5. (41) oeuvre, work, body of work -- (the total
    output of a writer or artist (or a substantial
    part of it) "he studied the entire Wagnerian
    oeuvre" "Picasso's work can be divided into
    periods")
  • 6. (18) workplace, work -- (a place where work is
    done "he arrived at work early today")
  • 7. (14) work -- ((physics) a manifestation of
    energy the transfer of energy from one physical
    system to another expressed as the product of a
    force and the distance through which it moves a
    body in the direction of that force "work equals
    force times distance")

39
A Walk Through Example (cont.)
  • Sense-1 Sense-3 are subclasses of act, human
    action, human activity
  • Sense-3 also has a meaning of gt occupation,
    business, job, line of work, line
  • Sense-2 Sense-5 are subclasses of artifact.
  • Sense-4 is the subclass of basic cognitive
    process
  • Sense-6 is a kind of location.
  • Sense-7 is a physical phenomenon.

40
A Walk Through Example (cont.)
  • When the WordNet 2.0 hierarchy is examined it is
    found that many of the selectors have act, human
    action, human activity sense
  • politics, business, journalism, sport,
    medicine, banking, art, fashion, music, channel,
    retailing, computing, advertising
  • Moreover, politics, business, journalism, sport,
    medicine has the sense occupation, business,
    job, line of work, line.
  • Since Sense-3 received the majority of support
    from the selectors, it was selected as the
    correct sense with a support value of 5.25.
  • The other senses had support only from one or two
    selector words.

41
OUTLINE
  • Problem definition of Word Sense Disambiguation
  • WSD Approaches
  • Our Approach
  • An Example
  • Evaluation and Results
  • Related Work
  • Conclusion

42
EVALUATION
  • SENSEVAL-2 and SENSEVAL-3 English all-words task
    data were used for evaluation.
  • Only nouns were disambiguated.
  • There are 1136 annotated nouns in SENSEVAL-2
  • 951 annotated nouns in SENSEVAL-3

43
RESULTS
  • Results on SENSEVAL data

44
RESULTS (cont.)
  • Top-10 performing systems in SENSEVAL-2

45
RESULTS (cont.)
  • Reasons for the recall of our system to be low
  • wrong POS tags produced by the parser are
    eliminated
  • local context features of the nouns sometimes did
    not exist in the local context feature database
  • there were times when all of the support values
    became zero, so no sense could be selected and
    the algorithm stopped.
  • An accuracy of 62 is obtained if first sense
    heuristic is used for the unanswered cases.

46
RESULTS (cont.)
  • Comparison of similarity measures

47
OUTLINE
  • Problem definition of Word Sense Disambiguation
  • WSD Approaches
  • Our Approach
  • An Example
  • Evaluation and Results
  • Related Work
  • Conclusion

48
Related Work
  • Very similar to the algorithm presented by Lin
    (1997)
  • 25-million-word Wall Street Journal corpus was
    parsed
  • An accuracy of 56.1 is reported for nouns.
  • Since Wall Street Journal is mostly business
    news, Lin only used the press reportage part of
    SemCor corpus for testing.
  • Similarity maximization algorithm and the
    similarity measure used are the same
  • When Lins algorithm is ran on our data, an
    accuracy of 46,5 is obtained, while we obtained
    59,1 accuracy.

49
Related Work (cont.)
  • Differences from Lins work
  • new grammatical relationships are introduced
  • every occurrence of the same word in the text
    treated independently whereas Lin adopted the
    one sense per discourse heuristic advocated in
    (Gale et al., 1992).
  • A comparison between two similarity measures is
    done by the use of Lesk similarity measure as a
    part of WordNet Similarity Package
  • In the evaluation phase, we didnt restrict our
    test cases based on the topical content of the
    training data as Lin only used press reportage
    part of SemCor corpus.

50
Related Work (cont.)
  • Resnik, 1995
  • used a similar algortihm to our similarity
    maximization algorithm
  • Stetina et al. (1998)
  • achieved good results with syntactic relations as
    features
  • Martinez et al. (2002)
  • discussed the contribution of various syntactic
    features to WSD
  • the set of syntactic features was extracted using
    Minipar

51
CONCLUSION
  • Syntactic relations are helpful for WSD.
  • In similarity maximization part, every feature
    should not be treated equally.
  • The intuition that similar words occur in similar
    contexts does not always hold.
  • Unsupervised methods should be examined more
    deeply.

52
Thank you...
Write a Comment
User Comments (0)
About PowerShow.com