Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation - PowerPoint PPT Presentation

About This Presentation
Title:

Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation

Description:

1. Combining Lexical and Syntactic Features for Supervised Word Sense ... Documents pertaining to the insect and the mammal, irrelevant. Machine Translation ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 42
Provided by: scie73
Learn more at: https://www.d.umn.edu
Category:

less

Transcript and Presenter's Notes

Title: Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation


1
Combining Lexical and Syntactic Features for
Supervised Word Sense Disambiguation
  • Masters Thesis Saif Mohammad
  • Advisor Dr. Ted Pedersen
  • University of Minnesota, Duluth
  • Date August 1, 2003

2
Path Map
  • Introduction
  • Background
  • Data
  • Experiments
  • Conclusions

3
Word Sense Disambiguation
  • Harry cast a bewitching spell
  • Humans immediately understand spell to mean a
    charm or incantation
  • reading out letter by letter or a period of time
    ?
  • Words with multiple senses polysemy, ambiguity
  • Utilize background knowledge and context
  • Machines lack background knowledge
  • Automatically identifying the intended sense of a
    word in written text, based on its context,
    remains a hard problem
  • Features are identified from the context
  • Best accuracies in latest international event,
    around 65

4
Why do we need WSD !
  • Information Retrieval
  • Query cricket bat
  • Documents pertaining to the insect and the
    mammal, irrelevant
  • Machine Translation
  • Consider English to Hindi translation
  • head to sar (upper part of the body) or adhyaksh
    (leader)
  • Machine Human interaction
  • Instructions to machines
  • Interactive home system turn on the lights
  • Domestic Android get the door
  • Applications are widespread and will affect our
    way of life

5
Terminology
  • Harry cast a bewitching spell
  • Target word the word whose intended sense is to
    be identified
  • spell
  • Context the sentence housing the target word
    and possibly, 1 or 2 sentences around it
  • Harry cast a bewitching spell
  • Instance target word along with its context
  • WSD is a classification problem wherein the
    occurrence of the
  • target word is assigned to one of its many
    possible senses

6
Corpus-Based Supervised Machine Learning
  • A computer program is said to learn from
    experience if its performance at tasks
    improves with experience
  • - Mitchell
  • Task Word Sense Disambiguation of given test
    instances
  • Performance Ratio of instances correctly
    disambiguated to the total test instances -
    accuracy
  • Experience Manually created instances such that
    target words are marked with intended sense
    training instances
  • Harry cast a bewitching spell / incantation

7
Path Map
  • Introduction
  • Background
  • Data
  • Experiments
  • Conclusions

8
Decision Trees
  • A kind of classifier
  • Assigns a class by asking a series of questions
  • Questions correspond to features of the instance
  • Question asked depends on answer to previous
    question
  • Inverted tree structure
  • Interconnected nodes
  • Top most node is called the root
  • Each node corresponds to a question / feature
  • Each possible value of feature has corresponding
    branch
  • Leaves terminate every path from root
  • Each leaf is associated with a class

9
Automating Toy Selection for Max
Moving Parts ?
ROOT
NODES
No
Yes
Color ?
Car ?
No
Yes
Blue
Other
Red
Size ?
Car ?
LOVE
HATE
HATE
Big
Small
No
Yes
Size ?
LOVE
SO SO
SO SO
Small
Big
LEAVES
LOVE
HATE
10
WSD Tree
Feature 1 ?
0
1
Feature 4?
Feature 2 ?
0
1
0
1
Feature 4 ?
Feature 2 ?
SENSE 1
SENSE 3
1
0
0
1
Feature 3 ?
SENSE 4
SENSE 3
SENSE 1
0
1
SENSE 3
SENSE 2
11
Issues
  • Why use decision trees for WSD ?
  • How are decision trees learnt ?
  • ID3 and C4.5algorithms
  • What is bagging and its advantages
  • Drawbacks of decision trees bagging
  • Pedersen2002 Choosing the right features is of
  • greater significance than the learning algorithm
    itself

12
Lexical Features
  • Surface form
  • A word we observe in text
  • Case(n)
  • 1. Object of investigation 2. frame or covering
    3. A weird person
  • Surface forms case, cases, casing
  • An occurrence of casing suggests sense 2
  • Unigrams and Bigrams
  • One word and two word sequences in text
  • The interest rate is low
  • Unigrams the, interest, rate, is, low
  • Bigrams the interest, interest rate, rate is, is
    low

13
Part of Speech Tagging
  • Pre-requisite for many Natural Language Tasks
  • Parsing, WSD, Anaphora resolution
  • Brill Tagger most widely used tool
  • Accuracy around 95
  • Source code available
  • Easily understood rules
  • Harry/NNP cast/VBD a/DT bewitching/JJ spell/NN
  • NNP proper noun, VBD verb past, DT determiner, NN
    noun

14
Pre-Tagging
  • Pre-tagging is the act of manually assigning tags
    to selected words in a text prior to tagging
  • Mona will sit in the pretty chair//NN this time
  • chair is the pre-tagged word, NN is its pre-tag
  • Reliable anchors or seeds around which tagging is
    done
  • Brill Tagger facilitates pre-tagging
  • Pre-tag not always respected !
  • Mona/NNP will/MD sit/VB in/IN the/DT
  • pretty/RB chair//VB this/DT time/NN

15
Contextual Rules
  • Initial state tagger assigns most frequent tag
    for a type based on entries in a Lexicon (pre-tag
    respected)
  • Final state tagger may modify tag of word based
    on context (pre-tag not given special treatment)
  • Relevant Lexicon Entries
  • Type Most frequent tag Other possible tags
  • chair NN(noun) VB(verb)
  • pretty RB(adverb) JJ(adjective)
  • Relevant Contextual Rules
  • Current Tag New Tag When
  • NN VB NEXTTAG DT
  • RB JJ NEXTTAG NN

16
Guaranteed Pre-Tagging
  • A patch to the tagger provided BrillPatch
  • Application of contextual rules to the pre-tagged
    words bypassed
  • Application of contextual rules to non pre-tagged
    words unchanged.
  • Mona/NNP will/MD sit/VB in/IN the/DT
  • pretty/JJ chair//NN this/DT time/NN
  • Tag of chair retained as NN
  • Contextual rule to change tag of chair from NN to
    VB not applied
  • Tag of pretty transformed
  • Contextual rule to change tag of pretty from RB
    to JJ applied

17
Part of Speech Features
  • A word in different parts of speech has different
    senses
  • A word used in different senses is likely to have
    different sets of pos around it
  • Why did jack turn/VB against/IN his/PRP team/NN
  • Why did jack turn/VB left/VBN at/IN the/DT
    crossing
  • Features used
  • Individual word POS P-2, P-1, P0, P1, P2
  • P2 JJ implies P2 is an adjective
  • Sequential POS P-1P0, P-1P0 P1, and so on
  • P-1P0 NN, VB implies P-1 is a noun and P0 is a
    verb
  • A combination of the above

18
Parse Features
  • Collins Parser used to parse the data
  • Source code available
  • Uses part of speech tagged data as input
  • Head word of a phrase
  • the hard work, the hard surface
  • Phrase itself noun phrase, verb phrase and so
    on
  • Parent Head word of the parent phrase
  • fasten the line, cross the line
  • Parent Phrase

19
Sample Parse Tree
SENTENCE
VERB PHRASE
NOUN PHRASE
Harry
NOUN PHRASE
cast
NNP
VBD
spell
bewitching
a
NN
JJ
DT
20
Path Map
  • Introduction
  • Background
  • Data
  • Experiments
  • Conclusions

21
Sense-Tagged Data
  • Senseval2 data
  • 4328 instances of test data and 8611 instances of
    training data ranging over 73 different noun,
    verb and adjectives.
  • Senseval1 data
  • 8512 test instances and 13,276 training
    instances, ranging over 35 nouns, verbs and
    adjectives.
  • Line, hard, interest, serve data
  • 4,149, 4,337, 4378 and 2476 sense-tagged
    instances with line, hard, serve and interest as
    the head words.
  • Around 50,000 sense-tagged instances in all !

22
Data Processing
  • Packages to convert line hard, serve and interest
    data to Senseval-1 and Senseval-2 data formats
  • refine preprocesses data in Senseval-2 data
    format to make it suitable for tagging
  • Restore one sentence per line and one line per
    sentence, pre-tag the target words, split long
    sentences
  • posSenseval part of speech tags any data in
    Senseval-2 data format
  • Brill tagger along with Guaranteed Pre-tagging
    utilized
  • parseSenseval parses data in a format as output
    by the Brill Tagger
  • restores xml tags, creating a parsed file in
    Senseval-2 data format
  • Uses the Collins Parser

23
Sample line data instance
  • Original instance
  • art aphb 01301041
  • " There's none there . " He hurried outside to
    see if there were any dry ones on the line .
  • Senseval-2 data format
  • ltinstance id"line-n.art aphb 01301041"gt
  • ltanswer instance"line-n.art aphb 01301041"
    senseid"cord"/gt
  • ltcontextgt
  • ltsgt " There's none there . " lt/sgt ltsgt He hurried
    outside to see if there were any dry ones on the
    ltheadgtlinelt/headgt . lt/sgt
  • lt/contextgt
  • lt/instancegt

24
Sample Output from parseSenseval
  • ltinstance idharry"gt
  • ltanswer instanceharry" senseidincantation"/gt
  • ltcontextgt
  • Harry cast a bewitching ltheadgtspelllt/headgt
  • lt/contextgt
  • lt/instancegt
  • ltinstance idharry"gt
  • ltanswer instanceharry" senseidincantation"/gt
  • ltcontextgt
  • ltPTOPcast11gt ltPScast22gt
    ltPNPBPotter22gt Harry
  • ltpNNP/gt ltPVPcast21gt cast ltpVB/gt
    ltPNPBspell33gt
  • a ltpDT/gt bewitching ltpJJ/gt spell ltpNN/gt
    lt/Pgt lt/Pgt lt/Pgt lt/Pgt
  • lt/contextgt
  • lt/instancegt

25
Issues
  • How is the target word identified in line, hard
    and serve data
  • How the data is tokenized for better quality pos
    tagging and parsing
  • How is the data pre-tagged
  • How is parse output of Collins Parser interpreted
  • How is the parsed output XMLized and brought
    back to Senseval-2 data format
  • Idiosyncrasies of line, hard, serve, interest,
    Senseval-1 and Senseval-2 data and how they are
    handled

26
Path Map
  • Introduction
  • Background
  • Data
  • Experiments
  • Conclusions

27
Surface Forms Senseval-1 Senseval-2
Senseval-2 Senseval-1
Majority 47.7 56.3
Surface Form 49.3 62.9
Unigrams 55.3 66.9
Bigrams 55.1 66.9
28
Individual Word POS (Senseval-1)
All Nouns Verbs Adj.
Majority 56.3 57.2 56.9 64.3
P-2 57.5 58.2 58.6 64.0
P-1 59.2 62.2 58.2 64.3
P0 60.3 62.5 58.2 64.3
P1 63.9 65.4 64.4 66.2
P-2 59.9 60.0 60.8 65.2
29
Individual Word POS (Senseval-2)
All Nouns Verbs Adj.
Majority 47.7 51.0 39.7 59.0
P-2 47.1 51.9 38.0 57.9
P-1 49.6 55.2 40.2 59.0
P0 49.9 55.7 40.6 58.2
P1 53.1 53.8 49.1 61.0
P-2 48.9 50.2 43.2 59.4
30
Combining POS Features
Senseval-2 Senseval-1 line
Majority 47.7 56.3 54.3
P0, P1 54.3 66.7 54.1
P-1, P0, P1 54.6 68.0 60.4
P-2, P-1, P0, P1 , P2 54.6 67.8 62.3
31
Effect Guaranteed Pre-tagging on WSD
Senseval-1
Senseval-2
Guar. P. Reg. P. Guar. P. Reg. P
P-1, P0 62.2 62.1 50.8 50.9
P0, P1 66.7 66.7 54.3 53.8
P-1, P0, P1 68.0 67.6 54.6 54.7
P-1P0, P0P1 66.7 66.3 54.0 53.7
P-2, P-1, P0, P1 , P2 67.8 66.1 54.6 54.1
32
Parse Features (Senseval-1)
All Nouns Verbs Adj.
Majority 56.3 57.2 56.9 64.3
Head 64.3 70.9 59.8 66.9
Parent 60.6 62.6 60.3 65.8
Phrase 58.5 57.5 57.2 66.2
Par. Phr. 57.9 58.1 58.3 66.2
33
Parse Features (Senseval-2)
All Nouns Verbs Adj.
Majority 47.7 51.0 39.7 59.0
Head 51.7 58.5 39.8 64.0
Parent 50.0 56.1 40.1 59.3
Phrase 48.3 51.7 40.3 59.5
Par. Phr. 48.5 53.0 39.1 60.3
34
Thoughts
  • Both lexical and syntactic features perform
    comparably
  • But do they get the same instances right ?
  • How much are the individual feature sets
    redundant
  • Are there instances correctly disambiguated by
    one feature set and not by the other ?
  • How much are the individual feature sets
    complementary
  • Is the effort to combine of lexical and syntactic
  • features justified ?

35
Measures
  • Baseline Ensemble accuracy of a hypothetical
    ensemble which predicts the sense correctly only
    if both individual feature sets do so
  • Quantifies redundancy amongst feature sets
  • Optimal Ensemble accuracy of a hypothetical
    ensemble which predicts the sense correctly if
    either of the individual feature sets do so
  • Difference with individual accuracies quantifies
    complementarity
  • We used a simple ensemble which sums up the
  • probabilities for each sense by the individual
    feature
  • sets to decide the intended sense

36
Best Combinations
Data Set 1 Set 2 Base Maj. Ens. Opt.
Sval2 Unigrams 55.3 P-1,P0, P1 55.3 43.6 47.7 57.0 67.9
Sval1 Unigrams 66.9 P-1,P0, P1 68.0 57.6 56.3 71.1 78.0
line Unigrams 74.5 P-1,P0, P1 60.4 55.1 54.3 74.2 82.0
hard Bigrams 89.5 Head, Par 87.7 86.1 81.5 88.9 91.3
serve Unigrams 73.3 P-1,P0, P1 73.0 58.4 42.2 81.6 89.9
Interest Bigrams 79.9 P-1,P0, P1 78.8 67.6 54.9 83.2 90.1
37
Path Map
  • Introduction
  • Background
  • Data
  • Experiments
  • Conclusions

38
Conclusions
  • Significant amount of complementarity across
    lexical and syntactic features
  • Combination of the two justified
  • Part of speech of word immediately to the right
    of target word found most useful
  • Pos of words immediately to the right of target
    word best for verbs and adjectives
  • Nouns helped by tags on either side
  • Head word of phrase particularly useful for
    adjectives
  • Nouns helped by both head and parent

39
Other Contributions
  • Converted line, hard, serve and interest data
    into Senseval-2 data format
  • Part of speech tagged and Parsed the Senseval2,
    Senseval-1, line, hard, serve and interest data
  • Developed the Guaranteed Pre-tagging mechanism to
    improve quality of pos tagging
  • Showed that guaranteed pre-tagging improves WSD

40
Code, Data, Resources and Publication
  • posSenseval part of speech tags any data in
    Senseval-2 data format
  • parseSenseval parses data in a format as output
    by the Brill Tagger. Output is in Senseval-2 data
    format with part of speech and parse information
    as xml tags.
  • Packages to convert line hard, serve and interest
    data to Senseval-1 and Senseval-2 data formats
  • BrillPatch Patch to Brill Tagger to employ
    Guaranteed Pre-Tagging
  • http//www.d.umn.edu/tpederse/data.html
  • Brill Tagger http//www.cs.jhu.edu/brill/RBT1_14
    .tar.Z
  • Collins Parser http//www.ai.mit.edu/people/mcoll
    ins
  • Guaranteed Pre-Tagging for the Brill Tagger,
    Mohammad and Pedersen, Fourth International
    Conference of Intelligent Systems and Text
    Processing, February 2003, Mexico

41
Thank You
Write a Comment
User Comments (0)
About PowerShow.com