Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation - PowerPoint PPT Presentation

About This Presentation

Title:

Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation

Description:

1. Combining Lexical and Syntactic Features for Supervised Word Sense ... Documents pertaining to the insect and the mammal, irrelevant. Machine Translation ... – PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 42

Provided by: scie73

Learn more at: https://www.d.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation

1
Combining Lexical and Syntactic Features for
Supervised Word Sense Disambiguation

Masters Thesis Saif Mohammad
Advisor Dr. Ted Pedersen
University of Minnesota, Duluth
Date August 1, 2003

2
Path Map

Introduction
Background
Data
Experiments
Conclusions

3
Word Sense Disambiguation

Harry cast a bewitching spell
Humans immediately understand spell to mean a
charm or incantation
reading out letter by letter or a period of time
?
Words with multiple senses polysemy, ambiguity
Utilize background knowledge and context
Machines lack background knowledge
Automatically identifying the intended sense of a
word in written text, based on its context,
remains a hard problem
Features are identified from the context
Best accuracies in latest international event,
around 65

4
Why do we need WSD !

Information Retrieval
Query cricket bat
Documents pertaining to the insect and the
mammal, irrelevant
Machine Translation
Consider English to Hindi translation
head to sar (upper part of the body) or adhyaksh
(leader)
Machine Human interaction
Instructions to machines
Interactive home system turn on the lights
Domestic Android get the door
Applications are widespread and will affect our
way of life

5
Terminology

Harry cast a bewitching spell
Target word the word whose intended sense is to
be identified
spell
Context the sentence housing the target word
and possibly, 1 or 2 sentences around it
Harry cast a bewitching spell
Instance target word along with its context
WSD is a classification problem wherein the
occurrence of the
target word is assigned to one of its many
possible senses

6
Corpus-Based Supervised Machine Learning

A computer program is said to learn from
experience if its performance at tasks
improves with experience
- Mitchell
Task Word Sense Disambiguation of given test
instances
Performance Ratio of instances correctly
disambiguated to the total test instances -
accuracy
Experience Manually created instances such that
target words are marked with intended sense
training instances
Harry cast a bewitching spell / incantation

7
Path Map

Introduction
Background
Data
Experiments
Conclusions

8
Decision Trees

A kind of classifier
Assigns a class by asking a series of questions
Questions correspond to features of the instance
Question asked depends on answer to previous
question
Inverted tree structure
Interconnected nodes
Top most node is called the root
Each node corresponds to a question / feature
Each possible value of feature has corresponding
branch
Leaves terminate every path from root
Each leaf is associated with a class

9
Automating Toy Selection for Max
Moving Parts ?
ROOT
NODES
No
Yes
Color ?
Car ?
No
Yes
Blue
Other
Red
Size ?
Car ?
LOVE
HATE
HATE
Big
Small
No
Yes
Size ?
LOVE
SO SO
SO SO
Small
Big
LEAVES
LOVE
HATE
10
WSD Tree
Feature 1 ?
0
1
Feature 4?
Feature 2 ?
0
1
0
1
Feature 4 ?
Feature 2 ?
SENSE 1
SENSE 3
1
0
0
1
Feature 3 ?
SENSE 4
SENSE 3
SENSE 1
0
1
SENSE 3
SENSE 2
11
Issues

Why use decision trees for WSD ?
How are decision trees learnt ?
ID3 and C4.5algorithms
What is bagging and its advantages
Drawbacks of decision trees bagging
Pedersen2002 Choosing the right features is of
greater significance than the learning algorithm
itself

12
Lexical Features

Surface form
A word we observe in text
Case(n)
1. Object of investigation 2. frame or covering
3. A weird person
Surface forms case, cases, casing
An occurrence of casing suggests sense 2
Unigrams and Bigrams
One word and two word sequences in text
The interest rate is low
Unigrams the, interest, rate, is, low
Bigrams the interest, interest rate, rate is, is
low

13
Part of Speech Tagging

Pre-requisite for many Natural Language Tasks
Parsing, WSD, Anaphora resolution
Brill Tagger most widely used tool
Accuracy around 95
Source code available
Easily understood rules
Harry/NNP cast/VBD a/DT bewitching/JJ spell/NN
NNP proper noun, VBD verb past, DT determiner, NN
noun

14
Pre-Tagging

Pre-tagging is the act of manually assigning tags
to selected words in a text prior to tagging
Mona will sit in the pretty chair//NN this time
chair is the pre-tagged word, NN is its pre-tag
Reliable anchors or seeds around which tagging is
done
Brill Tagger facilitates pre-tagging
Pre-tag not always respected !
Mona/NNP will/MD sit/VB in/IN the/DT
pretty/RB chair//VB this/DT time/NN

15
Contextual Rules

Initial state tagger assigns most frequent tag
for a type based on entries in a Lexicon (pre-tag
respected)
Final state tagger may modify tag of word based
on context (pre-tag not given special treatment)
Relevant Lexicon Entries
Type Most frequent tag Other possible tags
chair NN(noun) VB(verb)
pretty RB(adverb) JJ(adjective)
Relevant Contextual Rules
Current Tag New Tag When
NN VB NEXTTAG DT
RB JJ NEXTTAG NN

16
Guaranteed Pre-Tagging

A patch to the tagger provided BrillPatch
Application of contextual rules to the pre-tagged
words bypassed
Application of contextual rules to non pre-tagged
words unchanged.
Mona/NNP will/MD sit/VB in/IN the/DT
pretty/JJ chair//NN this/DT time/NN
Tag of chair retained as NN
Contextual rule to change tag of chair from NN to
VB not applied
Tag of pretty transformed
Contextual rule to change tag of pretty from RB
to JJ applied

17
Part of Speech Features

A word in different parts of speech has different
senses
A word used in different senses is likely to have
different sets of pos around it
Why did jack turn/VB against/IN his/PRP team/NN
Why did jack turn/VB left/VBN at/IN the/DT
crossing
Features used
Individual word POS P-2, P-1, P0, P1, P2
P2 JJ implies P2 is an adjective
Sequential POS P-1P0, P-1P0 P1, and so on
P-1P0 NN, VB implies P-1 is a noun and P0 is a
verb
A combination of the above

18
Parse Features

Collins Parser used to parse the data
Source code available
Uses part of speech tagged data as input
Head word of a phrase
the hard work, the hard surface
Phrase itself noun phrase, verb phrase and so
on
Parent Head word of the parent phrase
fasten the line, cross the line
Parent Phrase

19
Sample Parse Tree
SENTENCE
VERB PHRASE
NOUN PHRASE
Harry
NOUN PHRASE
cast
NNP
VBD
spell
bewitching
a
NN
JJ
DT
20
Path Map

Introduction
Background
Data
Experiments
Conclusions

21
Sense-Tagged Data

Senseval2 data
4328 instances of test data and 8611 instances of
training data ranging over 73 different noun,
verb and adjectives.
Senseval1 data
8512 test instances and 13,276 training
instances, ranging over 35 nouns, verbs and
adjectives.
Line, hard, interest, serve data
4,149, 4,337, 4378 and 2476 sense-tagged
instances with line, hard, serve and interest as
the head words.
Around 50,000 sense-tagged instances in all !

22
Data Processing

Packages to convert line hard, serve and interest
data to Senseval-1 and Senseval-2 data formats
refine preprocesses data in Senseval-2 data
format to make it suitable for tagging
Restore one sentence per line and one line per
sentence, pre-tag the target words, split long
sentences
posSenseval part of speech tags any data in
Senseval-2 data format
Brill tagger along with Guaranteed Pre-tagging
utilized
parseSenseval parses data in a format as output
by the Brill Tagger
restores xml tags, creating a parsed file in
Senseval-2 data format
Uses the Collins Parser

23
Sample line data instance

Original instance
art aphb 01301041
" There's none there . " He hurried outside to
see if there were any dry ones on the line .
Senseval-2 data format
ltinstance id"line-n.art aphb 01301041"gt
ltanswer instance"line-n.art aphb 01301041"
senseid"cord"/gt
ltcontextgt
ltsgt " There's none there . " lt/sgt ltsgt He hurried
outside to see if there were any dry ones on the
ltheadgtlinelt/headgt . lt/sgt
lt/contextgt
lt/instancegt

24
Sample Output from parseSenseval

ltinstance idharry"gt
ltanswer instanceharry" senseidincantation"/gt
ltcontextgt
Harry cast a bewitching ltheadgtspelllt/headgt
lt/contextgt
lt/instancegt
ltinstance idharry"gt
ltanswer instanceharry" senseidincantation"/gt
ltcontextgt
ltPTOPcast11gt ltPScast22gt
ltPNPBPotter22gt Harry
ltpNNP/gt ltPVPcast21gt cast ltpVB/gt
ltPNPBspell33gt
a ltpDT/gt bewitching ltpJJ/gt spell ltpNN/gt
lt/Pgt lt/Pgt lt/Pgt lt/Pgt
lt/contextgt
lt/instancegt

25
Issues

How is the target word identified in line, hard
and serve data
How the data is tokenized for better quality pos
tagging and parsing
How is the data pre-tagged
How is parse output of Collins Parser interpreted
How is the parsed output XMLized and brought
back to Senseval-2 data format
Idiosyncrasies of line, hard, serve, interest,
Senseval-1 and Senseval-2 data and how they are
handled

26
Path Map

Introduction
Background
Data
Experiments
Conclusions

27
Surface Forms Senseval-1 Senseval-2
Senseval-2 Senseval-1
Majority 47.7 56.3
Surface Form 49.3 62.9
Unigrams 55.3 66.9
Bigrams 55.1 66.9
28
Individual Word POS (Senseval-1)
All Nouns Verbs Adj.
Majority 56.3 57.2 56.9 64.3
P-2 57.5 58.2 58.6 64.0
P-1 59.2 62.2 58.2 64.3
P0 60.3 62.5 58.2 64.3
P1 63.9 65.4 64.4 66.2
P-2 59.9 60.0 60.8 65.2
29
Individual Word POS (Senseval-2)
All Nouns Verbs Adj.
Majority 47.7 51.0 39.7 59.0
P-2 47.1 51.9 38.0 57.9
P-1 49.6 55.2 40.2 59.0
P0 49.9 55.7 40.6 58.2
P1 53.1 53.8 49.1 61.0
P-2 48.9 50.2 43.2 59.4
30
Combining POS Features
Senseval-2 Senseval-1 line
Majority 47.7 56.3 54.3
P0, P1 54.3 66.7 54.1
P-1, P0, P1 54.6 68.0 60.4
P-2, P-1, P0, P1 , P2 54.6 67.8 62.3
31
Effect Guaranteed Pre-tagging on WSD
Senseval-1
Senseval-2
Guar. P. Reg. P. Guar. P. Reg. P
P-1, P0 62.2 62.1 50.8 50.9
P0, P1 66.7 66.7 54.3 53.8
P-1, P0, P1 68.0 67.6 54.6 54.7
P-1P0, P0P1 66.7 66.3 54.0 53.7
P-2, P-1, P0, P1 , P2 67.8 66.1 54.6 54.1
32
Parse Features (Senseval-1)
All Nouns Verbs Adj.
Majority 56.3 57.2 56.9 64.3
Head 64.3 70.9 59.8 66.9
Parent 60.6 62.6 60.3 65.8
Phrase 58.5 57.5 57.2 66.2
Par. Phr. 57.9 58.1 58.3 66.2
33
Parse Features (Senseval-2)
All Nouns Verbs Adj.
Majority 47.7 51.0 39.7 59.0
Head 51.7 58.5 39.8 64.0
Parent 50.0 56.1 40.1 59.3
Phrase 48.3 51.7 40.3 59.5
Par. Phr. 48.5 53.0 39.1 60.3
34
Thoughts

Both lexical and syntactic features perform
comparably
But do they get the same instances right ?
How much are the individual feature sets
redundant
Are there instances correctly disambiguated by
one feature set and not by the other ?
How much are the individual feature sets
complementary
Is the effort to combine of lexical and syntactic
features justified ?

35
Measures

Baseline Ensemble accuracy of a hypothetical
ensemble which predicts the sense correctly only
if both individual feature sets do so
Quantifies redundancy amongst feature sets
Optimal Ensemble accuracy of a hypothetical
ensemble which predicts the sense correctly if
either of the individual feature sets do so
Difference with individual accuracies quantifies
complementarity
We used a simple ensemble which sums up the
probabilities for each sense by the individual
feature
sets to decide the intended sense

36
Best Combinations
Data Set 1 Set 2 Base Maj. Ens. Opt.
Sval2 Unigrams 55.3 P-1,P0, P1 55.3 43.6 47.7 57.0 67.9
Sval1 Unigrams 66.9 P-1,P0, P1 68.0 57.6 56.3 71.1 78.0
line Unigrams 74.5 P-1,P0, P1 60.4 55.1 54.3 74.2 82.0
hard Bigrams 89.5 Head, Par 87.7 86.1 81.5 88.9 91.3
serve Unigrams 73.3 P-1,P0, P1 73.0 58.4 42.2 81.6 89.9
Interest Bigrams 79.9 P-1,P0, P1 78.8 67.6 54.9 83.2 90.1
37
Path Map

Introduction
Background
Data
Experiments
Conclusions

38
Conclusions

Significant amount of complementarity across
lexical and syntactic features
Combination of the two justified
Part of speech of word immediately to the right
of target word found most useful
Pos of words immediately to the right of target
word best for verbs and adjectives
Nouns helped by tags on either side
Head word of phrase particularly useful for
adjectives
Nouns helped by both head and parent

39
Other Contributions

Converted line, hard, serve and interest data
into Senseval-2 data format
Part of speech tagged and Parsed the Senseval2,
Senseval-1, line, hard, serve and interest data
Developed the Guaranteed Pre-tagging mechanism to
improve quality of pos tagging
Showed that guaranteed pre-tagging improves WSD

40
Code, Data, Resources and Publication

posSenseval part of speech tags any data in
Senseval-2 data format
parseSenseval parses data in a format as output
by the Brill Tagger. Output is in Senseval-2 data
format with part of speech and parse information
as xml tags.
Packages to convert line hard, serve and interest
data to Senseval-1 and Senseval-2 data formats
BrillPatch Patch to Brill Tagger to employ
Guaranteed Pre-Tagging
http//www.d.umn.edu/tpederse/data.html
Brill Tagger http//www.cs.jhu.edu/brill/RBT1_14
.tar.Z
Collins Parser http//www.ai.mit.edu/people/mcoll
ins
Guaranteed Pre-Tagging for the Brill Tagger,
Mohammad and Pedersen, Fourth International
Conference of Intelligent Systems and Text
Processing, February 2003, Mexico