Word Sense Disambiguation at SensevalII

About This Presentation

Title:

Word Sense Disambiguation at SensevalII

Description:

All the senses for a word are collected into a dictionary. ... Cross category relations: operate#3 [Medicine] Cross language information. Polysemy Reduction ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 59

Provided by: giul9

Category:

more less

Transcript and Presenter's Notes

Title: Word Sense Disambiguation at SensevalII

1
Word Sense Disambiguation at Senseval-II

Bernardo Magnini, Carlo Strapparava
Giovanni Pezzulo and Alfio Gliozzo
ITC-irst, Centro per la Ricerca Scientifica e
Tecnologica
Povo (Trento) - Italy
magnini, strappa, pezzulo, gliozzo_at_itc.it

2
Outline

Word Sense Disambiguation (WSD)
Definition of the task
Methodological issues
Senseval-II Experience
Overview
Systems description

3
Word Sense Disambiguation (WSD)WSD is the
process of deciding the meaning of a word in its
context.

The problem to resolve derives from Lexical
Ambiguity
he cashed a check at the bank
he sat on the bank of the river and watched the
currents

The same word can assume different senses with
respect to the particular context where it
occurs.
4
WSD Preliminary observations

All the senses for a word are collected into a
dictionary.
Evaluating a WSD program is a process of
comparison between human and systems answers.
Most common words have more than one meaning
(Zirpfs law), so the great part of terms into a
text are polysemous.

5
Uses of WSD systems

Many NLP applications could be improved using a
good WSD module
Examples
Machine Translation
Information Retrieval Question Answering

6
The WSD Problem

Choose the sense repository (meanings are
represented in different ways in different
dictionaries)
Elaborate WSD procedures and systems
Evaluate systems results

7
Machine Readable Dictionaries (MRD)

Provide sense repertories for disambiguation
systems
Different dictionaries present different sense
distinctions for the same word (granularity)
Some algorithms use information taken from
dictionaries
The most used dictionaries for WSD are WordNet,
LDOCE, Hector

8
Choosing the right sense

he cashed a check at the bank

Fine Grained Dictionary (WordNet) 1. depository
financial institution, bank, banking concern,
banking company -- 2. bank -- (sloping land
(especially the slope beside a body of water))3.
bank -- (a supply or stock held in reserve for
future use (especially in emergencies))4. bank,
bank building 5. bank -- (an arrangement of
similar objects in a row or in tiers)6. savings
bank, coin bank, money box, bank 7. bank -- (a
long ridge or pile) 8. bank -- (the funds held
by a gambling house or the dealer in some
gambling games)9. bank, cant, camber 10. bank
-- (a flight maneuver aircraft tips laterally
about its longitudinal axis )
Coarse Grained Dictionary (WordNet Domains) 1.
ECONOMY (Institution or place where money can be
saved) 2. GEOGRAPHY (the sloping land beside a
body of water) 3. FACTOTUM (an arrangement of
similar objects in a row or in tiers) 4.
ARCHITECTURE (a slope in the turn of the road) 5
TRANSPORT (a flight maneuver)
9
Evaluation of WSD systems

Consists in a comparison between systems and
human answers
Human answers are collected in an annotated
corpus (Gold Standard)
Precision and Recall can be used.
Baseline and upper bound can be fixed.

10
Corpora

Large collections of texts
Sense Annotated
Semcor (200.000), DSO (192.000 semantically
annotated occurrences of 121 nouns and 70 verbs),
training data senseval (8699 texts for 73 words),
Tal-treebank(80000)
Difficult and expensive to realise.
Non Annotated
Brown, LOB, Reuters
Available in large quantity
Uses for WSD
To evaluate systems (gold standard)
Learning

11
Gold Standard Datasets Manually sense tagged
corpus with respect to a given dictionary

Requirements
Sense selections must be independently made by
more than one person using the same dictionary,
in cases of disagreement a supervisor is called
to choose.
Inter-Tagger Agreement must be high enough (more
than 80)

12
Inter-Tagger Agreement (ITA)

People often disagree on the sense to be assigned
to a corpus instance of a word
ITA can be evaluated if more than one person made
the sense selection on the same text
It is the percentage of the same choices made by
annotators
It can also be evaluated using the Kappa measure.

13
Precision and Recall for WSD

Precision
Recall
where Good is the number of correct answers
provided by the system
Bad is the number of wrong answers provided by
the system
Null is the number of cases in which the
system doesnt provide any answer

Note many systems provide multiple senses for a
single instance of a word so variations for the
measures shown can be used.
14
Classification of WSD systems

Unsupervised
Knowledge based (WN, dictionaries)
Learning from non annotated corpora

Supervised
Learning from sense annotated corpora (e.g.
Semcor, DSO, TAL-treebank, and training data)

Many systems make use of mixed techniques to
improve their results.
15
Baselines for a WSD system

Very easy (naive) WSD procedures
Used to measure the improvement in a WSD system
performance
Represent the lower bound of a WSD systems
accuracy.
Examples
Unsupervised Random, Simple-Lesk
Supervised Most Frequent,Lesk-plus-corpus.

16
Lesks algorithm (1986)

Simple
Choose the sense whose dictionary definition and
example texts have most word in common with the
words around the instance to be disambiguated.

Plus corpus
As Simple Lesk, but also considers the word
contained in the tagged training data.

Supervised
Unsupervised
17
Is ITA the upper bound for the accuracy of WSD
systems?

If a second human agrees with a first only 80
of the time, then it is not clear what it means
to say that a program was more then 80
accurate (Kilgarriff,1998)
The debate is still open
ITA defines the upper bound of how well a
computer program can perform (Kilgarriff)
Computers could work better than humans (Wilks)
If a WSD system has got a recall higher than ITA,
then either the system or the task itself are
wrongly designed (our opinion).

18
Outline

Word Sense Disambiguation (WSD)
Definition of the task
Methodological issues
Senseval-II Experience
Overview
Description of some systems

19
SENSEVAL goals

Provide a common framework to compare WSD systems
Standardise the task (especially evaluation
procedures)
Build and distribute new lexical resources
(dictionaries and sense tagged corpora)
There are now many computer programs for
automatically determining the sense of a word in
context (Word Sense Disambiguation or WSD). The
purpose of Senseval is to evaluate the strengths
and weaknesses of such programs with respect to
different words, different varieties of language,
and different languages. from
http//www.sle.sharp.co.uk/senseval2

20
SENSEVAL History

ACL-SIGLEX workshop (1997)
Yarowsky and Resnik paper
SENSEVAL-I (1998)
Lexical Sample for English, French, and Italian
SENSEVAL-II (Toulouse, 2001)
Lexical Sample and All Words
Organization Kilkgarriff (Brighton)
SENSEVAL-III (???)
Senseval workshop (ACL 2002)

21
WSD at SENSEVAL-II

Choosing the right sense for a word among those
of WordNet

Sense 1 horse, Equus caballus -- (solid-hoofed
herbivorous quadruped domesticated since
prehistoric times) Sense 2 horse -- (a padded
gymnastic apparatus on legs) Sense 3 cavalry,
horse cavalry, horse -- (troops trained to fight
on horseback "500 horse led the attack") Sense
4 sawhorse, horse, sawbuck, buck -- (a framework
for holding wood that is being sawed) Sense 5
knight, horse -- (a chessman in the shape of a
horse's head can move two squares horizontally
and one vertically (or vice versa)) Sense 6
heroin, diacetyl morphine, H, horse, junk, scag,
shit, smack -- (a morphine derivative)
Corton has been involved in the design,
manufacture and installation of horse stalls and
horse-related equipment like external doors,
shutters and accessories.
22
SENSEVAL-II Schedule
23
SENSEVAL-II Tasks

All Words (without training data) Czech, Dutch,
English, Estonian
Lexical Sample (with training data) Basque,
Chinese, Danish, English, Italian, Japanese,
Korean, Spanish, Swedish

24
English SENSEVAL-II

Organization Martha Palmer (UPENN)
Gold-standard 2 annotators and 1 supervisor
(Fellbaum)
Interchange data format XML
Sense repository WordNet 1.7 (special Senseval
release)
Competitors
All Words 11systems
Lexical Sample 16 systems

25
English All Words

Data 3 texts for a total of 1770 words
Average polysemy 6.5
Example (part of) Text 1

The art of change-ringing is peculiar to the
English and, like most English peculiarities ,
unintelligible to the rest of the world . --
Dorothy L. Sayers , " The Nine Tailors " ASLACTON
, England -- Of all scenes that evoke rural
England , this is one of the loveliest An
ancient stone church stands amid the fields , the
sound of bells cascading from its tower ,
calling the faithful to evensong . The
parishioners of St. Michael and All Angels stop
to chat at the church door , as members here
always have .
26
English All Words Systems

Supervised (5)
S. Sebastian (decision lists in Semcor)
UCLA (Semcor, Semantic Distance and Density,
AltaVista for frequency)
Sinequa (Semcor and Semantic Classes)
Antwerp (Semcor, Memory Based Learning)
Moldovan (Semcor plus an additional sense tagged
corpus, heuristics)

Unsupervised (6)
UMED (relevance matrix over Gutemberg project
corpus)
Illinois (Lexical Proximity)
Malaysia (MTD, Machine Tractable Dictionary)
Litkowsky (New Oxford Dictionary and Contextual
Clues)
Sheffield (Anaphora and WN hierarchy)
IRST (WordNet Domains)

27
Fine and coarse grained senses

Fine-grained The answers are compared to the
senses from the Gold Standard.
Coarse-grained The answers are mapped to
coarse-grained senses and compared to the Gold
Standard tags, also mapped to coarse-grained
senses.
Example groups for the verb to use
GROUP 1
use1 use, utilize, utilise, apply, employ --
(put into service)
use3 use -- (seek or achieve an end )
use5 practice, apply, use -- (avail oneself
to)
GROUP 2
use2 use -- (take or consume (regularly))
use4 use, expend -- (use up, consume fully
...)
GROUP 3
use6 use -- (habitually do something)

28
(No Transcript)
29
Lexical Sample

Data 8699 texts for 73 words
Average WN polysemy 9.22
Training Data 8166 (average 118/word)
Baseline (commonest) 0.47 precision
Baseline (Lesk) 0.51 precision

30
Lexical Sample
Example to leave
ltinstance id"leave.130"gt ltcontextgt I 'd been
seeing Johnnie almost a year now, but I still
didn't want to ltheadgtleavelt/headgt him for five
whole days. lt/contextgt lt/instancegt ltinstance
id"leave.157"gt ltcontextgt And he saw them all as
he walked up and down. At two that morning, he
was still walking -- up and down Peony, up and
down the veranda, up and down the silent, moonlit
beach. Finally, in desperation, he opened the
refrigerator, filched her hand lotion, and
ltheadgtleftlt/headgt a note. lt/contextgt lt/instancegt
31
English Lexical Sample Systems

Unsupervised (5) Sunderlard, UNED, Illinois,
Litkowsky, ITRI
Supervised (12) S. Sebastian, Sinequa, Manning,
Pedersen, Korea, Yarowsky, Resnik, Pennsylvania,
Barcellona, Moldovan, Alicante, IRST

32
Supervised Techniques

Algorithms
Decision Lists
Boosting
Domain Driven Disambiguation
...

Features
Lexical Context
Words
Morphological roots
Syntactic Context
POS bigrams/trigrams
Semantic Context
Domains
...

33
Decision Lists 1/2

Training lexical context of n words
Example the word bank
bank1 depository financial institution ...
bank2 sloping land ...
...

34
Decision Lists 2/2

The evidences most strongly indicative of a
particular pattern will have the largest
log-likelihood (strongest and most reliable
evidence)
Log-likelihood for each evidence takes into
account positive and negative examples

Classification of new examples the highest line
in the list that matches the given context
35
Boosting 1/2

Combine many simple and moderately accurate Weak
Classifiers (WC)
Train WCs sequentially, each on the examples
which were most difficult to classify by the
preceding WCs
Examples of WCs
preceding_wordhouse
domainsport
...

36
Boosting 2/2

WCi is trained and tested on the whole corpus
Each pair word, synset is given an importance
weight h depending on how difficult it was for
WC1,,WCi to classify
WCi1 is tuned to classify the worst pairs word,
synset correctly and it is tested on the whole
corpus
so h is updated at each step

At the end all the WCs are combined into a single
rule, the combined hypothesis each WCs is
weighted according to its effectiveness in the
tests
37
Domain Driven Disambiguation 1/3

Comparison between
the domain(s) of each synset of a word
the domain(s) of the context where the word
appears
Domain information is collected in Domain Vectors
having 41 dimensions (one for each domain label)
We build
Text Vectors
Synset Vectors
and we compare them using scalar products

38
Domain Driven Disambiguation 2/3
Example Bank1 depository financial institution
... Bank2 sloping land TEXT He cashed a
check at the bank
1,731878
0,06185

The module of a Synset Vector is proportional to
its frequency (in Semcor or in other training
data)
The direction is indicative of the contribute of
domain(s)

39
Domain Driven Disambiguation 3/3

Obtaining Text Vectors
text categorisation technique based on WordNet
Domains resource
Obtaining Synset Vectors
from training data
from manual annotation (WordNet Domains)

40
(No Transcript)
41
Discussion about IRST Results

Domain Driven Disambiguation can not be
successfully applied to the words that do not
carry relevant domain information
for instance
factotum words (i.e. having many generic senses
e.g. the verb to be)
words whose senses have domains that are far
from the relevant ones in the context
In this cases the system gives no answer this
explains the low recall

42
IRST Results at SENSEVAL-II
43
(No Transcript)
44
Word Sense Disambiguation at Senseval-II

Bernardo Magnini, Carlo Strapparava
Giovanni Pezzulo and Alfio Gliozzo
ITC-irst, Centro per la Ricerca Scientifica e
Tecnologica
Povo (Trento) - Italy
magnini, strappa, pezzulo, gliozzo_at_itc.it

45
Domain Driven Disambiguation

Semantic domains play an important role in the
disambiguation process
Underlying assumption
Knowing in advance the relevant semantic
domain(s) of a text makes word sense
disambiguation easier

46
Domain Information 1/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
1. Furniture chair -- (a seat for one
person) 2. University professorship, chair --
(the position of professor) 3. Administration
president, chairman, chairwoman, chair,
chairperson 4. Law electric chair, chair,
death chair, hot seat
Furniture Play Literature
47
Domain Information 2/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
48
Domain Information 3/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
49
Domain Information 4/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
50
Domain Information 5/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
51
Domain Information Sources

Annotated WordNet (WordNet Domains)
ontology-based (according to the WordNet
hierarchical structure)
focused on technical senses (e.g. believe)
Categorised corpora
words clustering reflects their distribution over
texts
focused on common use

52
WordNet Domains

Integrate taxonomic and domain oriented
information
Cross hierarchy relations
doctor2 Medicine --gt person1
hospital1 Medicine --gt location1
Cross category relations operate3 Medicine
Cross language information

53
Polysemy Reduction
U
B
L
I
S
H
I
N
G
P

R
U
B
L
I
S
H
I
N
G
E
L
I
G
I
O
N
T
H
E
A
T
E
R
C
O
M
M
E
R
C
E
F
A
C
T
O
T
U
M
54
Semantic Domains Organization

250 Domain labels collected from dictionaries
Four level hierarchy (Dewey Decimal
Classification)
41 basic domains used for Senseval

55
WordNet Domain Statistics 1/2
56
WordNet Domains Statistics 2/2
57
Domain Overlapping
Alimentation
Supermarket
Recipe
Restaurant
Cooking
Food
Eating
Fork
Kitchen
Drinking
Diet
Bulimia
Medicine
Hospital
Illness
Doctor
58
This was another difficult verb to group,
possibly even more difficult than "match" (and
thankfully less polysemous!). The problem with
grouping "use" -- and I remember encountering
this in my tagging -- is that the various senses
of "use" sort of shade off into one another, so
that the boundaries are fuzzy even for verbs.
In fact, of all the verbs I tagged, this one is
the murkiest. Ultimately, these groups are
almost artificial. This is true for any grouping
assignment really, but in this case the artifice
is worn on the sleeve. GROUP ONE. If the sense
seemed to be fairly explicit about the existence
of an inherent function or purpose, I grouped it
here. This ended up to be the general
all-purpose when-in-doubt-tag-to-this-sense sense
(that would be sense 1), as well as the specific
exploitative sense where the subject is using the
direct object to further his own advantage
(sense 3) and the sense which refers to using
more abstract sorts of principles (sense
5). GROUP TWO. If the sense seemed to imply
that the thing being used was a commodity, and
that it was being consumed, I put it here. This
ended up to be the drug addict sense (sense 2)
and the deplete sense (sense 5). SENSE 6. I
love sense 6. Since it is really only an
aspectual marker, you can't group it with
anything, and no matter how lumpy you might be,
you can't argue that it should be grouped
anywhere.

Write a Comment

User Comments (0)