Word Sense Disambiguation - PowerPoint PPT Presentation

About This Presentation
Title:

Word Sense Disambiguation

Description:

Word Sense Disambiguation German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Inform tics – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 71
Provided by: Germ138
Learn more at: http://www.cs.upc.edu
Category:

less

Transcript and Presenter's Notes

Title: Word Sense Disambiguation


1
Word Sense Disambiguation
German Rigau i Claramunt http//www.lsi.upc.es/ri
gau TALP Research Center Departament de
Llenguatges i Sistemes Informàtics Universitat
Politècnica de Catalunya
2
WSDOutline
  • Setting
  • Unsupervised WSD systems
  • Supervised WSD systems
  • Using the Web and EWN to WSD

3
Using the Web and EWN for WSDSetting
  • Word Sense Disambiguation
  • is the problem of assigning the appropriate
    meaning (sense) to a given word in a text
  • WSD is perhaps the great open problem at the
    lexical level of NLP (Resnik Yarowsky 97)
  • WSD resolution would allow
  • acquisition of subcategorisation structure
    parsing
  • improve existing Information Retrieval
  • Machine Translation
  • Natural Language Understanding

4
Using the Web and EWN for WSDSetting
  • Example
  • Senses (WordNet 1.5.)
  • age 1 the length of time something (or someone)
    has existed "his age was 71" "it was replaced
    because of its age"
  • age 2 a historic period "the Victorian age"
    "we live in a litigious age
  • DSO Corpora examples (Ng 96)
  • He was mad about stars at the gtgt age 1 ltlt of
    nine .
  • About 20,000 years ago the last ice gtgt age 2 ltlt
    ended .

5
Using the Web and EWN for WSDSetting
  • Knowledge-Driven WSD (Unsupervised)
  • knowledge-based WSD
  • 100 coverage
  • 55 accuracy (SensEval-1)
  • No Training Process
  • Large scale lexical knowledge resources
  • WordNet
  • MRDs
  • Thesaurus

6
Using the Web and EWN for WSDSetting
  • Data-Driven (Supervised)
  • corpus-based WSD
  • statistical-based WSD
  • Machine-Learning WSD
  • no full coverage
  • 75 accuracy (SensEval-1)
  • Training Process
  • learning from large amount of sense annotated
    corpora
  • (Ng 97) effort of 16 man/year per year per
    language

7
UnsupervisedWord Sense DisambiguationSystems
German Rigau i Claramunt http//www.lsi.upc.es/ri
gau TALP Research Center Departament de
Llenguatges i Sistemes Informàtics Universitat
Politècnica de Catalunya
8
Unsupervised WSD SystemsOutline
  • Setting
  • Knowledge-driven WSD methods
  • MRDs
  • Thesauri Corpus
  • LKBs
  • LKBs Conceptual Distance
  • LKBs Conceptual Density
  • LKBs Corpus
  • Experiments Genus Sense Disambiguation
  • Future Work

9
Unsupervised WSD SystemsSetting
  • Knowledge-Driven (Unsupervised)
  • No Need of large anotated corpora
  • Tested on unrestricted domains (words and
    senses)
  • - Worst results

10
Unsupervised WSD SystemsMRDs
  • Lesk Method
  • (Lesk 86)
  • Counting word overlaping between context and
    senses of the word
  • (Cowie et al. 92)
  • simulated annealing for overcomming the
    combinatorial explosion using LDOCE
  • (Wilks Stevenson 97)
  • simulated annealing
  • 57 accuracy at a sense level

11
Unsupervised WSD SystemsMRDs
  • Coocurrence Word Vectors
  • (Wilks et al. 93)
  • word-context vectors from LDOCE
  • testing large set of relateness functions
  • 13 senses of word bank
  • 45 accuracy
  • (Rigau et al. 97)
  • (Noun) Genus Sense Disambiguation
  • 60 accuracy

12
Unsupervised WSD SystemsMRDs
371.616 conexions 11.8004 9.8 16 elaborado
queso 35 113 10.8938 8.0 23 pasta queso
178 113 10.4846 7.5 25 leche queso 274
113 10.2483 9.2 13 oveja queso 45
113 9.1513 7.6 16 queso sabor 113
160 7.4956 8.3 8 queso tortilla 113
51 6.7732 7.5 8 queso vaca 113 84 6.5830
6.1 12 maíz queso 347 113 6.2208 8.9 5
queso suero 113 21 6.1509 8.8 5
mantequilla queso 22 113 6.1474 7.9 6
compacta queso 50 113 5.9918 7.7 6 picante
queso 55 113 5.9002 9.8 4 manchego queso
9 113 5.6805 7.3 6 cabra queso 75
113 5.6300 5.9 9 pan queso 287 113
13
Unsupervised WSD SystemsThesauri Corpus
  • (Yarowsky 92)
  • uses Rogets Thesaurus to partition
  • Groliers Enciclopedia
  • 1042 categories
  • 92 accuracy for 12 polysemous words
  • (Yarowsky 95)
  • seed words
  • (Liddy Paik 92)
  • subject-code correlation matrix
  • 122 LDOCE semantic codes
  • 166 sentences of Wall Street Journal
  • 89 correct subject code

14
Unsupervised WSD SystemsLKBs Conceptual
Distance
  • (Rada et al. 92)
  • length of the shortest path
  • (Sussna 93)
  • (Agirre et al. 94)
  • (Rigau 94 Rigau et al. 95, 97 Atserias et al.
    97)
  • length of the shortest path
  • specificity of the concepts

15
Unsupervised WSD SystemsLKBs Conceptual Density
  • (Agirre Rigau 95, 96)

16
Unsupervised WSD SystemsLKBs Conceptual Density
  • (Agirre Rigau 95, 96)
  • length of the shortest path
  • the depth in the hierarchy
  • concepts in a dense part of the hierarchy are
    relatively closer than those in a more sparse
    region.
  • the measure should be independent of the number
    of concepts involved

17
Unsupervised WSD SystemsLKBs Corpus
  • (Resnik 95)
  • Information Content
  • (Richardson et al. 94)
  • (Jiang Conrath 97)

18
Unsupervised WSD SystemsExperiments Genus Sense
Disambiguation
  • Unsupervised WSD
  • Unrestricted WSD (coverage 100)
  • Eight Heuristics (McRoy 92)
  • Combining several lexical resources
  • Combining several methods

19
Unsupervised WSD SystemsExperiments Genus Sense
Disambiguation
  • 0) Monosemous Genus Term
  • 1) Entry Sense Ordering
  • 2) Explicit Semantic Domain
  • 3) Word Matching (Lesk 86)
  • 4) Simple Concordance
  • 5) Coocurrence Word Vectors
  • 6) Semantic Vectors
  • 7) Conceptual Distance

20
Unsupervised WSD SystemsExperiments Genus Sense
Disambiguation
  • Results

21
Unsupervised WSD SystemsExperiments Genus Sense
Disambiguation
  • Knowledge provided by each heuristic

22
SupervisedWord Sense DisambiguationSystems
German Rigau i Claramunt http//www.lsi.upc.es/ri
gau TALP Research Center Departament de
Llenguatges i Sistemes Informàtics Universitat
Politècnica de Catalunya
23
WSD using ML algoritmsOutline
  • Setting
  • Methodology
  • Machine Learning algorithms
  • Naive Bayes (Mooney 98)
  • Snow (Dagan et al. 97)
  • Exemplar-based (Ng 97)
  • LazyBoosting (Escudero et al. 00)
  • Experimental Results
  • Naive Bayes vs. Exemplar Based
  • Portability and Tuning of Supervised WSD
  • Future Work

24
WSD using ML algorithmsSetting
  • Data-Driven (Supervised)
  • Better results
  • - Need of large corpora
  • knowledge adquisition bottleneck
  • (Gale et al. 93, Ng 97)
  • - Tested on limited domains (words and senses)

25
WSD using ML algorithmsSetting
  • Current research lines open the bottleneck
  • Design of efficient example sampling methods
    (Engelson Dagan 96 Fujii et al. 98)
  • Use of WordNet and Web to automatically obtain
    examples (Leacock et al. 98 Mihalcea Moldovan
    99)
  • Use of unsupervised methods for estimating
    parameters (Pedersen Bruce 98)

26
WSD using ML algorithmsSetting
  • Contradictory Previous Work
  • (Mooney, 98)
  • t-student test of significance
  • n-fold cross-validation
  • - Word line with 4,149 examples and 6 senses
    (Leacock et al. 93).
  • - Neither parameter setting nor algorithm
    tunning
  • (Ng 97)
  • Large corpora (192,800 occurrences of 191
    words)
  • - Direct Test (No n-fold crossvalidation).
  • - Small set of features.

27
WSD using ML algoritmsOutline
  • Setting
  • Methodology
  • Machine Learning algorithms
  • Naive Bayes (Mooney 98)
  • Snow (Dagan et al. 97)
  • Exemplar-based (Ng 97)
  • LazyBoosting (Escudero et al. 00)
  • Experimental Results
  • Naive Bayes vs. Exemplar Based
  • Portability and Tuning of Supervised WSD
  • Future Work

28
WSD using ML algorithms Methodology
  • Main goals
  • Study supervised methods for WSD
  • Use it with Automatically Extracted Examples
    from the Web using WordNet
  • Rigorous direct comparisons
  • Supervised WSD Methods
  • Naive Bayes
  • State-of-the-art accuracy (Mooney 98)
  • Snow
  • From Text Categorization (Dagan et al. 97)
  • Exemplar-based
  • State-of-the-art accuracy (Ng 97)
  • Boosting
  • From Text Categorization (Schapire Singer to
    appear, Escudero, Màrquez Rigau 2000)

29
WSD using ML algorithms Methodology
  • Evaluation (Dietterich 98)
  • 10-fold crossvalidation
  • t-student test of significance
  • Data
  • LDC (Ng 96)
  • 192,800 occurrences of 191 words
  • (121 nouns 70 verbs)
  • Avg. Number of senses 7.2 N, 12.6 V, 9.2 (all)
  • WSJ Corpus (Corpus A)
  • Brown Corpus (Corpus B)
  • Sets of attributes
  • Set A (Ng 97)
  • Small set of features
  • No broad-context attributes
  • Set B ? (Ng 96)
  • Large set of features
  • Broad-context attributes

30
WSD using ML algoritmsOutline
  • Setting
  • Methodology
  • Machine Learning algorithms
  • Naive Bayes (Mooney 98)
  • Snow (Dagan et al. 97)
  • Exemplar-based (Ng 97)
  • LazyBoosting (Escudero et al. 00)
  • Experimental Results
  • Naive Bayes vs. Exemplar Based
  • Portability and Tuning of Supervised WSD
  • Future Work

31
WSD using ML algorithmsNaive Bayes
  • Based on Bayes Theorem (Duda Hart 73)
  • Frequencies used as probabilities
  • Assumed independence of example features
  • Smoothing technique (Ng 97)

32
WSD using ML algorithmsExemplar-based WSD
  • k-NN approach (Ng 96 Ng 97)
  • Distances
  • Hamming
  • Modified Value Difference Metric
  • MVDM (Cost Salzberg 93)
  • Variants
  • Example weighting
  • Attribute weighting (RLM 91)

k3
33
WSD using ML algorithmsSnow
  • Snow (Golding Roth 99)
  • Sparse Network of Winows
  • On-line learning system
  • Winow (Littlestone 88)
  • linear threshold
  • mistake-driven (when predicted class is wrong)

34
WSD using ML algorithmsSnow
MAX
Winow Sense 1
Winow Sense 2
wf
w-1 average
w242
w1of
w2nuclear
... an average ltage_1gt of 42 ...
... in this ltage_2gt of nuclear ...
35
WSD using ML algorithmsBoosting
  • AdaBoost.MH (Freund Shapire00)
  • Combine many simple weak classifiers
    (hypothesis)
  • Weak classifiers are trained sequencially
  • Each iteration concentrate on the most difficult
    cases
  • Results Better than NB and EB
  • - Problem Computational Complexity
  • Time and space grow linearly with number of
    examples.
  • Solution LazyBoosting!

36
WSD using ML algoritmsOutline
  • Setting
  • Methodology
  • Machine Learning algorithms
  • Naive Bayes (Mooney 98)
  • Snow (Dagan et al. 97)
  • Exemplar-based (Ng 97)
  • LazyBoosting (Escudero et al. 00)
  • Experimental Results
  • Naive Bayes vs. Exemplar Based
  • Portability and Tuning of Supervised WSD
  • Future Work

37
WSD using ML algorithmsExperimental Results
(LazyBoosting)
  • Features from Set A (Ng 97)
  • w-2, w-1 , w1, w2 , (w-2, w-1), (w-1 , w1),
    (w1, w2)
  • 15 reference words (10 N, 5 V)
  • Average
  • ns ex att
  • nouns (121) 8.6 1040 3978
  • verbs (70) 17.9 1266 4432
  • total (191) 12.1 1115 4150
  • Accuaracy
  • MFS NB EB1 EB15 AB750 ABSC
  • nouns (121) 57.4 71.7 65.8 71.1 73.5 73.4
  • verbs (70) 46.6 57.6 51.1 58.1 59.3 59.1
  • total (191) 53.3 66.4 60.2 66.2 68.1 68.0

38
WSD using ML algorithmsExperimental Results
(LazyBoosting)
  • Accelerating the WeakLearner
  • Reducing Feature Space
  • Frequency filtering (Freq)
  • Discard those features occourring less than N
    times
  • Local frequency filtering (LFreq)
  • Selects the N most freqeunt features of each
    sense
  • RLM ranking (López de Mantaras 91)
  • Selects the N most relevant features
  • Reducing the number of Attributes examined
  • LazyBoosting
  • A small proportion of attributes are randomly
    selected at each iteration

39
WSD using ML algorithmsExperimental Results
(LazyBoosting)
  • Accelerating the WeakLearner
  • All methods perform quite well
  • many irrelevant attributes in the domain
  • LFreq is slghly better than Freq
  • RLM performs better than LFreq and Freq
  • LazyBoosting is better than all other methods
  • acceptable performance with 1 of exploration
    when looking for a weak rule.
  • 10 achieves the same performance than 100
  • 7 times faster!

40
WSD using ML algorithmsExperimental Results
(LazyBoosting)
  • 7 features from Set A (Ng 97)
  • w-2, w-1 , w1, w2 , (w-2, w-1), (w-1 , w1),
    (w1, w2)
  • 15 reference words (10 N, 5 V)
  • Average
  • ns ex att
  • nouns (121) 8.6 1040 3978
  • verbs (70) 17.9 1266 4432
  • total (191) 12.1 1115 4150
  • Accuaracy
  • MFS NB EB15 LB10SC
  • nouns (121) 56.4 68.7 68.0 70.8
  • verbs (70) 46.7 64.8 64.9 67.5
  • total (191) 52.3 67.1 66.7 69.5

41
WSD using ML algorithmsExperimental Results (NB
vs EB)
  • Experiments on Set A with 15 words
  • Results
  • Conclusions
  • NB and EB are better than MFS
  • k-NN performs better with kgt1
  • Variants of EB improve the EB
  • MVDM(cs) metric is better than Hamming distance
  • EB performs better than NB

42
WSD using ML algorithmsExperimental Results (NB
vs EB)
  • Experiments on Set B with 15 words
  • Results
  • What happened?
  • Problem with the binary representation of the
    broad-context attributes.
  • Examples are represented with sparse vectors
    (5,000 positions).
  • Two examples coincide in the majority of values.
  • Biases the similarity measure in favour of
    shortest sentences.
  • Related work Clarified
  • (Mooney 98)
  • Poor results of k-NN algorithm
  • (Ng 96 Ng 97)
  • Lower results of a system with a large number of
    attributes

43
WSD using ML algorithmsExperimental Results (NB
vs EB)
  • Improving both methods (NB and EB) (Escudero et
    al. 00b)
  • Use only positive information
  • Treat the broad-context attributes as
    multivalued attributes
  • Let two values
  • The similarity S between two values has to be
    redefined as
  • This representation allows a very
    computationally efficient implementation
  • Positive Naive Bayes (PNB)
  • Positive Exemplar-based (PEB)

44
WSD using ML algorithmsExperimental Results (NB
vs EB)
  • Experiments on Set B with 15 words
  • Results
  • Conclusions
  • PEB improves by 12.2 points the accuracy of EB
  • PEB is higher than Set A except PEBh,10,e,a
  • PNB is at least as accurate as NB
  • The positive approach increases greatly the
    efficiency (80 times for NB and 15 for EB) of the
    algorithms
  • PEB accuracy is higher than PNB

45
WSD using ML algorithmsExperimental Results (NB
vs EB)
  • Global Results (191 words)
  • Conclusions
  • In Set A,
  • The best option is Exemplar-based using MVDM
    metric
  • In Set B,
  • The best option is Exemplar-based using Hamming
    distance and example weighting
  • MVDM metric has higher accuracy but is currently
    computationally prohibitive
  • Positive Exemplar-based allows the addition of
    unordered contextual attributes with an accuracy
    improvement
  • Positive information allows to improve greatly
    the efficiency

46
WSD using ML algorithmsExperimental Results
(Portability)
  • 15 features from Set A (Ng 96)
  • p-3, p-2, p-1 , p1, p2, p3, w-1 , w1, (w-2,
    w-1), (w-1 , w1), (w1, w2), (w-3, w-2, w-1),
    (w-2, w-1 , w1), (w-1 , w1, w2), (w1, w2 ,
    w3)
  • 21 reference words (13 N, 8 V)
  • DSO Corpus
  • Wall Street Journal (Corpus A)
  • Brown Corpus (Corpus B)
  • 7 combinations of training-test sets
  • AB-AB, AB-A, AB-B
  • A-A, B-B, A-B, B-A
  • Forcing the number of examples of corpus A and B
    be the same (reducing the size to the smalest)

47
WSD using ML algorithmsExperimental Results
(Portability)
First Experiment ( accuracy) Method AB-AB AB-
A AB-B MFC 46.6 53.0 39.2 NB 61,6 67.3 55.9
EB 63.0 69.0 57.0 Snow 60.1 65.6 56.3 LB 66
.3 71.8 60.9 Method A-A B-B A-B B-A MFC
56.0 45.5 36.4 38.7 NB 65.9 56.8 41.4 47.
7 EB 69.0 57.4 45.3 51.1 Snow 67.1 56.1
44.1 49.8 LB 71.3 59.0 47.1 52.0
48
WSD using ML algorithmsExperimental Results
(Portability)
  • Conclusions of First Experiment
  • LazyBoosting outperforms all other methods in all
    cases
  • the knowledge acquired from a single corpus
    almost covers the knowldge of combining both
    corpora
  • Very disappointing results!
  • Looking at Kappa values
  • NB most similar to MFC
  • LB most similar to DSO
  • LB most disimilar to MFC

49
WSD using ML algorithmsExperimental Results
(Portability)
  • Second Experiment
  • Adding tuning material
  • BA-A, AB-B, A-A, B-B
  • ranging from 10 to 50 (50 remaining for test)
  • For NB, EB, Snow it is not worth keeping the
    original corpus
  • LB has a moderate (but consistent) improvement
    when retaining the original training set

50
WSD using ML algorithmsExperimental Results
(Portability)
  • Third Experiment
  • Two main reasons
  • Corpus A and B have a very different distribution
    of senses
  • Examples of corpus A and B contain different
    information
  • New corpus sense-balanced
  • Forcing the number of examples of each sense of
    corpus A and B be the same (reducing the size to
    the smalest)

51
WSD using ML algorithmsExperimental Results
(Portability)
  • Third Experiment ( accuracy)
  • Method AB-AB AB-A AB-B
  • MFC 48.6 48.6 48.5
  • LB 64.4 66.2 62.5
  • Method A-A B-B A-B B-A
  • MFC 48.6 48.5 48.7 48.7
  • LB 65.2 61.7 56.1 58.0
  • Even when the same distribution of senses is
    conserved between training and test examples, the
    portability is not garanteed!

52
WSD using ML algoritmsOutline
  • Setting
  • Methodology
  • Machine Learning algorithms
  • Naive Bayes (Mooney 98)
  • Snow (Dagan et al. 97)
  • Exemplar-based (Ng 97)
  • LazyBoosting (Escudero et al. 00)
  • Experimental Results
  • Naive Bayes vs. Exemplar Based
  • Portability and Tuning of Supervised WSD
  • Future Work

53
WSD using ML algorithmsFuture Works
  • Other methods (SVMs, DLs, ...)
  • Other corpora (Semcor, Senseval, Bruce, ...)
  • Comparison with unsupervised methods
  • Combination of classifiers
  • Search of the optimum set of features for each
    method
  • Try new sets of features (semantic features,
    ...)
  • 3 research lines of bottleneck knowledge
    adquisition solution
  • Other tagsets (synsets, semantic fields, base
    concepts, groups of synsets, ...)

54
Using the Web and EuroWordNet forWord Sense
Disambiguation
German Rigau i Claramunt http//www.lsi.upc.es/ri
gau TALP Research Center Departament de
Llenguatges i Sistemes Informàtics Universitat
Politècnica de Catalunya
55
Using the Web and EWN for WSDOutline
  • Setting
  • Exploiting EWN Semantic Relations
  • Collecting training Corpus from the Web

56
Using the Web and EWN for WSDSetting
  • Our approach
  • Unsupervised
  • Automatically obtain training corpora
  • using the Web or on-line corpora
  • to feed a supervised ML WSD system

57
Using the Web and EWN for WSDOutline
  • Setting
  • Exploiting EWN Semantic Relations
  • Collecting training Corpus from the Web

58
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
  • WordNet
  • WordNet is organized conceptually
  • 123,497 content words
  • 11,514 polisemous
  • 99,642 synsets

wine, vino -- (fermented juice (of grapes
especially)) gt sake, saki -- (Japanese
beverage from fermented rice ...) gt
vintage -- (a season's yield of wine from a
vineyard) gt red wine -- (wine having a
red color derived from skins ...) gt
Pinot noir -- (dry red California table wine
...) gt claret, red Bordeaux -- (dry
red Bordeaux or Bordeaux-like wine)
gt Saint Emilion -- (full-bodied red wine from
...) gt Chianti -- (dry red Italian
table wine from the Chianti ...) gt
Cabernet, Cabernet Sauvignon -- (superior
Bordeaux-type red wine) gt Rioja --
(dry red table wine from the Rioja ...)
gt zinfandel -- (dry fruity red wine from
California)
59
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
  • SR PoS Examples
  • Synonymy Noun coche, automóvil
  • Verb salir, pasear
  • Adj feliz, contento
  • Adv duramente, severamente
  • Hyponymy Noun coche -gt vehículo
  • Meronymy Noun motor -gt coche
  • Troponymy Verb marchar -gt caminar
  • Entailment Verb roncar -gt dormir

60
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
61
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
partido 1 Todos los partidos piden reformas
legales para TV3. La derecha planea agruparse en
un partido. El diputado reiteró que ni él ni UDC,
como partido, han recibido dinero de
Pellerols. partido 2 Pero España puso al
partido intensidad, ritmo y coraje. El
seleccionador cree que el partido de hoy contra
Italia dará la medida de España El Racing no gana
en su campo desde hace seis partidos.
62
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
partido 1 No negociaremos nunca com un partido
político que sea partidario de la independencia
de Taiwan. Una vez más es noticia la desviación
de fondos destinadoss a la formación ocupacional
hacia la financiación de un partido
político. Estas lleyess fueron votadas gracias a
un consenso general de los partidos
políticos. partido 2 Rivera pide el suporte de
la afición para encarrilar las semifinales. Sólo
el equipo de Valero Ribera puede sentenciar una
semifinal como lo hizo ayer en un Palau Blaugrana
completamente entregado. El Racing ganó los
cuartos de final en su campo.
63
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
  • 11,514 polisemous words
  • 1 sense
  • synonym brother father daugther grandchid
  • 1 step 2095 8903 3894 759 116
  • 2 step 3 1331 16 3
  • 3 step 512
  • 4 step 147
  • 5 step 43
  • total 2905 8906 5927 775 119

64
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
  • 11,514 polisemous words
  • 2 senses
  • synonym brother father daughter grandchild
  • 1 step 479 6988 584 408 87
  • 2 step 24 97 8 2
  • 3 step 9
  • 4 step 3
  • total 479 7012 693 417 89

65
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
11,514 polisemous words 3 senses synonym bro
ther father daughter grandchild 1
step 108 5640 76 239 59 2
step 22 6 1 total 108 5662 76 245
60
66
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
11,514 polisemous words 1 sense SB SD SB
D SBDF SBDFC 1 step 8903 3461 9257 102
84 10284 2 step 3 34 188 1068 1068 3
step 2 30 137 137 4 step 4 19
19 total 8906 3487 9479 11508 11508
67
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
11,514 polisemous words 2 sense SB SD SBD
SBDF SBDFC 1 step 7580 1282 8048 8891 88
99 2 step 281 16 461 1196 1213 3
step 11 1 33 264 245 4 step 2 80 74 5
step 13 13 6 step 2 2 total 7872 1299 8
544 10446 10446
68
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
  • 11,514 polisemous words
  • 3 sense
  • SB SD SBD SBDF SBDFC
  • 1 step 6116 568 6691 7657 7673
  • 2 step 274 5 482 1030 1039
  • 3 step 5 46 295 311
  • 4 step 7 91 78
  • 5 step 1 28 12
  • 6 step 3 3
  • total 6395 573 7230 9104 9113

69
Using the Web and EWN for WSDOutline
  • Setting
  • Exploiting EWN Semantic Relations
  • Collecting training Corpus from the Web

70
Using the Web and EWN for WSDCollecting training
Corpus from the Web
  • (Mihalcea Moldovan 99)
  • Search engines Altavista
  • Complex queries
  • synonyms
  • definitions
  • 120 word senses
  • 91 precision
  • Example
  • ltgrow, raise, farm, producegt (cultivate by
    growing)
  • cultivate NEAR growing AND (grow OR raise OR farm
    OR produce)
Write a Comment
User Comments (0)
About PowerShow.com