Comparing%20Corpus%20Co-Occurrence,%20Dictionary%20and%20Wikipedia%20Entries%20as%20Resources%20for%20Semantic%20Relatedness%20Information - PowerPoint PPT Presentation

About This Presentation
Title:

Comparing%20Corpus%20Co-Occurrence,%20Dictionary%20and%20Wikipedia%20Entries%20as%20Resources%20for%20Semantic%20Relatedness%20Information

Description:

Title: Relationen zwischen Nomen und ihren Assoziationen Author: Cil Last modified by: Michael Roth Created Date: 5/31/2006 7:16:34 AM Document presentation format – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Comparing%20Corpus%20Co-Occurrence,%20Dictionary%20and%20Wikipedia%20Entries%20as%20Resources%20for%20Semantic%20Relatedness%20Information


1
Comparing Corpus Co-Occurrence, Dictionary and
Wikipedia Entries as Resources for Semantic
Relatedness Information
  • Michael Roth

Sabine Schulte im Walde Universität
Stuttgart
2
Overview
  • Motivation / Introduction
  • Data-intensive lexical semantics
  • Corpus-based descriptions
  • Semantic Associations
  • Our Work
  • Evaluation of data-driven models
  • Cross-comparison between resources
  • Summary / Conclusions

3
Data-intensive lexical semantics
  • Modelling word meaning
  • Using meaning aspects
  • Automatically obtainable
  • Goal Determine (dis)similarity of words
  • Applications
  • Word sense discrimination
  • Anaphora resolution
  • ...

4
Corpus-based Descriptions
  • Disadvantage Corpus co-occurrence does not cover
    all aspects of word meaning
  • Especially world knowledge
  • Our question Can we find complementing
    information in other resources?
  • Dictionaries?
  • Encyclopaedias?

5
Dictionary and Encyclopaedia
  • Consider other resources
  • Dictionaries contain detailed information
    about word senses
  • Encyclopaedias written knowledge compendiums
  • How to identify meaning aspects?
  • In our work, we rely on semantic associations

6
Semantic Associations
  • Definition
  • We define semantic associations as concepts
    spontaneously called to mind by other concepts
    (stimuli)
  • Assumption
  • Evoked words reflect highly salient linguistic
    and conceptual features

7
Data Collection Verb Stimuli
  • Associates to verb stimuli
  • Web experiment
  • 330 verb stimuli
  • 30 seconds per verb

klagen complain, moan, sue klagen complain, moan, sue klagen complain, moan, sue
Gericht court 19
jammern moan 18
weinen cry 13
Anwalt lawyer 11
Richter judge 9
Klage complaint 7
Leid suffering 6
Trauer mourning 6
Klagemauer Wailing Wall 5
laut noisy 5
8
Data Collection Noun Stimuli
  • Associates to noun stimuli
  • Offline experiment
  • 409 noun stimuli
  • 3 associates per noun

Schloss castle, lock Schloss castle, lock Schloss castle, lock
Schlüssel key 51
Tür door 15
Prinzessin princess 8
Burg castle 8
sicher safe 7
Fahrrad bike 7
schließen close 7
Keller cellar 7
König king 7
Turm tower 6
9
Knowledge Resources
  • Corpus data
  • German newspaper corpus
  • 200 mio. words
  • Dictionary WDG
  • (Wörterbuch der deutschen Gegenwartssprache)
  • Freely available dictionary (130,000 entries)
  • Average of 840 words/entry
  • Encyclopedia Wikipedia
  • Free online encyclopedia (650,000 articles)
  • Average of 1,164 words/article

10
Analysis Vorgehensweise
  • Corpus data
  • Extract co-occurrence windows of stimuli
  • Check windows for associations
  • WDG / Wikipedia
  • Download stimuli entries
  • Check content for associations
  • Missing entries
  • WDG - 7/0
  • Wikipedia - 2/54

11
Analysis Resource Coverage
  • Noun associate (all)
  • Verb associate (all)

POS Types Tokens
corpus 70 84
WDG 12 28
Wikipedia 26 46
POS Types Tokens
corpus 67 77
WDG 12 25
Wikipedia 6 10
1.2 2.3 1.8
1.2 2.0 1.7
  • Resources differ in ...
  • coverage per stimuli part-of-speech
  • token/type ratio
  • proportions per associates part-of-speech
    (next slide)

12
Analysis Resource Coverage (2)
  • Proportions per associates part-of-speech
  • Noun stimuli
  • Corpus 88 V gt 84 N gt 83 Adj
  • WDG 43 V gt 31 Adj gt 26 N
  • Wikipedia 49 N gt 39 Adj gt 37 V
  • Verb stimuli
  • Corpus 91 Adv gt 79 V gt 77 Adj
    gt 76 N
  • WDG 29 Adv gt 28 V gt 25 Ngt 24 Adj
  • Wikipedia 12 N gt 9 Adj/Adv gt 6 V

13
Analysis Cross-Comparison
  • Noun associate
  • World knowledge?
  • Only in WDG/Wiki carrot orange, cry tears,
    ...
  • Only in Corpus igloo eskimo, teach school,
    ...
  • Verb associate

Corpus Dic Wiki
Corpus - 55.0 46.0
WDG 0.8 - 5.7
Wiki 3.2 18.1 -
Corpus Dic Wiki
Corpus - 45.8 22.1
WDG 0.7 - 3.9
Wiki 0.5 3.6 -
14
Summary / Conclusions
  • Analysis of associations across resources
  • Results
  • Different coverage per stimuli (noun vs. verb)
  • Different (predominant) PoS in word descriptions
  • Different strength of semantic relatedness
  • Resources complement each other
  • gt A combination of resources should be helpful
    for modelling word meaning and similarity

15
(No Transcript)
16
Questions?
Write a Comment
User Comments (0)
About PowerShow.com