Title: Word sense disambiguation Study on word net ontology
1Word sense disambiguationStudy on word net
ontology
- Akilan Velmurugan
- Computer Networks CS 790G
2Overview
- What is WSD ?
- How wordnet is analyzed as a Complex Network
- What are the results
- Project Methodology
- Area of study
- Key Findings/Results
- New approaches
- Improvement techniques
- Conclusion
3Project Description
- Objective
- Study on WSD
- Effects of WSD in Word Sense Ontology
- Characteristics of WordNet
- Results
- How do match words with other words
- Parameters taken for study of word sense
- Improvise them by making necessary changes
- Study network characteristics
4WordNet - overview
- Machine readable semantic dictionary interlinked
by semantic relations - Developed at Princeton University as a large
lexical database for English language - Most widely used linguistic resource
- Free for public (GPL )
- Forms a scale free network with small average
shortest path having words as nodes and concepts
as links - Easily navigable
5WordNet (Structure)
- Shows the relation in the form of
- Noun, Verb, Adjective, adverb
- Synonym
- Hypernym (Is a kind of )
- Hyponym ( Is a kind of)
- Troponym (particular ways to )
- Meronym (parts of )
- ---- about 25 relations
- Also available for online navigation
-
6WordNet online - by Princeton University
WordNet online
7WordNet Browser
Wordnet Application
8WordNet (working)
- WSD
- Corpus based approaches
- Set of samples that enables the system
- Knowledge based approaches
- Machine readable dictionary with relations
- WordNet Research
- Open source
- Ranking of synsets derived from word frequencies
in the British National Corpus - Top 1000
- Content manipulation of text
- Dataset I controlled and calibrated study
- Dataset II collected using mechanical trunk
using pairs
WordNet Database
9Word Sense Disambiguation (WSD)
- Task of determining the meaning of an ambiguous
word in the given context - Bank
- Edge of a river
- or
- Financial institution that accepts money
- Refers to the resolution of lexical semantic
ambiguity and its goal is to attribute the
correct senses to words (AI-complete problem)
10WSD Area of Research
- Assigning correct sense to words having
electronic dictionary as source of word
definitions - Open research field in Natural Language
Processing (NLP) - Hard Problem which is a popular area for research
- Used in speech synthesis by identifying the
correct sense of the word
11JavaScript Visual WordNet
Visual WordNet
12Visual Thesaurus
Visual Thesaurus
13WordNet Theoretical aspects
- Wordnet word sense ontology
- Symbols are words
- Synset list of words and semantic relations
between them - Word sense disambiguation
- Wordnet structure using latent semantics
- Variable lexical notation for a concept
- Citibase Thesaurus
- Semantic relatedness
- And few others
14WSD using latent semantics
- Measures the semantic distance of concepts
- Relatedness and between-ness are calculated
- Matrix form of wordnet data structure is used
- Can be used to integrate with other applications
- Uses Singular Value Decomposition (SVD) algorithm
- Example Multiple synsets are
- car, gondola
- car, railway car
- car, automobile
- Motor vehicle, Coupe, Sedan, Taxi
15MDS-example
1, 2, 3, 4, 10, 12
5, 6, 7, 8, 9, 11, 13
k-means
S
Geodesic Distance Matrix
-1.22 -0.12
-0.88 -0.39
-2.12 -0.29
-1.01 1.07
0.43 -0.28
0.78 0.04
1.81 0.02
-0.09 -0.77
-0.09 -0.77
0.30 1.18
2.85 0.00
-0.47 2.13
-0.29 -1.81
1 2 3 4 5 6 7 8 9 10 11 12 13
1 0 1 1 1 2 2 3 1 1 2 4 2 2
2 1 0 2 2 1 2 3 2 2 3 4 3 3
3 1 2 0 2 3 3 4 2 2 3 5 3 3
4 1 2 2 0 3 2 3 2 2 1 4 1 3
5 2 1 3 3 0 1 2 2 2 2 3 3 3
6 2 2 3 2 1 0 1 1 1 1 2 2 2
7 3 3 4 3 2 1 0 2 2 2 1 3 3
8 1 2 2 2 2 1 2 0 2 2 3 3 1
9 1 2 2 2 2 1 2 2 0 2 3 3 1
10 2 3 3 1 2 1 2 2 2 0 3 1 3
11 4 4 5 4 3 2 1 3 3 3 0 4 4
12 2 3 3 1 3 2 3 3 3 1 4 0 4
13 2 3 3 3 3 2 3 1 1 3 4 4 0
MDS
Source Lecture18 Community Structure by
Prof.Gunes
16WSD using latent semantics
17WSD variable lexical notations for a concept
- Generic concept notation
- D I ? J ? K
- ? J D - (I ? K)
- (D - I )n(D - K)
- Dn (I? K)
- J Dn ( I nK)
- since, B D ? E ? F
- D B - (E?F)
- (B - E)n(B - F)
- Bn(E ?F)
- D B n(E n F)
Source Proceedings of the 20th International
Conference on Advanced Information Networking and
Applications
18WSD variable lexical notations for a concept
- J Dn ( I nK)
- ( Bn(E n F) )n( I n K)
- J Bn( (E n F)n( I n K) )
- when J fly,
- D fish lure
- I spinner
- k troll
- And introducing boolean operators,
- AND for n
- OR for ?
- NOT for
Source Proceedings of the 20th International
Conference on Advanced Information Networking and
Applications
19WSD variable lexical notations for a concept
- (fly) becomes
- (fisherman's lure OR fish lure) AND (
(NOT spinner) AND (NOT troll) ) - then B lure,
- E ground bait,
- F stool pigeon
- (fly) becomes
- (bait OR decoy OR lure) AND ( ((NOT
ground bait) AND (NOT stoolpigeon) AND((NOT
spinner)AND(NOT troll)) )
Source Proceedings of the 20th International
Conference on Advanced Information Networking and
Applications
20Thesaurus as a complex network
- As a Directed Graph
- sink composed of the 73,046 terms with kout 0
- source are the 30,260 terms with at least one
outgoing link (kout gt 0) Root words - absolute source without incoming links kin 0
- normal source (kout gt 0 and kin gt 0)
- bridge source without outgoing links to root
words (kout(source) 0)
1 Normal source 2 Bridge source 3 Absolute
source 4 sink
Source arXivcond-mat/0312586 v1 2003
21WSD Semantic relatedness and word sense
disambiguation
- Concepts that occur more frequently and closer
with each others are more related to each
others than the concepts that appear less
frequently and farther one
Source Proceedings of the 20th International
Conference on Advanced Information Networking and
Applications
22WordNet Relationship
- Semantic relatedness
- Involves relationships among words
- car-wheel (meronym)
- hot-cold (antonym)
- pencil-paper (functional)
- penguin-antarctica (association)
- Bank-trust company (synonym)
- Probability and Distance calculation
- Frequency of synsets or words
- Performance in NLP applications
23WordNet Relationship Browser
WordNet Relationship Browser
24WordNet Connect
- Program to find all possible connections between
two words in WordNet - Used in computing Semantic Opposition among word
sense ontology - WordNet lexical database dictionary is used to
read the semantic relations - Capabilities like number of paths, shortest path,
overall network structure is studied
25WordNet Connect
WordNet Connect
26WordNet Connect
WordNet Connect
27WordNet Connect
WordNet Connect
28Future work
- WordNet structure in terms of complex network
- Key assumptions
- WordNet lexical dictionary analyzed under the
scope of source node, target node with an
additional reference node - Achieve a cost effective path which is
conditionally related to mean reference node - Control the path traversal with a relation of
focus - Include Common File Number to make it more
efficient
29Conclusion
- A single visualization can not reveal the entire
structure of wordnet - There are different ways of analyzing the
effectiveness of the overall system - A new method to evaluate the usefullness of the
WordNet network structure
30Questions and Comments