Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies

Description:

Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies Euripides G.M. Petrakis Giannis Varelas – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 23
Provided by: eur87
Category:

less

Transcript and Presenter's Notes

Title: Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies


1
Design and Evaluation of Semantic Similarity
Measures for Concepts Stemming from the Same or
Different Ontologies
  • Euripides G.M. Petrakis
  • Giannis Varelas
  • Angelos Hliaoutakis
  • Paraskevi Raftopoulou

2
Semantic Similarity
  • Relates to computing the conceptual similarity
    between terms which are not necessarily
    lexicacally similar
  • car-automobile-vehicle,
  • drug- medicine
  • Tool for making knowledge commonly understandable
    in applications such as IR, information
    communication in general

3
Methodology
  • Terms from different communicating sources are
    represented by ontologies
  • Map two terms to an ontology and compute their
    relationship in that ontology
  • Terms from different ontologies Discover
    linguistic relationships or affinities between
    terms in different ontologies

4
Contributions
  • We investigate several Semantic Similarity
    Methods and we evaluate their performance
  • http//www.intelligence.tuc.gr/similarity
  • We propose a novel semantic similarity measure
    for comparing concepts from different ontologies

5
Ontologies
  • Tools of information representation on a subject
  • Hierarchical categorization of terms from general
    to most specific terms
  • object ? artifact ? construction ? stadium
  • Domain Ontologies representing knowledge of a
    domain
  • e.g., MeSH medical ontology
  • General Ontologies representing common sense
    knowledge about the world
  • e.g., WordNet

6
WordNet
  • A vocabulary and a thesaurus offering a
    hierarchical categorization of natural language
    terms
  • More than 100,000 terms
  • Nouns, verbs, adjectives and adverbs are grouped
    into synonym sets (synsets)
  • Synsets represent terms or concepts with similar
    meaning
  • stadium, bowl, arena, sports stadium (a large
    structure for open-air sports or entertainments)

7
WordNet Hierarchies
  • The synsets are also organized into senses
  • Senses Different meanings of the same term
  • The synsets are related to other synsets higher
    or lower in the hierarchy by different types of
    relationships e.g.
  • Hyponym/Hypernym (Is-A relationships)
  • Meronym/Holonym (Part-Of relationships)
  • Nine noun and several verb Is-A hierarchies

8
A Fragment of the WordNet Is-A Hierarchy
9
MeSH
  • MeSH ontology for medical and biological terms
    by the N.L.M.
  • Organized in IS-A hierarchies
  • More than 15 taxonomies, more than 22,000 terms
  • No part-of relationships
  • The terms are organized into synsets called
    entry terms

10
A Fragment of the MeSH Is-A Hierarchy
11
Semantic Similarity Methods
  • Map terms to an ontology and compute their
    relationship in that ontology
  • Four main categories of methods
  • Edge counting path length between terms
  • Information content as a function of their
    probability of occurrence in a corpus
  • Feature based similarity between their
    properties (e.g., definitions) or based on their
    relationships to other similar terms
  • Hybrid combine the above ideas

12
Example
  • Edge counting distance between conveyance and
    ceramic is 2
  • An information content method, would associate
    the two terms with their common subsumer and with
    their probabilities of occurrence in a corpus

13
X-Similarity
  • Relies on matching between synsets and set
    description sets
  • A,B synsets or term description sets
  • Do the same with all IS-A, Part-Of relationships
    and take their maximum

14
Example
  • S(Hypothyroidism, Hyperthyroidism) 0.387

WordNet term Hypothyroidism MeSH term Hyperthyroidism
lttermgt hypothyroidism ltdefinitiongt An underactive thyroid gland a glandular disorder Resulting from insufficient production of thyroid hormones. lt/definitiongt ltsynsetgt Hypothyroidism lt/synsetgt lthypernymsgt glandular disease, disorder, condition, state lt/hypernymsgt lthyponymsgt myxedema, cretinism lt/hyponymsgt lt/termgt lttermgt hyperthyroidism ltdefinitiongt Hypersecretion of Thyroid Hormones from Thyroid Gland. Elevated levels of thyroid hormones increase Basal Metabolic Rate. lt/definitiongt ltsynsetgt Hyperthyroidism lt/synsetgt lthypernymsgt disease, thyroid, Endocrine System Diseases, diseases lt/hypernymsgt lthyponymsgt thyrotoxicosis, thyrotoxicoses lt/hyponymsgt lt/termgt
15
Evaluation
  • The most popular methods are evaluated
  • All methods applied on a set of 38 term pairs
  • Their similarity values are correlated with
    scores obtained by humans
  • The higher the correlation of a method the better
    the method is

16
Evaluation on WordNet
Method Type Correlation
Rada 1989 Edge Counting 0.59
Wu 1994 Edge Counting 0.74
Li 2003 Edge Counting 0.82
Leackok 1998 Edge Counting 0.82
Richardson 1994 Edge Counting 0.63
Resnik 1999 Info. Content 0.79
Lin 1993 Info. Content 0.82
Lord 2003 Info. Content 0.79
Jiang 1998 Info. Content 0.83
Tversky 1977 Feature Based 0.73
X-Similarity Feature Based 0.74
Rodriguez 2003 Hybrid 0.71
17
Evaluation on MeSH
Method Type Correlation
Rada 1989 Edge Counting 0.50
Wu 1994 Edge Counting 0.67
Li 2003 Edge Counting 0.70
Leackok 1998 Edge Counting 0.74
Richardson 1994 Edge Counting 0.64
Resnik 1999 Info. Content 0.71
Lin 1993 Info. Content 0.72
Lord 2003 Info. Content 0.70
Jiang 1998 Info. Content 0.71
Tversky 1977 Feature Based 0.67
X-Similarity Feature Based 0.71
Rodriguez 2003 Hybrid 0.71
18
Cross Ontology Measures
  • We used 40 MeSH terms pairs
  • One of the terms is a also a WordNet term
  • We measured correlation with scores obtained by
    experts

Method Type Correlation
X-Similarity Feature-Based 0.70
Rodriguez Hybrid 0.55
19
Comments
  • Edge counting/Info. Content methods work by
    exploiting structure information
  • Good methods take the position of the terms into
    account
  • Higher similarity for terms which are close
    together but lower in the hierarchy e.g., Li
    et.al. 2003
  • X Similarity performs at least as good as other
    Feature-Based methods
  • Outperforms other Cross-Ontology methods

20
Conclusions
  • Semantic similarity methods approximated the
    human notion of similarity reaching correlation
    up to 83
  • Cross ontology similarity is a difficult problem
    that required further investigation
  • Work towards integrating Sem. Sim within
    IntelliSearch information Retrieval System for
    Web documents
  • http//www.intelligence.tuc.gr/intellisearch

21
Try our system on the Web
  • http//www.intelligence.tuc.gr/similarity
  • Implementation
  • Giannis Varelas
  • Spyros Argyropoulos

22
www.intelligence.tuc.gr/similarity
Write a Comment
User Comments (0)
About PowerShow.com