Title: Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies
1Design and Evaluation of Semantic Similarity
Measures for Concepts Stemming from the Same or
Different Ontologies
- Euripides G.M. Petrakis
- Giannis Varelas
- Angelos Hliaoutakis
- Paraskevi Raftopoulou
2Semantic Similarity
- Relates to computing the conceptual similarity
between terms which are not necessarily
lexicacally similar - car-automobile-vehicle,
- drug- medicine
- Tool for making knowledge commonly understandable
in applications such as IR, information
communication in general
3Methodology
- Terms from different communicating sources are
represented by ontologies - Map two terms to an ontology and compute their
relationship in that ontology - Terms from different ontologies Discover
linguistic relationships or affinities between
terms in different ontologies
4Contributions
- We investigate several Semantic Similarity
Methods and we evaluate their performance - http//www.intelligence.tuc.gr/similarity
- We propose a novel semantic similarity measure
for comparing concepts from different ontologies
5Ontologies
- Tools of information representation on a subject
- Hierarchical categorization of terms from general
to most specific terms - object ? artifact ? construction ? stadium
- Domain Ontologies representing knowledge of a
domain - e.g., MeSH medical ontology
- General Ontologies representing common sense
knowledge about the world - e.g., WordNet
6WordNet
- A vocabulary and a thesaurus offering a
hierarchical categorization of natural language
terms - More than 100,000 terms
- Nouns, verbs, adjectives and adverbs are grouped
into synonym sets (synsets) - Synsets represent terms or concepts with similar
meaning - stadium, bowl, arena, sports stadium (a large
structure for open-air sports or entertainments)
7WordNet Hierarchies
- The synsets are also organized into senses
- Senses Different meanings of the same term
- The synsets are related to other synsets higher
or lower in the hierarchy by different types of
relationships e.g. - Hyponym/Hypernym (Is-A relationships)
- Meronym/Holonym (Part-Of relationships)
- Nine noun and several verb Is-A hierarchies
8A Fragment of the WordNet Is-A Hierarchy
9MeSH
- MeSH ontology for medical and biological terms
by the N.L.M. - Organized in IS-A hierarchies
- More than 15 taxonomies, more than 22,000 terms
- No part-of relationships
- The terms are organized into synsets called
entry terms
10A Fragment of the MeSH Is-A Hierarchy
11Semantic Similarity Methods
- Map terms to an ontology and compute their
relationship in that ontology - Four main categories of methods
- Edge counting path length between terms
- Information content as a function of their
probability of occurrence in a corpus - Feature based similarity between their
properties (e.g., definitions) or based on their
relationships to other similar terms - Hybrid combine the above ideas
12Example
- Edge counting distance between conveyance and
ceramic is 2 - An information content method, would associate
the two terms with their common subsumer and with
their probabilities of occurrence in a corpus
13X-Similarity
- Relies on matching between synsets and set
description sets - A,B synsets or term description sets
- Do the same with all IS-A, Part-Of relationships
and take their maximum
14Example
- S(Hypothyroidism, Hyperthyroidism) 0.387
WordNet term Hypothyroidism MeSH term Hyperthyroidism
lttermgt hypothyroidism ltdefinitiongt An underactive thyroid gland a glandular disorder Resulting from insufficient production of thyroid hormones. lt/definitiongt ltsynsetgt Hypothyroidism lt/synsetgt lthypernymsgt glandular disease, disorder, condition, state lt/hypernymsgt lthyponymsgt myxedema, cretinism lt/hyponymsgt lt/termgt lttermgt hyperthyroidism ltdefinitiongt Hypersecretion of Thyroid Hormones from Thyroid Gland. Elevated levels of thyroid hormones increase Basal Metabolic Rate. lt/definitiongt ltsynsetgt Hyperthyroidism lt/synsetgt lthypernymsgt disease, thyroid, Endocrine System Diseases, diseases lt/hypernymsgt lthyponymsgt thyrotoxicosis, thyrotoxicoses lt/hyponymsgt lt/termgt
15Evaluation
- The most popular methods are evaluated
- All methods applied on a set of 38 term pairs
- Their similarity values are correlated with
scores obtained by humans - The higher the correlation of a method the better
the method is
16Evaluation on WordNet
Method Type Correlation
Rada 1989 Edge Counting 0.59
Wu 1994 Edge Counting 0.74
Li 2003 Edge Counting 0.82
Leackok 1998 Edge Counting 0.82
Richardson 1994 Edge Counting 0.63
Resnik 1999 Info. Content 0.79
Lin 1993 Info. Content 0.82
Lord 2003 Info. Content 0.79
Jiang 1998 Info. Content 0.83
Tversky 1977 Feature Based 0.73
X-Similarity Feature Based 0.74
Rodriguez 2003 Hybrid 0.71
17Evaluation on MeSH
Method Type Correlation
Rada 1989 Edge Counting 0.50
Wu 1994 Edge Counting 0.67
Li 2003 Edge Counting 0.70
Leackok 1998 Edge Counting 0.74
Richardson 1994 Edge Counting 0.64
Resnik 1999 Info. Content 0.71
Lin 1993 Info. Content 0.72
Lord 2003 Info. Content 0.70
Jiang 1998 Info. Content 0.71
Tversky 1977 Feature Based 0.67
X-Similarity Feature Based 0.71
Rodriguez 2003 Hybrid 0.71
18Cross Ontology Measures
- We used 40 MeSH terms pairs
- One of the terms is a also a WordNet term
- We measured correlation with scores obtained by
experts
Method Type Correlation
X-Similarity Feature-Based 0.70
Rodriguez Hybrid 0.55
19Comments
- Edge counting/Info. Content methods work by
exploiting structure information - Good methods take the position of the terms into
account - Higher similarity for terms which are close
together but lower in the hierarchy e.g., Li
et.al. 2003 - X Similarity performs at least as good as other
Feature-Based methods - Outperforms other Cross-Ontology methods
20Conclusions
- Semantic similarity methods approximated the
human notion of similarity reaching correlation
up to 83 - Cross ontology similarity is a difficult problem
that required further investigation - Work towards integrating Sem. Sim within
IntelliSearch information Retrieval System for
Web documents - http//www.intelligence.tuc.gr/intellisearch
21Try our system on the Web
- http//www.intelligence.tuc.gr/similarity
- Implementation
- Giannis Varelas
- Spyros Argyropoulos
22www.intelligence.tuc.gr/similarity