Title: Modelling and Use of DomainSpecific Knowledge for Similarity and Retrieval
1Modelling and Use of Domain-Specific Knowledge
for Similarity and Retrieval
- Troels Andreasen, Henrik Bulskov, and Rasmus
Knappe
2Outline
- Motivation
- Representation of Ontologies
- Modelling Ontologies
- General Ontology
- Domain-specific Ontology
- Deriving Similarity
- Applications
- Prototype System
- Navigating and Surveying
- Query Visualization
3Motivation
- The use of ontologies
- Contributes to organization of concepts,
structure and relations within a knowledge domain - Which in turn
- Provides means for enhanced, knowledge-based
approaches to surveying, indexing, and querying
of document collections
4Concept Language - Ontolog
- Basically a lattice-algebra with attribution
- Compound concepts are built from
- atomic concepts of the ontology, and
- attribution
- attribute features using semantic relations like
- WRT with respect to
- CHR characterized by
- CBY caused by
- PNT patient of act or process
- LOC location, position
- ...
5Representation of Ontologies
- A generative framework consisting of
- A basis ontology Obasis A,ISAKB,
- A generative concept language (OntoLog) defining
the set of well-formed concepts L. Given atomic
concepts A and a set of semantic relations R, the
set of well-formed concepts L are
6Concept Examples
- The red and yellow bird
- birdCHRred, CHRyellow
- The meeting on Thursday
- meetingTMPthursday
- Disorder caused by lack of vitaminC
- disorderCBYlackWRTvitaminC
7Modelling Ontologies
- Modelling in this context consists of two parts
- The inclusion of knowledge from available
resources into a general ontology, and - A restriction to the part of the general ontology
covering the instantiated concepts in the
document collection
8The General Ontology (1)
- Built from various sources
- Taxonomy supplemented with word and term lists,
dictionaries, thesauri - We assume the presence of taxonomy in the form of
a simple taxonomic concept inclusion relation
ISAKB over the set of atomic concepts A
9The General Ontology (2)
- Based on ISATRAN the transitive closure of ISAKB
we generalize into a relation over well-formed
concepts L - where repeated in each inequality denotes zero
or more attributes of the form
10The General Ontology (3)
- Ogeneral L,,R thus encompasses
- A set of well-formed concepts derived in the
concept language from a set of atomic concepts A - An inclusion relation generalized from a
knowledge expert given relation ISAKB - A supplementary set of semantic relations R
11Domain-specific Ontology
- Transform the global generative ontology into a
domain-specific ontology - Restrict the global ontology to cover only the
concepts in a collection for instance those
appearing in a given document collection. - The result can be considered as an instantiated
ontology with respect to the document collection
12Domain-specific Ontology,an example
Example knowledge base ontology ISAKB
Example instantiated ontology
13Deriving Similarity
- Domain-specific ontology as basis for deriving a
similarity measure for use in connection with
querying of documents - The measure must reflect nearness of concepts in
the ontology - Well-known approach Shortest Path
- Problem Multiple connections are ignored
- Applied here
- Shared Nodes
14Shortest path Similarity
sim(dogCHRgray , catCHRgray )
sim(dogCHRgray , birdCHRyellow)
- Counterintuitive
- dog and cat share the property gray
15Shared Nodes
- Similarity between two concepts in this approach
is based on the set of upwards reachable nodes
shared between the concepts
16Shared Nodes Similarity
sim(dogCHRgray , catCHRgray ) gt
sim(dogCHRgray ,birdCHRyellow))
- Notice
- Shared nodes take multiple connections into
account
17Shared Nodes Similarity
- with as the nodes upwards reachable from x
- similarity is proportional to
-
- We have
- where the triple (x,y,r) is the edge of type r
from x to y, E the set of all edges, T the
topmost concept, and r ? R ISA, CBY, CHR,
18Shared Nodes Similarity
sim(dogCHRgray , catCHRgray ) gt
sim(dogCHRgray , dogCHRlarge )
- Counterintuitive
- concept-inclusion (ISA) should have higher
importance than characterized-by (CHR) property
19Weighted Shared Nodes Similarity
sim(dogCHRgray , catCHRgray ) lt
sim(dogCHRgray , dogCHRlarge )
- solution
- attach weights in 0,1 to relations so the
nodes upwards reachable from x
becomes a fuzzy set
20Weighted Shared Nodes Similarity
- Similarity is proportional to
- Shared nodes
-
- Weighted Shared Nodes
-
- where the function weight(r) attach a weight to
each relation type. - We assume that
- weight(ISA) 1
- and the weight attached to all other relations
are less than 1. - For instance
- weight(CHR) 0.8
21Similarity measure
- Similarity can be defined in various ways,
- An asymmetric measure
- is used here where contributes to a
weighted average that determines the degree of
influence of the nodes reachable from x
respectively y. -
22Applications
- Prototype system
- Navigating and Surveying
- Query Visualization
23Prototype System
- Based on
- WordNet 2.0
- Suggested Upper Merged Ontology (SUMO)
- Mid-level Ontology (MILO)
- Contains
- Approximately 100.000 concepts (synsets)
- ISAKB hypernym relation
24Navigating andSurveying
- Instantiated Concepts
- palisade,
- stockadechrold,
- rampartchrold, and
- churchchrold
- Two aspects
- Fortifications
- Place of worship
- More abstract
- Buildings
- Something dated back in time
25Visualizing Queries
- Polysemous concepts
- Q bank,huge
- Domain specific concepts
26Conclusion
- Domain-specific ontologies
- Restriction of the general ontology, has
application with respect to - Deriving Similarity
- Weighted Shared Nodes
- Navigation and Surveying
- Query Visualization