Title: Evaluating semantic similarity using GML in Geographic Information Systems
1Evaluating semantic similarity using GML in
Geographic Information Systems
- Fernando Ferri 1, Anna Formica 2, Patrizia
Grifoni 1, and Maurizio Rafanelli 2 -
- 1 IRPPS-CNR, via Nizza 128, 00198 Roma, Italy
- fernando.ferri_at_irpps.cnr.it, patrizia.grifoni_at_irpp
s.cnr.it - 2 IASI-CNR, viale Manzoni 30, 00185 Roma, Italy
- formica_at_iasi.cnr.it, rafanelli_at_iasi.cnr.
2Summary
- Motivation
- Related works
- Coding a Part-of Hierarchy using GML
- Similarity evaluation
- Conclusion
3Motivation (1)
- In Geographic Information Systems (GISs) semantic
similarity plays an important role, as it
supports the identification of objects that are
conceptually close, but not identical. - GML (Geography Markup Language) is emerging as
the dominant standard for exchanging geographic
data across the Internet. - A semantic similarity model facilitates
comparison of entities and allows information
retrieval and integration to handle semantically
similar concepts . The goal of a similarity model
is to obtain flexible and better matches between
user-expected and system-retrieved information.
4Motivation (2)
- Given the relevance of the Is-in relationship in
the geographic context, we focus on GML elements
organized according to Part-of (meronymic)
hierarchies. - The semantics essentially concerns parts which
are similar to and inseparable from the whole.
5Related works (1)
- Similarity of hierarchically related concepts has
been widely investigated in the literature
Resnik Rodriguez, Egenhofer. - From the various proposals, we followed the
probabilistic approach of Lin, which is based on
the notion of information content and overcomes
the drawbacks of the traditional edge-counting
approach.
6Related Works (2)
- Resnik proposes algorithms that take advantage of
taxonomic similarity in resolving syntactic and
semantic ambiguities. - Lin starts from the Resnik work and addresses
also the information content of the comparing
concepts.
7Coding a Part-of Hierarchy with GML (1)
- The real world in the geographic domain can be
represented as a set of features, and
AbstractFeatureType codifies a geographic feature
in GML. - Its geometry type is an important property, it is
given in the reference coordinate system and
describes the extent, position or relative
location of the represented concept.
8Coding a Part-of Hierarchy with GML (2)
- The geometric types defined in GML provide the
framework for modelling all the geographical
concepts. - By means of this framework it is possible to
model, for example, the concepts composing a
communication ways network, such as roads,
rivers, canals and other communication
infrastructures.
9Coding a Part-of Hierarchy with GML (3)
- This figure shows an example of a type hierarchy
that introduces concepts concerning communication
infrastructures starting from the GML geometric
types.
10Coding a Part-of Hierarchy with GML (4)
- As mentioned in the motivation, due to the
relevance of the Is-in relationship in the
geographic context, the paper focuses on GML
elements organized according to Part-of
(meronymic) hierarchies. - For instance, in our example a Part-of
relationship exists among communication ways
(ComWay) and roads, rivers and canals.
11Coding a Part-of Hierarchy with GML (5)
- Usually, in the literature, Part-of hierarchies
are modelled in XML using sequences of
elements, and a similar approach could be
followed in GML
- However, this approach does not permit to
distinguish between elements of the Part-of
hierarchy and other elements eventually defined
out of the Part-of hierarchy, such as Kind and
Country
12Coding a Part-of Hierarchy with GML (6)
- In order to put in evidence meronymic
relationships within the GML element hierarchy, a
Part-of hierarchy could be modelled by
introducing some special geographic types such as
PartOfWayType, PartOfRivType, PartOfCanType
- Each special type is introduced for modelling a
Part-of relationship between a geographic concept
and their component concepts
13Coding a Part-of Hierarchy with GML (7)
- ltelement name"ComWay" type"ComWayType"/gt
- ltelement name"Road" type"RoadType"/gt
- ltelement name"River" type"RiverType"/gt
- ltelement name"Canal" type"CanalType"/gt
- ltelement name"NavRiver" type"NavSegmentType"/
gt - ltelement name"NNavRiver type"NNavSegmentType"
/gt - ltelement name"NavCanal type"NavSegmentType"/
gt - ltelement name"NNavCanal type"NNavSegmentType"
/gt -
- ltcomplexType name"ComWayType"gt
- ltsequencegt
- ltelement name "kind" type"string"/gt
- ltelement name "country" type"string"/gt
- ltelement name "PartOfWay"
type"PartOfWayType"/gt - lt/sequencegt
- ltattribute name"label" type"string" /gt
- ltattribute name"label" type"string" /gt
- ltattribute name"length" type"integer" /gt
- lt/complexTypegt
- This GML code shows how to put in evidence a
meronymic relationship within the GML element
hierarchy introducing a special geographic type
such as PartOfWayType
14Evaluating similarity (1)
- For evaluating concept similarity this paper
combines and revisits - the information content approach Lin98,
- a proposal inspired by the maximum weighted
matching problem in bipartite graphs FM02.
15Evaluating similarity (2)
- The starting assumption is that the association
of probabilities with the Part-of taxonomy allows
us the notion of a weighted element hierarchy to
be introduced. In particular, in our example the
probabilities have been estimated in line with
WordNet 2.0. - For instance, below the concepts Road and River
have been defined, with the related frequencies
(the numbers in parenthesis).
(95) Road an open way (generally public)
for travel and transportation
(55) River a large natural stream of water
(larger than a creek)
16Evaluating similarity (3)
- The probability of a concept
- The probability of a concept c is defined as
- p(c) freq(c)/N
- where freq(c) is the frequency of the concept c
in the taxonomy, and N is the total number of
concepts. - In the example probabilities have been assigned
according to WordNet.
17Evaluating similarity (4)
- Example Weighted Concept Hierarchy
18Evaluating similarity (5)
- Following the standard approach of information
theory Ross76, the information content of a
concept c can be quantified as - log p(c)
- that is, as the probability increases, the
informativeness decreases.
19Evaluating similarity (6)
- The information content similarity (ics) of
two concepts such as River and Canal is defined
as - ics(River, Canal) 2 log p(ComWay)/(log
p(River)log p(Canal)) 0,72 - where ComWay is the concept representing the
maximum information content shared by River and
Canal. According to the Lins approach the more
information two concepts share, the more similar
they are.
20Evaluating similarity (7)
- Structural similarity (asim)
- Inspired by the maximum weighted matching
problem in bipartite graphs, we have to identify
the - set of pairs of typed attributes
- such that is maximal the sum of the products
of the information content similarity of the
attributes and the related types.
21Evaluating similarity (8)
RiverType
CanalType
labelstring lengthinteger flowinteger deepne
ssinteger
labelstring profundityinteger capacityinteger
lengthinteger
22Evaluating similarity (9)
- In the previous example the set of pairs of
attributes that maximizes the sum of the related
information content similarity is the following - (label,label), (length,length),
(flow,capacity), (deepness,profundity)
23Evaluating similarity (10)
- In fact, by assuming that deepness and
profundity are synonyms, we have - ics(label,label)ics(length,length)
ics(deepness,profundity) 1 - and ics(flow,capacity) 0.
24Evaluating similarity (11)
- The similarity of the sets of attributes of
complexTypes (asim) is therefore defined by the
above maximum sum divided by the greatest of the
cardinalities of the sets of attributes of the
types compared. - In the case of RiverType and CanalType we
have - asim(RiverType,CanalType) ¾ 0.75
25Evaluating similarity (12)
- Concept Similarity (Gsim)
- The Similarity (Gsim) of the concepts River and
Canal is defined as - Gsim(River , Canal) (ics(River , Canal)w
asim(River, Canal)(1-w)) Bt(RiverType,CanalTyp
e) - where
- ics(River , Canal) is the information content
similarity - asim(River , Canal) is the structural similarity
- w is a weight, s.t. 0 lt w lt 1.
- Bt is a Boolean function that, given two
complexTypes, returns 0 if their least upper
bound in the type hierarchy is AbstractFeatureType
, otherwise it returns 1.
26Evaluating similarity (13)
- In particular, if we assume w0.5
- Gsim(River , Canal) (ics(River , Canal)w
asim(River, Canal)(1-w)) Bt(RiverType,CanalTyp
e) - Gsim(River , Canal) 0.5 (0.720.75)1 0.74
27Conclusion