Title: Next Generation Semantic Web Applications
1Next Generation Semantic Web Applications
- Enrico Motta, Mathieu DAquinSofia Angeletou,
Claudio Baldassarre, Martin Dzbor, Laurian
Gridinoc, Davide Guidi, Ainhoa Llorente, Vanessa
Lopez, Marta Sabou - Knowledge Media InstituteThe Open
UniversityMilton Keynes, UK
2Introduction
- This talk presents a number of projects, which
are part of an integrated effort at exploring the
possibilities opened by the Semantic Web, viewed
as a domain-independent, large scale supplier of
formally encoded background knowledge, with
respect to enabling intelligent problem solving. - We call the resulting applications
- Next Generation Semantic Web Applications
3Organization of the Talk
- The Semantic Web
- The Semantic Web in the context of AI research
- Next Generation Semantic Web Applications
- What are they?
- Why are they different from 1st generation SW
Applications? - Infrastructure needs
- Examples
- Ontology Matching
- Integrating Web2.0 and SW
- Semantic Web Browsing
- Question Answering
- Conclusions
4The Semantic Web
- The collection of all formal, machine
processable, web accessible, ontology-based
statements (semantic metadata) about web
resources and other entities in the world,
expressed in a knowledge representation language
based on an XML syntax (e.g., OWL, DAML,
DAMLOIL, RDF, etc)
5Ontology
Metadata
UoD
6(No Transcript)
7Semantic Web Document
ltfoafPersonalProfileDocument rdfabout"http//km
i.open.ac.uk/people/rdf.cfm/idstring/sofia-angelet
ou/"gt ltdctitlegtSofia Angeletouaposs RDF
Descriptionlt/dctitlegt ltrdfslabelgtSofia
Angeletous RDF Descriptionlt/rdfslabelgt
ltdcdescriptiongtRDF description for Sofia
Angeletou in machine-readable RDF/XMLlt/dcdescript
iongt ltdccreator rdfresource"http//identifier
s.kmi.open.ac.uk/people/sofia-angeletou/" /gt
ltfoafmaker rdfresource"http//identifiers.kmi.o
pen.ac.uk/people/sofia-angeletou/"/gt
ltfoafprimaryTopic rdfresource"http//identifier
s.kmi.open.ac.uk/people/sofia-angeletou/"/gt lt/foaf
PersonalProfileDocumentgt ltfoafPerson
rdfabout"http//identifiers.kmi.open.ac.uk/peopl
e/sofia-angeletou/"gt ltfoafnamegtSofia
Angeletoult/foafnamegt ltfoaffirstNamegtSofialt/foa
ffirstNamegt ltfoafsurnamegtAngeletoult/foafsurna
megt ltfoafmbox_sha1sumgtF78114D4E45CFC6AC811E6191
F50182FB9838938lt/foafmbox_sha1sumgt ltfoafphone
rdfresource"tel44-(0)1908-654777"/gt
ltfoafjabberIDgts.angeletouopen.ac.uk_at_msg.open.ac.
uklt/foafjabberIDgt ltfoafhomepage
rdfresource"http//"/gt ltfoafpublications
rdfresource"http//kmi.open.ac.uk/publications/p
ublications.cfm?id167"/gt ltfoafworkplaceHomepag
e rdfresource"http//kmi.open.ac.uk/"/gt
ltfoafworkInfoHomepage rdfresource"http//kmi.op
en.ac.uk/people/index.cfm?id167"/gt
ltfoafdepiction rdfresource"http//kmi.open.ac.u
k/img/members/Sofia4F8E0276.jpg"/gt
8Increasing Semantic Content
ltrdfRDFgt ltFeature rdfabout"http//sws.geonames.
org/2638049/"gt ltnamegtShenley Church
Endlt/namegt ltalternateNamegtShenleylt/alternateNamegt
ltinCountry rdfresource"http//www.geonames.org/c
ountries/GB"/gt lt/rdfRDFgt
9Charting the web
10Charting the web (2)
11Domain Coverage on the SW
- Great variety Some topics are almost not covered
(e.g. Adult), while some are over represented
(e.g. Society, Computers) - As we can expect, a large number of narrow
coverage documents and a small number of large
coverage ones.
Distribution of documents in the 16 top
categories of DMOZ
Distribution of the documents according to their
coverage
12Example Annotating the queen's birthday dinner
ltRDF triplegt ltRDF triplegt ltRDF triplegt ltRDF
triplegt ltRDF triplegt ltRDF triplegt ltRDF
triplegt ltRDF triplegt ltRDF triplegt ltRDF
triplegt ltRDF triplegt ltRDF triplegt
13Knowledge Sparseness
- Nr.(t1, t2, t3)
(t1) (t2) (t3) (t1, t2) (t1, t3)
(t2, t3) (t1, t2, t3) - 1 (project, article, researcher)
84 90 24 9 13 9
8 - 2 (researcher, student, university)
24 101 64 16 15 38 13 - (research, publication, author) 15
77 138 8 5 36 4 - 4 (adventurer, expedition, photo)
1 0 32 0 1 0
0 - 5 (mountain, team, talk)
12 25 9 2 1 1
1 - 6 (queen, birthday, dinner)
0 9 2 0 0 1
0 - 7 (project, relatedTo, researcher)
84 11 24 0 13 0 0 - 8 (researcher, worksWith, Ontology) 24
9 52 0 3 0 0 - 9 (academic, memberOf, project) 21
36 84 0 3 5 0 - 10 (article, hasAuthor, person)
90 14 371 8 32 2 0 - 11 (person, trip, photo)
371 7 32 1 20 1
1 - 12 (woman, birthday, dinner)
32 9 2 1 1 1
1 - 13 (person, memberOf, project) 371
36 84 16 46 5 5 - 14 (publication, hasAuthor, person) 77
14 371 2 52 2 2
14Example Annotating the queen's birthday dinner
15The Rise of Semantics
16Thesis 1
- The SW today has already reached a level of scale
good enough to make it a very useful source of
knowledge to support intelligent applications - In other words the Semantic Web is no longer an
aspiration but a reality - The availability of such large scale amounts of
formalised knowledge is unprecedented in the
history of AI
17Thesis 2
- The SW may well provide a solution to one of the
classic AI challenges how to acquire and manage
large volumes of knowledge to develop truly
intelligent problem solvers and address the
brittleness of traditional KBS
18Knowledge Representation Hypothesis in AI
- Any mechanically embodied intelligent process
will be comprised of structural ingredients that - we as external observers naturally take to
represent a propositional account of the
knowledge that the overall process exhibits, and - independent of such external semantic
attribution, play a formal but causal and
essential role in engendering the behaviour that
manifests that knowledge - Brian Smith, 1982
19Intelligence as a function of possessing domain
knowledge
KA Bottleneck
Intelligent Behaviour
20The Knowledge Acquisition Bottleneck
KA Bottleneck
Intelligent Behaviour
21(No Transcript)
22SW as Enabler of Intelligent Behaviour
Intelligent Behaviour
23Overall Goal
Our research programme is to contribute to the
development of this large-scale web of data and
develop a new generation of web applications able
to exploit it to provide intelligent
functionalities
24First Generation Semantic Web Applications
25(No Transcript)
26Bibliographic Data
CS Dept Data
AKT Reference Ontology
RDF Data
27(No Transcript)
28(No Transcript)
29Features of 1st generation SW Applications
- Typically use a single ontology
- Usually providing a homogeneous view over
heterogeneous data sources. - Limited use of existing SW data
- Closed to semantic resources
Hence current SW applications are more similar
to traditional KBS (closed semantic systems) than
to 'real' SW applications (open semantic systems)
30It is still early days..
1895
2007
31(No Transcript)
32Next Generation Semantic Web Applications
33Architecture of NGSW Apps
Semantic Web Gateway
34Issue Semantic Web Infrastructure
35Current Gateway to the Semantic Web
36Limitations of Swoogle
- Limited quality control mechanisms
- Many ontologies are duplicated
- Limited Query/Search mechanisms
- Only keyword search no distinction between types
of elements - No support for formal query languages (such as
SPARQL) - Limited range of ontology ranking mechanisms
- Swoogle only uses a 'popularity-based' one
- Limited API
- No support for ontology modularization and
combination
37A New Gateway to the Semantic Web
http//watson.kmi.open.ac.uk
38- Sophisticated quality control mechanism
- Detects duplications
- Fixes obvious syntax problems
- E.g., duplicated ontology IDs, namespaces, etc..
- Structures ontologies in a network
- Using relations such as extends,
inconsistentWith, duplicates - Provides efficient API
- Supports formal queries (SPARQL)
- Variety of ontology ranking mechanisms
- Modularization/Combination support
- Plug-ins for Protégé and NeOn Toolkit (under
devpt.) - Very cool logo!
39Networked Ontologies
M1
M2
target
source
source
priorVersionOf
priorVersionOf
relatedWith
O1
O1
O2
O1
incompatibleWith
dependsOn
extends
O3
O4
40(No Transcript)
41- Sophisticated quality control mechanism
- Detects duplications
- Fixes obvious syntax problems
- E.g., duplicated ontology IDs, namespaces, etc..
- Structures ontologies in a network
- Using relations such as extends,
inconsistentWith, duplicates - Provides efficient API
- Supports formal queries (SPARQL)
- Variety of ontology ranking mechanisms
- Modularization/Combination support
- Plug-ins for Protégé and NeOn Toolkit (under
devpt.) - Very cool logo!
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47Examples of Next Generation Semantic Web
Applications
48Example 1 Ontology Matching
49Ontology Matching
50New paradigm use of background knowledge
Background Knowledge (external source)
R
B
A
A
B
51External Source One Ontology
- Aleksovski et al. EKAW06
- Map (anchor) terms into concepts from a richly
axiomatized domain ontology - Derive a mapping based on the relation of the
anchor terms
Assumes that a suitable (rich, large) domain
ontology (DO) is available.
52External Source Web
- van Hage et al. ISWC05
- rely on Google and an online dictionary in the
food domain to extract semantic relations between
candidate terms using IR techniques
OnlineDictionary
Does not rely on a rich Domain Ont,
IR Methods
Precision increases significantly if domain
specific sources are used 50 - Web 75 -
domain texts.
rel
A
B
53External Source SW
- Proposal
- rely on online ontologies (Semantic Web) to
derive mappings - ontologies are dynamically discovered and
combined
Semantic Web
Does not rely on any pre-selected knowledge
sources.
rel
A
B
M. Sabou, M. dAquin, E. Motta, Using the
Semantic Web as Background Knowledge inOntology
Mapping", Ontology Mapping Workshop, ISWC06.
Best Paper Award
54Strategy 1 - Definition
Find ontologies that contain equivalent classes
for A and B and use their relationship in the
ontologies to derive the mapping.
For each ontology use these rules
Semantic Web
B1
B2
Bn
An
A1
A2
O2
On
O1
These rules can be extended to take into account
indirect relations between A and B, e.g.,
between parents of A and B
rel
A
B
55Strategy 1- Examples
56Strategy 2 - Definition
Principle If no ontologies are found that
contain the two terms then combine information
from multiple ontologies to find a mapping.
Details (1) Select all ontologies containing
A equiv. with A (2) For each ontology
containing A (a) if find
relation between C and B. (b) if
find relation between C and B.
Details (1) Select all ontologies containing
A equiv. with A (2) For each ontology
containing A (a) if find
relation between C and B. (b) if
find relation between C and B.
B
rel
C
Semantic Web
rel
B
C
A
rel
A
B
57Strategy 2 - Examples
Ex1
Vs.
(r1)
(midlevel-onto)
(Tap)
(Same results for Duck, Goose, Turkey)
Vs.
Ex2
(pizza-to-go)
(r1)
(SUMO)
Vs.
Ex3
(pizza-to-go)
(r3)
(wine.owl)
58Large Scale Evaluation
Matching AGROVOC (16k terms) and NALT(41k terms)
(derived from 180 different ontologies)
Evaluation 1600 mappings, two teams Overall
performance comparable to best in class
M. Sabou, M. dAquin, W.R. van Hage, E. Motta,
Improving Ontology Matching by Dynamically
Exploring Online Knowledge. In Press
59Chart 2
60Thesis 3
- Using the SW to provide dynamically background
knowledge to tackle the Agrovoc/NALT mapping
problem provides the first ever test case in
which the SW, viewed as a large scale
heterogeneous resource, has been successfully
used to address a real-world problem
61(No Transcript)
62Thesis 4
- The claim that the information on the SW is of
poor quality and therefore not useful to support
intelligent problem solving is a myth not
supported by concrete experience - Our experience in the NALT/Agrovoc ontology
matching benchmark problem shows that without any
particularly intelligent filter, the info
available on the SW already allows a 85
theoretical precision for our algorithm, well
beyond the performance of any other ontology
matching algorithm
63Example 2 Integrating SW and Web2.0
64Features of Web2.0 sites
- Tagging as opposed to rigid classification
- Dynamic vocabulary does not require much
annotation effort and evolves easily - Shared vocabulary emerge over time
- certain tags become particularly popular
65Limitations of tagging
- Different granularity of tagging
- rome vs colosseum vs roman monument
- Flower vs tulip
- Etc..
- Multilinguality
- Spelling errors, different terminology, plural vs
singular, etc - This has a number of negative implications for
the effective use of tagged resources - e.g., Search exhibits very poor recall
66Giving meaning to tags
67What does it mean to add semantics to tags?
- 1. Mapping a tag to a SW element
- "japan"
ltaktCountry Japangt
68Applications of the approach
- To improve recall in keyword search
- To support annotation by dynamically suggesting
relevant tags or visualizing the structure of
relevant tags - To enable formal queries over a space of tags
- Hence, going beyond keyword search
- To support new forms of intelligent navigation
- i.e., using the 'semantic layer' to support
navigation
69Folksonomy
Clustering
Analyze co-occurrence of tags
Co-occurence matrix
Cluster tags
Cluster1
Cluster2
Clustern
Concept and relation identification
Yes
2 related tags
SW search engine
Remaining tags?
Wikipedia
Find mappings relation for pair of tags
No
Google
END
ltconcept, relation, conceptgt
70Pre-processing
- Scope Subsets of Flickr and del.icio.us tags.
-
- Pre-processing (thresholds)
- To be similar, Levenshtein gt 0.83
- A tag has to occur at least 10 times.
Total Total Distinct Distinct Distinct
entries tags users resources tags
del.icio.us 19,605 89,978 7,164 14,211 11,960
Flickr 49,087 167,130 6,140 49,087 17,956
Total Total Distinct Distinct Distinct
entries tags users resources tags
del.icio.us 18,882 70,194 7,090 13,579 1,265
Flickr 44,032 127,098 5,321 44,032 2,696
71Clustering
- Each pair of similar tags, as determined by a
co-occurrence analysis (e.g., audio and mp3), is
a seed constituting an initial cluster - The cluster is enlarged by including tags that
are similar to both the initial tags - Repeat procedure recursively for all tags each
new candidate tag for a cluster must be similar
to the whole (possibly enlarged) set of tags in
that cluster. - If there are no more candidates for the cluster,
go to step 1 with a new seed (e.g., audio and
music).
72Clustering
audio semantic-web adult apple chat
1 mp3 rdf girls mac aim
2 music ontology nude macintosh messenger
3 playlist owl babes tiger gtalk
4 streaming semweb pics osx msn
5 radio daml sex macosx icq
? Fruit
73Combining Clusters
- Smoothing heuristics are applied to avoid having
a number of very similar clusters - originated from distinct seeds that are similar
amongst each other. - For every two clusters
- If one cluster contains the other, i.e., if the
larger cluster contains all the tags of the
smaller, remove the smaller cluster - If clusters differ within a small margin, i.e.,
the number of different tags in the smaller
cluster represents less than a percentage of the
number of tags in the smaller and larger
clusters, add the distinct words from the smaller
to the larger cluster and remove the smaller.
74Extracting relations
- For each pair of tags for which the search engine
retrieved information, investigate the possible
relationships - A tag can be an ancestor of the other. For
example, in the FOOD ontology, apple is a
subclass of fruit. - A tag is the range or the value of a property of
another tag. E.g., Class Zinfandel has a property
hasColor, with value red - Both tags have the same direct parent apple and
pear are subclasses of fruit - Both tags have the same ancestors assembly has
as ancestors building (1st level) and
construction (2nd), while formation has
fabrication (1st) and construction (2nd) in
WordNet.
75Examples
Cluster_1 admin application archive collection component control developer dom example form innovation interface layout planning program repository resource sourcecode
76Examples
Cluster_2 college commerce corporate course education high instructing learn learning lms school student
1http//gate.ac.uk/projects/htechsight/Employment.
daml. 2http//reliant.teknowledge.com/DAML/Mid-lev
el-ontology.daml. 3http//www.mondeca.com/owl/mos
es/ita.owl. 4http//www.cs.utexas.edu/users/mfkb/R
KF/tree/CLib-core-office.owl.
77Faceted Ontology
- Ontology creation and maintenance is automated
- Ontology evolution is driven by task features and
by user changes - Large scale integration of ontology elements from
massively distributed online ontologies - Very different from traditional top-down-designed
ontologies
78Lessons Learnt
- Approach proven to be feasible and promising.
- However
- Assumptions in initial experiments (e.g., single
ontology coverage for pairs of tags focus on
classes, clustering-based approach, etc..) too
restrictive - Swoogle is too limited to support a fully
automated approach - we are now using Watson for the current
experiments - Integration with SW-enabled ontology matching
algorithm is essential to improve term matching
79Example 3 Semantics-Enhanced Web Browsing
80(No Transcript)
81Magpie Architecture
82(No Transcript)
83(No Transcript)
84(No Transcript)
85(No Transcript)
86PowerMagpie Architecture
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91Example 4 Question Answering on the Semantic
Web
92Aqualog QA for Corporate Semantic Webs
ltaktPerson rdfabout"aktPeterScott"gt
ltrdfslabelgtPeter Scottlt/rdfslabelgt
ltakthasAffiliation rdfresource"aktTheOpenUnive
rsity"/gt ltakthasJobTitlegtkmi deputy
directorlt/akthasJobTitlegt ltaktworksInUnit
rdfresource"aktKnowledgeMediaInstitute"/gt
ltakthasGivenNamegtPeterlt/akthasGivenNamegt
ltakthasFamilyNamegtScottlt/akthasFamilyNamegt
ltakthasPrettyNamegtPeter Scottlt/akthasPrettyNamegt
ltakthasPostalAddress rdfresource"aktKmiPosta
lAddress"/gt ltakthasEmailAddressgtpeter.scott_at_open
.ac.uklt/akthasEmailAddressgt ltakthasHomePage
rdfresource"http//kmi.open.ac.uk/people/scott/
"/gt lt/aktPersongt
Which KMi researcherswork on the Semantic Web?
Answer
93An Ontology-Modular System
Which premiership footballershave played for
Leeds and Chelsea?
ltftbFootballer rdfaboutftbWayneRooney"gt
ltrdfslabelgtWayne Rooneylt/rdfslabelgt
ltftbplaysFor ftbManUnitedgt ltftbhasPosition
ftbForwardgt ltftbhasPreviousClub ftbEvertongt
lt/ftbFootballergt ltftbFootballer
rdfaboutftbDavidBeckham"gt ltrdfslabelgtDavid
Beckhamlt/rdfslabelgt ltftbplaysFor
ftbRealMadridgt ltftbhasPosition
ftbRightMidfieldgt ltftbhasPreviousClub
ftbManUnitedgt lt/ftbFootballergt
AquaLog
Answer
94Coarse-grained Architecture
Linguistic Component obtains intermediate
representation from the input query Relation
Similarity Service maps the intermediate
representation to the ontology/kb
95Relation Similarity Service
Which are the KMi researchers in the semantic web
area?
Translated query
Ontological structures
KMi researchers (person/organization, semantic
web area)
Has-research-interest (kmi-research-staff-member,
Semantic-web-area)
MECHANISMS Ontology relationships and taxonomy,
String algorithms, WordNet, Learning Mechanism,
Users feedback
96Learning Mechanism
Which academics
work
in Akt ?
academic
project
User Lexicon
Which academics
work
in Akt ?
User Disambiguation
Ontology Concepts
project
Has-project-member
academic
Mapping
work
has-project-member
(inverse-of)
97Learning Mechanism
98Interpretation Mechanisms Ontology structure,
String algorithms, WordNet, Machine Learning,
Users feedback
99User Feedback
100(No Transcript)
101AquaLog --gt PowerAqua
ltaktPerson rdfabout"aktPeterScott"gt
ltrdfslabelgtPeter Scottlt/rdfslabelgt
ltakthasAffiliation rdfresource"aktTheOpenUnive
rsity"/gt lt/aktPersongt
NL Query
ltftbFootballer rdfaboutftbWayneRooney"gt
ltrdfslabelgtWayne Rooneylt/rdfslabelgt
ltftbplaysFor ftbManUnitedgt ltftbhasPosition
ftbForwardgt ltftbhasPreviousClub ftbEvertongt
lt/ftbFootballergt
ltptbBuilder rdfaboutptbBobgt ltrdfslabelgtBob
the Builderlt/rdfslabelgt ltptbplayedBygt
lt/ptbBuildergt
PowerAqua
Answer
102PowerAqua vs AquaLog
- Challenges when consulting and aggregating
(dynamically mapping) information derived from
multiple heterogeneous ontologies - Locating the right ontologies
- Intra-ontology semantic relevance analysis
- Filtering the right mappings
- Intg. heterogeneous information to provide an
answer - This reduces to deciding whether two instances
specified according to different ontologies
denote the same entity
103(No Transcript)
104Conclusions
- SW provides an unprecedented opportunity to build
a new generation of intelligent systems, able to
exploit large scale background knowledge - The large scale background knowledge provided by
the SW may address one of the fundamental
premises (and holy grails) of AI - The SW is not an aspiration it is a concrete
technology that is already in place today and is
steadily becoming larger and more robust - The new class of systems enabled by the SW is
fundamentally different in many respects both
from traditional KBS and even from early SW
applications - The examples shown in this talk provide an
initial taste of the new generation of
applications which will be made possible by the
emerging Semantic Web
105References
- Ontology Mapping
- Lopez, V., Sabou, M., Motta, E. (2006). "Mapping
the real semantic web on the fly". ISWC 2006 - Sabou, M., D'Aquin, M., Motta, E. (2006). "Using
the semantic web as background knowledge for
ontology mapping". ISWC 2006 Workshop on Ontology
Mapping. - Integration of Web2.0 and Semantic Web
- L.Specia, E. Motta, "Integrating Folksonomies
with the Semantic Web", ESWC 2007. - Angeletou, S., Sabou, M., Specia, L., and Motta,
E., (2007). Bridging the Gap Between
Folksonomies and the Semantic Web An Experience
Report. ESWC 2007 Workshop on Bridging the Gap
between Semantic Web and Web 2.0. - Watson
- dAquin, M., Sabou, M., Dzbor, M., Baldassarre,
C., Gridinoc, L., Angeletou, S. and Motta, E.
"WATSON A Gateway for the Semantic Web". Poster
Session at ESWC 2007
106'Vision' Papers
- Motta, E., Sabou, M. (2006). "Next Generation
Semantic Web Applications". 1st Asian Semantic
Web Conference, Beijing. - Motta, E., Sabou, M. (2006). "Language
Technologies and the Evolution of the Semantic
Web". LREC 2006, Genoa, Italy. - Motta, E. (2006). "Knowledge Publishing and
Access on the Semantic Web A Socio-Technological
Analysis". IEEE Intelligent Systems, Vol.21, 3,
(88-90).
107(No Transcript)