Title: Exploiting large scale web semantics to build end user applications
1Exploiting large scale web semantics to build
end user applications
- Enrico Motta
- Professor of Knowledge Technologies
- Knowledge Media Institute
- The Open University
2Aims of the Talk
- What is the Semantic Web
- Perspectives
- The SW as a web of data
- The SW as a new context in which to build
semantic applications and an unprecedented
opportunity in which to address some classic AI
problems - Typical misconceptions
- What the SW is not!
- Semantic Web for Users
- Applications that do something interesting and
useful to users, by exploiting available web
semantics
3The Semantic Web as a Web of Data
- Making data available to SW-aware software
4(No Transcript)
5(No Transcript)
6ltfoafPerson rdfabout"http//identifiers.kmi.ope
n.ac.uk/people/enrico-motta/"gt
ltfoafnamegtEnrico Mottalt/foafnamegt
ltfoaffirstNamegtEnricolt/foaffirstNamegt
ltfoafsurnamegtMottalt/foafsurnamegt ltfoafphone
rdfresource"tel44-(0)1908-653506"/gt
ltfoafhomepage rdfresource"http//kmi.open.ac.uk
/people/motta/"/gt ltfoafworkplaceHomepage
rdfresource"http//kmi.open.ac.uk/"/gt
ltfoafdepiction rdfresource"http//kmi.open.ac.u
k/img/members/enrico.jpg"/gt ltfoaftopic_interest
gtKnowledge Technologieslt/foaftopic_interestgt
ltfoaftopic_interestgtSemantic Weblt/foaftopic_inte
restgt ltfoaftopic_interestgtOntologieslt/foaftopi
c_interestgt ltfoaftopic_interestgtProblem
Solving Methodslt/foaftopic_interestgt
ltfoaftopic_interestgtKnowledge Modellinglt/foaftop
ic_interestgt ltfoaftopic_interestgtKnowledge
Managementlt/foaftopic_interestgt
ltfoafbased_neargt ltgeoPointgt
ltgeolatgt52.024868lt/geolatgt
ltgeolonggt-0.707143lt/geolonggt
ltcontactnearestAirportgt
ltairportnamegtLondon Luton Airportlt/airportnamegt
ltairportiataCodegtLTNlt/airportiataCodegt
ltairportlocationgtLuton, United
Kingdomlt/airportlocationgt
ltgeolatgt51.866666666667lt/geolatgt
ltgeolonggt-0.36666666666667lt/geolonggt
ltrdfsseeAlso rdfresource"http//www.daml.org/cg
i-bin/airport?LTN"/gt ltfoafcurrentProjectgt ltf
oafProjectgt ltfoafnamegtAquaLoglt/foafnamegt
lt/foafcurrentProjectgt
7The web of SW documents
8Current status of the semantic web
- 10-20 million semantic web documents
- Expressed in RDF, OWL, DAMLOIL
- 7K-10K ontologies
- These cover a variety of domains - music,
multimedia, computing, management, bio-medical
sciences, upper level concepts, etc - Hence
- To a significant extent the semantic web is
already in place - However, domain coverage is very uneven
- Still primarily a research enterprise, however
interest is rapidly increasing in both
governmental and business organizations - early adopters phase
The above figures refer to resources which are
publicly accessible on the web
9ltdata data datagt
ltdata data datagt
ltdata data datagt
ltdata data datagt
ltdata data datagt
ltdata data datagt
10(No Transcript)
11Bibliographic Data
CS Dept Data
Geography
AKT Reference Ontology
RDF Data
12(No Transcript)
13Corporate Semantic Webs
- A corporate ontology is used to provide a
homogeneous view over heterogeneous data sources. - Often tackle Enterprise Information Integration
scenarios - Hailed by Gartner as one of the key emerging
strategic technology trends - E.g., Garlik is a multi-million startup recently
set up in UK to support personal information
management, which uses an ontology to integrate
data mined from the web on a large scale
14(No Transcript)
15(No Transcript)
16AquaLog
17Applications that exploit large scale semantic
content
18The web of data
19Gateways to the SW
SemanticWeb
Application
Semantic Web Gateway
20- Sophisticated quality control mechanism
- Detects duplications
- Fixes obvious syntax problems
- E.g., duplicated ontology IDs, namespaces, etc..
- Structures ontologies in a network
- Using relations such as extends,
inconsistentWith, duplicates - Provides interfaces for both human users and
software programs - Provides efficient API
- Supports formal queries (SPARQL)
- Variety of ontology ranking mechanisms
- Modularization/Combination support
- Plug-ins for Protégé and NeOn Toolkit
- Very cool logo!
21(No Transcript)
22(No Transcript)
23Case Study 1 Automatic Alignment of Thesauri in
the Agricultural/Fishery Domain
24Method
- SCARLET - matching by Harvesting the SW
- Automatically select and combine multiple online
ontologies to derive a relation
Access
Semantic Web
Scarlet
Deduce
Concept_A (e.g., Supermarket)
Concept_B (e.g., Building)
Semantic Relation ( )
25Two strategies
Building
OrganicChemical
PublicBuilding
Lipid
Shop
Steroid
Steroid
Supermarket
Cholesterol
Semantic Web
Scarlet
Scarlet
Building
Cholesterol
OrganicChemical
Supermarket
(A)
(B)
Deriving relations from (A) one ontology and (B)
across ontologies.
26Experiment
- Matching
- AGROVOC
- UNs Food and Agriculture
- Organisation (FAO) thesaurus
- 28.174 descriptor terms
- 10.028 non-descriptor terms
- NALT
- US National Agricultural
- Library Thesaurus
- 41.577 descriptor terms
- 24.525 non-descriptor terms
27226 Used Ontologies
http//139.91.183.309090/RDF/VRP/Examples/tap.rdf
http//reliant.teknowledge.com/DAML/SUMO.daml
http//reliant.teknowledge.com/DAML/Mid-level-onto
logy.daml
http//reliant.teknowledge.com/DAML/Economy.daml
http//gate.ac.uk/projects/ htechsight/Technologie
s.daml
28Evaluation 1 - Precision
- Manual assessment of 1000 mappings (15)
- Evaluators
- Researchers in the area of the Semantic Web
- 6 people split in two groups
- Results
- Comparable to best results for background
knowledge based matchers.
29Evaluation 2 Error Analysis
30Case Study 2Folksonomy Tagspace Enrichment
31Features of Web2.0 sites
- Tagging as opposed to rigid classification
- Dynamic vocabulary does not require much
annotation effort and evolves easily - Shared vocabulary emerge over time
- certain tags become particularly popular
32Limitations of tagging
- Different granularity of tagging
- rome vs colosseum vs roman monument
- Flower vs tulip
- Etc..
- Multilinguality
- Spelling errors, different terminology, plural vs
singular, etc - This has a number of negative implications for
the effective use of tagged resources - e.g., Search exhibits very poor recall
33Giving meaning to tags
34What does it mean to add semantics to tags?
- 1. Mapping a tag to a SW element
- "japan"
ltaktCountry Japangt
35Applications of the approach
- To improve recall in keyword search
- To support annotation by dynamically suggesting
relevant tags or visualizing the structure of
relevant tags - To enable formal queries over a space of tags
- Hence, going beyond keyword search
- To support new forms of intelligent navigation
- i.e., using the 'semantic layer' to support
navigation
36Folksonomy
Clustering
Analyze co-occurrence of tags
Co-occurence matrix
Cluster tags
Cluster1
Cluster2
Clustern
Concept and relation identification
Yes
2 related tags
SW search engine
Remaining tags?
Wikipedia
Find mappings relation for pair of tags
No
Google
END
ltconcept, relation, conceptgt
37Examples
Cluster_1 admin application archive collection component control developer dom example form innovation interface layout planning program repository resource sourcecode
38Examples
Cluster_2 college commerce corporate course education high instructing learn learning lms school student
1http//gate.ac.uk/projects/htechsight/Employment.
daml. 2http//reliant.teknowledge.com/DAML/Mid-lev
el-ontology.daml. 3http//www.mondeca.com/owl/mos
es/ita.owl. 4http//www.cs.utexas.edu/users/mfkb/R
KF/tree/CLib-core-office.owl.
39Faceted Ontology
- Ontology creation and maintenance is automated
- Ontology evolution is driven by task features and
by user changes - Large scale integration of ontology elements from
massively distributed online ontologies - Very different from traditional top-down-designed
ontologies
40Case Study 3Reviewing and Rating on the Web
41Revyu.com
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47Trust Factors
expertise the source has relevant expertise of the domain of the recommendation-seeking this may be formally validated through qualifications or acquired over time.
experience the source has experience of solving similar scenarios in this domain, but without extensive expertise.
impartiality the source does not have vested interests in a particular resolution to the scenario.
affinity the source has characteristics in common with the recommendation seeker, such as shared tastes, standards, values, viewpoints, interests, or expectations.
track record the source has previously provided successful recommendations to the recommendation seeker.
48solution
subjective
objective
affinity
expertiseexperience
factorsemphasised
49Applying the framework to revyu.com
- Affinity
- Operationalised as the degree of overlap in items
reviewed, and in ratings given - Experience
- Proxy metric Usage of particular tags (as
proxies for topics) - Experience scores based on tagging data
- Integrates also data from del.icio.us for those
users who have chosen to publish their
del.icio.us account on FOAF - Expertise
- Proxy metric Credibility
- Captures the social aspect of expertise
endorsement
50Using trust factors for ranking reviews
51(No Transcript)
52PowerAqua and PowerMagpie
53How does the Semantic Web relate to Artificial
Intelligence research?
54AI as Heuristic Search
55The knowledge-based paradigm in AI
- Today there has been a shift in paradigm. The
fundamental problem of understanding intelligence
is not the identification of a few powerful
techniques, but rather the question of how to
represent large amounts of knowledge in a fashion
that permits their effective use - Goldstein and Papert,1977
56(No Transcript)
57Knowledge Representation Hypothesis in AI
- Any mechanically embodied intelligent process
will be comprised of structural ingredients that - we as external observers naturally take to
represent a propositional account of the
knowledge that the overall process exhibits, and - independent of such external semantic
attribution, play a formal but causal and
essential role in engendering the behaviour that
manifests that knowledge - Brian Smith, 1982
58Knowledge-Based Systems
Intelligent Behaviour
59The Knowledge Acquisition Bottleneck
KA Bottleneck
Intelligent Behaviour
60The Cyc project
61Structured libraries of reusable components
Problem Solving Method
Generic Task
Parametric Design
Library of PSMs
Classification
Mapping Knowledge
Application-specific Problem-Solving Knowledge
Scheduling
Ontology
Mapping Ontology
Etc
Application Configuration
Domain Model
62The next knowledge medium
- However, our approach based on structured
libraries of problem solving components only
addressed the economic cost of KBS development
63SW as Enabler of Intelligent Behaviour
Both a platform for knowledge publishing and a
large scale source of knowledge
Intelligent Behaviour
64KBS vs SW Systems
Classic KBS SW Systems
Provenance Centralized Distributed
Size Small/Medium Extra Huge
Repr. Schema Homogeneous Heterogeneous
Quality High Very Variable
Degree of trust High Very Variable
65Key Paradigm Shift
Classic KBS SW Systems
Intelligence A function of sophisticated, logical, task-centric problem solving A side-effect of being able to integrate different types of reasoning to handle size and heterogeneous quality and representation
66Conclusions
67Typical misconceptions
- The SW is a long-term vision
- Ehmactually it already exists
- The SW will never work because nobody is going
to annotate their web pages - The SW is not about annotating web pages, the SW
is a web of data, most of which are generated
from DBs, or from web mining software, or from
applications which produce SW technology - The idea of a universal ontology has failed
before and will fail again. Hence the SW is
doomed - The SW is not about a single universal ontology.
Already there are around 10K ontologies and the
number is growing - SW applications may use 1, 2, 3, or even hundreds
of ontologies.
68Large Scale Distributed Semantics
- Widespread production of formalised knowledge
models (ontologies and metadata), from a variety
of different groups and individuals - E.g., legal, bio-medical, governmental,
environmental, music, art, multimedia, computing,
etc.. - Knowledge modelling to become a new form of
literacy? - Stutt and Motta, 1997
- This large scale heterogenous resource will
enable a new generation of semantic-aware
technologies - These developments may provide a new context in
which to address the economic barriers to KBS
development - The SW already exists to some extent, however
there is still a way to go, before it will reach
the required degree of maturity
69Large Scale Distributed Semantics
- Much like AI, the semantic web will only succeed
if it becomes ubiquitous and hidden
There's this stupid myth out there that A.I. has
failed, but A.I. is everywhere around you every
second of the day. People just don't notice it.
You've got A.I. systems in cars, tuning the
parameters of the fuel injection systems. When
you land in an airplane, your gate gets chosen by
an A.I. scheduling system. Every time you use a
piece of Microsoft software, you've got an A.I.
system trying to figure out what you're doing,
like writing a letter, and it does a pretty
damned good job. Every time you see a movie with
computer-generated characters, they're all little
A.I. characters behaving as a group. Every time
you play a video game, you're playing against an
A.I. system. Rodney Brooks
70(No Transcript)