Exploiting large scale web semantics to build end user applications - PowerPoint PPT Presentation

About This Presentation
Title:

Exploiting large scale web semantics to build end user applications

Description:

Knowledge Media Institute. The Open University. Aims of the Talk. What is the Semantic Web ... video game, you're playing against an A.I. system.' Rodney Brooks ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 71
Provided by: Harriett9
Category:

less

Transcript and Presenter's Notes

Title: Exploiting large scale web semantics to build end user applications


1
Exploiting large scale web semantics to build
end user applications
  • Enrico Motta
  • Professor of Knowledge Technologies
  • Knowledge Media Institute
  • The Open University

2
Aims of the Talk
  • What is the Semantic Web
  • Perspectives
  • The SW as a web of data
  • The SW as a new context in which to build
    semantic applications and an unprecedented
    opportunity in which to address some classic AI
    problems
  • Typical misconceptions
  • What the SW is not!
  • Semantic Web for Users
  • Applications that do something interesting and
    useful to users, by exploiting available web
    semantics

3
The Semantic Web as a Web of Data
  • Making data available to SW-aware software

4
(No Transcript)
5
(No Transcript)
6
ltfoafPerson rdfabout"http//identifiers.kmi.ope
n.ac.uk/people/enrico-motta/"gt
ltfoafnamegtEnrico Mottalt/foafnamegt
ltfoaffirstNamegtEnricolt/foaffirstNamegt
ltfoafsurnamegtMottalt/foafsurnamegt ltfoafphone
rdfresource"tel44-(0)1908-653506"/gt
ltfoafhomepage rdfresource"http//kmi.open.ac.uk
/people/motta/"/gt ltfoafworkplaceHomepage
rdfresource"http//kmi.open.ac.uk/"/gt
ltfoafdepiction rdfresource"http//kmi.open.ac.u
k/img/members/enrico.jpg"/gt ltfoaftopic_interest
gtKnowledge Technologieslt/foaftopic_interestgt
ltfoaftopic_interestgtSemantic Weblt/foaftopic_inte
restgt ltfoaftopic_interestgtOntologieslt/foaftopi
c_interestgt ltfoaftopic_interestgtProblem
Solving Methodslt/foaftopic_interestgt
ltfoaftopic_interestgtKnowledge Modellinglt/foaftop
ic_interestgt ltfoaftopic_interestgtKnowledge
Managementlt/foaftopic_interestgt
ltfoafbased_neargt ltgeoPointgt
ltgeolatgt52.024868lt/geolatgt
ltgeolonggt-0.707143lt/geolonggt
ltcontactnearestAirportgt
ltairportnamegtLondon Luton Airportlt/airportnamegt
ltairportiataCodegtLTNlt/airportiataCodegt
ltairportlocationgtLuton, United
Kingdomlt/airportlocationgt
ltgeolatgt51.866666666667lt/geolatgt
ltgeolonggt-0.36666666666667lt/geolonggt
ltrdfsseeAlso rdfresource"http//www.daml.org/cg
i-bin/airport?LTN"/gt ltfoafcurrentProjectgt ltf
oafProjectgt ltfoafnamegtAquaLoglt/foafnamegt
lt/foafcurrentProjectgt
7
The web of SW documents
8
Current status of the semantic web
  • 10-20 million semantic web documents
  • Expressed in RDF, OWL, DAMLOIL
  • 7K-10K ontologies
  • These cover a variety of domains - music,
    multimedia, computing, management, bio-medical
    sciences, upper level concepts, etc
  • Hence
  • To a significant extent the semantic web is
    already in place
  • However, domain coverage is very uneven
  • Still primarily a research enterprise, however
    interest is rapidly increasing in both
    governmental and business organizations
  • early adopters phase

The above figures refer to resources which are
publicly accessible on the web
9
ltdata data datagt
ltdata data datagt
ltdata data datagt
ltdata data datagt
ltdata data datagt
ltdata data datagt
10
(No Transcript)
11
Bibliographic Data
CS Dept Data
Geography
AKT Reference Ontology
RDF Data
12
(No Transcript)
13
Corporate Semantic Webs
  • A corporate ontology is used to provide a
    homogeneous view over heterogeneous data sources.
  • Often tackle Enterprise Information Integration
    scenarios
  • Hailed by Gartner as one of the key emerging
    strategic technology trends
  • E.g., Garlik is a multi-million startup recently
    set up in UK to support personal information
    management, which uses an ontology to integrate
    data mined from the web on a large scale

14
(No Transcript)
15
(No Transcript)
16
AquaLog
17
Applications that exploit large scale semantic
content
18
The web of data
19
Gateways to the SW
SemanticWeb
Application
Semantic Web Gateway
20
  • Sophisticated quality control mechanism
  • Detects duplications
  • Fixes obvious syntax problems
  • E.g., duplicated ontology IDs, namespaces, etc..
  • Structures ontologies in a network
  • Using relations such as extends,
    inconsistentWith, duplicates
  • Provides interfaces for both human users and
    software programs
  • Provides efficient API
  • Supports formal queries (SPARQL)
  • Variety of ontology ranking mechanisms
  • Modularization/Combination support
  • Plug-ins for Protégé and NeOn Toolkit
  • Very cool logo!

21
(No Transcript)
22
(No Transcript)
23
Case Study 1 Automatic Alignment of Thesauri in
the Agricultural/Fishery Domain
24
Method
  • SCARLET - matching by Harvesting the SW
  • Automatically select and combine multiple online
    ontologies to derive a relation

Access
Semantic Web
Scarlet
Deduce
Concept_A (e.g., Supermarket)
Concept_B (e.g., Building)
Semantic Relation ( )
25
Two strategies
Building
OrganicChemical
PublicBuilding
Lipid
Shop
Steroid
Steroid
Supermarket
Cholesterol
Semantic Web
Scarlet
Scarlet
Building
Cholesterol
OrganicChemical
Supermarket
(A)
(B)
Deriving relations from (A) one ontology and (B)
across ontologies.
26
Experiment
  • Matching
  • AGROVOC
  • UNs Food and Agriculture
  • Organisation (FAO) thesaurus
  • 28.174 descriptor terms
  • 10.028 non-descriptor terms
  • NALT
  • US National Agricultural
  • Library Thesaurus
  • 41.577 descriptor terms
  • 24.525 non-descriptor terms

27
226 Used Ontologies
http//139.91.183.309090/RDF/VRP/Examples/tap.rdf
http//reliant.teknowledge.com/DAML/SUMO.daml
http//reliant.teknowledge.com/DAML/Mid-level-onto
logy.daml
http//reliant.teknowledge.com/DAML/Economy.daml
http//gate.ac.uk/projects/ htechsight/Technologie
s.daml
28
Evaluation 1 - Precision
  • Manual assessment of 1000 mappings (15)
  • Evaluators
  • Researchers in the area of the Semantic Web
  • 6 people split in two groups
  • Results
  • Comparable to best results for background
    knowledge based matchers.

29
Evaluation 2 Error Analysis
30
Case Study 2Folksonomy Tagspace Enrichment
31
Features of Web2.0 sites
  • Tagging as opposed to rigid classification
  • Dynamic vocabulary does not require much
    annotation effort and evolves easily
  • Shared vocabulary emerge over time
  • certain tags become particularly popular

32
Limitations of tagging
  • Different granularity of tagging
  • rome vs colosseum vs roman monument
  • Flower vs tulip
  • Etc..
  • Multilinguality
  • Spelling errors, different terminology, plural vs
    singular, etc
  • This has a number of negative implications for
    the effective use of tagged resources
  • e.g., Search exhibits very poor recall

33
Giving meaning to tags
34
What does it mean to add semantics to tags?
  • 1. Mapping a tag to a SW element
  • "japan"
    ltaktCountry Japangt

35
Applications of the approach
  • To improve recall in keyword search
  • To support annotation by dynamically suggesting
    relevant tags or visualizing the structure of
    relevant tags
  • To enable formal queries over a space of tags
  • Hence, going beyond keyword search
  • To support new forms of intelligent navigation
  • i.e., using the 'semantic layer' to support
    navigation

36
Folksonomy
Clustering
Analyze co-occurrence of tags
Co-occurence matrix
Cluster tags
Cluster1
Cluster2
Clustern

Concept and relation identification
Yes
2 related tags
SW search engine
Remaining tags?
Wikipedia
Find mappings relation for pair of tags
No
Google
END
ltconcept, relation, conceptgt
37
Examples
Cluster_1 admin application archive collection component control developer dom example form innovation interface layout planning program repository resource sourcecode
38
Examples
Cluster_2 college commerce corporate course education high instructing learn learning lms school student
1http//gate.ac.uk/projects/htechsight/Employment.
daml. 2http//reliant.teknowledge.com/DAML/Mid-lev
el-ontology.daml. 3http//www.mondeca.com/owl/mos
es/ita.owl. 4http//www.cs.utexas.edu/users/mfkb/R
KF/tree/CLib-core-office.owl.
39
Faceted Ontology
  • Ontology creation and maintenance is automated
  • Ontology evolution is driven by task features and
    by user changes
  • Large scale integration of ontology elements from
    massively distributed online ontologies
  • Very different from traditional top-down-designed
    ontologies

40
Case Study 3Reviewing and Rating on the Web
41
Revyu.com
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
Trust Factors
expertise the source has relevant expertise of the domain of the recommendation-seeking this may be formally validated through qualifications or acquired over time.
experience the source has experience of solving similar scenarios in this domain, but without extensive expertise.
impartiality the source does not have vested interests in a particular resolution to the scenario.
affinity the source has characteristics in common with the recommendation seeker, such as shared tastes, standards, values, viewpoints, interests, or expectations.
track record the source has previously provided successful recommendations to the recommendation seeker.
48
solution
subjective
objective
affinity
expertiseexperience
factorsemphasised
49
Applying the framework to revyu.com
  • Affinity
  • Operationalised as the degree of overlap in items
    reviewed, and in ratings given
  • Experience
  • Proxy metric Usage of particular tags (as
    proxies for topics)
  • Experience scores based on tagging data
  • Integrates also data from del.icio.us for those
    users who have chosen to publish their
    del.icio.us account on FOAF
  • Expertise
  • Proxy metric Credibility
  • Captures the social aspect of expertise
    endorsement

50
Using trust factors for ranking reviews
51
(No Transcript)
52
PowerAqua and PowerMagpie
53
How does the Semantic Web relate to Artificial
Intelligence research?
54
AI as Heuristic Search
55
The knowledge-based paradigm in AI
  • Today there has been a shift in paradigm. The
    fundamental problem of understanding intelligence
    is not the identification of a few powerful
    techniques, but rather the question of how to
    represent large amounts of knowledge in a fashion
    that permits their effective use
  • Goldstein and Papert,1977

56
(No Transcript)
57
Knowledge Representation Hypothesis in AI
  • Any mechanically embodied intelligent process
    will be comprised of structural ingredients that
  • we as external observers naturally take to
    represent a propositional account of the
    knowledge that the overall process exhibits, and
  • independent of such external semantic
    attribution, play a formal but causal and
    essential role in engendering the behaviour that
    manifests that knowledge
  • Brian Smith, 1982

58
Knowledge-Based Systems
Intelligent Behaviour
59
The Knowledge Acquisition Bottleneck
KA Bottleneck
Intelligent Behaviour
60
The Cyc project
61
Structured libraries of reusable components
Problem Solving Method
Generic Task
Parametric Design
Library of PSMs
Classification
Mapping Knowledge
Application-specific Problem-Solving Knowledge
Scheduling
Ontology
Mapping Ontology
Etc
Application Configuration
Domain Model
62
The next knowledge medium
  • However, our approach based on structured
    libraries of problem solving components only
    addressed the economic cost of KBS development

63
SW as Enabler of Intelligent Behaviour
Both a platform for knowledge publishing and a
large scale source of knowledge
Intelligent Behaviour
64
KBS vs SW Systems
Classic KBS SW Systems
Provenance Centralized Distributed
Size Small/Medium Extra Huge
Repr. Schema Homogeneous Heterogeneous
Quality High Very Variable
Degree of trust High Very Variable
65
Key Paradigm Shift
Classic KBS SW Systems
Intelligence A function of sophisticated, logical, task-centric problem solving A side-effect of being able to integrate different types of reasoning to handle size and heterogeneous quality and representation
66
Conclusions
67
Typical misconceptions
  • The SW is a long-term vision
  • Ehmactually it already exists
  • The SW will never work because nobody is going
    to annotate their web pages
  • The SW is not about annotating web pages, the SW
    is a web of data, most of which are generated
    from DBs, or from web mining software, or from
    applications which produce SW technology
  • The idea of a universal ontology has failed
    before and will fail again. Hence the SW is
    doomed
  • The SW is not about a single universal ontology.
    Already there are around 10K ontologies and the
    number is growing
  • SW applications may use 1, 2, 3, or even hundreds
    of ontologies.

68
Large Scale Distributed Semantics
  • Widespread production of formalised knowledge
    models (ontologies and metadata), from a variety
    of different groups and individuals
  • E.g., legal, bio-medical, governmental,
    environmental, music, art, multimedia, computing,
    etc..
  • Knowledge modelling to become a new form of
    literacy?
  • Stutt and Motta, 1997
  • This large scale heterogenous resource will
    enable a new generation of semantic-aware
    technologies
  • These developments may provide a new context in
    which to address the economic barriers to KBS
    development
  • The SW already exists to some extent, however
    there is still a way to go, before it will reach
    the required degree of maturity

69
Large Scale Distributed Semantics
  • Much like AI, the semantic web will only succeed
    if it becomes ubiquitous and hidden

There's this stupid myth out there that A.I. has
failed, but A.I. is everywhere around you every
second of the day. People just don't notice it.
You've got A.I. systems in cars, tuning the
parameters of the fuel injection systems. When
you land in an airplane, your gate gets chosen by
an A.I. scheduling system. Every time you use a
piece of Microsoft software, you've got an A.I.
system trying to figure out what you're doing,
like writing a letter, and it does a pretty
damned good job. Every time you see a movie with
computer-generated characters, they're all little
A.I. characters behaving as a group. Every time
you play a video game, you're playing against an
A.I. system. Rodney Brooks
70
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com