Where does it break? or: Why Semantic Web research is not just - PowerPoint PPT Presentation

About This Presentation

Title:

Where does it break? or: Why Semantic Web research is not just

Description:

Title: Where does it break? or: Why Semantic Web research is not just Computer Science as usual Author: Frank van Harmelen Description: ESWC 2006 – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 48

Provided by: Frankvan8

Category:

more less

Transcript and Presenter's Notes

Title: Where does it break? or: Why Semantic Web research is not just

1
Where does it break?orWhy Semantic Web
research is not just Computer Science as usual

Frank van Harmelen
AI Department
Vrije Universiteit Amsterdam

2
But first
the Semantic Web forces us to rethink the
foundations of many subfields of Computer
Science
the challenge of the Semantic Web continues to
break many often silently held and shared
assumptions underlying decades of research
I will try to identify silently held
assumptions which are no longer true on the
Semantic Web, prompting a radical rethink of
many past results
3
Oh no, not more vision

Dont worry,there will be lots of technical
content

4
Grand Topics

what are the science challenges in SW?
Which implicit traditional assumptions break?
Illustrated with 4 such traditional assumptions
and also
Which Semantic Web ?

5
Before we go onWhich Semantic Web are we
talking about?
6
Typical SemWeb slide 1
7
Typical SemWeb slide 2
8
Which Semantic Web?

Version 1"Semantic Web as Web of Data" (TBL)
recipeexpose databases on the web, use RDF,
integrate
meta-data from
expressing DB schema semantics in machine
interpretable ways
enable integration and unexpected re-use

9
Which Semantic Web?

Version 2Enrichment of the current Web
recipeAnnotate, classify, index
meta-data from
automatically producing markup named-entity
recognition, concept extraction, tagging, etc.
enable personalisation, search, browse,..

10
Which Semantic Web?

Version 1Semantic Web as Web of Data

Version 2Enrichment of the current Web

Different use-cases
Different techniques
Different users

11
Before we go onThe current state of the
Semantic Web?
12
Whats up in the Semantic Web? The 4 hard
questions

Q1 "where does the meta-data come from?
NL technology is delivering on
concept-extraction
Q2 where do the ontologies come from?
many handcrafted ontologies
ontology learning remains hard
relation extraction remains hard
Q3 what to do with many ontologies?
ontology mapping/aligning remains VERY hard
Q4 wheres the Web in the Semantic Web?
more attention to social aspects (P2P, FOAF)
non-textual media remains hard

Q1 "where does the meta-data come from?
Q2 where do the ontologies come from?
Q3 what to do with many ontologies?
Q4 wheres the Web in the Semantic Web?

13
Whats up in the Semantic Web? The 4 hard
questions

healthy uptake in some areas
knowledge management / intranets
data-integration (Boeing)
life-sciences (e-Science)
convergence with Semantic Grid
cultural heritage
emerging applications in search browse
Elsevier, Ilse, MagPie, KIM

very few applications in
personalisation
mobility/context awareness
Most applications for companies, few
applications for the public

14
Semantic WebScience or technology?
15
Semantic Web as Technology

better search browse
personalisation
semantic linking
semantic web services
...

Semantic Web as Science
16
4 examples ofwhere does it break?

old assumptions that no longer hold,
old approaches that no longer work

17
4 examples ofwhere does it break??
Traditional complexity measures
18
Who cares about decidability?

Decidability completeness guarantee to find
an answer, or tell you it doesnt exist, given
enough run-time memory
Sources of incompleteness
incompleteness of the input data
insufficient run-time to wait for the answer
Completeness is unachievable in practice
anyway, regardless of the completeness of the
algorithm

19
Who cares about undecidability?

Undecidability ? always guaranteed not to
find an answer
Undecidability not always guaranteed to
find an answer
Undecidability may be harmless in many
cases in all cases that matter

20
Who cares about complexity?

worst-case may be exponentially rare
asymptotic
ignores constants

21
What to do instead?

Practical observations on RDF Schema
Compute full closure of O(105) statements
Practical observations on OWL
NEXPTIME ? but fine on many practical cases?
Do more experimental performance profileswith
realistic data
Think hard about average case complexity.

6
9
22
4 examples ofwhere does it break??
Traditional complexity measures ? Hard in
theory, easy in practice
23
ExampleReasoning with Inconsistent Knowledge

This work with
Zhisheng Huang
Annette ten Teije

24
Knowledge will be inconsistent

Because of
mistreatment of defaults
homonyms
migration from another formalism
integration of multiple sources

25
New formal notions are needed

New notions
Accepted
Rejected
Overdetermined
Undetermined
Soundness (only classically justified results)

26
Basic Idea

Start from the query
Incrementally select larger parts of the ontology
that are relevant to the query, until
you have an ontology subpart that is small
enough to be consistent and large enough to
answer the queryor
the selected subpart is already inconsistent
before it can answer the query

Selection function
27
General Framework
s(T,?,2)
s(T,?,1)
s(T,?,0)
28
More precisely

Use selection function s(T,?,k),with s(T,?,k) µ
s(T,?,k1)
Start with k0 s(T,?,0) j¼ ? or s(T,?,0) j¼ ?
?
Increase k, untils(T,?,k) j¼ ? or s(T,?,k) j¼ ?
Abort when
undetermined at maximal k
overdetermined at some k

29
Nice general framework, but...

which selection function s(T,?,k) to use?
Simple option syntactic distance
put all formulae in clausal forma1 Ç a2 Ç Ç
an
distance k1 if some clausal letters overlap a1
Ç X Ç Ç an, b1 Ç X Ç bn
distance k if chain of k overlapping clauses are
neededa1 Ç X Ç X1 Ç an b1 Ç X1 Ç X2 Ç bn,
.c1 Ç Xk Ç X Ç cn

30
Evaluation

Ontologies
Transport 450 conceptsCommunication 200
conceptsMadcow 55 concepts
Selection functions
symbol-relevance axioms overlap by ?1 symbol
concept-relevance ? axioms overlap by ?1 concept
Query a random set of subsumption queries
Concept1 ? Concept2 ?

31
Evaluation Lessons

this makes concept-relevance a high quality
sound approximation(gt 90 recall, 100 precision)

32
Works surprisingly well

On our benchmarks,
allmost all answers are intuitive
Not well understood why
Theory doesnt predict that this is easy
paraconsistent logic,
relevance logic
multi-valued logic
Hypothesis due to local structure of
knowledge?

33
4 examples ofwhere does it break??
Traditional complexity measures ? Hard in
theory, easy in practice ? context-specific
nature of knowledge
34
Opinion poll
35
Opinion poll
36
ExampleOntology mapping with community support

This work with
Zharko Aleksovski Michel Klein

37
The general idea
background knowledge
inference
source
target
mapping
38
Example 1
39
Example 2
40
Results

Example matchings discovered
OLVG Acute respiratory failure AMC Asthma
cardiale
OLVG Aspergillus fumigatus AMC Aspergilloom
OLVG duodenum perforation AMC Gut perforation
OLVG HIVAMC AIDS
OLVG Aorta thoracalis dissectie type B
AMC Dissection of artery

41
Experimental results

Source target flat lists of 1400 ICU terms
each
Anchoring substring simple germanic
morphology
Background DICE (2300 concepts in DL)

42
New results

more background knowledge makes mappings better
DICE (2300 concepts)
MeSH (22000 concepts)
ICD-10 (11000 concepts)
Monotonic improvement of quality
Linear increase of cost

43
Distributed/P2P setting
background knowledge
inference
source
target
mapping
44
So

The OLVG AMC terms get their meaning from the
context in which they are being used.
Different background knowledge would have
resulted in different mappings
Their semantics is not context-free
See also S-MATCH by Trento

45
4 examples ofwhere does it break??
Traditional complexity measures ? Hard in
theory, easy in practice ? context-specific
nature of knowledge ? logic vs. statistics
46
Logic vs. statistics

DB schemas integration isonly logic, no
statistics
AI is both logic and statistics,but completely
disjoint
Find combinations of the two worlds?
Statistics in the logic?
Statistics to control the logic?
Statistics to define the semantics of the logic?

47
Statistics in the logic? Fuzzy DL

(TalksByFrank v InterestingTalks) 0.7
(TurkeyEuropeanCountry) 0.2
youngPerson Person u 9 age.YoungYoung(x)
veryYoungPerson Person u 9 age.very(Young)

1
0
10yr
30yr
1
0
10yr
30yr
48
Statistics to control the logic?

query A v B ?
B B1 u B2 u B3 ?A v B1, A v B2, A v B3 ?

B1
B3
A
49
Statistics to control the logic?

Use Google distance to decide which ones are
reasonable to focus on
Google distance
symmetric conditional probability of
co-occurrence
estimate of semantic distance
estimate of contribution to A v B1 u B2 u B3

50
Statistics to define semantics?

Many peers have many mappings on many terms to
many other peers
Mapping is good if results of whispering game
are truthful
Punish mappings that contribute to bad whispering
results
Network will converge to set of good
mappings(or at least consistent)

51
Statistics to define semantics?

Meaning of terms relations to other terms
Determined by stochastic process
Meaning stable state of self-organising system
statistics getting a system to a
meaning-defining stable state
logic description of such a stable state
Note meaning is still binary, classical
truth-value
Note same system may havemultiple stable states

52
4 examples ofwhere does it break?

old assumptions that no longer hold,
old approaches that no longer work

Traditional complexity measures dont work
completeness, decidability, complexity
Sometimes hard in theory, easy in practice
Q/A over inconsistent ontologies is easy, but
why?
Meaning dependent on context
meaning determined by background knowledge
Logic versus statistics
statistics in the logic
statistics to control the logic
statistics to determine semantics

53
Final comments

These 4 broken assumptions/old methods were
just examples. There are many more.(e.g. Hayes,
Halpin on identity, equality and reference)
Notice that they are interlinked, e.g ? hard
theory/easy practice ? complexity? meaning
in context ? logic/statistics
Working on these will not be SemWeb work per se,
but
they will be inspired by SemWeb challenges
they will help the SemWeb effort (either V1 or V2)

54
Have fun with the puzzles!

Write a Comment

User Comments (0)