Title: Where does it break? or: Why Semantic Web research is not just
1Where does it break?orWhy Semantic Web
research is not just Computer Science as usual
- Frank van Harmelen
- AI Department
- Vrije Universiteit Amsterdam
2But first
the Semantic Web forces us to rethink the
foundations of many subfields of Computer
Science
the challenge of the Semantic Web continues to
break many often silently held and shared
assumptions underlying decades of research
I will try to identify silently held
assumptions which are no longer true on the
Semantic Web, prompting a radical rethink of
many past results
3Oh no, not more vision
- Dont worry,there will be lots of technical
content
4Grand Topics
- what are the science challenges in SW?
- Which implicit traditional assumptions break?
- Illustrated with 4 such traditional assumptions
- and also
- Which Semantic Web ?
5Before we go onWhich Semantic Web are we
talking about?
6Typical SemWeb slide 1
7Typical SemWeb slide 2
8Which Semantic Web?
- Version 1"Semantic Web as Web of Data" (TBL)
- recipeexpose databases on the web, use RDF,
integrate - meta-data from
- expressing DB schema semantics in machine
interpretable ways - enable integration and unexpected re-use
9Which Semantic Web?
- Version 2Enrichment of the current Web
- recipeAnnotate, classify, index
- meta-data from
- automatically producing markup named-entity
recognition, concept extraction, tagging, etc. - enable personalisation, search, browse,..
10Which Semantic Web?
- Version 1Semantic Web as Web of Data
- Version 2Enrichment of the current Web
- Different use-cases
- Different techniques
- Different users
11Before we go onThe current state of the
Semantic Web?
12Whats up in the Semantic Web? The 4 hard
questions
- Q1 "where does the meta-data come from?
- NL technology is delivering on
concept-extraction - Q2 where do the ontologies come from?
- many handcrafted ontologies
- ontology learning remains hard
- relation extraction remains hard
- Q3 what to do with many ontologies?
- ontology mapping/aligning remains VERY hard
- Q4 wheres the Web in the Semantic Web?
- more attention to social aspects (P2P, FOAF)
- non-textual media remains hard
- Q1 "where does the meta-data come from?
- Q2 where do the ontologies come from?
- Q3 what to do with many ontologies?
- Q4 wheres the Web in the Semantic Web?
13Whats up in the Semantic Web? The 4 hard
questions
- healthy uptake in some areas
- knowledge management / intranets
- data-integration (Boeing)
- life-sciences (e-Science)
- convergence with Semantic Grid
- cultural heritage
- emerging applications in search browse
- Elsevier, Ilse, MagPie, KIM
- very few applications in
- personalisation
- mobility/context awareness
- Most applications for companies, few
applications for the public
14Semantic WebScience or technology?
15Semantic Web as Technology
- better search browse
- personalisation
- semantic linking
- semantic web services
- ...
Semantic Web as Science
164 examples ofwhere does it break?
- old assumptions that no longer hold,
- old approaches that no longer work
174 examples ofwhere does it break??
Traditional complexity measures
18Who cares about decidability?
- Decidability completeness guarantee to find
an answer, or tell you it doesnt exist, given
enough run-time memory - Sources of incompleteness
- incompleteness of the input data
- insufficient run-time to wait for the answer
- Completeness is unachievable in practice
anyway, regardless of the completeness of the
algorithm
19Who cares about undecidability?
- Undecidability ? always guaranteed not to
find an answer - Undecidability not always guaranteed to
find an answer - Undecidability may be harmless in many
cases in all cases that matter
20Who cares about complexity?
- worst-case may be exponentially rare
- asymptotic
- ignores constants
21What to do instead?
- Practical observations on RDF Schema
- Compute full closure of O(105) statements
- Practical observations on OWL
- NEXPTIME ? but fine on many practical cases?
- Do more experimental performance profileswith
realistic data - Think hard about average case complexity.
6
9
224 examples ofwhere does it break??
Traditional complexity measures ? Hard in
theory, easy in practice
23ExampleReasoning with Inconsistent Knowledge
- This work with
- Zhisheng Huang
- Annette ten Teije
24Knowledge will be inconsistent
- Because of
- mistreatment of defaults
- homonyms
- migration from another formalism
- integration of multiple sources
25New formal notions are needed
- New notions
- Accepted
- Rejected
- Overdetermined
- Undetermined
- Soundness (only classically justified results)
26Basic Idea
- Start from the query
- Incrementally select larger parts of the ontology
that are relevant to the query, until - you have an ontology subpart that is small
enough to be consistent and large enough to
answer the queryor - the selected subpart is already inconsistent
before it can answer the query
Selection function
27General Framework
s(T,?,2)
s(T,?,1)
s(T,?,0)
28More precisely
- Use selection function s(T,?,k),with s(T,?,k) µ
s(T,?,k1) - Start with k0 s(T,?,0) j¼ ? or s(T,?,0) j¼ ?
? - Increase k, untils(T,?,k) j¼ ? or s(T,?,k) j¼ ?
- Abort when
- undetermined at maximal k
- overdetermined at some k
29Nice general framework, but...
- which selection function s(T,?,k) to use?
- Simple option syntactic distance
- put all formulae in clausal forma1 Ç a2 Ç Ç
an - distance k1 if some clausal letters overlap a1
Ç X Ç Ç an, b1 Ç X Ç bn - distance k if chain of k overlapping clauses are
neededa1 Ç X Ç X1 Ç an b1 Ç X1 Ç X2 Ç bn,
.c1 Ç Xk Ç X Ç cn
30Evaluation
- Ontologies
- Transport 450 conceptsCommunication 200
conceptsMadcow 55 concepts - Selection functions
- symbol-relevance axioms overlap by ?1 symbol
- concept-relevance ? axioms overlap by ?1 concept
- Query a random set of subsumption queries
- Concept1 ? Concept2 ?
31Evaluation Lessons
- this makes concept-relevance a high quality
sound approximation(gt 90 recall, 100 precision)
32Works surprisingly well
- On our benchmarks,
- allmost all answers are intuitive
- Not well understood why
- Theory doesnt predict that this is easy
- paraconsistent logic,
- relevance logic
- multi-valued logic
- Hypothesis due to local structure of
knowledge?
334 examples ofwhere does it break??
Traditional complexity measures ? Hard in
theory, easy in practice ? context-specific
nature of knowledge
34Opinion poll
35Opinion poll
36ExampleOntology mapping with community support
- This work with
- Zharko Aleksovski Michel Klein
37The general idea
background knowledge
inference
source
target
mapping
38Example 1
39Example 2
40Results
- Example matchings discovered
- OLVG Acute respiratory failure AMC Asthma
cardiale - OLVG Aspergillus fumigatus AMC Aspergilloom
- OLVG duodenum perforation AMC Gut perforation
- OLVG HIVAMC AIDS
- OLVG Aorta thoracalis dissectie type B
AMC Dissection of artery
41Experimental results
- Source target flat lists of 1400 ICU terms
each - Anchoring substring simple germanic
morphology - Background DICE (2300 concepts in DL)
42New results
- more background knowledge makes mappings better
- DICE (2300 concepts)
- MeSH (22000 concepts)
- ICD-10 (11000 concepts)
- Monotonic improvement of quality
- Linear increase of cost
43Distributed/P2P setting
background knowledge
inference
source
target
mapping
44So
- The OLVG AMC terms get their meaning from the
context in which they are being used. - Different background knowledge would have
resulted in different mappings - Their semantics is not context-free
- See also S-MATCH by Trento
454 examples ofwhere does it break??
Traditional complexity measures ? Hard in
theory, easy in practice ? context-specific
nature of knowledge ? logic vs. statistics
46Logic vs. statistics
- DB schemas integration isonly logic, no
statistics - AI is both logic and statistics,but completely
disjoint - Find combinations of the two worlds?
- Statistics in the logic?
- Statistics to control the logic?
- Statistics to define the semantics of the logic?
47Statistics in the logic? Fuzzy DL
- (TalksByFrank v InterestingTalks) 0.7
- (TurkeyEuropeanCountry) 0.2
- youngPerson Person u 9 age.YoungYoung(x)
- veryYoungPerson Person u 9 age.very(Young)
1
0
10yr
30yr
1
0
10yr
30yr
48Statistics to control the logic?
- query A v B ?
- B B1 u B2 u B3 ?A v B1, A v B2, A v B3 ?
B1
B3
A
49Statistics to control the logic?
- Use Google distance to decide which ones are
reasonable to focus on - Google distance
- symmetric conditional probability of
co-occurrence - estimate of semantic distance
- estimate of contribution to A v B1 u B2 u B3
50Statistics to define semantics?
- Many peers have many mappings on many terms to
many other peers - Mapping is good if results of whispering game
are truthful - Punish mappings that contribute to bad whispering
results - Network will converge to set of good
mappings(or at least consistent)
51Statistics to define semantics?
- Meaning of terms relations to other terms
- Determined by stochastic process
- Meaning stable state of self-organising system
- statistics getting a system to a
meaning-defining stable state - logic description of such a stable state
- Note meaning is still binary, classical
truth-value - Note same system may havemultiple stable states
524 examples ofwhere does it break?
- old assumptions that no longer hold,
- old approaches that no longer work
- Traditional complexity measures dont work
- completeness, decidability, complexity
- Sometimes hard in theory, easy in practice
- Q/A over inconsistent ontologies is easy, but
why? - Meaning dependent on context
- meaning determined by background knowledge
- Logic versus statistics
- statistics in the logic
- statistics to control the logic
- statistics to determine semantics
53Final comments
- These 4 broken assumptions/old methods were
just examples. There are many more.(e.g. Hayes,
Halpin on identity, equality and reference) - Notice that they are interlinked, e.g ? hard
theory/easy practice ? complexity? meaning
in context ? logic/statistics - Working on these will not be SemWeb work per se,
but - they will be inspired by SemWeb challenges
- they will help the SemWeb effort (either V1 or V2)
54Have fun with the puzzles!