Where does it break? or: Why Semantic Web research is not just - PowerPoint PPT Presentation

About This Presentation
Title:

Where does it break? or: Why Semantic Web research is not just

Description:

Title: Where does it break? or: Why Semantic Web research is not just Computer Science as usual Author: Frank van Harmelen Description: ESWC 2006 – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 48
Provided by: Frankvan8
Category:

less

Transcript and Presenter's Notes

Title: Where does it break? or: Why Semantic Web research is not just


1
Where does it break?orWhy Semantic Web
research is not just Computer Science as usual
  • Frank van Harmelen
  • AI Department
  • Vrije Universiteit Amsterdam

2
But first
the Semantic Web forces us to rethink the
foundations of many subfields of Computer
Science
the challenge of the Semantic Web continues to
break many often silently held and shared
assumptions underlying decades of research
I will try to identify silently held
assumptions which are no longer true on the
Semantic Web, prompting a radical rethink of
many past results
3
Oh no, not more vision
  • Dont worry,there will be lots of technical
    content

4
Grand Topics
  • what are the science challenges in SW?
  • Which implicit traditional assumptions break?
  • Illustrated with 4 such traditional assumptions
  • and also
  • Which Semantic Web ?

5
Before we go onWhich Semantic Web are we
talking about?
6
Typical SemWeb slide 1
7
Typical SemWeb slide 2
8
Which Semantic Web?
  • Version 1"Semantic Web as Web of Data" (TBL)
  • recipeexpose databases on the web, use RDF,
    integrate
  • meta-data from
  • expressing DB schema semantics in machine
    interpretable ways
  • enable integration and unexpected re-use

9
Which Semantic Web?
  • Version 2Enrichment of the current Web
  • recipeAnnotate, classify, index
  • meta-data from
  • automatically producing markup named-entity
    recognition, concept extraction, tagging, etc.
  • enable personalisation, search, browse,..

10
Which Semantic Web?
  • Version 1Semantic Web as Web of Data
  • Version 2Enrichment of the current Web
  • Different use-cases
  • Different techniques
  • Different users

11
Before we go onThe current state of the
Semantic Web?
12
Whats up in the Semantic Web? The 4 hard
questions
  • Q1 "where does the meta-data come from?
  • NL technology is delivering on
    concept-extraction
  • Q2 where do the ontologies come from?
  • many handcrafted ontologies
  • ontology learning remains hard
  • relation extraction remains hard
  • Q3 what to do with many ontologies?
  • ontology mapping/aligning remains VERY hard
  • Q4 wheres the Web in the Semantic Web?
  • more attention to social aspects (P2P, FOAF)
  • non-textual media remains hard
  • Q1 "where does the meta-data come from?
  • Q2 where do the ontologies come from?
  • Q3 what to do with many ontologies?
  • Q4 wheres the Web in the Semantic Web?

13
Whats up in the Semantic Web? The 4 hard
questions
  • healthy uptake in some areas
  • knowledge management / intranets
  • data-integration (Boeing)
  • life-sciences (e-Science)
  • convergence with Semantic Grid
  • cultural heritage
  • emerging applications in search browse
  • Elsevier, Ilse, MagPie, KIM
  • very few applications in
  • personalisation
  • mobility/context awareness
  • Most applications for companies, few
    applications for the public

14
Semantic WebScience or technology?
15
Semantic Web as Technology
  • better search browse
  • personalisation
  • semantic linking
  • semantic web services
  • ...

Semantic Web as Science
16
4 examples ofwhere does it break?
  • old assumptions that no longer hold,
  • old approaches that no longer work

17
4 examples ofwhere does it break??
Traditional complexity measures
18
Who cares about decidability?
  • Decidability completeness guarantee to find
    an answer, or tell you it doesnt exist, given
    enough run-time memory
  • Sources of incompleteness
  • incompleteness of the input data
  • insufficient run-time to wait for the answer
  • Completeness is unachievable in practice
    anyway, regardless of the completeness of the
    algorithm

19
Who cares about undecidability?
  • Undecidability ? always guaranteed not to
    find an answer
  • Undecidability not always guaranteed to
    find an answer
  • Undecidability may be harmless in many
    cases in all cases that matter

20
Who cares about complexity?
  • worst-case may be exponentially rare
  • asymptotic
  • ignores constants

21
What to do instead?
  • Practical observations on RDF Schema
  • Compute full closure of O(105) statements
  • Practical observations on OWL
  • NEXPTIME ? but fine on many practical cases?
  • Do more experimental performance profileswith
    realistic data
  • Think hard about average case complexity.

6
9
22
4 examples ofwhere does it break??
Traditional complexity measures ? Hard in
theory, easy in practice
23
ExampleReasoning with Inconsistent Knowledge
  • This work with
  • Zhisheng Huang
  • Annette ten Teije

24
Knowledge will be inconsistent
  • Because of
  • mistreatment of defaults
  • homonyms
  • migration from another formalism
  • integration of multiple sources

25
New formal notions are needed
  • New notions
  • Accepted
  • Rejected
  • Overdetermined
  • Undetermined
  • Soundness (only classically justified results)

26
Basic Idea
  • Start from the query
  • Incrementally select larger parts of the ontology
    that are relevant to the query, until
  • you have an ontology subpart that is small
    enough to be consistent and large enough to
    answer the queryor
  • the selected subpart is already inconsistent
    before it can answer the query

Selection function
27
General Framework
s(T,?,2)
s(T,?,1)
s(T,?,0)
28
More precisely
  • Use selection function s(T,?,k),with s(T,?,k) µ
    s(T,?,k1)
  • Start with k0 s(T,?,0) j¼ ? or s(T,?,0) j¼ ?
    ?
  • Increase k, untils(T,?,k) j¼ ? or s(T,?,k) j¼ ?
  • Abort when
  • undetermined at maximal k
  • overdetermined at some k

29
Nice general framework, but...
  • which selection function s(T,?,k) to use?
  • Simple option syntactic distance
  • put all formulae in clausal forma1 Ç a2 Ç Ç
    an
  • distance k1 if some clausal letters overlap a1
    Ç X Ç Ç an, b1 Ç X Ç bn
  • distance k if chain of k overlapping clauses are
    neededa1 Ç X Ç X1 Ç an b1 Ç X1 Ç X2 Ç bn,
    .c1 Ç Xk Ç X Ç cn

30
Evaluation
  • Ontologies
  • Transport 450 conceptsCommunication 200
    conceptsMadcow 55 concepts
  • Selection functions
  • symbol-relevance axioms overlap by ?1 symbol
  • concept-relevance ? axioms overlap by ?1 concept
  • Query a random set of subsumption queries
  • Concept1 ? Concept2 ?

31
Evaluation Lessons
  • this makes concept-relevance a high quality
    sound approximation(gt 90 recall, 100 precision)

32
Works surprisingly well
  • On our benchmarks,
  • allmost all answers are intuitive
  • Not well understood why
  • Theory doesnt predict that this is easy
  • paraconsistent logic,
  • relevance logic
  • multi-valued logic
  • Hypothesis due to local structure of
    knowledge?

33
4 examples ofwhere does it break??
Traditional complexity measures ? Hard in
theory, easy in practice ? context-specific
nature of knowledge
34
Opinion poll
35
Opinion poll
36
ExampleOntology mapping with community support
  • This work with
  • Zharko Aleksovski Michel Klein

37
The general idea
background knowledge
inference
source
target
mapping
38
Example 1
39
Example 2
40
Results
  • Example matchings discovered
  • OLVG Acute respiratory failure AMC Asthma
    cardiale
  • OLVG Aspergillus fumigatus AMC Aspergilloom
  • OLVG duodenum perforation AMC Gut perforation
  • OLVG HIVAMC AIDS
  • OLVG Aorta thoracalis dissectie type B
    AMC Dissection of artery

41
Experimental results
  • Source target flat lists of 1400 ICU terms
    each
  • Anchoring substring simple germanic
    morphology
  • Background DICE (2300 concepts in DL)

42
New results
  • more background knowledge makes mappings better
  • DICE (2300 concepts)
  • MeSH (22000 concepts)
  • ICD-10 (11000 concepts)
  • Monotonic improvement of quality
  • Linear increase of cost

43
Distributed/P2P setting
background knowledge
inference
source
target
mapping
44
So
  • The OLVG AMC terms get their meaning from the
    context in which they are being used.
  • Different background knowledge would have
    resulted in different mappings
  • Their semantics is not context-free
  • See also S-MATCH by Trento

45
4 examples ofwhere does it break??
Traditional complexity measures ? Hard in
theory, easy in practice ? context-specific
nature of knowledge ? logic vs. statistics
46
Logic vs. statistics
  • DB schemas integration isonly logic, no
    statistics
  • AI is both logic and statistics,but completely
    disjoint
  • Find combinations of the two worlds?
  • Statistics in the logic?
  • Statistics to control the logic?
  • Statistics to define the semantics of the logic?

47
Statistics in the logic? Fuzzy DL
  • (TalksByFrank v InterestingTalks) 0.7
  • (TurkeyEuropeanCountry) 0.2
  • youngPerson Person u 9 age.YoungYoung(x)
  • veryYoungPerson Person u 9 age.very(Young)

1
0
10yr
30yr
1
0
10yr
30yr
48
Statistics to control the logic?
  • query A v B ?
  • B B1 u B2 u B3 ?A v B1, A v B2, A v B3 ?

B1
B3
A
49
Statistics to control the logic?
  • Use Google distance to decide which ones are
    reasonable to focus on
  • Google distance
  • symmetric conditional probability of
    co-occurrence
  • estimate of semantic distance
  • estimate of contribution to A v B1 u B2 u B3

50
Statistics to define semantics?
  • Many peers have many mappings on many terms to
    many other peers
  • Mapping is good if results of whispering game
    are truthful
  • Punish mappings that contribute to bad whispering
    results
  • Network will converge to set of good
    mappings(or at least consistent)

51
Statistics to define semantics?
  • Meaning of terms relations to other terms
  • Determined by stochastic process
  • Meaning stable state of self-organising system
  • statistics getting a system to a
    meaning-defining stable state
  • logic description of such a stable state
  • Note meaning is still binary, classical
    truth-value
  • Note same system may havemultiple stable states

52
4 examples ofwhere does it break?
  • old assumptions that no longer hold,
  • old approaches that no longer work
  • Traditional complexity measures dont work
  • completeness, decidability, complexity
  • Sometimes hard in theory, easy in practice
  • Q/A over inconsistent ontologies is easy, but
    why?
  • Meaning dependent on context
  • meaning determined by background knowledge
  • Logic versus statistics
  • statistics in the logic
  • statistics to control the logic
  • statistics to determine semantics

53
Final comments
  • These 4 broken assumptions/old methods were
    just examples. There are many more.(e.g. Hayes,
    Halpin on identity, equality and reference)
  • Notice that they are interlinked, e.g ? hard
    theory/easy practice ? complexity? meaning
    in context ? logic/statistics
  • Working on these will not be SemWeb work per se,
    but
  • they will be inspired by SemWeb challenges
  • they will help the SemWeb effort (either V1 or V2)

54
Have fun with the puzzles!
Write a Comment
User Comments (0)
About PowerShow.com