DB Lunch @ Berkeley 10.28.05 - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

DB Lunch @ Berkeley 10.28.05

Description:

... in Research are managed by the Swiss National Science Foundation on behalf of ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 43
Provided by: philippecu
Category:
Tags: berkeley | com | lunch | mms | swiss

less

Transcript and Presenter's Notes

Title: DB Lunch @ Berkeley 10.28.05


1
DB Lunch _at_ Berkeley 10.28.05
  • Semantic Interoperability in Large Scale
    Heterogeneous Networks
  • Philippe Cudré-Mauroux, EPFL
  • Joint work with
  • Karl Aberer (advisor _at_ EPFL)
  • Manfred Hauswirth (Semantic Gossiping)
  • T. van Pelt, L. Zhou A. Feher (Implementation)

2
Overview
  • Motivation
  • Picture Sharing in Decentralized Settings
  • Decentralized Data Integration
  • Peer Data Management Systems
  • Probabilistic Message-passing
  • Aspects of self-organization
  • Studying semantic interoperability in the large
  • Applications
  • GridVine
  • PicShark
  • Conclusions

3
1. Motivation Picture Sharing
  • Profusion of Digital Images
  • Variety of powerful devices
  • gigabytes of pictures is the new norm
  • Most of the images are kept local
  • Some are shared
  • Mostly point-to-point
  • Primitive search capabilities

4
Opportunity
  • More and more software use metadata to organize
    images locally
  • (Semi) Structured metadata (e.g., XML, PSA)
  • Ontological metadata (e.g., RDF, XMP)
  • Type-based metadata (e.g., WinFS)

lt?xpacket begin'' id'W5M0MpCehiHzreSzNTczkc9d'?gt
ltxxapmeta xmlnsx'adobensmeta/'gt ltrdfRDF
xmlnsrdf 'http//www.w3.org/1999/02/22-rdf-synt
ax-ns'gt ltrdfDescription about''
xmlnsxap'http//ns.adobe.com/xap/1.0/'gt
ltxapCreateDategt2001-12-19T184903Zlt/xapCreateDa
tegt ltxapModifyDategt2001-12-19T200928Zlt/xap
ModifyDategt ltxapCreatorgt John Doe
lt/xapCreatorgt lt/rdfDescriptiongt
5
Hurdle Metadata Heterogeneity
  • Why not taking advantage of those metadata in a
    distributed setting?
  • X Syntactic discrepancies
  • X Semantic heterogeneity
  • All the aforementioned standards are extensible
  • Shared representation is not enough

ImageGUID cDate
A0657B25 05.08.04
109E7A25 05.08.04
VS
ltescDategt 05/08/2004 lt/escDategt
ltrdfProperty rdfIDLength-Y"gt
ltrdfslabelgtLength-Ylt/rdfslabelgt
ltrdfssubPropertyOf rdfresource"length"/gt lt/rdf
Propertygt
ltrdfProperty rdfID"width"gt
ltrdfslabelgtWidthlt/rdfslabelgt
ltrdfssubPropertyOf rdfresource"length"/gt lt/rdf
Propertygt
VS
6
Beyond Keyword Search
  • searching semantically richer objects in large
    scale heterogeneous networks

ltxapCreateDategt2001-12-19T184903Zlt/xapCreateDa
tegt ltxapModifyDategt2001-12-19T200928Zlt/xapModi
fyDategt
date?
ltesDofCreationgt 05/08/2004 lt/esDofCreationgt
?
?
?
?
?
ltmyRDFDategt Jan 1, 2005 lt/myRDFDategt
7
2. Decentralized Semantics
  • Traditional database techniques (e.g., LAV/GAV)
    rely on centralized schemas to integrate data
    sources
  • Not applicable to our context
  • Scale (upper ontologies?)
  • Churn
  • Autonomy

Date
m(Date) myDate
m(Date) yourDate
myDate
yourDate
8
Semantic Interoperability
Q2ltGUIDgtp/GUIDlt/GUIDgt FOR p IN T12 WHERE
p/Creator LIKE "Robi"
Q1ltGUIDgtp/GUIDlt/GUIDgt FOR p IN
/Photoshop_Image WHERE p/Creator LIKE "Robi"
Photoshop (own schema)
WinFS (known schema)
ltPhotoshop_Imagegt ltGUIDgt178A8CD8865lt/GUIDgt
ltCreatorgtRobinsonlt/Creatorgt ltSubjectgt ltBaggt
ltItemgt Tunbridge Wells lt/Itemgt
ltItemgtRoyal Councillt/Itemgt lt/Baggt
lt/Subjectgt lt/Photoshop_Imagegt
ltWinFSImagegt ltGUIDgt178A8CD8866lt/GUIDgt ltAuthorgt
ltDisplayNamegt Henry Peach Robinson
ltDisplayNamegt ltRolegtPhotographerlt/Rolegt
ltAuthorgt ltKeywordgt Tunbridge lt/Keywordgt
ltKeywordgtCouncillt/Keywordgt lt/WinFSImagegt
T12 ltPhotoshop_Imagegt ltGUIDgtfs/GUIDlt/GUIDgt
ltCreatorgt fs/Author/DisplayName
lt/Creatorgtlt/Photoshop_ImagegtFOR fs IN
/WinFSImage
  • ? Extending semantic interoperability techniques
    to decentralized settings

9
2.1 Peer Data Management Systems
escDate ? xapCreateDate
weather
article
  • Local pairwise mappings
  • Peer Data Management Systems (PDMS)
  • Pairwise mappings overcome global schema
    heterogeneity
  • Transitive closures on mapping operations

10
Problem Precision/Recall Tradeoff
  • Semantic Query routing
  • To whom shall I forward a query posed against my
    local schema?
  • Some (most) mappings will be (partially) faulty
  • Low expressive power of mappings
  • Automatic schema alignment techniques
  • Granularity of conceptualizations
  • Local query resolution
  • Low recall
  • Flooding (PDMS)
  • Low precision
  • Standard deductive integration is not sufficient
  • Uncertainty on mappings and conceptualizations
  • abductive reasoning (on transitive closures of
    mappings)

11
2.2. Probabilistic Message Passing
  • Link-based analysis of the PDMS
  • Mapping Cycles
  • Parallel Paths
  • ? Semantics as global agreement

m0
m1
m4
m5
m2
m3
q VS m3(m4(m0(q)))
12
Computing a Marginal for one cycle
observed
unknown
  • P(m0, m1, m2, m3, f0)
  • P(m0) P(m1) P(m2) P(m3) P(f0 m0, m1, m2, m3,)
  • P(m0 f0) ?m1, m2, m3 P(m0, m1, m2, m3 , f0)
    P(f0)-1
  • But feedbacks on different cycles are correlated
  • Need to express a global probabilistic model for
    the mapping graph

13
A Brief Intro to Factor-Graphs
  • g(x1, x2, x3, x4) fA(x1, x2)fB(x2, x3, x4)

14
Deriving PDMS Factor-Graphs
15
PDMS Factor-Graphs
  • Cyclic graph
  • Junction Tree? Clustering / Stretching of
    variables?
  • Not applicable (decentralization)
  • Iterative Sum-Product
  • Approximate results
  • How to perform iterative sum-product by message
    passing on the mapping graph?
  • Message passing in factor graph does not
    correspond to connectivity of mapping graph
  • We want to rely on decentralized computations
    only
  • Locality VS Globality of nodes in the factor
    graph
  • Mappings local
  • Feedback factor common, global knowledge
  • Observed feedback variables neighborhood

16
Embedded Message-Passing (1)
17
Embedded Message-Passing (2)
18
Sending Messages in the Mapping Graph
  • Message-Passing Schedules
  • Periodic
  • Lazy (piggybacking on query forwarding)
  • No message overhead

19
Implemented System
  • Schemas
  • Import from OWL (Web Ontology Language)
  • Mappings
  • KnowledgeWeb Ontology Alignment API
  • Import from RDF/XML
  • Automated on-the-fly creation
  • Comparison to standard alignments
  • Automatic derivation of quality measures
    P(mcorrect F) for the mappings using
    iterative message-passing
  • Per-Hop Forwarding Behaviors (Semantic Gossiping)

20
Some (Preliminary) Results Convergence
(undirected example graph, prior 0.7 delta 0.1)
21
Impact Of Cycle Length
(simple cycle, prior 0.5)
22
Fault-tolerance (faulty links)
(undirected example graph, prior 0.8 delta 0.1)
23
Preliminary Results EON (Alignment contest)
  • Worst-case scenario no prior knowledge
  • Set of 6 schemas on bibliographic data (approx.
    30-40 attributes)
  • 396 generated attribute mappings (84 incorrect)

24
2.3. Semantic Gossiping
  • Selectively reformulate queries through mapping
    links
  • Semantic disances
  • Cycles analysis (?)
  • Results analysis
  • Syntactic distance
  • Lost predicates

pTitle ?CreatureJoe (R5)
X
pTitle ?CreatorJoe (R3)
pTitle ?AuthorJoe (R2)
pTitle ?CreatorJoe (R4)
pTitre ?AuteurJoe (R1)
X
???AuthorJoe (R4))
25
Self-Organization
  • Two types of self-organization
  • Static network
  • Self-organizing dissemination of queries (?)
  • Dynamic network
  • Self-organizing network of mappings
  • Idea
  • Quality evaluation of mappings through Semantic
    Gossiping
  • Drop low quality links
  • Reorganized network leads to different quality
    evaluation
  • Dynamic network changes
  • ? self-organizing, self-referential semantic
    network

26
Some Results (1)
Sensitivity to TTL (cycle analysis only, 25
schemas, 4 concepts)
27
Some Results (2)
Scalability (results analysis only, 4 concepts,
TTL3, misclassification rate0.1, 2
documents/peer on avg.)
28
2.4. Semantic Interoperability in the Large
  • Do we have enough (good) mappings?
  • Modeling semantic interoperability
  • The semantic connectivity graph
  • Idea as for physical network analyses, define a
    connectivity layer
  • Unweighted, non-redundant version of the
    Schema-to-schema graph
  • Observation
  • Peers in a set Ps are semantically interoperable
    iff Ss is strongly connected, with Ss ? s ?p ?
    Ps, p?s
  • Schema-to-Schema Graph
  • Logical model
  • Directed
  • Weighted
  • Redundant

29
Analyzing Semantic Interoperability in the Large
  • Analyzing semantic interoperability in
    large-scale, decentralized networks
  • Percolation theory for directed graphs
  • Based on recent graph-theoretic frameworks
  • Random graphs with specific degree distributions
    pjk, clustering coefficients cc and
    bidirectionality coefficient bc
  • Necessary condition for semantic interoperability
    in the large ?j,k (jk-j(bccc)-k)pjk 0
  • Excellent approximations of the size of
    semantically interoperable clusters in the graph
  • Analysis Sequence Retrieval System

30
3. Applications
  • GridVine
  • Self-organizing semantic overlay network
  • PicShark
  • Self-organizing middleware to export pictures and
    create mappings

31
3.1 GridVine
  • Building large-scale semantic systems
  • Self-organizing semantic overlay network

32
Semantic Mediation Layer
Semantic Mediation Layer
Correlated / Uncorrelated
Overlay Layer
Correlated / Uncorrelated
Physical layer
33
Features
  • Based on the P-Grid P2P structure
  • Distributed Hash Table developed at EPFL
  • Self-organized, scalable, decentralized
  • Resolves key-based searches in O (log(n)) even
    for unbalanced trees
  • Semantic Web compliant
  • RDF triples, RDFS schemas, OWL mappings
  • Structured searches
  • RDQL queries
  • Semantic Gossiping
  • Fosters semantic interoperability

34
GridVine Annotating Content
35
Decentralized Query Resolution Overview
36
3.2 PicShark
  • Where do the translation links come from?
  • Middleware for sharing semi-structured metadata
    attached to pictures and creating translation
    links

60 moments
PicShark
(Distributed) Hashtable (e.g., GridVine)
Features Extractor
Insert
PSP
Retrieve
Metadata Extractor
XMP
Information Tracker
WinFS
37
Features
  • Self-Organization of mappings
  • Based on low-level features extracted from
  • Picture (color moment, textures)
  • Structured Metadata (lexicographical analysis)
  • Self-Organization of annotations
  • Probabilistic propagation of annotations between
    similar individuals
  • Self-Organization of query propagation
  • Schema distance based on probabilistic
    subsumption
  • Propagation within a certain diameter
  • Driven by user interaction
  • Scalable
  • Computationally expensive operations are local at
    the peers
  • Only simple in-network operations (look-ups)
  • (on-going) collaborative effort with Microsoft
    Research Asia

38
PicShark Prototype
39
4. Conclusions
  • Fundamental issue Interoperability in large
    scale (semi) structured environments
  • Content Sharing
  • Information search
  • Semantic Web?
  • Traditional techniques are not sufficient
  • Scale
  • Autonomy
  • Uncertainty
  • Self-organizing, decentralized stochastic
    processes
  • Data Indexation
  • Data Integration
  • Query dissemination

40
Some References (1)
Semantic Gossiping A Framework for Semantic
Gossiping Karl Aberer, Philippe Cudré-Mauroux,
Manfred Hauswirth SIGMOD Record, 31(4), December
2002. The Chatty Web Emergent Semantics through
Gossiping Karl Aberer, Philippe Cudré-Mauroux,
Manfred Hauswirth, International World Wide Web
Conference (WWW 03). Probabilistic
Message-Passing in Peer-Data Management
Systems Philippe Cudré-Mauroux, Karl Aberer, and
Andras Feher International Conference on Data
Engineering (ICDE 06). Self-Organizing
Semantics Start making sense The Chatty Web
approach for global semantic agreements, Karl
Aberer, Philippe Cudré-Mauroux, Manfred
Hauswirth, Journal of Web Semantics, 1 (1),
December 2003. Emergent Semantics Principles and
Issues Karl Aberer, Philippe Cudré-Mauroux and
Aris M. Ouksel (editors) Tiziana Catarci
Mohand-Said Hacid, Arantza Illarramendi, Vipul
Kashyap, Massimo Mecella, Eduardo Mena, Erich J.
Neuhold, Olga De Troyer, Thomas Risse, Monica
Scannapieco, Fèlix Saltor, Luca de Santis,
Stefano Spaccapietra, Steffen Staab and Rudi
Studer International Conference on Database
Systems for Advanced Applications (DASFAA 04).
41
Some References (2)
Semantic Interoperability In the Large A
Necessary Condition For Semantic Interoperability
In The LargePhilippe Cudré-Mauroux and Karl
AbererInternational Conference on Ontologies,
DataBases, and Applications of Semantics (ODBASE
04). Analyzing Semantic Interoperability in
Bioinformatic Database Networks Philippe
Cudré-Mauroux, Julien Gaugaz, Adriana Budura and
Karl Aberer Semantic Network Analysis (SNA
05). GridVine Building Internet-Scale Semantic
Overlay Networks Karl Aberer, Philippe
Cudré-Mauroux, Manfred Hauswirth and Tim van
Pelt International Semantic Web Conference (ISWC
04). Semantic Overlay Netwoks (tutorial) Karl
Aberer and Philippe Cudré-Mauroux International
Conference on Very Large Data Bases (VLDB
05). more references at http//lsirpeople.e
pfl.ch/pcudre/
42
Questions?
Write a Comment
User Comments (0)
About PowerShow.com