Semantic Mediation of Scientific Data via LogicBased Data Federation Software - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Semantic Mediation of Scientific Data via LogicBased Data Federation Software

Description:

A Neuroscience Question. What is the cerebellar distribution of rat ... Knowledge-Based Integration of Neuroscience Data Sources, A. Gupta, B. Lud scher, ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 20
Provided by: bertramlu
Category:

less

Transcript and Presenter's Notes

Title: Semantic Mediation of Scientific Data via LogicBased Data Federation Software


1
Semantic Mediation of Scientific Data via
Logic-Based Data Federation Software
  • Amarnath Gupta
  • Bertram Ludäscher
  • Reagan Moore
  • San Diego Supercomputer Center
  • University of California, San Diego

2
Information Integration / Mediation
  • Goal
  • combine data from different sources s.t. the
    integrated whole is more than the sum of its
    isolated parts
  • gt SDSC/CSE MIX project (Mediation of Information
    in XML)
  • Standard Scenarios
  • C2B, e.g. comparison shopping
  • AddAll IntegratedView(amazon, barnesnoble,
    ...)
  • B2B, e.g. marketplaces
  • Virt_Market IntegratedView(supplier_1, ...
    supplier_n)
  • C2M, e.g. home-buyer
  • Full_Picture IntegratedView(Realtor, Crime,
    Schools, ...)

One-World Mediation e.g. join on ISBN
Simple Multiple-World Mediation e.g. join on ZIP
3
MIX Mediation Challenges
  • MIX Mediator Architecture (middleware)
  • wrappers wrap different data into common format
    (XML)
  • mediator combines sources XML views into
    IntegratedView
  • MIX Mediator Components
  • declarative mediator view definition language
  • XMAS (XML Matching And Structuring) language,
    algebra, and first prototype 1999
    SIGMOD99,EDBT00,...
  • query composition and rewriting esp. with limited
    source capabilities
  • on-demand (lazy) query processing of virtual
    XML docs (DOM-VXD)
  • Blended Browsing and Querying user interface
    (BBQ)

4
New MIX Challenges from Scientific Applications
  • Complex Data (S2S)
  • SDSCs Scientific Data Applications
    (current/planned, e.g. Neurosciences SciDAC/SDM,
    NCMIR, NIH BIRN, Earth sciences, ...) show that
    syntactic/structural integration is insufficient
    for ...
  • Complex Multiple-World Mediation Problems
  • complex, disjoint, seemingly unrelated data
  • hidden semantics in complex, indirect
    relationships
  • gt Semantic (aka Model/Knowledge-Based) Mediation
  • lift mediation to the level of conceptual models
    (CMs)
  • use domain experts knowledge formalized as rules
    over CMs
  • gt Specialized Extensions
  • temporal, geospatial, statistical, DQ/accuracy...
    operations
  • gt Extend Mediation Scope and Power via Deductive
    Rules

5
A Neuroscience Question
What is the cerebellar distribution of rat
proteins with more than 70 homology with human
NCS-1? Any structure specificity? How about other
rodents?
6
Example for Formalizing Domain KnowledgeDomain
Map (Ontology) for SYNAPSE and NCMIR
  • A domain map comprises
  • Description Logic facts ...
  • - concepts ("classes")
  • - roles ("associations")
  • derived properties ...
  • ... expressed as logic rules
  • - (e.g. F-logic)

7
Domain Map Refinement
... source can register new concepts at the
mediator ...
8
Semantic Annotation Tool for Domain Scientists
9
Extended Mediator Architecture for Semantic
Mediation
USER/Client
CM (Integrated View)
Domain Map DM
Integrated View Definition IVD
Mediator Repository DB
CM Plug-In
CM Queries Results (exchanged in XML)
first results demos SSDBM00
VLDB00 ICDE01 NIH-HBP01
Logic API (capabilities)
S3
S2
10
ANATOM Domain Map with Registered Data
11
Query Processing
12
Mediator System Architecture
13
Mediation ServicesSource Registration (System
Issues)
14
Mediation Services Source Registration
(Semantics Issues)
  • Domain Map Registration
  • provide concept space/ontology
  • as a private object (myANATOM)
  • merge with others (give semantic bridges)
  • and check for conflicts
  • Conceptual Model Registration
  • schema classes, associations, attributes
  • domain constraints
  • put data into context (linking data to the
    domain map)

15
Mediation Services Client Registration
16
Other Existing Infrastructure
  • Transparent Access to Remote Data Collections
    Storage Resource Broker (SRB) and Metadata
    Catalog (MCAT)
  • Production-Level Software
  • PPDG interface to LBNL Storage Manager,
    collection creation, replication management
  • Use of manual and automatic wrapper technology
    (Minerva, Roadrunner, V. Crescenzi, Universita di
    Roma Tre)
  • gt XWrap Elite

17
SRB and the Particle Physics Data Grid
S-Commands
S-Commands
Wisc Client 2
SRB Server _at_Wisc
Wisc Client 1
SRB Server _at_LBL
SRB Server _at_LBL
Disk cache
file caching
esrb.driver
esrb.driver
IPC
IPC
Stage() purge() fileStatus()
file purging
File caching request
Stage() purge() fileStatus()
HRM
FC
esrb.server
Stage() purge() fileStatus()
18
Year 1 Deliverables
  • define interface metadata format (Critchlow)
  • extend XWrap to generate wrappers using the
    interface metadata description instead of
    requiring human interaction (GT)
  • develop a canonical XML-based query and response
    format as a dynamic interface between query
    engine and wrappers (Critchlow, GT, SDSC)
  • communication via agent protocols? How about
    using digital library infrastructure (e.g. Simple
    Digital Library Interoperability Protocol, SDLIP)
  • use extended XWrap to create wrappers for the
    genomics domain for evaluation (GT)
  • extend the SDSC query and metadata architecture
    to interoperate with the LLNL DataFoundry (SDSC,
    Critchlow)
  • ... interoperation at the wrapper level Minerva
    wrappers, XWrap

19
References
  • Model-Based Mediation with Domain Maps, B.
    Ludäscher, A. Gupta, M. E. Martone, 17th Intl.
    Conference on Data Engineering (ICDE),
    Heidelberg, Germany, IEEE Computer Society, April
    2001.
  • Model-Based Information Integration in a
    Neuroscience Mediator System, B. Ludäscher, A.
    Gupta, M. E. Martone, demonstration track, 26th
    Intl. Conference on Very Large Databases (VLDB),
    Cairo, Egypt, September 2000.
  • Knowledge-Based Integration of Neuroscience Data
    Sources, A. Gupta, B. Ludäscher, M. E. Martone,
    12th Intl. Conference on Scientific and
    Statistical Database Management (SSDBM), Berlin,
    Germany, IEEE Computer Society, July 2000.
Write a Comment
User Comments (0)
About PowerShow.com