Title: Semantic Mediation of Scientific Data via LogicBased Data Federation Software
1Semantic Mediation of Scientific Data via
Logic-Based Data Federation Software
- Amarnath Gupta
- Bertram Ludäscher
- Reagan Moore
- San Diego Supercomputer Center
- University of California, San Diego
2Information Integration / Mediation
- Goal
- combine data from different sources s.t. the
integrated whole is more than the sum of its
isolated parts - gt SDSC/CSE MIX project (Mediation of Information
in XML) - Standard Scenarios
- C2B, e.g. comparison shopping
- AddAll IntegratedView(amazon, barnesnoble,
...) - B2B, e.g. marketplaces
- Virt_Market IntegratedView(supplier_1, ...
supplier_n) - C2M, e.g. home-buyer
- Full_Picture IntegratedView(Realtor, Crime,
Schools, ...)
One-World Mediation e.g. join on ISBN
Simple Multiple-World Mediation e.g. join on ZIP
3 MIX Mediation Challenges
- MIX Mediator Architecture (middleware)
- wrappers wrap different data into common format
(XML) - mediator combines sources XML views into
IntegratedView - MIX Mediator Components
- declarative mediator view definition language
- XMAS (XML Matching And Structuring) language,
algebra, and first prototype 1999
SIGMOD99,EDBT00,... - query composition and rewriting esp. with limited
source capabilities - on-demand (lazy) query processing of virtual
XML docs (DOM-VXD) - Blended Browsing and Querying user interface
(BBQ)
4 New MIX Challenges from Scientific Applications
- Complex Data (S2S)
- SDSCs Scientific Data Applications
(current/planned, e.g. Neurosciences SciDAC/SDM,
NCMIR, NIH BIRN, Earth sciences, ...) show that
syntactic/structural integration is insufficient
for ... - Complex Multiple-World Mediation Problems
- complex, disjoint, seemingly unrelated data
- hidden semantics in complex, indirect
relationships - gt Semantic (aka Model/Knowledge-Based) Mediation
- lift mediation to the level of conceptual models
(CMs) - use domain experts knowledge formalized as rules
over CMs - gt Specialized Extensions
- temporal, geospatial, statistical, DQ/accuracy...
operations - gt Extend Mediation Scope and Power via Deductive
Rules
5A Neuroscience Question
What is the cerebellar distribution of rat
proteins with more than 70 homology with human
NCS-1? Any structure specificity? How about other
rodents?
6 Example for Formalizing Domain KnowledgeDomain
Map (Ontology) for SYNAPSE and NCMIR
- A domain map comprises
- Description Logic facts ...
- - concepts ("classes")
- - roles ("associations")
- derived properties ...
- ... expressed as logic rules
- - (e.g. F-logic)
7 Domain Map Refinement
... source can register new concepts at the
mediator ...
8Semantic Annotation Tool for Domain Scientists
9Extended Mediator Architecture for Semantic
Mediation
USER/Client
CM (Integrated View)
Domain Map DM
Integrated View Definition IVD
Mediator Repository DB
CM Plug-In
CM Queries Results (exchanged in XML)
first results demos SSDBM00
VLDB00 ICDE01 NIH-HBP01
Logic API (capabilities)
S3
S2
10ANATOM Domain Map with Registered Data
11 Query Processing
12Mediator System Architecture
13Mediation ServicesSource Registration (System
Issues)
14Mediation Services Source Registration
(Semantics Issues)
- Domain Map Registration
- provide concept space/ontology
- as a private object (myANATOM)
- merge with others (give semantic bridges)
- and check for conflicts
- Conceptual Model Registration
- schema classes, associations, attributes
- domain constraints
- put data into context (linking data to the
domain map)
15Mediation Services Client Registration
16Other Existing Infrastructure
- Transparent Access to Remote Data Collections
Storage Resource Broker (SRB) and Metadata
Catalog (MCAT) - Production-Level Software
- PPDG interface to LBNL Storage Manager,
collection creation, replication management - Use of manual and automatic wrapper technology
(Minerva, Roadrunner, V. Crescenzi, Universita di
Roma Tre) - gt XWrap Elite
17SRB and the Particle Physics Data Grid
S-Commands
S-Commands
Wisc Client 2
SRB Server _at_Wisc
Wisc Client 1
SRB Server _at_LBL
SRB Server _at_LBL
Disk cache
file caching
esrb.driver
esrb.driver
IPC
IPC
Stage() purge() fileStatus()
file purging
File caching request
Stage() purge() fileStatus()
HRM
FC
esrb.server
Stage() purge() fileStatus()
18Year 1 Deliverables
- define interface metadata format (Critchlow)
- extend XWrap to generate wrappers using the
interface metadata description instead of
requiring human interaction (GT) - develop a canonical XML-based query and response
format as a dynamic interface between query
engine and wrappers (Critchlow, GT, SDSC) - communication via agent protocols? How about
using digital library infrastructure (e.g. Simple
Digital Library Interoperability Protocol, SDLIP) - use extended XWrap to create wrappers for the
genomics domain for evaluation (GT) - extend the SDSC query and metadata architecture
to interoperate with the LLNL DataFoundry (SDSC,
Critchlow) - ... interoperation at the wrapper level Minerva
wrappers, XWrap
19References
- Model-Based Mediation with Domain Maps, B.
Ludäscher, A. Gupta, M. E. Martone, 17th Intl.
Conference on Data Engineering (ICDE),
Heidelberg, Germany, IEEE Computer Society, April
2001. - Model-Based Information Integration in a
Neuroscience Mediator System, B. Ludäscher, A.
Gupta, M. E. Martone, demonstration track, 26th
Intl. Conference on Very Large Databases (VLDB),
Cairo, Egypt, September 2000. - Knowledge-Based Integration of Neuroscience Data
Sources, A. Gupta, B. Ludäscher, M. E. Martone,
12th Intl. Conference on Scientific and
Statistical Database Management (SSDBM), Berlin,
Germany, IEEE Computer Society, July 2000.