A Uniform Approach to Data and Workflow Integration for the Life Sciences - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

A Uniform Approach to Data and Workflow Integration for the Life Sciences

Description:

A Uniform Approach to. Data and Workflow Integration. for the ... data warehousing (data provenance, incr. view maintenance) schema matching (model independent) ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 29
Provided by: Luc75
Category:

less

Transcript and Presenter's Notes

Title: A Uniform Approach to Data and Workflow Integration for the Life Sciences


1
A Uniform Approach to Data and Workflow
Integration for the Life Sciences
  • L. Zamboulis1,2,3, N. Martin1,3, A.
    Poulovassilis1,3
  • lucas,nigel,ap_at_dcs.bbk.ac.uk
  • 1School of Computer Science Inf. Systems,
    Birkbeck
  • 2Dept. of Biochemistry and Molecular Biology, UCL
  • 3London Knowledge Lab

2
Projects
  • AutoMed (EPSRC 2001-2003 still running)
  • Birkbeck and Imperial College
  • http//www.doc.ic.ac.uk/automed
  • Framework for the integration of heterogeneous
    data sources (mediator approach)
  • ISPIDER (BBSRC 2005-2007)
  • Birkbeck, UCL, Manchester, EBI
  • http//www.ispider.manchester.ac.uk
  • Integrated Grid platform of proteomic resources

3
Outline
  • AutoMed
  • Data integration approaches
  • The BAV approach
  • The AutoMed system
  • ISPIDER
  • BioMap integration
  • ISPIDER integration
  • AutoMed Taverna interoperation

4
Data Integration
  • Global-As-View (GAV) approach describe GS
    constructs with view definitions over LSi
    constructs
  • Local-As-View (LAV) approach describe LSi
    constructs with view definitions over GS
    constructs

5
GAV Example
  • student(id,name,left,degree) x,y,z,w
    ?x,y,z,w,_??ug ? (?x,y,z,_??phd ?
    w phd)
  • monitors(sno,id)
  • x,y (?y,_,_,_,x??ug ?
    ?x,_,_,_??phd) ? ?x,y??supervises
  • staff(sno,sname,dept)
  • x,y,z ?x,y,z??supervisor
  • ? (?x,y??tutor
  • ? ?x,_,_??supervisor)

6
Both-As-View (BAV) Approach
  • Schema transformation approach
  • For each pair (LSi,GS) incrementally modify
    LSi/GS to match GS/LSi

7
BAV Example
  • Transformation pathway consists of primitive
    transformations
  • Pathway contains both GAV LAV definitions
  • Transformations are automatically reversible
  • Metadata in AutoMed Repository

8
AutoMed
  • Heterogeneous data integration system
  • AutoMed advantages
  • Subsumes GAV, LAV and GLAV
  • Handles heterogeneity easily extensible
  • Virtual/materialised/hybrid integration
  • Schema evolution
  • Available components
  • GUI
  • data warehousing (data provenance, incr. view
    maintenance)
  • schema matching (model independent)
  • schema evolution
  • semi-automatic XML transformation/integration
  • P2P infrastructure
  • Grid support OGSA-DAI/DQP
  • Parallel distributed query processing

9
Outline
  • AutoMed
  • Data integration approaches
  • The BAV approach
  • The AutoMed system
  • ISPIDER
  • Review
  • BioMap integration
  • ISPIDER integration
  • AutoMed Taverna interoperation

10
ISPIDER
  • Produce an integrated platform for biologists
  • Laboratories across the world produce vast
    amounts of experimental data
  • Combining efforts will result in added value
  • Challenges
  • Data are overlapping and heterogeneous
  • Data rapidly updated/modified/evolved
  • Physical distance between repositories
  • Need for processing power

11
ISPIDER Objectives
12
ISPIDER Objectives
13
ISPIDER Objectives
14
ISPIDER Objectives
15
ISPIDER Objectives
16
BioMap Integration
  • Relational?XML(automatic)
  • Schema conformance data cleansing(manual)
  • XML integration(automatic)
  • XML or SQL materialisation

17
ISPIDER integration
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

OGSA-DQP
OGSA-DQP
OGSA-DQP
OGSA-DQP
QES
QDQS
QES
QES
DB
DB
DB
Distributed
Query Processor
OGSA-DAI
OGSA-DAI
OGSA-DAI
GDS
GDS
GDS
OQL
OQL
query
result
AutoMed
Wrappers
AutoMed DAI
AutoMed DAI
AutoMed DAI
AutoMed DQP
wrapper
wrapper
wrapper
wrapper
IQL
IQL
AutoMed
AutoMed
AutoMed
query
result
Schema
Schema
Schema
transformation pathways
AutoMed
Query Processor
Global
AutoMed
AutoMed Schema
IQL
IQL
Metadata
Repository
query
result
18
ISPIDER integration
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

OGSA-DQP
OGSA-DQP
OGSA-DQP
OGSA-DQP
QES
QDQS
QES
QES
DB
DB
DB
Distributed
Query Processor
OGSA-DAI
OGSA-DAI
OGSA-DAI
GDS
GDS
GDS
OQL
OQL
query
result
AutoMed
Wrappers
AutoMed DAI
AutoMed DAI
AutoMed DAI
AutoMed DQP
wrapper
wrapper
wrapper
wrapper
IQL
IQL
AutoMed
AutoMed
AutoMed
query
result
Schema
Schema
Schema
transformation pathways
AutoMed
Query Processor
Global
AutoMed
AutoMed Schema
IQL
IQL
Metadata
Repository
query
result
19
ISPIDER integration
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

OGSA-DQP
OGSA-DQP
OGSA-DQP
OGSA-DQP
QDQS
QES
QES
QES
DB
DB
DB
Distributed
Query Processor
OGSA-DAI
OGSA-DAI
OGSA-DAI
GDS
GDS
GDS
OQL
OQL
query
result
AutoMed
Wrappers
AutoMed DAI
AutoMed DAI
AutoMed DAI
AutoMed DQP
wrapper
wrapper
wrapper
wrapper
IQL
IQL
AutoMed
AutoMed
AutoMed
query
result
Schema
Schema
Schema
transformation pathways
AutoMed
Query Processor
Global
AutoMed
AutoMed Schema
IQL
IQL
Metadata
Repository
query
result
20
ISPIDER integration
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

OGSA-DQP
OGSA-DQP
OGSA-DQP
OGSA-DQP
QES
QDQS
QES
QES
DB
DB
DB
Distributed
Query Processor
OGSA-DAI
OGSA-DAI
OGSA-DAI
GDS
GDS
GDS
OQL
OQL
query
result
AutoMed
Wrappers
AutoMed DAI
AutoMed DAI
AutoMed DAI
AutoMed DQP
wrapper
wrapper
wrapper
wrapper
IQL
IQL
AutoMed
AutoMed
AutoMed
query
result
Schema
Schema
Schema
transformation pathways
AutoMed
Query Processor
Global
AutoMed
AutoMed Schema
IQL
IQL
Metadata
Repository
query
result
21
ISPIDER integration
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

OGSA-DQP
OGSA-DQP
OGSA-DQP
OGSA-DQP
QES
QDQS
QES
QES
DB
DB
DB
Distributed
Query Processor
OGSA-DAI
OGSA-DAI
OGSA-DAI
GDS
GDS
GDS
OQL
OQL
query
result
AutoMed
Wrappers
AutoMed DAI
AutoMed DAI
AutoMed DAI
AutoMed DQP
wrapper
wrapper
wrapper
wrapper
IQL
IQL
AutoMed
AutoMed
AutoMed
query
result
Schema
Schema
Schema
transformation pathways
AutoMed
Query Processor
Global
AutoMed
AutoMed Schema
IQL
IQL
Metadata
Repository
query
result
22
Service Reconciliation
  • Plethora of offered bioinformatics services
  • ? impedes service composition ? service
    discovery used to reduce the search space
  • Semantically compatible services not able to
    interoperate
  • service technologies
  • differences in data model, modelling, data types
  • ? need for service reconciliation.
  • Reconciliation problem amplified due to
  • simple strings used rather than complex types
  • service providers disinclined to supply
    annotations

23
Proposed Approach
  • Service reconciliation via mediation using the
    AutoMed system
  • Requirements
  • Wide coverage of interoperability issues
  • Scalability of approach, promote reusability
  • Static/dynamic mediation

24
Heterogeneity Types
  • Data model heterogeneity
  • different data models (e.g. legacy flat files and
    XML)
  • different schema types (e.g. DTD and XML Schema)
  • a service producing/consuming XML data may not
    have an XML schema
  • Semantic heterogeneity
  • use of different terminology
  • describing the same information at different
    levels of granularity
  • Schematic heterogeneity
  • modelling the same information in different ways.
  • amplified in XML due to hierarchical structure
    and elements vs. attributes
  • Data type heterogeneity
  • use of different primitive data types, e.g. int
    and varchar
  • use of different units, e.g. miles and yards

25
Service Reconciliation By Schema Transformation
  • Heterogeneity types
  • service technology
  • data model
  • semantic
  • schematic
  • primitive data type/scaling
  • Assumption workflow tool addresses service
    technology reconciliation

26
Multiple Ontologies
  • In a setting where
  • X1 corresponds to O1 using C1
  • X2 corresponds to O2 using C2
  • there is a (direct or indirect) AutoMed pathway
    O1?O2
  • Automatically produce new set of correspondences
    C1 for X1 and O2 (using query reformulation)
  • Setting is now identical to single ontology
    setting.
  • Proviso C1 syntax must conform to our
    correspondences language.

27
Architectures
  • A workflow tool can use our approach either
    dynamically or statically
  • Mediation service.
  • Workflow tool invokes service S1 and receives its
    output
  • Workflow tool submits output of S1, the schema of
    S2 and the two sets of correspondences to an
    AutoMed service.
  • The AutoMed service transforms the output of S1
    to a suitable input for consumption by S2.
  • Shim generation.
  • AutoMed is used to generate a shim for services
    S1 and S2.
  • XMLDSS schema transformation algorithm
    (currently) tightly coupled with AutoMed ?
    functionality exported as single XQuery query.

28
References
  • AutoMed
  • P.J. McBrien and A. Poulovassilis, Data
    Integration by Bi-Directional Schema
    Transformation Rules, Proceedings of
    International Conference on Data Engineering
    (ICDE), 2003
  • ISPIDERM. Maibaum, L. Zamboulis, G. Rimon, N.
    Martin, A. Poulovassilis, Cluster based
    Integration of Heterogeneous Biological Databases
    using the AutoMed toolkit, Proc. Data Integration
    in the Life Sciences (DILS), 2005
  • L. Zamboulis, H. Fan, K. Belhajjame, J. Siepen,
    A. Jones, N. Martin, A. Poulovassilis, S.
    Hubbard, S. M. Embury, N. W. Paton, Data Access
    and Integration in the ISPIDER Proteomics Grid,
    Proc. Data Integration in the Life Sciences
    (DILS), 2006
  • L. Zamboulis, N. Martin, A. Poulovassilis,
    Bioinformatics Service Reconciliation By
    Heterogeneous Schema Transformation, Proc. Data
    Integration in the Life Sciences (DILS), 2007
Write a Comment
User Comments (0)
About PowerShow.com