Title: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies
1IXodusa knowledge discovery process based on the
SIMDAT-Pharma GRID technologies
- Richard Kamuzinzi
- Université Libre de Bruxelles Bioinformatics
- June, 5 7th 2007
- World Wide Workflow GRID ASIA 2007
- Singapore
2SIMDAT Facts
- EU Information Society Technologies (IST)
- GRID Project
- Duration 4 years
- Start date September 1st 2004
- 26 partners
3Scope
- Product and Process Development (automobiles,
aircraft, drugs, meteorological services) is - Complex
- Involves several independent organizations at
different locations - Complexity management in one site is too
expensive gt cost/risk sharing with partners gt
GRID
4Strategic objectives
- to test and enhance Data Grid technology for
product development and production process
design, - to develop federated versions of problem-solving
environments by leveraging enhanced Grid
services, - to exploit Data Grids as a basis for distributed
knowledge discovery, - to promote defacto standards for these enhanced
Grid technologies across a range of disciplines
and sectors as well as - to raise awareness of the advantages of Data
Grids in important industrial sectors
5Project organization (SIMDAT-Pharma)
NEC, GSK, Inpharmatica, ULB, Fraunhofer
SCAI-Bio and UKA
6IXodus The scientific problem
- Lyme disease significant source of human and
animal pathology in temperate areas of the world
(identified in 90s) - Caused by the bite of a tick of genus IXodes,
infected by the pathogen bacterium Borrelia
burgdorferi - the study of host-parasite interactions is an
active research as 20 ticks have been found
infected by the bacterium - IXodus scientific protocol designed to deal with
characterisations of genes expressed in the
salivary gland of the tick IXodes ricinus at
various stage of the host-parasite interaction
process
7IXodus Workflow design (1)
- From IXodus scientific protocol to IXodus
workflow (WF) design, we identify 2 uses cases - New cDNA sequences the workflow is daily
feeded with a batch of nucleic sequences from the
systematic sequencing of thousands of salivary
gland cDNAs - Databank update whenever a new version of
relevant biological databank appears, the core
workflow analysis is re-enacted to discover
potentially new information
8IXodus design (2) Use Case 1
9IXodus design (3) Use Case 2
10IXodus Implementation
- Workflow technology platform InforSenseTM KDE
- Implementation is tightly coupled with the
deployment environment, which is mainly driven by
2 kind of constraints - GRID approach
- Semantic Web (SW) approach
11IXodus implementation - The test-bed GRID
approach
Knowledge DB IXodus
12IXodus implementation - The test-bed SW approach
Semantic Broker
Service advertising
13IXodus implementation InforSense KDE The
complete Workflow
14IXodus implementation InforSense KDE User
sequences gathering
15IXodus implementation InforSense KDE
Management of sequences overlapping
16IXodus implementation InforSense KDE Main
analysis flow (Bioinformatics tools)
17IXodus implementation InforSense KDE Service
instance selection launching
18IXodus - General benefits
- Workflow tool maturity design of complex WF to
support demanding problem in a reasonable
delivery-time is a reality (RWD vs. RAD) - WF on GRID approach is really valuable and
provides the confidence we need to front the
data/services tsunami in Life sciences the
good news is
19IXodus - General benefits (2)
...thanks to WF technologies, the scientists no
more scares the vertiginous beast
(data/services explosion)
20IXodus Remaining challenges
- B2A Grids we still need precise understanding of
strategic benefits from both (win-win) side - WF technologies need better distinction between
abstract WF and operational WF - How to decouple?
- Runtime service selection using the concept of
rules? - At design phase the designer would appreciate
semantics approach to search for services - From WF to Service
- Partial (?args) vs. Complete(?args)
- Different profiles of user
- From WF to UI
- At design phase need to define how WF actors
interact with the whole system - To leverage the WF log in order to generate
textual information that would support scientific
papers/notebooks writing (who, service_name,
service_version, database_version, )
21SIMDAT- Major outcomes to expect
- SIMDAT approach will provide state-of-the-art
components - To enable industry-strength environment for
e-Science activities - To support the academia/industry collaborations
in RD activities (B2B B2A Grids) - B2A Grids how the win-win model is precisely
configured? - To help build up virtual organisations that
federate data, services and scientific expertise
22- Web http//www.simdat.org
- Contact richard.kamuzinzi_at_ulb.ac.be
- Acknowledgments
- co-author Robert Herzog, Université Libre de
Bruxelles (ULB) - Scientific expert Valérie Ledent, ULB
- Edmond Godfroid Bernard Couvreur Laboratory of
Applied Genetics, ULB - SIMDAT colleagues Joseph Mavor (ULB), Falk
Zimmermann (NEC), Changtao Qu (NEC), Nabeel Azam
(InforSense), Moustapha Ghanem (InforSense), Kai
Kumpf (SCAI-Bio)