Scientific workflow management in the VL-e framework - PowerPoint PPT Presentation

About This Presentation
Title:

Scientific workflow management in the VL-e framework

Description:

SP1.2 AID-Food informatics-IvI ... Provide UML based analysis diagrams. SP2.5 side (Wibi, Zhiming) SP1.5 side (Frans and Han) ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 25
Provided by: fnwi5
Category:

less

Transcript and Presenter's Notes

Title: Scientific workflow management in the VL-e framework


1
Scientific workflow management in the VL-e
framework
Sub-program 2.5 Department of Computer
ScienceUniversiteit van Amsterdam
2
Outline
  • Background
  • Scientific experiments, Workflow and e-Science
    framework
  • Workflow management in the VL-e framework
  • The approach followed review the related work
  • Application use cases and workflow support
  • Future work

3
Scientific experiments e-Science
  • Complex experiments
  • have complex processes
  • require interdisciplinary expertise
  • require large scale resources

Grid high level support
Scientific workflows
4
Scientific Workflow Management Systems in an
e-Science environment
Domain specific Applications
  • Functionalities
  • Automating experiment routines
  • Rapid prototyping of experimental computing
    systems
  • Hiding integration details between resources
  • Managing experiment lifecycle
  • Cross different layers of middleware for
    managing
  • Data
  • Computing
  • Information
  • Knowledge.

In the VL-e project the targeted e-science
framework is
Workflow Management system
Knowledge
Information
e-Science framework
Computing tasks
Data management
Generic Grid middleware
Grid infrastructure
5
VL-e workflow wish list
  • A list of 36 points was established to
    characterise the ideal workflow for the VL-e
  • Classified in 4 categories
  • Functionality and Capability
  • User interface characteristics
  • Run time capabilities
  • Software engineering aspects
  • VL-e SIG Workflow meeting Jan 11th, 2005,
    10001130, H220 (NIKHEF building)
  • Present Belleman, Belloum, Bouwhuis, Breanndán,
    Kaletas, Konijnenburg, Marshall, Rauwerda, Sterk,
    Sluiter, Terpstra, Vasunin, wibisono, Yakali.

6
Prioritize the workflow requirements based on
the VL-e Applications
  • A list of 12 points was established to
    characterise the practical workflow for VL-e
  • Classified in 4 categories
  • Application domains Model
  • Engineering
  • Underlying middleware
  • Workflow management system
  • Composition/ Engine (runtime issues)/User support
  • VL-e sub-program 2.5 in collaboration with SP1.X
    developers
  • SP1.X contributors Belleman, Klous,
    Konijnenburg, Marshall, Rauwerda, Sluiter,
    Terpstra,

7
Application use cases and workflow requirements
  • Application use cases
  • Different rounds a series of meetings
  • Distinguish workflow requirement
  • Summary
  • From the resource perspective
  • To support legacy tools
  • standard middleware, e.g., web/grid services
  • To be able to invoke resources from different
    systems
  • Provides a rich library of workflow components
  • From the application process perspective
  • To efficiently manage parallel processes/tasks in
    an experiment (Job farming)
  • To efficiently explore large parameter space
    (Parameter sweep)
  • To support knowledge based information processing
    (semantic level data integration).
  • From the perspective of using a SWMS
  • To provide a friendly user interface (preferably
    a GUI)
  • To support the development of new workflow
    components ( java, scripts, C, documentation
    and support)
  • To be able to execute tasks on distributed
    resources (clusters or Grid)
  • To be stable at runtime
  • To be able to interoperate with different
    workflow management systems.

8
Workflow management in VL-e
  • First prototype
  • VLAM-G
  • Shortcoming (GUI, control flow, monitoring etc.
    software engineering)
  • Approach
  • Collect and analyze application use cases
  • Review the state of art of workflow systems
  • Propose workflow systems for the PoC environment
  • Be active in use case projects
  • Learn lessons from use cases
  • Propose a new design

Based on the list of 36 items was
established to characterize the ideal workflow
for the VL-e, the VLAM-G scored 13 Yes, 5 but
need to be reimplementation, 09 No, 02 Partially
supported, 6 In progress or Planned
9
Survey of existing workflow systems
http//staff.science.uva.nl/gvlam//doc/P2/Workflo
wSurvey Participants Belloum, De Boer,
Guevara-Masis, Korkhov, Mirzadeh, Terpstra, van
Hooft, Vasunin, wibisono, Yakali, Zhao.
10
Survey results
  • Based on the survey and the practical tests on
    the nine workflow systems, we learn
  • All of the systems are still in beta-versions
    (even in alpha), and have the tendency to crash
    when we do relatively complex tests.
  • None of the systems have support for
    collaboration, data sharing, and information
    management.
  • None of the systems enforce best practice or
    provide support for knowledge capture.
  • Most of systems are not geared to use Grid based
    systems, they have been built to work on a single
    system with some features to submit jobs on a
    remote host (user still exposed to some Grid
    related issues like writing RSLs).
  • We have had some problems when testing some
    features described in the documentation.

http//staff.science.uva.nl/gvlam//doc/P2/SWMSRec
ommendationReport.pdf Participants Belloum, De
Boer, Korkhov, Terpstra, van Hooft, Vasunin,
wibisono, Zhao.
11
Recommendation for PoC R1(Part of the short term
solution)
http//staff.science.uva.nl/gvlam//doc/P2/SWMSRec
ommendationReport.pdf Participants Belloum, De
Boer, Korkhov, Terpstra, van Hooft, Vasunin,
wibisono, Zhao.
12
Use cases and small project teams
  • Use case project teams
  • Participants from SPs from P1, P2, P3 and P4.
  • Contributions from workflow team distinguish
    reusable components and provide integration
    solution.
  • We are also active in project management, such as
    decomposing the implementation into concrete
    tasks, and track the progress.
  • Inside SP2.5, we divide the group members
  • SP1.2 ? Belloum Korkhov
  • SP1.3 ? Belloum De Boer
  • SP1.4 ? Zhao Vasunin
  • SP1.5 ? Zhao Wibisono
  • SP1.6 ? Belloum Paul De Boer

13
Collaboration with VL-e Applications
  • SP1.2 AID-Food informatics-IvI
  • WCFS case searching in Research Management
    System (Selected by the VLeIT) (ongoing )
  • SP1.3 AMC-IvI
  • High-volume data management in the PoC SRB
    (Selected by the VLeIT) (ongoing )
  • SP1.4 - IBED-IvI
  • Run KansK toolbox in Workflow environment (Master
    thesis project) (ongoing )

14
Collaboration with VL-e Applications
  • SP1.5 IBU-IvI
  • Histone code - semantic data integration
    (Selected by VLeIT) (ongoing )
  • Running R scripts on multiple nodes using web
    service (Finished)
  • Running R scripts in workflows (ongoing )
  • Ridge-O-grammer (ongoing )
  • SP1.6 AMOLF-IvI
  • SRB Meta data update from file header (Selected
    by VLeIT) (ongoing )

15
SP1.2 WCFS case searching in Research
Management System
  • Much data in scientific research
  • But
  • No reuse data not available across projects
  • No context meaning of data not known
  • Not reproducible experiments
  • Only successful experiments traceable
  • Wish
  • Research Management System manage experimental
    data for WCFS researchers

AID tools
16
SP1.3 High-volume data management in the PoC SRB
  • The goal of the use case is to
  • Facilitate the data management and analysis for
    the functional MRI studies bu using PoC resources
    for computation and resources
  • Matrix cluster
  • SRB
  • FMRI pilot is going to be developed as a first
    step.

17
SP1.4 Run KansK toolbox in Workflow environment
  • The toolbox main processes are dealing with the
    data preparation, evaluate, prediction, and
    display
  • The workflow is about the prediction of the
    location of the birds
  • To be integrated in workflow
  • VLAM

18
SP1.5 Histone code - semantic data integration
Knowledge Data Discovery
Data Exploration Extract overlapping genome
locations
Data Import
  • Scaling problems
  • Sesame
  • Jena

19
SP1.5 Running R scripts in workflows
SP1.5 side (Frans and Han) SP2.5 side (Wibi, Zhiming)
Define concrete description
Provide UML based analysis diagrams
Have a meeting decompose the task Have a meeting decompose the task
Implement the functionality in the modules (Kepler Actor or VLAM module) Work together and give necessary support.
Integrating modules into a workflow (a integration meeting) Integrating modules into a workflow (a integration meeting)
Refine the modules Refine workflow
Final demonstration Final demonstration
20
SP1.5 Ridge-O-grammer
identify ridges (regions of increased gene
expression)
The outcome of this work is going to be presented
at Netherlands Bioinformatics Conference - 24
April 2006
21
On going development Activities on the rapid
prototyping environment
  • Simple file management tools for SRB, and GridFTP
  • R scripts in workflow system
  • Parameters sharing of workflow components.
  • Service discovery using P2P approach
  • Parameter Sweep and Job farming

22
Future work
  • By far the most active and rapidly progressing
    WMS is Kepler
  • Beta-version March 2006.
  • Kepler/Ptolomy has two ways of extending the
    Systems
  • Actors
  • Directors

23
Summary
  • Survey results showed that the e-science WMS
    targeted in VL-e
  • Does not exist yet
  • Collaboration with other Workflow project will
    likely speed up the development process
  • Project teams working on application use case is
    the only way to progress
  • VLAM is still quite useful for rapid prototyping

24
References
  • People
  • Adam Belloum (SP2.5 leader), Zhiming Zhao, Paul
    van Hooft (post doc), Adianto Wibisono, Dmitry
    Vasyunin , Vladimir Korkhov , Frank Terpstra
    (Ph.D students), Piter de Boer (Programmer)
  • VL-e Reports
  • PoC recommendation report
  • Publications
  • Z. Zhao A. Belloum H. Yakali P.M.A. Sloot and
    L.O. Hertzberger Dynamic Workflow in a Grid
    Enabled Problem Solving Environment, in
    Proceedings of the 5th International Conference
    on Computer and Information Technology , pp.
    339-345 . IEEE Computer Society Press, Shanghai,
    China, September 2005.
  • Z. Zhao A. Belloum A. Wibisono F. Terpstra
    P.T. de Boer P.M.A. Sloot and L.O. Hertzberger
    Scientific workflow management between
    generality and applicability, in Proceedings of
    the International Workshop on Grid and
    Peer-to-Peer based Workflows, pp. 357-364. IEEE
    Computer Society Press, Melbourne, Australia ,
    September 19th-21st 2005.
  • Z. Zhao A. Belloum P.M.A. Sloot and L.O.
    Hertzberger Agent technology and scientific
    workflow management in an e-Science environment,
    in Proceedings of the 17th IEEE International
    conference on Tools with Artificial Intelligence,
    pp. 19-23. IEEE Computer Society Press, Hongkong,
    China, November 14th-16th 2005.
  • Activity
  • Intl workshop on Workflow systems in e-Science,
    organized by Zhiming Zhao and Adam Belloum, in
    the context of ICCS06, Reading University, May
    28, 2006.
  • Workshop on Workflow systems in e-Science, to be
    held during the next e-Science conference in
    Amsterdam December 2006.
Write a Comment
User Comments (0)
About PowerShow.com