Deployment of Testcases Jean Salzemann LPC IN2P3CNRS credits: Nicolas Jacq, Tristan Glatard - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Deployment of Testcases Jean Salzemann LPC IN2P3CNRS credits: Nicolas Jacq, Tristan Glatard

Description:

HMM-profile sequence analysis (cpu intensive) ... WISDOM (requires a lot of CPU power and produce a lot of data) ... of a CPU consuming application generating ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 14
Provided by: embracewp
Category:

less

Transcript and Presenter's Notes

Title: Deployment of Testcases Jean Salzemann LPC IN2P3CNRS credits: Nicolas Jacq, Tristan Glatard


1
Deployment of Test-casesJean Salzemann LPC
IN2P3/CNRScredits Nicolas Jacq, Tristan
Glatard
2
The embrace test-cases
  • HMM-profile sequence analysis (cpu intensive)
  • Data management and storage database update
    service. (huge data transfers)
  • WISDOM (requires a lot of CPU power and produce a
    lot of data)

3
WISDOM Wide In Silico Docking On Malaria
  • Goals of the first biomedical data challenge
    (July - August 2005)
  • Biological goal Proposition of new inhibitors
    for a family of proteins produced
  • by Plasmodium falciparum
  • Biomed. informatics goal Deployment of in
    silico virtual docking on the grid
  • Grid goal Deployment of a CPU consuming
    application generating large data
  • flows to test the grid infrastructure and
    services.
  • Partners
  • Fraunhofer SCAI (Project PI Martin Hofmann)
  • LPC Clermont-Ferrand (CNRS/IN2P3)
  • CMBA (Center for Bio-Active Molecules screening)
  • Representing different projects
  • EGEE (EU FP6)
  • Simdat (EU FP6)
  • AuverGrid and Campus Grid
  • (French and German Regional Grids)
  • Accamba project (french ACI project)

4
Grid deployment
  • Objective
  • Producing a large amount of data in a limited
    time with a minimal human cost during the data
    challenge.
  • Need an optimized environment
  • Limited time
  • Performance goal
  • Need a fault tolerant environment
  • Grid is heterogeneous and dynamic
  • Stress usage of the grid during the DC
  • Need an automatic production environment
  • Execution with the Biomedical Task Force
  • Grid API are not fully adapted for a bulk use at
    a large scale

5
WISDOM development
Executer (LPC, SCAI, Biomed Task Force)
Installer (LPC)
Tester (LPC)
wisdom_env wisdom_resources Instances rep
wisdom_test CE, SE, RB Conf.
wisdom_install Soft db copy Publication
wisdom_exe Workload definition Multithreaded jobs
submission Jobs monitoring (status, content) Jobs
bookkeeping
biomed VO LCG components WMS, RLS/RMC LCG
resources RB, CE, SE
wisdom_quality Fault tracking Fault
repairing Jobs resubmission
Superviser (LPC, SCAI, Biomed Task Force)
Superviser (EIS, SA1)
wisdom_env Meta-instances/statistics
wisdom_collect Data transfer and register
Executer/End-user (LPC, SCAI, Free access)
wisdom_site Statistics study Instances access
wisdom_db Meta-instances/statistics
wisdom_access Instances access
6
Grid workflow
Results
Compounds list
Software
Site1
Statistics
Parameter settings Target structures Compounds
sublists
User interface
Site2
Compounds database
Storage Element
Software
Results
  • FlexX license server
  • 3000 floating licenses given by BioSolveIT to
    SCAI
  • Maximum number of used licenses was 1008

7
Integrating Wisdom with Taverna
  • Algorithms are encapsulated into Web-Services
  • standard WSDL interface
  • Web-service to handle the job execution,
    independent from the implementation
  • Basic interaction pattern between Taverna and
    EGEE
  • Authentication problems
  • From the Web-Services to the User Interface (ssh
    tunelling?)
  • From the User Interface to Grid Resources (proxy
    creation?)

Grid Resources
EGEE User Interface
Taverna workflow manager
Registration Web-Service
SOAP (over HTTP)
ssh tunnelling
command line interface
8
A service to update and replicate databases
  • RUGBI french project financed by the Gen'homme
    network
  • Grid for biologists
  • Based on existing technologies (Web Services,
    globus toolkit 4, native XML databases)
  • 3 sites in France Grenoble, Lyon,
    Clermont-Ferrand
  • Biologists are using, most of the time flat files
    databases , available on ftp repositories.
  • These databases are changing and growing
    constantly and therefore need regular updates in
    order to keep the most up to date version
    available.
  • This service, is an applicative service,
    integrable in a grid environment, which performs
    automatically regular updates and propagate them
    through the grid.

9
Service concept
  • Master Service
  • Get the information from the information system
    (Controller)
  • Compare the states of the databases
  • Download the differences
  • Notify the clients
  • Client Service
  • Get the information from the information system
  • Download the differences
  • Implemented in java as web Services and tcp
    socket.
  • Compatible with Axis, Globus Toolkit 3, Globus
    Toolkit 4.

Ftp Server
SER
10
Main Steps of the process
  • 1. The SER updates its repository and notifies
    the clients
  • (Performs a comparison and download the
    differences)
  • 2. The SE gets the notification and download the
    updates with GridFTP.
  • 3.The SER ask for a REGISTER of the new database
    and an UNREGISTER of the old version.
  • 4. The SE notifies the success of the deployment
    to the SER
  • 5. The SER is waiting for a deletion notification
    of the old version, when it is received, it
    deletes the old database and propagates this
    notification through the grid.

11
The challenge
  • The databases
  • Swissprot, 700 MB
  • Trembl, 2.4 GB
  • Pdb, 2.9 GB
  • Kegg, 13 GB
  • Embl, 476 GB , 180 GB (release, without
    annotations)
  • Need of reliable file transfer service.
  • Need of information system that allow databases
    registration and discovery

12
Deployment with LCG
SE
Comparison and download
FTP SERVER
Copy and registration lcg-cr
User Interface (Update Service)
RLS
  • Applicative service (just to be deployed on User
    Interface)
  • uses a specific certificate and is registered in
    a vo
  • uses the services intrated in the grid.

13
QUESTIONS
  • ?
Write a Comment
User Comments (0)
About PowerShow.com