Title: Deployment of Testcases Jean Salzemann LPC IN2P3CNRS credits: Nicolas Jacq, Tristan Glatard
1Deployment of Test-casesJean Salzemann LPC
IN2P3/CNRScredits Nicolas Jacq, Tristan
Glatard
2The embrace test-cases
- HMM-profile sequence analysis (cpu intensive)
- Data management and storage database update
service. (huge data transfers) - WISDOM (requires a lot of CPU power and produce a
lot of data)
3WISDOM Wide In Silico Docking On Malaria
- Goals of the first biomedical data challenge
(July - August 2005) - Biological goal Proposition of new inhibitors
for a family of proteins produced - by Plasmodium falciparum
- Biomed. informatics goal Deployment of in
silico virtual docking on the grid - Grid goal Deployment of a CPU consuming
application generating large data - flows to test the grid infrastructure and
services. - Partners
- Fraunhofer SCAI (Project PI Martin Hofmann)
- LPC Clermont-Ferrand (CNRS/IN2P3)
- CMBA (Center for Bio-Active Molecules screening)
- Representing different projects
- EGEE (EU FP6)
- Simdat (EU FP6)
- AuverGrid and Campus Grid
- (French and German Regional Grids)
- Accamba project (french ACI project)
4Grid deployment
- Objective
- Producing a large amount of data in a limited
time with a minimal human cost during the data
challenge. - Need an optimized environment
- Limited time
- Performance goal
- Need a fault tolerant environment
- Grid is heterogeneous and dynamic
- Stress usage of the grid during the DC
- Need an automatic production environment
- Execution with the Biomedical Task Force
- Grid API are not fully adapted for a bulk use at
a large scale
5WISDOM development
Executer (LPC, SCAI, Biomed Task Force)
Installer (LPC)
Tester (LPC)
wisdom_env wisdom_resources Instances rep
wisdom_test CE, SE, RB Conf.
wisdom_install Soft db copy Publication
wisdom_exe Workload definition Multithreaded jobs
submission Jobs monitoring (status, content) Jobs
bookkeeping
biomed VO LCG components WMS, RLS/RMC LCG
resources RB, CE, SE
wisdom_quality Fault tracking Fault
repairing Jobs resubmission
Superviser (LPC, SCAI, Biomed Task Force)
Superviser (EIS, SA1)
wisdom_env Meta-instances/statistics
wisdom_collect Data transfer and register
Executer/End-user (LPC, SCAI, Free access)
wisdom_site Statistics study Instances access
wisdom_db Meta-instances/statistics
wisdom_access Instances access
6Grid workflow
Results
Compounds list
Software
Site1
Statistics
Parameter settings Target structures Compounds
sublists
User interface
Site2
Compounds database
Storage Element
Software
Results
- FlexX license server
- 3000 floating licenses given by BioSolveIT to
SCAI - Maximum number of used licenses was 1008
7Integrating Wisdom with Taverna
- Algorithms are encapsulated into Web-Services
- standard WSDL interface
- Web-service to handle the job execution,
independent from the implementation - Basic interaction pattern between Taverna and
EGEE - Authentication problems
- From the Web-Services to the User Interface (ssh
tunelling?) - From the User Interface to Grid Resources (proxy
creation?)
Grid Resources
EGEE User Interface
Taverna workflow manager
Registration Web-Service
SOAP (over HTTP)
ssh tunnelling
command line interface
8A service to update and replicate databases
- RUGBI french project financed by the Gen'homme
network - Grid for biologists
- Based on existing technologies (Web Services,
globus toolkit 4, native XML databases) - 3 sites in France Grenoble, Lyon,
Clermont-Ferrand - Biologists are using, most of the time flat files
databases , available on ftp repositories. - These databases are changing and growing
constantly and therefore need regular updates in
order to keep the most up to date version
available. - This service, is an applicative service,
integrable in a grid environment, which performs
automatically regular updates and propagate them
through the grid.
9Service concept
- Master Service
- Get the information from the information system
(Controller) - Compare the states of the databases
- Download the differences
- Notify the clients
- Client Service
- Get the information from the information system
- Download the differences
- Implemented in java as web Services and tcp
socket. - Compatible with Axis, Globus Toolkit 3, Globus
Toolkit 4.
Ftp Server
SER
10Main Steps of the process
- 1. The SER updates its repository and notifies
the clients - (Performs a comparison and download the
differences) - 2. The SE gets the notification and download the
updates with GridFTP. - 3.The SER ask for a REGISTER of the new database
and an UNREGISTER of the old version. - 4. The SE notifies the success of the deployment
to the SER - 5. The SER is waiting for a deletion notification
of the old version, when it is received, it
deletes the old database and propagates this
notification through the grid.
11The challenge
- The databases
- Swissprot, 700 MB
- Trembl, 2.4 GB
- Pdb, 2.9 GB
- Kegg, 13 GB
- Embl, 476 GB , 180 GB (release, without
annotations) - Need of reliable file transfer service.
- Need of information system that allow databases
registration and discovery
12Deployment with LCG
SE
Comparison and download
FTP SERVER
Copy and registration lcg-cr
User Interface (Update Service)
RLS
- Applicative service (just to be deployed on User
Interface) - uses a specific certificate and is registered in
a vo - uses the services intrated in the grid.
13QUESTIONS