Title: A demonstration of the use of Datagrid testbed and services for the biomedical community
1A demonstration of the use of Datagrid testbed
and services for the biomedical community
- Biomedical applications work package
- V. Breton, Y Legré (CNRS/IN2P3)
- R. Météry (CS)
- Credits C. Blanchet, T. Contamine, S. Gadras,
M. Joubert, A.Minne, J. Montagnat
2The Visual DataGrid Blast
- A graphical interface to enter query sequences
and select the reference database - A script to execute the BLAST algorithm on the
grid - A graphical interface to analyze results
3When/Where do biologists use BLAST ?
- (When ?) The first step for analysing new
sequences to compare DNA or protein sequences to
other ones stored in personal or public
databases - (Where ?) in a laboratory with an updated version
of the genomics and post-genomics data banks - Requires equipment to store databases and run
algorithms - Requires manpower for system network
maintenance and frequent update of databases - Most biologists use integrated web portals for
their genomics comparative analysis no need to
worry about the biological file format and the
method arguments
4Web portals for biologists under growing
pressure
- Biologist enters sequences through web interface
- Pipelined execution of bio-informatics algorithms
- Genomics comparative analysis
- Phylogenetics
- 2D, 3D molecular structure of proteins
- The algorithms are executed on a local cluster
- Big labs have big clusters
- But growing pressure
- More and more biologists
- compare larger and larger sequences (whole
genomes) - to more and more genomes
- with fancier and fancier
algorithms !!
5Executing BLAST on the grid
Replica Catalog
DB
DB
Credit Fabio Hernandez
6Actual demonstration
Computing element
Input file
Seq1 gt dcscdssdcsdcdsc bscdsbcbjbfvbfvbvfbvbvbhvbh
svbhdvbhfdbvfd Seq2 gt bvdfvfdvhbdfvb
bhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkc Seqn
gt bvdfvfdvhbdfvb bhvdsvbhvbhdvrefghefgdscgdfgcsdy
cgdkcsqkchdsqhfduhdhdhqedezhhezldhezhfehflezfzejfv
UI
Computing element
RESULT dedzedzdzedezdzecdscsdcscdssdcsdcdscbscds
bcbjbfvbfvbvfbvbvbhvbhsvbhdvbhfdbvfdbvdfvfdvhbdfvb
hdbhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkcqhdsqhf
duhdhdhqedezhdhezldhezhfehflezfzeflehfhezfhehfezhf
lezhflhfhfelhfehflzlhfzdjazslzdhfhfdfezhfehfizhflq
fhduhsdslchlkchudcscscdscdscdscsddzdzeqvnvqvnq!
Vqlvkndlkvnldwdfbwdfbdbd wdfbfbndblnblkdnblkdbdfbw
fdbfn
Computing element
7The Grid impact on computing
- Swissprot vs Swissprot (100000 sequences)
- Running time on one CPU 228 hours
- Tests at Institut de Biologie et Chimie des
Protéines (quadripro) 49 hours - Tests on DataGrid (cc-in2p3) 3 hours
- Impacts
- Reduced pressure on local computing
- Ability to handle very large jobs
8The grid impact on data handling
- DataGrid will allow mirroring of databases
- An alternative to the current costly replication
mechanism - Allowing web portals on the grid to access
updated databases -
Trembl(EBI)
Biomedical Replica Catalog
9This demo illustrates how grids can bring a
revolution to genomics
- Grids expand the performances of genomics web
portals - Distributed execution of bio-informatics
algorithms, - Even the ones requiring huge amount of CPU
- Maintenance of up-to-date biological databases
over the network - Grids open new perspectives in large scale
genomics analysis - Complete genome annotation
- Cross-genomes analysis
- Data mining on distributed databases
- Pipelining of huge automatic bio-informatics
analysis