A demonstration of the use of Datagrid testbed and services for the biomedical community - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

A demonstration of the use of Datagrid testbed and services for the biomedical community

Description:

Biomedical applications work package ... Distributed execution of bio-informatics algorithms, Even the ones requiring huge amount of CPU ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 10
Provided by: equipeb
Category:

less

Transcript and Presenter's Notes

Title: A demonstration of the use of Datagrid testbed and services for the biomedical community


1
A demonstration of the use of Datagrid testbed
and services for the biomedical community
  • Biomedical applications work package
  • V. Breton, Y Legré (CNRS/IN2P3)
  • R. Météry (CS)
  • Credits C. Blanchet, T. Contamine, S. Gadras,
    M. Joubert, A.Minne, J. Montagnat

2
The Visual DataGrid Blast
  • A graphical interface to enter query sequences
    and select the reference database
  • A script to execute the BLAST algorithm on the
    grid
  • A graphical interface to analyze results

3
When/Where do biologists use BLAST ?
  • (When ?) The first step for analysing new
    sequences to compare DNA or protein sequences to
    other ones stored in personal or public
    databases
  • (Where ?) in a laboratory with an updated version
    of the genomics and post-genomics data banks
  • Requires equipment to store databases and run
    algorithms
  • Requires manpower for system network
    maintenance and frequent update of databases
  • Most biologists use integrated web portals for
    their genomics comparative analysis no need to
    worry about the biological file format and the
    method arguments

4
Web portals for biologists under growing
pressure
  • Biologist enters sequences through web interface
  • Pipelined execution of bio-informatics algorithms
  • Genomics comparative analysis
  • Phylogenetics
  • 2D, 3D molecular structure of proteins
  • The algorithms are executed on a local cluster
  • Big labs have big clusters
  • But growing pressure
  • More and more biologists
  • compare larger and larger sequences (whole
    genomes)
  • to more and more genomes
  • with fancier and fancier
    algorithms !!

5
Executing BLAST on the grid
Replica Catalog
DB
DB
Credit Fabio Hernandez
6
Actual demonstration
Computing element
Input file
Seq1 gt dcscdssdcsdcdsc bscdsbcbjbfvbfvbvfbvbvbhvbh
svbhdvbhfdbvfd Seq2 gt bvdfvfdvhbdfvb
bhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkc Seqn
gt bvdfvfdvhbdfvb bhvdsvbhvbhdvrefghefgdscgdfgcsdy
cgdkcsqkchdsqhfduhdhdhqedezhhezldhezhfehflezfzejfv
UI
Computing element
RESULT dedzedzdzedezdzecdscsdcscdssdcsdcdscbscds
bcbjbfvbfvbvfbvbvbhvbhsvbhdvbhfdbvfdbvdfvfdvhbdfvb
hdbhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkcqhdsqhf
duhdhdhqedezhdhezldhezhfehflezfzeflehfhezfhehfezhf
lezhflhfhfelhfehflzlhfzdjazslzdhfhfdfezhfehfizhflq
fhduhsdslchlkchudcscscdscdscdscsddzdzeqvnvqvnq!
Vqlvkndlkvnldwdfbwdfbdbd wdfbfbndblnblkdnblkdbdfbw
fdbfn
Computing element
7
The Grid impact on computing
  • Swissprot vs Swissprot (100000 sequences)
  • Running time on one CPU 228 hours
  • Tests at Institut de Biologie et Chimie des
    Protéines (quadripro) 49 hours
  • Tests on DataGrid (cc-in2p3) 3 hours
  • Impacts
  • Reduced pressure on local computing
  • Ability to handle very large jobs

8
The grid impact on data handling
  • DataGrid will allow mirroring of databases
  • An alternative to the current costly replication
    mechanism
  • Allowing web portals on the grid to access
    updated databases

Trembl(EBI)
Biomedical Replica Catalog
9
This demo illustrates how grids can bring a
revolution to genomics
  • Grids expand the performances of genomics web
    portals
  • Distributed execution of bio-informatics
    algorithms,
  • Even the ones requiring huge amount of CPU
  • Maintenance of up-to-date biological databases
    over the network
  • Grids open new perspectives in large scale
    genomics analysis
  • Complete genome annotation
  • Cross-genomes analysis
  • Data mining on distributed databases
  • Pipelining of huge automatic bio-informatics
    analysis
Write a Comment
User Comments (0)
About PowerShow.com