Title: Porting of BioInformatic Tools for Plant Virology on a Computational Grid
1Porting of Bio-Informatic Tools for Plant
Virology on a Computational Grid
- Gaetano Lanzalone1,3, Alessandro Lombardo1,2
Annamaria Muoio1, Marcello Iacono-Manno1,
Roberto Barbera1,4 - 1INFN Sezione di Catania and Consorzio COMETA
Catania IT - 2Dipartimento di Scienze e Tecnologie
Fitosanitarie Catania IT - 3INFN LNS Catania IT
- 4Dipartimento di fisica UniversitĂ di Catania IT
- Catania, June.2008
2Outline
- 1 Introduction on the Biological problem.
- 2 TriGrid and Cometa projects.
- 3 Problem Solution by GENIUS.
- 4 Results and Conclusions
3- 1 Brief introdution on the Biological problem.
4 WORLD-WIDE CITRUS PRODUCTION(FAO)
the production of oranges would have to be
attested around to 66,4 of tried million tons of
which 36,3 of fresh product and 30.1 product
5Biological problem
TESTBEDS CMV (Cucumber mosaic virus) TYLCV
(Tomato yellow leaf curl virus)
TYLCSV TSWV (Tomato
yellow leaf curl sardinia virus)
(Tomato spotted wilt virus)
CTV (Citrus tristeza virus)
6- Symptoms
- Rapid decline and death of Citrus grafted on
bitter orange (Citrus aurantianum L.) - Stem pitting, yields reduced, poor quality of
the fruits. - Yellow seedling and leaves.
- Low growth rate
Vectors Aphids (Toxoptera citricida, Aphis
gossypii)
7CTV Geographic distribution Algeria, American
Samoa, Antigua and Barbuda, Argentina, Australia,
Belize, Bermuda, Bolivia, Brazil, Brunei
Darussalam, Cameroon, the Central African
Republic, Chad, China, Colombia, Costa Rica,
Cyprus, the Dominican Republic, Ecuador, Egypt,
El Salvador, Ethiopia, Fiji, French Polynesia,
Gabon, Ghana, Guyana, India, Indonesia, Iran,
Israel, Italy, Jamaica, Japan, Kenya, Korea
Republic, Malaysia, Mauritius, Morocco,
Mozambique, Nepal, Netherlands Antilles, New
Caledonia, New Zealand, Nicaragua, Nigeria,
Pakistan, Panama, Paraguay, Peru, the
Philippines, Portugal, Puerto Rico, Saudi Arabia,
Spain, Sri Lanka, Suriname, Taiwan, Tanzania,
Thailand, Trinidad and Tobago, Turkey, the USA,
Uganda, Uruguay, Venezuela, Vietnam, Zaire,
Zambia, Western Samoa, the former Yugoslavia,
Zimbabwe.
8CTV (Citrus Tristeza Virus)
ZOOM
Particles dimension 2000nm x 11 nm Genome RNA
single strand 19,3 Kb Genome organization 12
Open Reading Frames 2 Untraslated Terminal
Regions Proteins produced at least 19 Complete
genomes in GenBank 9
9ZOOM
NUCLEOTIDE
10ClustalW (Thomson et al., 1994)
MULTIPLE ALIGNMENT OF NUCLEOTIDE SEQUENCES
POINT MUTATIONS INSERTIONS DELETIONS TRANSVERSION
RECOMBINATIONS
Similarity Plot
11FILOGENETIC TREES
12LOCATION OF THE RECOMBINATION EVENTS
TOPALi
TOPALi V2.0 (BioSS-Biomathematics Statistics
Scotland)
DDS (Difference of Sums of Square - McGuire and
Wright, 2000) PDM (Probabilistic Divergence
Measures Husmeier and Wright, 2001) HMM (Hidden
Markov Model Husmeier and McGuire, 2003)
Time of analysis PDM about alignment of CTV on pc
user (3.2 MHz) 44,2 h !!!!!
13- 2 TriGrid and Cometa projects.
14The Sicilian Grid in one slide
1500 CPUs 250 TBytes
15.000.000 in 3 years! 300 FTEs ! (2/3 new
hired staff)
15Objectives of an e-Infrastructure in Sicily
- Create a Virtual Laboratory in Sicily, both for
scientific and industrial applications, built on
the top of a Grid infrastructure - Connect the Sicilian e-Infrastructure to those
already existing in Italy, Europe and the rest of
the world improving the scientific collaboration
and increasing the competitiveness of e-Science
and e-Industry made in Sicily - Disseminate the Grid paradigm through the
organization of dedicated events and training
courses - Trigger/foster the creation of spin-offs in the
ICT area in order to reduce the brain drain of
brilliant young people to other parts of Italy
and beyond
16The TriGrid e-Infrastructure
- 288 cores AMD Opteron 280
- 400 GB of memory
- LSF 6.1 HPC everywhere
- Infiniband-1X at INAF-OACT
- and CECUM for HPC apps.
- 57 TB of raw disk storage FC-2-SATA
- Distributed/parallel GPFS filesystem
17Lay-out of large sites
Site 1
(expansion w.r.t. TriGrid)
Site 3
Site 6
Padova, V Workshop INFN Grid, 18.12.2006
17
18Computing, Networking, and Storage (2/3)
- 8 IBM BladeCenter H enclosures
- 84 IBM LS21 blades
- 336 cores AMD Opteron 2218 rev. F
- 772 GB of RAM (2 GB/core)
- 0.55 MSpecInt2000
- 0.66 MSpecFP2000
- More than 6 kSpec(Int/FP)Rate
- 48.8 mW/SpecInt2000 at full load !
- G-Ethernet service network
- CISCO Topspin Infiniband-4X additional
low-latency network for HPC applications - LSF 6.1 HPC included !
19Computing, Networking, and Storage (3/3)
- 4 IBM DS4200 Storage Systems (sites 1, 2, 3, and
6) - FC-2-SATA technology
- 136 500-GB disks
- 68 TB of storage (raw) in total
- Expandability up to 0.45 PB
- GPFS distributed/parallel file sytem included !
20- 3 Problem Solution by GENIUS.
21- Command line interface
- Expert User ? long time before
start - Web interface
- Dummy User ? immediate start
- We need
- Friendly User Interface ? ENGINFRAME (GENIUS)
22A web portal why and how ?
- It can be accessed from everywhere and by
everything (desktop, laptop, PDA, WAP phone). - It can keep the same user interface to several
back-ends (grid dialects ? command-line UIs). - It must be secure at all levels
- 1) secure about web transactions,
- 2) secure about user authentication,
- 3) trustworthy at VO level.
- All available Grid services must be incorporated
in a logic way, just one mouse click away. - Its layout must be easily understandable and user
friendly.
23EnginFrame in brief
- Standard based GRID portal
- Java, Tomcat, XML/XSL GridML
- Solves back-end integration problems
- Visual rendering for most Grid objects
- jobs, job arrays, hosts, etc.
- Multiple Grid technologies support
- Globus, LSF, SGE, LoadLeveler, PBS, even OS!
- Authentication delegation
- Data management, UL/DL remote file browsing
- Integration with interactive applications, tools,
24EnginFrame workflow
Application Servers
Interactive applications
Web Server
Clients
EnginFrame Server
Standard Web Browser
Grid / Compute Farm
25GENIUS web portal
Applications specific layer
ALICE
ATLAS
CMS
Bio apps
High level GRID middleware
EGEE architecture
Basic Services
GLOBUS toolkit
OS Net services
26Porting of Bio-Informatic Tools for Plant Virology
- Applications
- ClustalW, TOPALi, SplitsTree and Knetfold.
- ClustalW is an execution MPI job on the Grid of
data analysis program for multiple alignments - TOPALi and Splitstree programs run as interactive
jobs on the Grid - Knetfold application runs as a parametric job.
27PROGRAMS WORK FLOW
28JDL
XML
DEVELOPER
USER
29JDL
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
392
40(No Transcript)
41TOPALi
input
42TOPALi
output
43SplitsTree
input.aln
output.jpg
44SplitsTree
algorithm bootstrapping
45- 4 Results and Conclusions
46Sequences alignement time by ClustalW
Elapsed time for the ClustalW-MPI results of 9
Citrus Tristeza Virus complete genome as a
function of the number of processor.
47Comparison in time for TOPALi
Comparison of TOPALi2 analysis times on CTV
sequences, DSS method, carried out on different
computational architectures.
48Summary and Conclusions
- TriGrid VL and PI2S2 are the first Grid projects
in Italy at a true regional level. After one year
from the beginning the e-Infrastructure of
TriGrid is now a reality and a big portfolio of
application is about to be deployed on it. The
PI2S2 Infrastructure is also available since six
months ago. - The process speed-up together with the
integration of the whole phylo-genetic analysis
into a coherent and easy-to-use frame, will lead
to a remarkable progress in such investigations.
49A citation
Telephone, Light bulb, Telegraph, Radio, TV,
Computer, Network, PC, Web, (in the same order
as they were invented)
50- You can this way copy files from or to a remote
server, you can even copy files from one remote
server to another remote server, without passing
through your PC. - Usage
- scp user_at_from-hostsource-file
user_at_to-hostdestination-file - Description of options
- from-host
- Is the name or IP of the host where the source
file is, this can be omitted if the from-host is
the host where you are actually issuing the
command - user
- Is the user which have the right to access the
file and directory that is supposed to be copied
in the cas of the from-host and the user who has
the rights to write in the to-host - source-file
- Is the file or files that are going to be copied
to the destination host, it can be a directory
but in that case you need to specify the -r
option to copy the contents of the directory - destination-file
- Is the name that the copied file is going to take
in the to-host, if none is given all copied files
are going to maintain its names - scp .txt user_at_remote.server.com/home/user/
- This will copy all files with .txt extension to
the directory /home/user in the remote.server.com
host
51Any Questions ?
Thank you very much for your kind attention!
This work makes use of results produced by the
PI2S2 Project managed by the Consorzio COMETA, a
project co-funded by the Italian Ministry of
University and Research (MIUR) within the Piano
Operativo Nazionale Ricerca Scientifica,
Sviluppo Tecnologico, Alta Formazione (PON
2000-2006). More information is available at
http//www.pi2s2.it and http//www.consorzio-comet
a.it