Porting of BioInformatic Tools for Plant Virology on a Computational Grid - PowerPoint PPT Presentation

Loading...

PPT – Porting of BioInformatic Tools for Plant Virology on a Computational Grid PowerPoint presentation | free to view - id: 1dd04b-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Porting of BioInformatic Tools for Plant Virology on a Computational Grid

Description:

Porting of BioInformatic Tools for Plant Virology on a Computational Grid – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 49
Provided by: document1
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Porting of BioInformatic Tools for Plant Virology on a Computational Grid


1
Porting of Bio-Informatic Tools for Plant
Virology on a Computational Grid
  • Gaetano Lanzalone1,3, Alessandro Lombardo1,2
    Annamaria Muoio1, Marcello Iacono-Manno1,
    Roberto Barbera1,4
  • 1INFN Sezione di Catania and Consorzio COMETA
    Catania IT
  • 2Dipartimento di Scienze e Tecnologie
    Fitosanitarie Catania IT
  • 3INFN LNS Catania IT
  • 4Dipartimento di fisica Università di Catania IT
  • Catania, June.2008

2
Outline
  • 1 Introduction on the Biological problem.
  • 2 TriGrid and Cometa projects.
  • 3 Problem Solution by GENIUS.
  • 4 Results and Conclusions

3
  • 1 Brief introdution on the Biological problem.

4
WORLD-WIDE CITRUS PRODUCTION(FAO)
the production of oranges would have to be
attested around to 66,4 of tried million tons of
which 36,3 of fresh product and 30.1 product
5
Biological problem
TESTBEDS CMV (Cucumber mosaic virus) TYLCV
(Tomato yellow leaf curl virus)
TYLCSV TSWV (Tomato
yellow leaf curl sardinia virus)
(Tomato spotted wilt virus)
CTV (Citrus tristeza virus)
6
  • Symptoms
  • Rapid decline and death of Citrus grafted on
    bitter orange (Citrus aurantianum L.)
  • Stem pitting, yields reduced, poor quality of
    the fruits.
  • Yellow seedling and leaves.
  • Low growth rate

Vectors Aphids (Toxoptera citricida, Aphis
gossypii)
7
CTV Geographic distribution Algeria, American
Samoa, Antigua and Barbuda, Argentina, Australia,
Belize, Bermuda, Bolivia, Brazil, Brunei
Darussalam, Cameroon, the Central African
Republic, Chad, China, Colombia, Costa Rica,
Cyprus, the Dominican Republic, Ecuador, Egypt,
El Salvador, Ethiopia, Fiji, French Polynesia,
Gabon, Ghana, Guyana, India, Indonesia, Iran,
Israel, Italy, Jamaica, Japan, Kenya, Korea
Republic, Malaysia, Mauritius, Morocco,
Mozambique, Nepal, Netherlands Antilles, New
Caledonia, New Zealand, Nicaragua, Nigeria,
Pakistan, Panama, Paraguay, Peru, the
Philippines, Portugal, Puerto Rico, Saudi Arabia,
Spain, Sri Lanka, Suriname, Taiwan, Tanzania,
Thailand, Trinidad and Tobago, Turkey, the USA,
Uganda, Uruguay, Venezuela, Vietnam, Zaire,
Zambia, Western Samoa, the former Yugoslavia,
Zimbabwe.
8
CTV (Citrus Tristeza Virus)
ZOOM
Particles dimension 2000nm x 11 nm Genome RNA
single strand 19,3 Kb Genome organization 12
Open Reading Frames 2 Untraslated Terminal
Regions Proteins produced at least 19 Complete
genomes in GenBank 9
9
ZOOM
NUCLEOTIDE
10
ClustalW (Thomson et al., 1994)
MULTIPLE ALIGNMENT OF NUCLEOTIDE SEQUENCES
POINT MUTATIONS INSERTIONS DELETIONS TRANSVERSION
RECOMBINATIONS
Similarity Plot
11
FILOGENETIC TREES
12
LOCATION OF THE RECOMBINATION EVENTS
TOPALi
TOPALi V2.0 (BioSS-Biomathematics Statistics
Scotland)
DDS (Difference of Sums of Square - McGuire and
Wright, 2000) PDM (Probabilistic Divergence
Measures Husmeier and Wright, 2001) HMM (Hidden
Markov Model Husmeier and McGuire, 2003)
Time of analysis PDM about alignment of CTV on pc
user (3.2 MHz) 44,2 h !!!!!
13
  • 2 TriGrid and Cometa projects.

14
The Sicilian Grid in one slide
1500 CPUs 250 TBytes
15.000.000 in 3 years! 300 FTEs ! (2/3 new
hired staff)
15
Objectives of an e-Infrastructure in Sicily
  • Create a Virtual Laboratory in Sicily, both for
    scientific and industrial applications, built on
    the top of a Grid infrastructure
  • Connect the Sicilian e-Infrastructure to those
    already existing in Italy, Europe and the rest of
    the world improving the scientific collaboration
    and increasing the competitiveness of e-Science
    and e-Industry made in Sicily
  • Disseminate the Grid paradigm through the
    organization of dedicated events and training
    courses
  • Trigger/foster the creation of spin-offs in the
    ICT area in order to reduce the brain drain of
    brilliant young people to other parts of Italy
    and beyond

16
The TriGrid e-Infrastructure
  • 288 cores AMD Opteron 280
  • 400 GB of memory
  • LSF 6.1 HPC everywhere
  • Infiniband-1X at INAF-OACT
  • and CECUM for HPC apps.
  • 57 TB of raw disk storage FC-2-SATA
  • Distributed/parallel GPFS filesystem

17
Lay-out of large sites
Site 1
(expansion w.r.t. TriGrid)
Site 3
Site 6
Padova, V Workshop INFN Grid, 18.12.2006
17
18
Computing, Networking, and Storage (2/3)
  • 8 IBM BladeCenter H enclosures
  • 84 IBM LS21 blades
  • 336 cores AMD Opteron 2218 rev. F
  • 772 GB of RAM (2 GB/core)
  • 0.55 MSpecInt2000
  • 0.66 MSpecFP2000
  • More than 6 kSpec(Int/FP)Rate
  • 48.8 mW/SpecInt2000 at full load !
  • G-Ethernet service network
  • CISCO Topspin Infiniband-4X additional
    low-latency network for HPC applications
  • LSF 6.1 HPC included !

19
Computing, Networking, and Storage (3/3)
  • 4 IBM DS4200 Storage Systems (sites 1, 2, 3, and
    6)
  • FC-2-SATA technology
  • 136 500-GB disks
  • 68 TB of storage (raw) in total
  • Expandability up to 0.45 PB
  • GPFS distributed/parallel file sytem included !

20
  • 3 Problem Solution by GENIUS.

21
  • Command line interface
  • Expert User ? long time before
    start
  • Web interface
  • Dummy User ? immediate start
  • We need
  • Friendly User Interface ? ENGINFRAME (GENIUS)

22
A web portal why and how ?
  • It can be accessed from everywhere and by
    everything (desktop, laptop, PDA, WAP phone).
  • It can keep the same user interface to several
    back-ends (grid dialects ? command-line UIs).
  • It must be secure at all levels
  • 1) secure about web transactions,
  • 2) secure about user authentication,
  • 3) trustworthy at VO level.
  • All available Grid services must be incorporated
    in a logic way, just one mouse click away.
  • Its layout must be easily understandable and user
    friendly.

23
EnginFrame in brief
  • Standard based GRID portal
  • Java, Tomcat, XML/XSL GridML
  • Solves back-end integration problems
  • Visual rendering for most Grid objects
  • jobs, job arrays, hosts, etc.
  • Multiple Grid technologies support
  • Globus, LSF, SGE, LoadLeveler, PBS, even OS!
  • Authentication delegation
  • Data management, UL/DL remote file browsing
  • Integration with interactive applications, tools,

24
EnginFrame workflow
Application Servers
Interactive applications
Web Server
Clients
EnginFrame Server
Standard Web Browser
Grid / Compute Farm
25

GENIUS web portal
Applications specific layer
ALICE
ATLAS
CMS
Bio apps
High level GRID middleware
EGEE architecture
Basic Services
GLOBUS toolkit
OS Net services
26
Porting of Bio-Informatic Tools for Plant Virology
  • Applications
  • ClustalW, TOPALi, SplitsTree and Knetfold.
  • ClustalW is an execution MPI job on the Grid of
    data analysis program for multiple alignments
  • TOPALi and Splitstree programs run as interactive
    jobs on the Grid
  • Knetfold application runs as a parametric job.

27
PROGRAMS WORK FLOW
28
JDL
XML
DEVELOPER
USER
29
JDL
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
2
40
(No Transcript)
41
TOPALi
input
42
TOPALi
output
43
SplitsTree
input.aln
output.jpg
44
SplitsTree
algorithm bootstrapping
45
  • 4 Results and Conclusions

46
Sequences alignement time by ClustalW
Elapsed time for the ClustalW-MPI results of 9
Citrus Tristeza Virus complete genome as a
function of the number of processor.
47
Comparison in time for TOPALi
Comparison of TOPALi2 analysis times on CTV
sequences, DSS method, carried out on different
computational architectures.
48
Summary and Conclusions
  • TriGrid VL and PI2S2 are the first Grid projects
    in Italy at a true regional level. After one year
    from the beginning the e-Infrastructure of
    TriGrid is now a reality and a big portfolio of
    application is about to be deployed on it. The
    PI2S2 Infrastructure is also available since six
    months ago.
  • The process speed-up together with the
    integration of the whole phylo-genetic analysis
    into a coherent and easy-to-use frame, will lead
    to a remarkable progress in such investigations.

49
A citation
Telephone, Light bulb, Telegraph, Radio, TV,
Computer, Network, PC, Web, (in the same order
as they were invented)
50
  • You can this way copy files from or to a remote
    server, you can even copy files from one remote
    server to another remote server, without passing
    through your PC.
  • Usage
  • scp user_at_from-hostsource-file
    user_at_to-hostdestination-file
  • Description of options
  • from-host
  • Is the name or IP of the host where the source
    file is, this can be omitted if the from-host is
    the host where you are actually issuing the
    command
  • user
  • Is the user which have the right to access the
    file and directory that is supposed to be copied
    in the cas of the from-host and the user who has
    the rights to write in the to-host
  • source-file
  • Is the file or files that are going to be copied
    to the destination host, it can be a directory
    but in that case you need to specify the -r
    option to copy the contents of the directory
  • destination-file
  • Is the name that the copied file is going to take
    in the to-host, if none is given all copied files
    are going to maintain its names
  • scp .txt user_at_remote.server.com/home/user/
  • This will copy all files with .txt extension to
    the directory /home/user in the remote.server.com
    host

51
Any Questions ?
Thank you very much for your kind attention!
This work makes use of results produced by the
PI2S2 Project managed by the Consorzio COMETA, a
project co-funded by the Italian Ministry of
University and Research (MIUR) within the Piano
Operativo Nazionale Ricerca Scientifica,
Sviluppo Tecnologico, Alta Formazione (PON
2000-2006). More information is available at
http//www.pi2s2.it and http//www.consorzio-comet
a.it
About PowerShow.com