Interoperability Achieved by GADU in using multiple Grids. OSG, Teragrid and ANL Jazz - PowerPoint PPT Presentation

About This Presentation
Title:

Interoperability Achieved by GADU in using multiple Grids. OSG, Teragrid and ANL Jazz

Description:

The Workflow Generator in GADU is responsible for producing a workflow suitable ... to Ruth Pordes and OSG team for their wonderful support. TeraGrid. Charlie ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 16
Provided by: dinanath
Category:

less

Transcript and Presenter's Notes

Title: Interoperability Achieved by GADU in using multiple Grids. OSG, Teragrid and ANL Jazz


1
Interoperability Achieved by GADU in using
multiple Grids.OSG, Teragrid and ANL Jazz
Mathematics and Computer Science Division Argonne
National Laboratory Computational
Institute University of Chicago
Presented by Dinanath Sulakhe
2
GADU Applications
Its all about Comparative analysis
  • Insights of Biology are gained by Comparative
    Analysis
  • Unknown genes are compared against known.
  • Similar genes tend to perform same functions.
  • Comparative analysis to know what is same and
    different between two strains of an Organism
  • Example What is different a organism living
    Boiling temperature such as 108 deg Celsius and
    the one living in extreme freezing conditions.
  • Difference between Pathogenic and non-pathogenic
    organisms.
  • Mycobecterium Tuberculosis is a Pathogen causing
    TB, is only 12 genes different from the
    non-pathogenic BCG used as vaccine against TB.

Tools BLAST , Blocks, Chisel, Interpro etc.. An
embarrassingly parallel workload.
3
GADUs evolution ..
  • GADU Just evolved into what it is today.
  • Chiba City at Argonne.
  • Jazz Cluster at Argonne.
  • Grid2003 to OSG
  • Teragrid
  • All of them togeather.

4
Some Results and Highlights
Status Site Name Site Test MaxNodes Gridcat
    ASGC_OSG 18 199 Pass
    FNAL_FERMIGRID 12 12 Pass
    FNAL_GPFARM 266 749 Pass
    GRASE-CCR-U2 114 2112 Pass
    Nebraska FAIL_TIMEOUT 252 Pass
    OSG_LIGO_PSU 28 312 Pass
    Purdue-ITaP 13 1224 Pass
    Purdue-Physics 14 63 Pass
    STAR-BNL FAIL_TIMEOUT 672 Pass
    UFlorida-PG 279 268 Pass
    UMATLAS FAIL_TIMEOUT 771 Pass
    UTA_DPCC 18 154 Inactive
    UWMadisonCMS FAIL_TIMEOUT 90 Pass
    grow-UNI-P FAIL_TIMEOUT 17 Pass
    TG_UC 44 316 NONE
    TG_NCSA 55 1000 NONE
    TG_PURDUE FAIL_FTP 1024 NONE
  • GADU can successfully use OSG and Teragrid
    resources simultaneously.
  • Individual clusters such as ANL Jazz is also
    used parallely.
  • Site selection and scheduling across multiple
    grids.
  • Easily add a new site into the pool of sites.

Last Run .. ( Last week) Ran 38830 BLAST Jobs 70
OSG 30 Teragrid
5
Grid Resources..
Open Science Grid and Teragrid.
  • Authentication.
  • OSG
  • OSG GADU VOMS Server.
  • DOE Grid Certificates are automatically picked by
    the Sites.
  • TeraGrid
  • Individual Accounts via Allocations.
  • Manually adding DOE Grid certificates to each
    site. (gx-map).
  • Application Deployment.
  • OSG
  • OSG variables, OSG_APP and OSG_DATA is used to
    install GADUs applications and pre-stage the
    databases such as NR.
  • TeraGrid
  • GADU has a Community space on each of the sites
    available. Applications and installed within this
    community space.

6
Resource Independent GADU.
GADU uses Pegasus based VDS and Condor-G
Abstract Workflow as VDL
GADUs automated Analysis Server, expressing,
executing and tracking the scientific workflows
on Grid.
tc.data
Pegasus
Pool.config
Condor Submit files
Information Services
DAGMan Condor-G
Submit Host
Globus GRAM Interface
Remote Resources
Gatekeeper JobManager
Gatekeeper JobManager
Gatekeeper JobManager
Job management system
Job management system
Job management system
WN
WN
WN
WN
WN
WN
WN
WN
WN
7
Resource Independent GADU.
GADU uses Pegasus based VDS and Condor-G
TR FileBreaker(input filename, none nodes, output
sequences, none species) argument
species argument filename argument
nodes profile globus.maxwalltime
"300" TR BLAST( none OutPre, none evalue,
input query, none type ) argument
OutPre argument evalue profile
globus.maxwalltime "300" DV
jobNo_1_1separator-gtFileBreaker (
filename_at_input"inputfile.1"rt, nodes"5",
sequences_at_output"job1.0""tmp",
_at_output"job1.1""tmp",
_at_output"job1.2""tmp",
_at_output"job1.3""tmp",
_at_output"job1.4""tmp" , species"Aeropyrum_P
ernix" ) . VDL for BLAST workflow
The Workflow Generator in GADU is responsible for
producing a workflow suitable for execution in
the Grid environment. This task is accomplished
through the use of the virtual data language
(VDL). Once the VDL for the workflow is written,
VDS converts it into condor submit files and a
DAG that can be submitted to the site selected by
the site selector.
8
Resource Independent GADU.
1000 sequences
ATGCATGCA
4 Million sequences
9
Resource Independent GADU.
Representing a Site and the applications on it..
pool.config
tc.data
10
Resource Independent GADU.
GADU uses Pegasus based VDS and Condor-G
Abstract Workflow as VDL
GADUs automated Analysis Server, expressing,
executing and tracking the scientific workflows
on Grid.
tc.data
Pegasus
Pool.config
Condor Submit files
Information Services
DAGMan Condor-G
Submit Host
Globus GRAM Interface
Remote Resources
Gatekeeper JobManager
Gatekeeper JobManager
Gatekeeper JobManager
Job management system
Job management system
Job management system
WN
WN
WN
WN
WN
WN
WN
WN
WN
11
Requirements ... Information Services.
VDS like System can to provide an Architecture
independent mechanism to use different sites
(Grids)
In order to automatically add a new Grid site, we
need information about the site
  • Information Services at various levels
  • Authentication To check if the certs are valid
    at this site.
  • Architecture Is it an ia-32 cluster or an ia-64
    ?
  • Gatekeeper, GridFtp Server.
  • Environment Variables OSG_APP, TG_COMMUNITY
  • Number of CPUs
  • Number of Used CPUs.
  • Number of Idle CPUs.
  • VO (user) specific jobs running at a given site.
  • VO (user) specific jobs sitting in QUEUE at a
    given site (why?)
  • We a need standards and protocols for these
    Information Services and identify more
    information variables that needs to published by
    the Grids.
  • Gridcat or MDS or something else.
  • Currently GADU uses GridCat to collect site
    specific information for OSG and manually adds
    information for TeraGrid and Jazz. We are working
    on an MDS based information interface on TeraGrid.

12
Another Big Challenge.. Site Selection.
GADU has access to 60 OSG Sites and 5 TeraGrid
Sites.
One challenge in using the Grid reliably for
high-throughput analysis is monitoring the state
of all Grid sites and how well they have
performed for job requests from a given submit
host.
OSG
We view a site as available if our submit host
can communicate with it, if it is responding to
Globus job-submission commands, and if it will
run our jobs promptly, with minimal queuing
delays
GADU Server
13
Another Big Challenge.. Site Selection.
GADU has access to 60 OSG Sites and 5 TeraGrid
Sites.
Web Interface to Control the Selection of Sites
for GADU http//compbio.mcs.anl.gov/sulakhe/cgi-
bin/site_selection_new.pl?userdina
Web Interface showing live status of
usage http//compbio.mcs.anl.gov/gaduvo/gadu_job
s.cgi Grid may not worry about this
14
Next Steps..
  • Working with Teragrid Information Services group
    MDS based interface.
  • Continue to improve GADUs implementation of
    Site Selection.
  • Trying to generalize Site Selection using the
    Information Services such as MDS and Gridcat.
  • Continue to deploy faster scientific
    applications for the Bioinformatics Group at
    Argonne.

15
Acknowledgements
  • Globus and VDS
  • Mike Wilde
  • Nika Nefedova
  • Jens Voeckler
  • Ian Foster
  • Rick Stevens
  • VDT Support.
  • Condor Support.
  • Systems at MCS.
  • Bioinformatics Group
  • Natalia Maltsev, PI
  • Alex Rodriguez
  • Elizabeth Glass
  • Mark D Souza
  • Mustafa Syed
  • Yi Zhang
  • Open Science Grid
  • Thanks to Ruth Pordes and OSG team for their
    wonderful support
  • TeraGrid
  • Charlie Catlett
  • Special thanks to David ONeal, Joeseph Insley,
    and Sergiu Sanielevici
Write a Comment
User Comments (0)
About PowerShow.com