Ewa Deelman, deelmanisi'eduwww'isi'edudeelmanpegasus'isi'edu - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Ewa Deelman, deelmanisi'eduwww'isi'edudeelmanpegasus'isi'edu

Description:

DAGMan: Miron Livny and the Condor team ... CyberShake simulations, Neuroscience, Artificial Intelligence, Genomics (GADU), others ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 20
Provided by: ewa83
Category:

less

Transcript and Presenter's Notes

Title: Ewa Deelman, deelmanisi'eduwww'isi'edudeelmanpegasus'isi'edu


1
Workflow Optimization and Sharing
  • Ewa Deelman
  • USC Information Sciences Institute
  • presented by
  • Rizos Sakellariou, University of Manchester

2
Acknowledgments
  • Pegasus Gaurang Mehta, Mei-Hui Su, Karan Vahi
    (developers), Nandita Mandal, Arun Ramakrishnan,
    Tsai-Ming Tseng (students)
  • DAGMan Miron Livny and the Condor team
  • Other Collaborators Yolanda Gil, Jihie Kim,
    Varun Ratnakar (Wings System), Henan Zhao, Rizos
    Sakellariou
  • LIGO Kent Blackburn, Duncan Brown, Stephen
    Fairhurst, David Meyers
  • Montage Bruce Berriman, John Good, Dan Katz, and
    Joe Jacobs
  • SCEC Tom Jordan, Robert Graves, Phil Maechling,
    David Okaya, Li Zhao

3
Scientific (Computational) Workflows
  • Enable the assembly of community codes into
    large-scale analysis
  • Montage example Generating science-grade mosaics
    of the sky (Bruce Berriman, Caltech)

4
Pegasus and Condor DAGMan
  • Automatically map high-level resource-independent
    workflow descriptions onto distributed resources
    such as the Open Science Grid and the TeraGrid
  • Improve performance of applications through
  • Data reuse to avoid duplicate computations and
    provide reliability
  • Workflow restructuring to improve resource
    allocation
  • Automated task and data transfer scheduling to
    improve overall runtime
  • Provide reliability through dynamic workflow
    remapping and execution
  • Pegasus and DAGMan applications include LIGOs
    Binary Inspiral Analysis, NVOs Montage, SCECs
    CyberShake simulations, Neuroscience, Artificial
    Intelligence, Genomics (GADU), others
  • Workflows with thousands of tasks and TeraBytes
    of data
  • Use Condor and Globus to provide the middleware
    for distributed environments

5
Pegasus Workflow Mapping
4
1
Original workflow 15 compute nodes devoid of
resource assignment
8
5
9
10
12
13
15
6
Typical Pegasus and DAGMan Deployment
7
Scalability
SCEC workflows run each week using Pegasus and
DAGMan on the TeraGrid and USC resources.
Cumulatively, the workflows consisted of over
half a million tasks and used over 2.5 CPU Years.
Managing Large-Scale Workflow Execution from
Resource Provisioning to Provenance tracking The
CyberShake Example, Ewa Deelman, Scott Callaghan,
Edward Field, Hunter Francoeur, Robert Graves,
Nitin Gupta, Vipin Gupta, Thomas H. Jordan, Carl
Kesselman, Philip Maechling, John Mehringer,
Gaurang Mehta, David Okaya, Karan Vahi, Li Zhao,
e-Science 2006, Amsterdam, December 4-6, 2006,
best paper award
8
Montage application7,000 compute jobs in
instance10,000 nodes in the executable
workflowsame number of clusters as
processorsspeedup of 15 on 32 processors
Performance optimization through workflow
restructuring
Small 1,200 Montage Workflow
Pegasus a Framework for Mapping Complex
Scientific Workflows onto Distributed Systems,
Ewa Deelman, Gurmeet Singh, Mei-Hui Su, James
Blythe, Yolanda Gil, Carl Kesselman, Gaurang
Mehta, Karan Vahi, G. Bruce Berriman, John Good,
Anastasia Laity, Joseph C. Jacob, Daniel S. Katz,
Scientific Programming Journal, Volume 13, Number
3, 2005
9
Data Reuse
  • Sometimes it is cheaper to access the data than
    to regenerate it
  • Keeping track of data as it is generated supports
    workflow-level checkpointing

Mapping Complex Workflows Onto Grid Environments,
E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G.
Mehta, K. Vahi, K. Backburn, A. Lazzarini, A.
Arbee, R. Cavanaugh, S. Koranda, Journal of Grid
Computing, Vol.1, No. 1, 2003., pp25-39.
10
Data Reuse
  • Share the full version of the workflow?
  • or
  • Share a shorter version with data files?

Mapping Complex Workflows Onto Grid Environments,
E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G.
Mehta, K. Vahi, K. Backburn, A. Lazzarini, A.
Arbee, R. Cavanaugh, S. Koranda, Journal of Grid
Computing, Vol.1, No. 1, 2003., pp25-39.
11
Efficient data handling
  • Workflow input data is staged dynamically, new
    data products are generated during execution
  • For large workflows 10,000 input files
  • (Similar order of intermediate/output files)
  • If not enough space failures occur
  • Solution
  • Determine which data are no longer needed and
    when
  • Add nodes to the workflow to cleanup data along
    the way
  • Take into account disk space onto resources
  • Benefits simulations show up to 57 space
    improvements for LIGO-like workflows

Scheduling Data-Intensive Workflows onto
Storage-Constrained Distributed Resources, A.
Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R.
Sakellariou, K. Vahi, K. Blackburn, D. Meyers,
and M. Samidi, accepted to CCGrid 2007
12
44 Improvement in footprint for Montage workflow
running on OSG
13
Efficient data handling
  • Sharing workflow with nodes that cleanup data is
    resource-independent.
  • Taking into account space constraints onto
    resources is resource-dependent. Sharing the
    workflow would also require about the specific
    resource constraints.

Scheduling Data-Intensive Workflows onto
Storage-Constrained Distributed Resources, A.
Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R.
Sakellariou, K. Vahi, K. Blackburn, D. Meyers,
and M. Samidi, accepted to CCGrid 2007
14
LIGO Inspiral Analysis Workflow Small Workflow
164 nodes Full Scale analysis 185,000 nodes and
466,000 edges 10 TB of input data and 1 TB of
output data
LIGO workflow running on OSG
Optimizing Workflow Data Footprint G. Singh, K.
Vahi, A. Ramakrishnan, G. Mehta, E. Deelman, H.
Zhao, R. Sakellariou, K. Blackburn, D. Brown, S.
Fairhurst, D. Meyers, G. B. Berriman , J. Good,
D. S. Katz, submitted.
15
LIGO Workflows
26 Improvement In disk space Usage 50 slower
runtime
16
LIGO Workflows
56 improvement in space usage 3 times slower
in runtime
Looking into new DAGMan capabilities for workflow
node prioritization Need automated techniques to
determine priorities
17
Aggressive Optimizationsfor Workflow Footprint
  • They will affect the performance of the
    executable workflow

Optimizing Workflow Data Footprint G. Singh, K.
Vahi, A. Ramakrishnan, G. Mehta, E. Deelman, H.
Zhao, R. Sakellariou, K. Blackburn, D. Brown, S.
Fairhurst, D. Meyers, G. B. Berriman , J. Good,
D. S. Katz, submitted.
18
  • What information related to optimizations do we
    need to keep track for efficient workflow sharing?

19
What do Pegasus DAGMan do for an application?
  • Provide a level of abstraction above gridftp,
    condor-submit, globus-job-run, etc commands
  • Provide automated mapping and execution of
    workflow applications onto distributed resources
  • Manage data files, can store and catalog
    intermediate and final data products
  • Improve successful application execution
  • Improve application performance
  • Provide provenance tracking capabilities
  • Provides a Grid-aware workflow management tool

20
Relevant Links
  • Pegasus pegasus.isi.edu
  • Currently released as part of VDS and VDT
  • Standalone pegasus distribution v 2.0 coming out
    in May 2007, will remain part of VDT
  • DAGMan www.cs.wisc.edu/condor/dagman
  • NSF Workshop on Challenges of Scientific
    Workflows www.isi.edu/nsf-workflows06, E.
    Deelman and Y. Gil (chairs)
  • Workflows for e-Science, Taylor, I.J. Deelman,
    E. Gannon, D.B. Shields, M. (Eds.), Dec. 2006
  • Open Science Grid www.opensciencegrid.org
  • LIGO www.ligo.caltech.edu/
  • SCEC www.scec.org
  • Montage montage.ipac.caltech.edu/
  • Condor www.cs.wisc.edu/condor/
  • Globus www.globus.org
  • TeraGrid www.teragrid.org

Ewa Deelman, deelman_at_isi.edu www.isi.edu/deelma
n pegasus.isi.edu
Write a Comment
User Comments (0)
About PowerShow.com