On Large DataFlow Scientific Workflows: An Astrophysics Case Study Integration of Heterogeneous Data - PowerPoint PPT Presentation

About This Presentation

Title:

On Large DataFlow Scientific Workflows: An Astrophysics Case Study Integration of Heterogeneous Data

Description:

The two primary subjects under investigation are interacting binary stars ... The 140 files of a time step are merged into one (1) netcdf file (takes about 10 ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 21

Provided by: com81

Learn more at: https://sdm.lbl.gov

Category:

more less

Transcript and Presenter's Notes

Title: On Large DataFlow Scientific Workflows: An Astrophysics Case Study Integration of Heterogeneous Data

1
On Large Data-Flow Scientific Workflows An
Astrophysics Case Study Integration of
Heterogeneous Datasets using Scientific
Workflow Engineering

Presenter
Mladen A. Vouk

2
Team(Scientific Process Automation - SPA)

Sangeeta Bhagwanani (MS student - GUI interfaces)
John Blondin (NCSU Faculty,TSI PI)
Zhengang Cheng (PhD student services, VV)
Dan Colonnese (MS student, graduated, workflow
grid and reliability issues)
Ruben Lobo (PhD student, packaging)
Pierre Moualem (MS student, fault-tolerance)

Jason Kekas (PhD student, Technical Support)
Phoemphun Oothongsap (NCSU, Postdoc,
high-throughput flows)
Elliot Peele (NCSU, Technical Support)
Mladen A. Vouk (NCSU faculty, SPA PI)
Brent Marinello (NCSU, workflows extensions,
Others

3
NC State researchers are simulating the death of
a massive star leading to a supernova explosion.
Of particular interest is the dynamics of the
shock wave generated by the initial implosion of
the star which ultimately destroys the star as a
highly energetic supernova.
4
Key Current TaskEmulating live workflows
5
Key Issue

Very important to distinguish between a
custom-made workflow solution and a more
cannonical set of operations, methods, and
solutions that can be composed into a scientific
workflow.
Complexity, skill level needed to implement,
usability, maintainability, standardization
e.g., sort, uniq, grep, ftp, ssh on unix boxes
vs.
SAS (that can do sorting), home-made sort, SABUL,
bbcp (free, but not standard), etc.

6
Topic Computational Astrophysics

Dr. Blondin is carrying out research in the field
of Circumstellar Gas-Dynamics. The numerical
hydrodynamical code VH-1 is used on
supercomputers, to study a vast array of objects
observed by astronomers both from ground-based
observatories and from orbiting satellites.
The two primary subjects under investigation are
interacting binary stars - including normal stars
like the Algol binary, and compact object systems
like the high mass X-ray binary SMC X-1 - and
supernova remnants - from very young, like SNR
1987a, to older remnants like the Cygnus Loop.
Other astrophysical processes of current interest
include radiatively driven winds from hot stars,
the interaction of stellar winds with the
interstellar medium, the stability of radiative
shockwaves, the propagation of jets from young
stellar objects, and the formation of globular
clusters.

7
Logistic Network L-Bone
Aggregate to 500 files (lt 10GB each)
Local Mass Storage 14TB)
Input Data
Aggregate to one file (1 TB each)
Data Depot
HPSS archive
Local 44 Proc. Data Cluster - data sits on local
nodes for weeks
Highly Parallel Compute
Output 500x500 files
Viz Software
Viz Wall
Viz Client
8
Workflow - Abstraction
Model Merge Backup Move Split Viz
Parallel Computation
Mass Storage
Fiber C. or Local NFS
Data Mover Channel (e.g. LORS, BCC, SABUL, FC
over SONET
Head Node Services
Head Node Services
RecvData
SendData
To VizWall
Split Viz
Merge Backup
Model
Parallel Visualization
Web or Client GUI
Web Services
Construct Orchestrate Monitor/Steer Change Stop/St
art
Control
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
Current and Future Bottlenecks
Computing Resources and Computational Speed
(1000 Cray X1
processors, compute times of 30 hrs, wait
time) Storage and Disks (14 TB, reliable and
sustainable transfer speeds 300 MB/s ,
Automation Reliable and Sustainable Network
Transfer Rates (300 MB/s)
13
Bottlenecks (B-specific)

Supercomputer, Storage, HPSS, Ensight Memory
Average per job wait time is 24-48 hrs (could be
longer if more processors are requested or more
time slices are calculated).
One run a 6 hrs (run time) on Cray X1 currently
uses 140 processors, and produces 10 time steps.
Each time step has 140 Fortran-binary files (28
GB total). Hence, currently, this is 280 MB per
6hr run. Takes about 300 to 500 slices for full
visualization (30 to 50 runs , and about 280x(300
to 500) 10 to 14 TB of space).
The 140 files of a time step are merged into one
(1) netcdf file (takes about 10 min)
BBCP the file to NCSU at about 30 MB/s, or about
15 min per time slice (this can be done in
parallel with next time-slice computation). In
the future network transfer speeds and disk
access speeds may be an issues.

14
B-specific Top-Level W/F Operations

Operators Create W/F (reserve resources), Run
Model, Backup Output, PostProcess Output (e.g.,
Merge, Split), MoveData, AnalyzeData (Viz,
other?), Monitor Progress (state, audit,
backtrack, errors, provenance), Modify Parameters
States Modeling, Backup, Postprocessing (A, ..
Z), MovingData, Analyzing Remotely
Creators CreateWF, Model?, Expand
Modifiers Merge, Split, Move, Backup, Start,
Stop, ModifyParameters
Behaviors Monitor, Audit, Visualize,
Error/Exception Handling, Data Provenance,

15
Goal Ubiquitous Canonical Operations for
Scientific W/F Support

Fast data transfer from A to B (e.g., LORS,
SABUL, GridFTP, BBCP?, other )
Database access
Stream merging and splitting
Flow monitoring
Tracking, Auditing, provenance
Verification and Validation
Communication service (web services, grid
services, xmlrpc, etc.)
Other

16
Issues (1)

Communication Coupling (loose, tight, v. tight,
code-level) and Granularity (fine, medium?,
coarse)
Communication Methods (e.g., ssh tunnels, xmprpc,
snmp, web/grid services,etc.) e.g., apparently
poor support for Cray
Storage issues (e.g., p-netcdf support,
bandwidth)
Direct and Indirect Data Flows (functionality,
throughput, delays, other QoS parameters)
End-to-end performance
Level of abstraction
Workflow description language(s) and exchange
issues interoperability
Standard scientific computing W/F functions

17
Issues (2)

Problem is currently similar to old-time
punched-card job submissions (long turn-around
time, can be expensive due to front end
computational resource I/O bottleneck) - need up
front verification and validation things will
change
Back-end bottleneck due to hierarchical storage
issues (e.g., retrieval from HPSS)
Long term workflow state preservation - needed
Recovery (transfers, other failures) more
needed
Tracking data and files - needed
Who maintains equipment, storage, data, scripts,
workflow elements? Elegant solutions my not be
good solutions from the perspective of autonomy.
EXTREMELY IMPORTANT!!! We are trying to get out
of the business of totally custom-made solutions.

18
Workflow - Abstraction
Model Merge Backup Move Split Viz
Goal 2 -3 Gbps TRates End-To-End
Parallel Computation
Mass Storage
Fiber C. or Local NFS
Data Mover Channel (e.g. LORS, SABUL, FC over
SONET
Head Node Services
Head Node Services
RecvData
SendData
To VizWall
Goal 1TB per Night
Split Viz
Merge Backup
Model
Parallel Visualization
Web Services
Web or Client GUI
Construct Orchestrate Monitor/Steer Change Stop/St
art
Control
19
Communications

Web/Java-based GUI
Web Services for Orchestration - overall and
less than tightly coupled sub-workflows
LSF and MPI for parallel computation
Scripts (in this example csh/sh based, could be
Perl, Python, etc.) on local machines
interpreted language
High-level programming language for simulations,
complex data movement algorithms, and similar
compiled language

20
B-specific Top-Level W/F Operations

Operators Create W/F (reserve resources), Run
Model, Backup Output, PostProcess Output (e.g.,
Merge, Split), MoveData, AnalyzeData (Viz,
other?), Monitor Progress (state, audit,
backtrack, errors, provenance), Modify Parameters
States Modeling, Backup, Postprocessing(A, ..
Z), MovingData, Analyzing Remotely
Constructor CreateWF, Model?, Expand
Modifiers Merge, Split, Move, Backup, Start,
Stop, ModifyParameters
Behaviors Monitor, Audit, Visualize,

21
(No Transcript)

Write a Comment

User Comments (0)