Title: Beyond Workflows DOE Cloud Computing Paradigm and the SDM Role and Future
1Beyond Workflows - DOE Cloud Computing Paradigm
and the SDMRole and Future
- Mladen A. Vouk, Nagiza Smatova, Paul Breimyer,
Pierre Moualem, Mei Nagappan, and the whole SPA
team (list available separately) - Scientific Data Management Center Scientific
Process Automation Group - NC State University, Raleigh, NC 27695
2Overview
- Scientific Workflow technology A success story
from the past 7 years in the SDM center (a
technology used in production or otherwise by
application people) Developed components
Workflows, Provenance, Dashboard, other - DOE SDM Cloud -Vision for the future of the SDM
centre Integration of components - Intelligent
Analytics and Social Networks, Component-based
cloud, Integrated Services (service oriented
architecture) - Sustainable science - Long term approach for the
survival of SDM center technology (Beyond SciDAC
and longer) Integration of Research,
Engineering, Transfer-of-Technology,
Partnerships, Results (ROI, TOC)
3Scientific Process Automation
- A key differentiating element of a successful
information technology (IT) is its ability to
become a true, valuable, and economical
contributor to cyberinfrastructure. - An IT-assisted workflow represents a series of
structured activities and computations that arise
in information assisted problem solving. - Scientific process automation principles, as well
as production level pilots, is SDMs Key
Contribution over last 7 years Smokey Mountains
retreat. - From NC State numerous publications, 3 graduated
PhD and 4 MS with thesis students, several in
progress, several generations of software.
4Environment
Analytics
Analytics
Analytics
Computations
Computations
Control Panels (Dashboard) Display
Networking Local/Remote Cloud Services
Orchestration (Kepler)
Orchestration (Kepler)
Data, DataBasesProvenanceStorage
Data, DataBasesProvenanceStorage
5Workflow Framework
Control Plane (light data flows)
Provenance, Tracking Meta-Data (DBs and Portals)
Kepler
Execution Plane (Heavy Lifting
Computations and flows)
Synchronous or Asynchronous
6Actor/Process in a Broader Sense
Out
In
Network/Cloud
Bsub lt code_run ------------ where code_run is a
script -------------- code_run ! /bin/csh
source /usr/local/lsf/conf/cshrc.lsf BSUB -W 5
BSUB -n 100 mpiexec ./code BSUB -o
/share/vouk/WFLOW/code.out.J BSUB -e
/share/vouk/WFLOW/code.err.J BSUB -J
codevouk -------------------------
6
7Modular Framework
Trust
Storage
Supercomputers Analytics Nodes
Kepler
Data Store
Access
Rec API
Disp API
Dash
Management API
Orchestration
Meta-Data about Processes, Data, Workflows, Syst
em, Apps Environment
8Read More
- Singh M.P. and M.A. Vouk, "Network Computing," in
John G. Webster (editor), Encyclopedia of
Electrical and Electronics Engineering, John
Wiley Sons, New York, Vol. 14, pp. 114-132,
1999 - S Klasky, M Beck, V Bhat, E Feibush, B Ludäscher,
M Parashar, A Shoshani, D Silver and M Vouk,
"Data management on the fusion computational
pipeline," SciDAC 2005, Journal of Physics
Conference Series 16 (2005), 510-520,
doi10.1088/1742-6596/16/1/070 - Ilkay Altintas, Oscar Barney, Zhengang Cheng,
Terence Critchlow, Bertram Ludaescher, Steve
Parker, Arie Shoshani and Mladen Vouk,
"Accelerating the scientific exploration process
with scientific workflows," sciDAC 2006, Journal
of Physics Conference Series 46 (2006), 468-478,
doi10.1088/1742-6596/46/1/065 - M. A. Vouk, I. Altintas R. Barreto, J. Blondin,
Z.Cheng, T. Critchlow, A. Khan, S. Klasky, J.
Ligon, B. Ludaescher, P. A. Mouallem, S. Parker,
N. Podhorszki, A. Shoshani, C. Silva, "
Automation of Network-Based Scientific
Workflows," Proc. of the IFIP WoCo 9 on
Grid-based Problem Solving Environemnts
Implications for Development and Deployment of
Numerical Software, IFIP WG 2.5 on Numerical
Software, Prescott, AZ, 2006, printed in IFIP,
Vol 239, "Grid-Based Problem Solving
Environments, eds. Gaffney PW and Pool JCT
(Boston Springer), pp. 35-61, 2007 - Klasky, S. Barreto, R. Kahn, A. Parashar, M.
Podhorszki, N. Parker, S. Silver, D. Vouk,
M.A. "Collaborative visualization spaces for
petascale simulations," Proceedings of the CTS
2008 - International Symposium on Collaborative
Technologies and Systems, pp 203-211, Digital
Object Identifier 10.1109/CTS.2008.4543933,10-23
May 2008 - More http//sdm.ncsu.edu
9DOE Cloud
- Cloud computing builds on decades of research
in virtualization, distributed computing, utility
computing, grids, and more recently networking,
web and software services. - It implies a seamless service oriented and
component-based architecture - delivery of an
integrated and orchestrated suite of on-demand
functions to an end-user through composition of
both loosely and tightly coupled functions, or
services - often network-based, reduced
information technology overhead for the end-user,
service orchestration, virtualization of
resources, great flexibility, reduced total cost
of ownership, different flavors. - Intelligent Analytics and Knowledge-Creating
Social Networks, Component-based Clouds,
Seamless/Integrated Services - Necessary in the context of Peta- and Exa-
sciences, data, etc.
10Analytics Cloud"
Knowledge creation Integration, Social
Networking, Provenance, Tracking Meta-Data (DBs
and Portals)
Workflow control plane
Concept-driven Analytics
W/F Engine
W/F Generation Wizard
Synchronous Asynchronous Services
Run-time Manager and Scheduler
Execution Plane - Heavy duty
in-cloud Computations, Flows Services
Analytics Enabled Resources
Supercomputers
Clusters
Supercomputers
Active Storage
Other cloud devices
11Components
- Reusability (elements can be re-used in other
workflows) - Substitutability (alternative implementations are
easy to insert, very precisely specified
interfaces are available, run-time component
replacement mechanisms exist, there is ability to
verify and validate substitutions, etc),
extensibility and scalability (ability to readily
extend system component pool and to scale it,
increase capabilities of individual components,
have an extensible and scalable architecture that
can automatically discover new functionalities
and resources, etc), - Customizability (ability to customize generic
features to the needs of a particular scientific
domain and problem), - Composability (easy construction of more complex
functional solutions using basic components,
reasoning about such compositions, etc.). There
are other characteristics that also are very
important. - Reliability and availability of the components
and services, - Cost - the cost of the services, total cost of
ownership, economy of scale - Security and privacyand so on.
12Example Meta-Data Framework
Storage
Supercomputers Analytics
Kepler?
Other. ..
Dash
Custom Web
Orchestration
13Fault-Tolerance Clouds of Clouds
Master DB (replicated)
14User Categories
- Developers (10)
- Service Authors (100 to 1,000)
- Service Integrators (100 10,000)
- End-users (1000 - ?)
15Read More
- Sam Averitt, Michael Bugaev, Aaron Peeler, Henry
Shaffer, Eric Sills, Sarah Stein, Josh Thompson,
Mladen Vouk Virtual Computing Laboratory (VCL),
In the proceedings of the International
Conference on Virtual Computing Initiative, May
7-8, 2007, IBM Corp., Research Triangle Park, NC,
pp. 1-16. - Mladen Vouk, Sam Averitt, Michael Bugaev, Andy
Kurth, Aaron Peeler, Andy Rindos, Henry Shaffer,
Eric Sills, Sarah Stein, Josh Thompson ,
Powered by VCL - Using Virtual Computing
Laboratory (VCL) Technology to Power Cloud
Computing, Published in the Prelim. Proceedings
of the 2nd International Conference on Virtual
Computing Initiative, 15-16 May 2008, RTP, NC,
pp. 1-10, final version to be available through
the ACM Digital Library - Mladen A. Vouk, Cloud Computing Issues,
Research and Implementations, ITI08, to appear
in IEEE Digital Library - Google for cloud computing
- Other ..
16Sustainable Science
- A Long term approach for the survival of SDM
center technology (Beyond SciDAC and longer) - Research
- Engineering
- Transfer-of-Technology,
- Partnerships with scientists
- Operational open-source tools
- Visible results (agreed upon ROI, and an
accounting of TOC)