Title: Applying GPCE Approaches to Distributed and Parallel High-Performance Scientific Computing
1Applying GPCE Approaches to Distributed and
Parallel High-Performance Scientific Computing
- Nanbor Wangnanbor_at_txcorp.comTech-X Corporation
Workshop on GPCE for QoS Provisioning in
Distributed SystemsPortland, OR, October 23, 2006
Funded by DOE OASCR and OHEP, DoD DARPA and
Navy, and Vanderbilt University
2Overview of HPC Environments
- Introduction to the problem domains experiences
from our projects - The GRID
- OGSA/WSRF framework of services
- Software Installation Management
- HPC Component environment
- Common Component Architecture
- SIDL/Babel language interoperability tool
- DPHPC
- Motivations and approaches
- Remoting components approaches
- Deployment of DPHPC applications
- Computational QoS from ANL
3HPC Using The Grid
- The GRID
- Making computing resources readily available like
the power grid to electricity - Data storage
- CPU cycle
- Globus is the de-facto standard tool
- Example application domains
- HENP data analysis
- Protein folding
- New OGSA (WSRF-based)
- Code generation tools
- Java/C based containers
4Enhancing the S/W Reusability on the Grid
- Improve job successful rate on the GRID
- Grid Software Installation Management Framework
- Robust/flexible s/w installation
- Based on OGSA services
- Does not address the actual parallel application
implementations - Need for modern software development paradigms
5Common Component Architecture (CCA)
- A component architecture specifically desgined to
address HPC needs - Supports scientific programming languages
(FORTRAN) - Built-in multi-dimensional array and complex
supports - Does not hinder parallel communications
- Core tools
- SIDL/Babel compilers
- Ccaffeine component run-time environment
6Babel Language Interoperability Tool
- Enable mixing and matching components developed
in different langauages - Scientific Interface Definition Language
- Generating Inter-Object Representation and
specific language mappings - Also support component implementations
- CCA is defined in SIDL
- Ccaffeine can then load components dynamically
http//www.llnl.gov/casc/components/babel.html
7Running Parallel CCA Applications
MCMD
SCMD
- Support both SPMD and MPMD scenarios
- Stay out of the way of component parallelism
- Components handle parallel communication
- MPMD can be very complicated to desing
- Also very brittle, non-portalbe, hard to configure
8Motivations for Mixing Distributed Tech. and
Parallelism
- Provide another high-level abstraction for HPC
infrastructure - A new dimension for partitioning application
compositions - Motivating example scenarios
- Integrate separately-developed and established
codes FSP - Provide a different paradigm for partitioning
problems multi-physics simulations - Provide ways to better utilize high-CPU number
hardware - Combine computing resources of multiple
clusters/computing centers - Enable parallel data streaming between computing
task and post-processing task - What we need A Distributed and Parallel
High-Performance Computing (DPHPC) Environment
9An Illustration of DPHPC Application
- Still support conventional CCA component managed
parallelism - Provide additional framework mediated distributed
inter-component communication capability
10Solution Approach
- Goal Mixing distributed and parallel approaches
to provide a higher composition abstraction - Existing CCA implementations dont support both
models - Demo on integration done before but not part of
distributions - Approach
- Connect distributed parallel CCA applications
using well-accepted tools - Explore the challenges of developing Distributed
and Parallel HPC (DPHPC) applications
11Remoting CCA Components
- Connect distributed parallel computations by
composing remote-capable proxy components into
applications - Remoting component generator
- Babel RMI library
- CORBA mapping
- Other mappingsunder development
12Examine Deployment Strategies for DPHPC
Applications
- Local-CCA component centric view
- Local applications (implemented benchmarking
programs for large datasets) - Employ a distributed builder service for
registering/requesting distributed ports - Distributed component centric view
- Two-tier deployment remote components and their
implementations - (Beginning to review the latest Data Parallel
CORBA standard) - (Brainstormed on implementing a distributed CCA
runtime using CCM.) - Grid view
- Making distributed components as grid services
Research on parallel components PACO by Paris
Group in IRISA, France (http//www.irisa.fr/paris/
General/)
13Computational QoS
- CQoS
- Performance sequential and parallel efficiency
- Accuracy
- Performance monitoring
- Tau (S. Shende, Oregon State)
- Adaptive composition
- Global application context
- Dynamic and adaptive implementation switching
P. Hovland, K. Keahey, L. C. McInnes, B. Norris
(ANL), L. Freitag (Sandia), P. Raghavan (Penn
State)