Title: Open MPI - A High Performance Fault Tolerant MPI Library
1Open MPI - A High Performance Fault Tolerant MPI
Library
- Richard L. Graham
- Advanced Computing Laboratory, Group Leader
(acting)
2Overview
- Open MPI Collaboration
- MPI
- Run-time
- Future directions
3Collaborators
- Los Alamos National Laboratory (LA-MPI)
- Sandia National Laboratory
- Indiana University (LAM/MPI)
- The University of Tennessee (FT-MPI)
- High Performance Computing Center, Stuttgart
(PACX-MPI) - University of Houston
- Cisco Systems
- Mellanox
- Voltaire
- Sun
- Myricom
- IBM
- QLogic
- URL www.open-mpi.org
4A Convergence of Ideas
FT-MPI (U of TN)
Open MPI
LA-MPI (LANL)
LAM/MPI (IU)
PACX-MPI (HLRS)
OpenRTE
Fault Detection (LANL, Industry)
FDDP (Semi. Mfg. Industry)
Resilient Computing Systems
Robustness (CSU)
Grid (many)
Autonomous Computing (many)
5Components
- Formalized interfaces
- Specifies black box implementation
- Different implementations available at run-time
- Can compose different systems on the fly
Caller
Interface 1
Interface 2
Interface 3
6Performance Impact
7MPI
8Two Sided Communications
9P2P Component Frameworks
10Shared Memory - Bandwidth
11Shared Memory - Latency
12IB PerformanceLatency
Message Size Latency - Open MPI Latency - MVAPICH
0 3.09 9.6 (anomaly?)
1 3.48 3.09
32 3.60 3.30
128 4.48 4.16
2048 7.93 8.67
8192 15.72 22.86
16384 27.14 29.37
13IB PerformanceBandwidth
14GM Performance DataPing-Pong Latency (usec)
Data Size Open MPI MPICH-GM
0 Byte 8.13 8.07
8 Byte 8.32 8.22
64 Byte 8.68 8.65
256 Byte 12.52 12.11
15GM Performance DataPing-Pong Latency (usec) -
Data FT
Data Size Open MPI - OB1 Open MPI - FT LA-MPI - FT
0 Byte 5.24 8.65 9.2
8 Byte 5.50 8.67 9.26
64 Byte 6.00 9.07 9.45
256 Byte 8.52 13.01 13.54
16GM Performance DataPing-Pong Bandwidth
17MX Ping-Pong Latency (usec)
Message Size Open MPI - MTL MPICH - MX
0 3.14 2.87
8 3.22 2.89
64 3.91 3.6
256 5.76 5.25
18MX Performance DataPing-Pong Bandwidth (MB/sec)
19XT3 PerformanceLatency
Implementation 1 Byte Latency
Native Portals 5.30us
MPICH-2 7.14us
Open MPI 8.50us
20XT3 PerformanceBandwidth
21Collective Operations
22MPI Reduce - Performance
23MPI Broadcast - Performance
24MPI Reduction - II
25Open RTE
26Open RTE - Design Overview
Cluster
Seamless, transparent environment for
high-performance applications
Grid
- Inter-process communications within and across
cells - Distributed publish/subscribe registry
- Supports event-driven logic across applications,
cells - Persistent, fault tolerant
- Dynamic spawn of processes, applications both
within and across cells
Cluster
Single Computer
27Open RTE - Components
Cluster
UNIVERSE
Grid
Cluster
Single Computer
28General Purpose Registry
- Cached, distributed storage/retrieval system
- All common data types plus user-defined
- Heterogeneity between storing process and
recipient automatically resolved - Publish/subscribe
- Support event-driven coordination and
notification - Subscribe to individual data elements, groups of
elements, wildcard collections - Specify actions that trigger notifications
29Subscription Services
- Subscribe to container and/or keyval entry
- Can be entered before data arrives
- Specifies data elements to be monitored
- Container tokens and/or data keys
- Wildcards supported
- Specifies action that generates event
- Data entered, modified, deleted
- Number of matching elements equals, exceeds, is
less than specified level - Number of matching elements transitions
(increases/decreases) through specified level - Events generate message to subscriber
- Includes specified data elements
- Asynchronously delivered to specified callback
function on subscribing process
30Future Directions
31Revise MPI Standard
- Clarify standard
- Standardized the interface
- Simplify standard
- Make the standard more H/W Friendly
32Beyond Simple Performance Measures
- Performance and scalability are important, but
- What about future HPC systems
- Heterogeneity
- Multi-core
- Mix of processors
- Mix of networks
- Fault-tolerance
33Focus on Programmability
- Performance and Scalability are important, but
what about - Programmability