Portable MPI and Related Parallel Development Tools - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Portable MPI and Related Parallel Development Tools

Description:

Portable MPI and Related Parallel Development Tools Rusty Lusk Mathematics and Computer Science Division Argonne National Laboratory (The rest of our group: Bill ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 33
Provided by: ewi81
Category:

less

Transcript and Presenter's Notes

Title: Portable MPI and Related Parallel Development Tools


1
Portable MPI and Related Parallel Development
Tools
  • Rusty Lusk
  • Mathematics and Computer Science Division
  • Argonne National Laboratory
  • (The rest of our group Bill Gropp, Rob Ross,
    David Ashton, Brian Toonen, Anthony Chan)

2
Outline
  • MPI
  • What is it?
  • Where did it come from?
  • One implementation
  • Why has it succeeded?
  • Case study an MPI application
  • Portability
  • Libraries
  • Tools
  • Future developments in parallel programming
  • MPI development
  • Languages
  • Speculative approaches

3
What is MPI?
  • A message-passing library specification
  • extended message-passing model
  • not a language or compiler specification
  • not a specific implementation or product
  • For parallel computers, clusters, and
    heterogeneous networks
  • Full-featured
  • Designed to provide access to advanced parallel
    hardware for
  • end users
  • library writers
  • tool developers

4
Where Did MPI Come From?
  • Early vendor systems (NX, EUI, CMMD) were not
    portable.
  • Early portable systems (PVM, p4, TCGMSG,
    Chameleon) were mainly research efforts.
  • Did not address the full spectrum of
    message-passing issues
  • Lacked vendor support
  • Were not implemented at the most efficient level
  • The MPI Forum organized in 1992 with broad
    participation by vendors, library writers, and
    end users.
  • MPI Standard (1.0) released June, 1994 many
    implementation efforts.
  • MPI-2 Standard (1.2 and 2.0) released July, 1997.

5
Informal Status Assessment
  • All MPP vendors now have MPI-1. (1.0, 1.1, or
    1.2)
  • Public implementations (MPICH, LAM, CHIMP)
    support heterogeneous workstation networks.
  • MPI-2 implementations are being undertaken now by
    all vendors.
  • MPI-2 is harder to implement than MPI-1 was.
  • MPI-2 implementations will appear piecemeal, with
    I/O first.

6
MPI Sources
  • The Standard itself
  • at http//www.mpi-forum.org
  • All MPI official releases, in both postscript and
    HTML
  • Books on MPI and MPI-2
  • Using MPI Portable Parallel Programming with
    the Message-Passing Interface (2nd edition), by
    Gropp, Lusk, and Skjellum, MIT Press, 1999.
  • Using MPI-2 Extending the Message-Passing
    Interface, by Gropp, Lusk, and Thakur, MIT Press,
    1999
  • MPI The Complete Reference, volumes 1 and 2,
    MIT Press, 1999.
  • Other information on Web
  • at http//www.mcs.anl.gov/mpi
  • pointers to lots of stuff, including other talks
    and tutorials, a FAQ, other MPI pages

7
The MPI Standard Documentation
8
Tutorial Material on MPI, MPI-2
9
The MPICH Implementation of MPI
  • As a research project exploring tradeoffs
    between performance and portability conducting
    research in implementation issues.
  • As a software project providing a free MPI
    implementation on most machines enabling
    vendors and others to build complete MPI
    implementation on their own communication
    services.
  • MPICH 1.2.2 just released, with complete MPI-1,
    parts of MPI-2 (I/O and C), port to
    Windows2000.
  • Available at http//www.mcs.anl.gov/mpi/mpich

10
Lessons From MPIWhy Has It Succeeded?
  • The MPI Process
  • Portability
  • Performance
  • Simplicity
  • Modularity
  • Composability
  • Completeness

11
The MPI Process
  • Started with open invitation to all those
    interested in standardizing message-passing model
  • Participation from
  • Parallel computing vendors
  • Computer Scientists
  • Application scientists
  • Open process
  • All invited, but hard work required
  • All deliberations available at all times
  • Reference implementation developed during design
    process
  • Helped debug design
  • Immediately available when design completed

12
Portability
  • Most important property of a programming model
    for high-performance computing
  • Application lifetimes 5 to 20 years
  • Hardware lifetimes much shorter
  • (not to mention corporate lifetimes!)
  • Need not lead to lowest common denominator
    approach
  • Example MPI semantics allow direct copy of data
    from user space send buffer to user space receive
    buffer
  • Might be implemented by hardware data mover
  • Might be implemented by netwrk hardware
  • Might be implemented by socket
  • The hard part portability with performance

13
Performance
  • MPI can help manage the crucial memory hierarchy
  • Local vs. remote memory is explicit
  • A received message is likely to be in cache
  • MPI provides collective operations for both
    communication and computation that hide
    complexity or non-portability of scalable
    algorithms from the programmer.
  • Can interoperate with optimising compilers
  • Promotes use of high-performance libraries
  • Doesnt provide performance portability
  • This problem is still too hard, even for the best
    compilers
  • E.g. BLAS

14
Simplicity
  • Simplicity is in the eye of the beholder
  • MPI-1 has about 125 functions
  • Too big!
  • Too small!
  • MPI-2 has about 150 more
  • Even this is not very many by comparison
  • Few applications use all of MPI
  • But few MPI functions go unused
  • One can write serious MPI programs with as few as
    six functions
  • Other programs with a different six
  • Economy of concepts
  • Communicators encapsulate both process groups and
    contexts
  • Datatypes both enable heterogeneous communication
    and allow non-contiguous messages buffers
  • Symmetry helps make MPI easy to understand.

15
Modularity
  • Modern applications often combine multiple
    parallel components.
  • MPI supports component-oriented software through
    its use of communicators
  • Support of libraries means applications may
    contain no MPI calls at all.

16
Composability
  • MPI works with other tools
  • Compilers
  • Since it is a library
  • Debuggers
  • Debugging interface used by MPICH, TotalView,
    others
  • Profiling tools
  • The MPI profiling interface is part of standard
  • MPI-2 provides precise interaction with
    multi-threaded programs
  • MPI_THREAD_SINGLE
  • MPI_THREAD_FUNNELLED (OpenMP loops)
  • MPI_THREAD_SERIAL (Open MP single)
  • MPI_THREAD_MULTIPLE
  • The interface provides for both portability and
    performance

17
Completeness
  • MPI provides a complete programming model.
  • Any parallel algorithm can be expressed.
  • Collective operations operate on subsets of
    processes.
  • Easy things are not always easy, but
  • Hard things are possible.

18
The Center for the Study of Astrophysical
Thermonuclear Flashes
  • To simulate matter accumulation on the surface of
    compact stars, nuclear ignition of the accreted
    (and possibly underlying stellar) material, and
    the subsequent evolution of the stars interior,
    surface, and exterior
  • X-ray bursts (on neutron star surfaces)
  • Novae (on white dwarf surfaces)
  • Type Ia supernovae (in white dwarf interiors)

19
FLASH Scientific Results
  • Wide range of compressibility
  • Wide range of length and time scales
  • Many interacting physical processes
  • Only indirect validation possible
  • Rapidly evolving computing environment
  • Many people in collaboration

Flame-vortex interactions
Compressible turbulence
Laser-driven shock instabilities
Nova outbursts on white dwarfs
Richtmyer-Meshkov instability
Cellular detonations
Helium burning on neutron stars
Rayleigh-Taylor instability
Gordon Bell prize at SC2000
20
The FLASH Code MPI in Action
  • Solves complex systems of equations for
    hydrodynamics and nuclear burning
  • Written primarily in Fortran-90
  • Uses Paramesh library for adaptive mesh
    refinement Paramesh is implemented with MPI
  • I/O (for checkpointing, visualization, other
    purposes) done with HDF-5 library, which is
    implemented with MPI-IO
  • Debugged with TotalView, using standard debugger
    interface
  • Tuned with Jumpshot and Vampir, using MPI
    profiling interface
  • Gordon Bell prize winner in 2000
  • Portable to all parallel computing environments
    (since MPI)

21
FLASH Scaling Runs
22
X-Ray Burst on the Surface of a Neutron Star
23
Showing the AMR Grid
24
MPI Performance Visualization with Jumpshot
  • For detailed analysis of parallel program
    behavior, timestamped events are collected into a
    log file during the run.
  • A separate display program (Jumpshot) aids the
    user in conducting a post mortem analysis of
    program behavior.
  • Log files can become large, making it impossible
    to inspect the entire program at once.
  • The FLASH Project motivated an indexed file
    format (SLOG) that uses a preview to select a
    time of interest and quickly display an interval.

25
Removing Barriers From Paramesh
26
Using Jumpshot
  • MPI functions and messages automatically logged
  • User-defined states
  • Nested states
  • Zooming and scrolling
  • Spotting opportunities for optimization

27
Future Developments in Parallel Programming MPI
and Beyond
  • MPI not perfect
  • Any widely-used replacement will have to share
    the properties that made MPI a success.
  • Some directions (in decreasing order of
    speculativeness)
  • Improvements to MPI implementations
  • Improvements to the MPI definition
  • Continued evolution of libraries
  • Research and development for parallel languages
  • Further out radically different programming
    models for radically different architectures.

28
MPI Implementations
  • Implementations beget implementation research
  • Datatypes, I/O, memory motion elimination
  • On most platforms, better collective
  • Most MPI implementations build collective on
    point-to-point, too high-level
  • Need stream-oriented methods that understand MPI
    datatypes
  • Optimize for new hardware
  • In progress for VIA, Infiniband
  • Need more emphasis on collective operations
  • Off-loading message processing onto NIC
  • Scaling beyond 10,000 processes
  • Parallel I/O
  • Clusters
  • Remote
  • Fault-tolerance
  • Intercommunicators provide an approach
  • Working with multithreading approaches

29
Improvements to MPI Itself
  • Better Remote-memory-access interface
  • Simpler for some simple operations
  • Atomic fetch-and-increment
  • Some minor fixup already in progress
  • MPI 2.1
  • Building on experience with MPI-2
  • Interactions with compilers

30
Libraries and Languages
  • General Libraries
  • Global Arrays
  • PETSc
  • ScaLAPACK
  • Application-specific libraries
  • Most built on MPI, at least for portable version.

31
More Speculative Approaches
  • HTMT for Petaflops
  • Blue Gene
  • PIMS
  • MTA
  • All will need a programming model that explicitly
    manages a deep memory hierarchy.
  • Exotic small benefit dead

32
Summary
  • MPI is a successful example of a community
    defining, implementing, and adopting a standard
    programming methodology.
  • It happened because of the open MPI process, the
    MPI design itself, and early implementation.
  • MPI research continues to refine implementations
    on modern platforms, and this is the main road
    ahead.
  • Tools that work with MPI programs are thus a good
    investment.
  • MPI provides portability and performance for
    complex applications on a variety of
    architectures.
Write a Comment
User Comments (0)
About PowerShow.com