High Performance Computing Course Notes 20072008 Message Passing Programming I - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

High Performance Computing Course Notes 20072008 Message Passing Programming I

Description:

Message Passing is the most widely used parallel programming model ... Six of them are indispensable, but can write a large number of useful programs already ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 58
Provided by: SAJ80
Category:

less

Transcript and Presenter's Notes

Title: High Performance Computing Course Notes 20072008 Message Passing Programming I


1
High Performance ComputingCourse Notes
2007-2008Message Passing Programming I
2
Message Passing Programming
  • Message Passing is the most widely used parallel
    programming model
  • Message passing works by creating a number of
    tasks, uniquely named, that interact by sending
    and receiving messages to and from one another
    (hence the message passing)
  • Generally, processes communicate through sending
    the data from the address space of one process to
    that of another
  • Communication of processes (via files, pipe,
    socket)
  • Communication of threads within a process (via
    global data area)
  • Programs based on message passing can be based on
    standard sequential language programs (C/C,
    Fortran), augmented with calls to library
    functions for sending and receiving messages

3
Message Passing Interface (MPI)
  • MPI is a specification, not a particular
    implementation
  • Does not specify process startup, error codes,
    amount of system buffer, etc
  • MPI is a library, not a language
  • The goals of MPI functionality, portability and
    efficiency
  • Message passing model gt MPI specification gt MPI
    implementation

4
OpenMP vs MPI
  • In a nutshell
  • MPI is used on distributed-memory systems
  • OpenMP is used for code parallelisation on
    shared-memory systems
  • Both are explicit parallelism
  • High-level control (OpenMP), lower-level control
    (MPI)

5
A little history
  • Message-passing libraries developed for a number
    of early distributed memory computers
  • By 1993 there were loads of vendor specific
    implementations
  • By 1994 MPI-1 came into being
  • By 1996 MPI-2 was finalized

6
The MPI programming model
  • MPI standards -
  • MPI-1 (1.1, 1.2), MPI-2 (2.0)
  • Forwards compatibility preserved between versions
  • Standard bindings - for C, C and Fortran. Have
    seen MPI bindings for Python, Java etc (all
    non-standard)
  • We will stick to the C binding, for the lectures
    and coursework. More info on MPI
    www.mpi-forum.org
  • Implementations - For your laptop pick up MPICH
    (free portable implementation of MPI
    (http//www-unix.mcs.anl. gov/mpi/mpich/index.htm)
  • Coursework will use MPICH

7
MPI
  • MPI is a complex system comprising of 129
    functions with numerous parameters and variants
  • Six of them are indispensable, but can write a
    large number of useful programs already
  • Other functions add flexibility (datatype),
    robustness (non-blocking send/receive),
    efficiency (ready-mode communication), modularity
    (communicators, groups) or convenience
    (collective operations, topology).
  • In the lectures, we are going to cover most
    commonly encountered functions

8
The MPI programming model
  • Computation comprises one or more processes that
    communicate via library routines and sending and
    receiving messages to other processes
  • (Generally) a fixed set of processes created at
    outset, one process per processor
  • Different from PVM

9
Intuitive Interfaces for sending and receiving
messages
  • Send(data, destination), Receive(data, source)
  • minimal interface
  • Not enough in some situations, we also need
  • Message matching add message_id at both send
    and receive interfaces
  • they become Send(data, destination, msg_id),
    receive(data, source, msg_id)
  • Message_id
  • Is expressed using an integer, termed as message
    tag
  • Allows the programmer to deal with the arrival of
    messages in an orderly fashion (queue and then
    deal with

10
How to express the data in the send/receive
interfaces
  • Early stages
  • (address, length) for the send interface
  • (address, max_length) for the receive interface
  • They are not always good
  • The data to be sent may not be in the contiguous
    memory locations
  • Storing format for data may not be the same or
    known in advance in heterogeneous platform
  • Enventually, a triple (address, count, datatype)
    is used to express the data to be sent and
    (address, max_count, datatype) for the data to be
    received
  • Reflecting the fact that a message contains much
    more structures than just a string of bits, For
    example, (vector_A, 300, MPI_REAL)
  • Programmers can construct their own datatype
  • Now, the interfaces become send(address, count,
    datatype, destination, msg_id) and
    receive(address, max_count, datatype, source,
    msg_id)

11
How to distinguish messages
  • Message tag is necessary, but not sufficient
  • So, communicator is introduced

12
Communicators
  • Messages are put into contexts
  • Contexts are allocated at run time by the system
    in response to programmer requests
  • The system can guarantee that each generated
    context is unique
  • The processes belong to groups
  • The notions of context and group are combined in
    a single object, which is called a communicator
  • A communicator identifies a group of processes
    and a communication context
  • The MPI library defines a initial communicator,
    MPI_COMM_WORLD, which contains all the processes
    running in the system
  • The messages from different process groups can
    have the same tag
  • So the send interface becomes send(address,
    count, datatype, destination, tag, comm)

13
Status of the received messages
  • The structure of the message status is added to
    the receive interface
  • Status holds the information about source, tag
    and actual message size
  • In the C language, source can be retrieved by
    accessing status.MPI_SOURCE,
  • tag can be retrieved by status.MPI_TAG and
  • actual message size can be retrieved by calling
    the function MPI_Get_count(status, datatype,
    count)
  • The receive interface becomes receive(address,
    maxcount, datatype, source, tag, communicator,
    status)

14
How to express source and destination
  • The processes in a communicator (group) are
    identified by ranks
  • If a communicator contains n processes, process
    ranks are integers from 0 to n-1
  • Source and destination processes in the
    send/receive interface are the ranks

15
Some other issues
  • In the receive interface, tag can be a wildcard,
    which means any message will be received
  • In the receive interface, source can also be a
    wildcard, which match any source

16
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

17
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

18
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

19
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Calculating the size of the data to be send
  • buf address of send buffer
  • count sizeof (datatype) bytes of data

20
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

21
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

22
MPI basics
  • First six functions (C bindings)
  • MPI_Recv (buf, count, datatype, source, tag,
    comm, status)
  • Receive a message
  • buf address of receive buffer (var param)
  • count max no. of elements in receive buffer
    (gt0)
  • datatype of receive buffer elements
  • source process id of source process, or
    MPI_ANY_SOURCE
  • tag message tag, or MPI_ANY_TAG
  • comm communicator
  • status status object

23
MPI basics
  • First six functions (C bindings)
  • MPI_Init (int argc, char argv)
  • Initiate a computation
  • argc (number of arguments) and argv (argument
    vector) are main programs arguments
  • Must be called first, and once per process
  • MPI_Finalize ( )
  • Shut down a computation
  • The last thing that happens

24
MPI basics
  • First six functions (C bindings)
  • MPI_Comm_size (MPI_Comm comm, int size)
  • Determine number of processes in comm
  • comm is communicator handle, MPI_COMM_WORLD is
    the default (including all MPI processes)
  • size holds number of processes in group
  • MPI_Comm_rank (MPI_Comm comm, int pid)
  • Determine id of current (or calling) process
  • pid holds id of current process

25
MPI basics a basic example
  • include "mpi.h" include ltstdio.hgt int
    main(int argc, char argv)     int rank,
    nprocs    MPI_Init(argc,argv)    
    MPI_Comm_size(MPI_COMM_WORLD,nprocs)    
    MPI_Comm_rank(MPI_COMM_WORLD,rank)    
    printf("Hello, world.  I am d of d\n", rank,
    nprocs)     MPI_Finalize()

mpirun np 4 myprog Hello, world. I am 1 of
4 Hello, world. I am 3 of 4 Hello, world. I am 0
of 4 Hello, world. I am 2 of 4
26
MPI basics send and recv example (1)
  • include "mpi.h"include ltstdio.hgt int
    main(int argc, char argv)    int rank,
    size, i    int buffer10    MPI_Status
    status     MPI_Init(argc, argv)   
    MPI_Comm_size(MPI_COMM_WORLD, size)   
    MPI_Comm_rank(MPI_COMM_WORLD, rank)    if
    (size lt 2)            printf("Please run with
    two processes.\n")         MPI_Finalize()     
       return 0        if (rank 0)   
            for (i0 ilt10 i)           
    bufferi i        MPI_Send(buffer, 10,
    MPI_INT, 1, 123, MPI_COMM_WORLD)   

27
MPI basics send and recv example (2)
  •     if (rank 1)            for (i0 ilt10
    i)            bufferi -1       
    MPI_Recv(buffer, 10, MPI_INT, 0, 123,
    MPI_COMM_WORLD, status)        for (i0 ilt10
    i)                    if (bufferi !
    i)                printf("Error bufferd d
    but is expected to be d\n", i, bufferi,
    i)                MPI_Finalize()

28
MPI language bindings
  • Standard (accepted) bindings for Fortran, C and
    C
  • Java bindings are work in progress
  • JavaMPI Java wrapper to native calls
  • mpiJava JNI wrappers
  • jmpi pure Java implementation of MPI library
  • MPIJ same idea
  • Java Grande Forum trying to sort it all out
  • We will use the C bindings

29
High Performance ComputingCourse Notes 2007-2008
  • Message Passing Programming II

30
Modularity
  • MPI supports modular programming via
    communicators
  • Provides information hiding by encapsulating
    local communications and having local namespaces
    for processes
  • All MPI communication operations specify a
    communicator (process group that is engaged in
    the communication)

31
Forming new communicators one approach
  • MPI_Comm world, workers
  • MPI_Group world_group, worker_group
  • int ranks1
  • MPI_Init(argc, argv)
  • worldMPI_COMM_WORLD
  • MPI_Comm_size(world, numprocs)
  • MPI_Comm_rank(world, myid)
  • servernumprocs-1
  • MPI_Comm_group(world, world_group)
  • ranks0server
  • MPI_Group_excl(world_group, 1, ranks,
    worker_group)
  • MPI_Comm_create(world, worker_group, workers)
  • MPI_Group_free(world_group)
  • MPI_Comm_free(workers)

32
Forming new communicators - functions
  • int MPI_Comm_group(MPI_Comm comm, MPI_Group
    group)
  • int MPI_Group_excl(MPI_Group group, int n, int
    ranks, MPI_Group newgroup)
  • Int MPI_Group_incl(MPI_Group group, int n, int
    ranks, MPI_Group newgroup)
  • int MPI_Comm_create(MPI_Comm comm, MPI_Group
    group, MPI_Comm newcomm)
  • int MPI_Group_free(MPI_Group group)
  • int MPI_Comm_free(MPI_Comm comm)

33
Forming new communicators another approach (1)
  • MPI_Comm_split (comm, colour, key, newcomm)
  • Creates one or more new communicators from the
    original comm
  • comm communicator (handle)
  • colour control of subset assignment (processes
    with same colour are in same new
    communicator)
  • key control of rank assignment
  • newcomm new communicator
  • Is a collective communication operation (must be
    executed by all processes in the process group
    comm)
  • Is used to (re-) allocate processes to
    communicator (groups)

34
Forming new communicators another approach (2)
  • MPI_Comm_split (comm, colour, key, newcomm)
  • MPI_Comm comm, newcomm int myid, color
  • MPI_Comm_rank(comm, myid) // id of current
    process
  • color myid3
  • MPI_Comm_split(comm, colour, myid, newcomm)

0
4
5
6
7
1
2
3
1
0
0
0
1
2
2
1
0
1
2
35
Forming new communicators another approach (3)
  • MPI_Comm_split (comm, colour, key, newcomm)
  • New communicator created for each new value of
    colour
  • Each new communicator (sub-group) comprises those
    processes that specify its value in colour
  • These processes are assigned new identifiers
    (ranks, starting at zero) with the order
    determined by the value of key (or by their ranks
    in the old communicator in event of ties)

36
Communications
  • Point-to-point communications involving exact
    two processes, one sender and one receiver
  • For example, MPI_Send() and MPI_Recv()
  • Collective communications involving a group of
    processes

37
Collective operations
  • i.e. coordinated communication operations
    involving multiple processes
  • Programmer could do this by hand (tedious), MPI
    provides a specialized collective communications
  • barrier synchronize all processes
  • broadcast sends data from one to all processes
  • gather gathers data from all processes to one
    process
  • scatter scatters data from one process to all
    processes
  • reduction operations sums, multiplies etc.
    distributed data
  • all executed collectively (on all processes in
    the group, at the same time, with the same
    parameters)

38
Collective operations
  • MPI_Barrier (comm)
  • Global synchronization
  • comm is the communicator handle
  • No processes return from function until all
    processes have called it
  • Good way of separating one phase from another

39
Barrier synchronizations
  • You are only as quick as your slowest process

Barrier sync.
Barrier sync.
40
Collective operations
  • MPI_Bcast (buf, count, type, root, comm)
  • Broadcast data from root to all processes
  • buf address of input buffer or output buffer
    (root)
  • count no. of entries in buffer (gt0)
  • type datatype of buffer elements
  • root process id of root process
  • comm communicator

data
One to all broadcast
proc.
A0
A0
A0
A0
MPI_BCAST
A0
41
Example of MPI_Bcast
  • Broadcast 100 ints from process 0 to every
    process in the group
  • MPI_Comm comm
  • int array100
  • int root 0
  • MPI_Bcast (array, 100, MPI_INT, root, comm)

42
Collective operations
  • MPI_Gather (inbuf, incount, intype, outbuf,
    outcount, outtype, root, comm)
  • Collective data movement function
  • inbuf address of input buffer
  • incount no. of elements sent from each (gt0)
  • intype datatype of input buffer elements
  • outbuf address of output buffer (var param)
  • outcount no. of elements received from each
  • outtype datatype of output buffer elements
  • root process id of root process
  • comm communicator

data
All to one gather
proc.
A0
A0
A1
A2
A3
A1
A2
MPI_GATHER
A3
43
Collective operations
  • MPI_Gather (inbuf, incount, intype, outbuf,
    outcount, outtype, root, comm)
  • Collective data movement function
  • inbuf address of input buffer
  • incount no. of elements sent from each (gt0)
  • intype datatype of input buffer elements
  • outbuf address of output buffer
  • outcount no. of elements received from each
  • outtype datatype of output buffer elements
  • root process id of root process
  • comm communicator

Input to gather
data
All to one gather
proc.
A0
A0
A1
A2
A3
A1
A2
MPI_GATHER
A3
44
Collective operations
  • MPI_Gather (inbuf, incount, intype, outbuf,
    outcount, outtype, root, comm)
  • Collective data movement function
  • inbuf address of input buffer
  • incount no. of elements sent from each (gt0)
  • intype datatype of input buffer elements
  • outbuf address of output buffer (var param)
  • outcount no. of elements received from each
  • outtype datatype of output buffer elements
  • root process id of root process
  • comm communicator

Output gather
data
All to one gather
proc.
A0
A0
A1
A2
A3
A1
A2
MPI_GATHER
A3
45
Collective operations
  • MPI_Gather (inbuf, incount, intype, outbuf,
    outcount, outtype, root, comm)
  • Collective data movement function
  • inbuf address of input buffer
  • incount no. of elements sent from each (gt0)
  • intype datatype of input buffer elements
  • outbuf address of output buffer (var param)
  • outcount no. of elements received from each
  • outtype datatype of output buffer elements
  • root process id of root process
  • comm communicator

Receiving proc.
data
All to one gather
proc.
A0
A0
A1
A2
A3
A1
A2
MPI_GATHER
A3
46
MPI_Gather example
  • Gather 100 ints from every process in group to
    root
  • MPI_Comm comm
  • int gsize, sendarray100
  • int root, myrank, rbuf
  • ...
  • MPI_Comm_rank( comm, myrank) // find proc. id
  • If (myrank root)
  • MPI_Comm_size( comm, gsize) // find group
    size
  • rbuf (int ) malloc(gsize100sizeof(int))
    // calc. receive buffer
  • MPI_Gather( sendarray, 100, MPI_INT, rbuf, 100,
    MPI_INT, root, comm)

47
Collective operations
  • MPI_Scatter (inbuf, incount, intype, outbuf,
    outcount, outtype, root, comm)
  • Collective data movement function
  • inbuf address of input buffer
  • incount no. of elements sent to each (gt0)
  • intype datatype of input buffer elements
  • outbuf address of output buffer
  • outcount no. of elements received by each
  • outtype datatype of output buffer elements
  • root process id of root process
  • comm communicator

data
One to all scatter
proc.
A0
A1
A0
A2
A3
A1
A2
MPI_SCATTER
A3
48
Example of MPI_Scatter
  • MPI_Scatter is reverse of MPI_Gather
  • It is as if the root sends using
  • MPI_Send(inbufiincount sizeof(intype),
    incount, intype, i, )
  • MPI_Comm comm
  • int gsize, sendbuf
  • int root, rbuff100
  • MPI_Comm_size (comm, gsize)
  • sendbuf (int ) malloc (gsize100sizeof(int
    ))
  • MPI_Scatter (sendbuf, 100, MPI_INT, rbuf,
    100, MPI_INT, root, comm)

49
Collective operations
  • MPI_Reduce (inbuf, outbuf, count, type, op,
    root, comm)
  • Collective reduction function
  • inbuf address of input buffer
  • outbuf address of output buffer
  • count no. of elements in input buffer (gt0)
  • type datatype of input buffer elements
  • op operation
  • root process id of root process
  • comm communicator

data
proc.
2
4
0
2
Using MPI_MIN Root 0
5
7
0
3
MPI_REDUCE
2
6
50
Collective operations
  • MPI_Reduce (inbuf, outbuf, count, type, op,
    root, comm)
  • Collective reduction function
  • inbuf address of input buffer
  • outbuf address of output buffer
  • count no. of elements in input buffer (gt0)
  • type datatype of input buffer elements
  • op operation
  • root process id of root process
  • comm communicator

data
proc.
2
4
Using MPI_SUM Root 1
13
16
5
7
0
3
MPI_REDUCE
2
6
51
Collective operations
  • MPI_Allreduce (inbuf, outbuf, count, type, op,
    comm)
  • Collective reduction function
  • inbuf address of input buffer
  • outbuf address of output buffer (var param)
  • count no. of elements in input buffer (gt0)
  • type datatype of input buffer elements
  • op operation
  • comm communicator

data
proc.
2
4
0
2
Using MPI_MIN
5
7
0
2
0
3
0
2
MPI_ALLREDUCE
2
6
0
2
52
Buffering in MPI communications
  • Application buffer specified by the first
    parameter in MPI_Send/Recv functions
  • System buffer
  • Hidden from the programmer and managed by the MPI
    library
  • Is limitted and can be easy to exhaust

53
Blocking and non-blocking communications
  • Blocking send
  • The sender doesnt return until the application
    buffer can be re-used (which often means that the
    data have been copied from application buffer to
    system buffer), but doesnt mean that the data
    will be received
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Blocking receive
  • The receiver doesnt return until the data have
    been ready to use by the receiver (which often
    means that the data have been copied from system
    buffer to application buffer)
  • Non-blocking send/receive
  • The calling process returns immediately
  • Just request the MPI library to perform the
    operation, the user cannot predict when that will
    happen
  • Unsafe to modify the application buffer until you
    can make sure the requested operation has been
    performed (MPI provides routines to test this)
  • Can be used to overlap computation with
    communication and have possible performance gains
  • MPI_Isend (buf, count, datatype, dest, tag, comm,
    request)

54
Testing non-blocking communications for completion
  • Completion tests come in two types
  • WAIT type
  • TEST type
  • WAIT type the WAIT type testing routines block
    until the communication has completed.
  • A non-blocking communication immediately followed
    by a WAIT-type test is equivalent to the
    corresponding blocking communication
  • TEST type these routines return TRUE or FALSE
    value
  • The process can perform some other tasks when the
    communication has not completed

55
Testing non-blocking communications for completion
  • The WAIT-type test is
  • MPI_Wait (request, status)
  • This routine blocks until the communication
    specified by the handle request has completed.
    The request handle will have been returned by an
    earlier call to a non-blocking communication
    routine.
  • The TEST-type test is
  • MPI_Test (request, flag, status)
  • In this case the communication specified by the
    handle request is simply queried to see if the
    communication has completed and the result of the
    query (TRUE or FALSE) is returned immediately in
    flag.

56
Testing multiple non-blocking communications for
completion
  • Wait for all communications to complete
  • MPI_Waitall (count, array_of_requests,
    array_of_statuses)
  • This routine blocks until all the communications
    specified by the request handles,
    array_of_requests, have completed. The statuses
    of the communications are returned in the array
    array_of_statuses and each can be queried in the
    usual way for the source and tag if required
  • Test if all communications have completed
  • MPI_Testall (count, array_of_requests, flag,
    array_of_statuses)
  • If all the communications have completed, flag is
    set to TRUE, and information about each of the
    communications is returned in array_of_statuses.
    Otherwise flag is set to FALSE and
    array_of_statuses is undefined.

57
Testing multiple non-blocking communications for
completion
  • Query a number of communications at a time to
    find out if any of them have completed
  • Wait MPI_Waitany (count, array_of_requests,
    index, status)
  • MPI_WAITANY blocks until one or more of the
    communications associated with the array of
    request handles, array_of_requests, has
    completed.
  • The index of the completed communication in the
    array_of_requests handles is returned in index,
    and its status is returned in status.
  • Should more than one communication have
    completed, the choice of which is returned is
    arbitrary.
  • Test MPI_Testany (count, array_of_requests,
    index, flag, status)
  • The result of the test (TRUE or FALSE) is
    returned immediately in flag.
Write a Comment
User Comments (0)
About PowerShow.com