PARALLEL COMPUTING WITH MPI

About This Presentation

Title:

PARALLEL COMPUTING WITH MPI

Description:

Sender can find out if the message-buffer can be re-used ... The jth block of the receive buffer is the block of data sent from the jth process ... – PowerPoint PPT presentation

Number of Views:163

Avg rating:3.0/5.0

Slides: 97

Provided by: suz86

Category:

more less

Transcript and Presenter's Notes

Title: PARALLEL COMPUTING WITH MPI

1
Parallel Programming on the SGI Origin2000
Taub Computer Center Technion
Moshe Goldberg, mgold_at_tx.technion.ac.il
With thanks to Igor Zacharov / Benoit Marchand,
SGI
Mar 2004 (v1.2)
2
Parallel Programming on the SGI Origin2000

Parallelization Concepts
SGI Computer Design
Efficient Scalar Design
Parallel Programming -OpenMP
Parallel Programming- MPI

3
2) Parallel Programming-MPI
4
Parallel classification

Parallel architectures
Shared Memory /
Distributed Memory
Programming paradigms
Data parallel /
Message passing

5
Shared Memory

Each processor can access any part of the memory
Access times are uniform (in principle)
Easier to program (no explicit message passing)
Bottleneck when several tasks access same
location

6
Distributed Memory

Processor can only access local memory
Access times depend on location
Processors must communicate via explicit message
passing

7
Distributed Memory
Processor Memory
Processor Memory
Interconnection network
8
Message Passing Programming

Separate program on each processor
Local Memory
Control over distribution and transfer of data
Additional complexity of debugging due to
communications

9
Performance issues

Concurrency ability to perform actions
simultaneously
Scalability performance is not impaired by
increasing number of processors
Locality high ration of local memory
accesses/remote memory accesses (or low
communication)

10
SP2 Benchmark

Goal Checking performance of real world
applications on the SP2
Execution time (seconds)CPU time for
applications
Speedup
Execution time for 1 processor
---------------------------------
---
Execution time for p processors

11
(No Transcript)
12
WHAT is MPI?

A message- passing library specification
Extended message-passing model
Not specific to implementation or computer

13
BASICS of MPI PROGRAMMING

MPI is a message-passing library
Assumes a distributed memory architecture
Includes routines for performing communication
(exchange of data and synchronization) among the
processors.

14
Message Passing

Data transfer synchronization
Synchronization the act of bringing one or more
processes to known points in their execution
Distributed memory memory split up into
segments, each may be accessed by only one
process.

15
Message Passing
May I send?
yes
Send data
16
MPI STANDARD

Standard by consensus, designed in an open forum
Introduced by the MPI FORUM in May 1994, updated
in June 1995.
MPI-2 (1998) produces extensions to the MPI
standard

17
Why use MPI ?

Standardization
Portability
Performance
Richness
Designed to enable libraries

18
Writing an MPI Program

If there is a serial version , make sure it is
debugged
If not, try to write a serial version first
When debugging in parallel , start with a few
nodes first.

19
Format of MPI routines
20
Six useful MPI functions
21
Communication routines
22
End MPI part of program
23

program hello
include mpif.h status(MPI_STATUS_SIZE)
character12 message call MPI_INIT(ierror) call
MPI_COMM_SIZE(MPI_COMM_WORLD, size,ierror) call
MPI_COMM_RANK(MPI_COMM_WORLD, rank,ierror) tag
100 if(rank .eq. 0) then message 'Hello,
world' do i1, size-1 call
MPI_SEND(message, 12, MPI_CHARACTER , i,
tag,MPI_COMM_WORLD,ierror)
enddo
else
call MPI_RECV(message, 12, MPI_CHARACTER,
0,tag,MPI_COMM_WORLD, status, ierror)
endif
print, 'node', rank, '', message
call MPI_FINALIZE(ierror)
end

24
int main( int argc, char argv) int tag100
int rank,size,i MPI_Status status char
message12 MPI_Init(argc,argv)
MPI_Comm_size(MPI_COMM_WORLD,size)
MPI_Comm_rank(MPI_COMM_WORLD,rank)
strcpy(message,"Hello,world")
if (rank0) for
(i1iltsizei)
MPI_Send(message,12,MPI_CHAR,i,tag,MPI_COMM_WORLD)
else
MPI_Recv(message,12,MPI_CHAR,0,tag,MPI_C
OMM_WORLD,status) printf("node d s
\n",rank,message) MPI_Finalize() return
0
25
MPI Messages

DATA data to be sent
ENVELOPE information to route the data.

26
Description of MPI_Send (MPI_Recv)
27
Description of MPI_Send (MPI_Recv)
28
Some useful remarks

Source MPI_ANY_SOURCE means that any source is
acceptable
Tags specified by sender and receiver must match,
or MPI_ANY_TAG any tag is acceptable
Communicator must be the same for send/receive.
Usually MPI_COMM_WORLD

29
POINT-TO-POINT COMMUNICATION

Transmission of a message between one pair of
processes
Programmer can choose mode of transmission
Programmer can choose mode of transmission

30
MODE of TRANSMISSION

Can be chosen by programmer
or let the system decide

Synchronous mode
Ready mode
Buffered mode
Standard mode

31
BLOCKING /NON-BLOCKING COMMUNICATIONS
32
BLOCKING STANDARD SEND
Date transfer from source complete
MPI_SEND
Sizegtthreshold
Task waits
S
R
wait
Transfer begins when MPI_RECV has been posted
MPI_RECV
Task continues when data transfer to buffer is
complete
33
NON BLOCKING STANDARD SEND
Date transfer from source complete
MPI_ISEND
MPI_WAIT
Sizegtthreshold
Task waits
S
R
wait
Transfer begins when MPI_IRECV has been posted
MPI_IRECV
MPI_WAIT
No interruption if wait is late enough
34
BLOCKING STANDARD SEND
MPI_SEND
Sizeltthreshold
Data transfer from source complete
S
R
Transfer to buffer on receiver
MPI_RECV
Task continues when data transfer to
usersbuffer is complete
35
NON BLOCKING STANDARD SEND
Date transfer from source complete
MPI_ISEND
MPI_WAIT
Sizeltthreshold
No delay even though message is not yet in buffer
on R
S
R
Transfer to buffer can be avoided if
MPI_IRECV posted early enough
MPI_IRECV
MPI_WAIT
No delay if wait is late enough
36
BLOCKING COMMUNICATION
37
NON-BLOCKING
38
(No Transcript)
39
Deadlock program (cont)
if ( irank.EQ.0 ) then idest 1
isrc 1 isend_tag ITAG_A
irecv_tag ITAG_B else if ( irank.EQ.1 )
then idest 0 isrc 0
isend_tag ITAG_B irecv_tag ITAG_A
end if C ------------------------------------
----------------------------C send and
receive messagesC ---------------------------
---------------------------------- print ,
" Task ", irank, " has sent the message"
call MPI_Send ( rmessage1, MSGLEN, MPI_REAL,
idest, isend_tag, . MPI_COMM_WORLD, ierr
) call MPI_Recv ( rmessage2, MSGLEN,
MPI_REAL, isrc, irecv_tag, .
MPI_COMM_WORLD, istatus, ierr ) print , "
Task ", irank, " has received the message"
call MPI_Finalize (ierr) end
40
DEADLOCK example
MPI_RECV
MPI_SEND
A
B
MPI_SEND
MPI_RECV
41
Deadlock example

SP2 implementationNo Receive has been posted
yet,so both processes block
Solutions
Different ordering
Non-blocking calls
MPI_Sendrecv

42
Determining Information about Messages

Wait
Test
Probe

43
MPI_WAIT

Useful for both sender and receiver of
non-blocking communications
Receiving process blocks until message is
received, under programmer control
Sending process blocks until send operation
completes, at which time the message buffer is
available for re-use

44
MPI_WAIT
compute
transmit
S
R
MPI_WAIT
45
MPI_TEST
MPI_TEST
compute
transmit
S
MPI_Isend
R
46
MPI_TEST

Used for both sender and receiver of non-blocking
communication
Non-blocking call
Receiver checks to see if a specific sender has
sent a message that is waiting to be delivered
... messages from all other senders are ignored

47
MPI_TEST (cont.)

Sender can find out if the message-buffer can be
re-used ... have to wait until operation is
complete before doing so

48
MPI_PROBE

Receiver is notified when messages from
potentially any sender arrive and are ready to be
processed.
Blocking call

49
Programming recommendations

Blocking calls are needed when
Tasks must synchronize
MPI_Wait immediately follows communication call

50
Collective Communication

Establish a communication pattern within a group
of nodes.
All processes in the group call the communication
routine, with matching arguments.
Collective routine calls can return when their
participation in the collective communication is
complete.

51
Properties of collective calls

On completion he caller is now free to access
locations in the communication buffer.
Does NOT indicate that other processors in the
group have completed
Only MPI_BARRIER will synchronize all processes

52
Properties

MPI guarantees that a message generated by
collective communication calls will not be
confused with a message generated by
point-to-point communication
Communicator is the group identifier.

53
Barrier

Synchronization primitive. A node calling it will
block until all the nodes within the group have
called it.
Syntax
MPI_Barrier(Comm, Ierr)

54
Broadcast

Send data on one node to all other nodes in
communicator.
MPI_Bcast(buffer, count, datatype,root,comm,ierr)

55
Broadcast
DATA
A0
A0
P0
A0
P1
A0
P2
A0
P3
56
Gather and Scatter
DATA
scatter
A0
A0
P0
A1
A2
A3
A1
P1
A2
P2
A3
P3
gather
57
Allgather effect
DATA
C0
A0
D0
B0
A0
P0
A0
B0
B0
D0
C0
P1
A0
C0
D0
B0
C0
P2
D0
A0
P3
D0
B0
C0
allgather
58
Syntax for Scatter Gather
59
Scatter and Gather

Gather Collect data from every member of the
group (including the root) on the root node in
linear order by the rank of the node.
Scatter Distribute data from the root to every
member of the group in linear order by node.

60
ALLGATHER

All processes, not just the root, receive the
result. The jth block of the receive buffer is
the block of data sent from the jth process
Syntax
MPI_Allgather( sndbuf,scount,datatype,recvbuf,r
count,rdatatype,comm,ierr)

61
Gather example

DIMENSION A(25,100),b(100),cpart(25),ctotal(100)
INTEGER root
DATA root/0/
DO I1,25
cpart(I) 0
. DO K1,100
cpart(I) cpart(I) A(I,K)b(K)
END DO
END DO
call MPI_GATHER(cpart,25,MPI_REAL,ctotal,25,MPI_RE
AL, root, MPI_COMM_WORLD, ierr)

62
AllGather example

DIMENSION A(25,100),b(100),cpart(25),ctotal(100)
INTEGER root
DO I1,25
cpart(I) 0
. DO K1,100
cpart(I) cpart(I) A(I,K)b(K)
END DO
END DO
call MPI_AllGATHER(cpart,25,MPI_REAL,ctotal,25,MPI
_REAL, MPI_COMM_WORLD, ierr)

63
Parallel matrix-vector multiplication
A b c
P1
25
P2

25
P3
25
P4
25
64
Global Computations

Reduction
Scan

65
Reduction

The partial result in each process in the group
is combined in one specified process

66
Reduction
Dj D(0,j)D(1,j) ... D(n-1,j)
67
Scan operation

Scan or prefix-reduction operation performs
partial reductions on distributed data
Dkj D0jD1j ... Dkj
k0,1,n-1

68
Varying size gather and scatter

Both size and memory location of the messages are
varying
More flexibility in writing code
less need to copy data into temporary buffers
more compact final code
Vendor implementation may be optimal

69
Scatterv syntax
70
SCATTER
P0
P0
P1
P2
P3
71
SCATTERV

P0
P0
P1
P2
P3
72
Advanced Datatypes

Predefined basic datatypes -- contiguous data
of the same type.
We sometimes need
non-contiguous data of single type
contiguous data of mixed types

73
Solutions

multiple MPI calls to send and receive each data
element
copy the data to a buffer before sending it
(MPI_PACK)
use MPI_BYTE to get around the datatype-matching
rules

74
Drawback

Slow , clumsy and wasteful of memory
Using MPI_BYTE or MPI_PACKED can hamper
portability

75
General Datatypes and Typemaps

a sequence of basic datatypes
a sequence of integer (byte) displacements

76
Typemaps

typemap (type0,disp0),(type1,disp1),.,
(typen,disp n)
Displacement are relative to the buffer
Example
Typemap (MPI_INT) (int,0)

77
Extent of a Derived Datatype
78
MPI_TYPE_EXTENT

MPI_TYPE_EXTENT(datatype,extent,ierr)
Describes distance (in bytes) from start of
datatype to start of the next datatype .

79
How and When Do I Use Derived Datatypes?

MPI derived datatypes are created at run-time
through calls to MPI library routines.

80
How to use

Construct the datatype
Allocate the datatype.
Use the datatype
Deallocate the datatype

81
EXAMPLE

integer oldtype,newtype,count,blocklength,stride
integer ierr,n
real buffer(n,n)
call MPI_TYPE_VECTOR(count,blocklength,stride,oldt
ype,newtype,ierr)
call MPI_TYPE_COMMIT(newtype,ierr)
call MPI_SEND(buffer,1,newtype,dest,tag,comm,err)
use it in communication operation
call MPI_TYPE_FREE(newtype,ierr)
deallocate it

82
Example on MPI_TYPE_VECTOR
oldtype
newtype
BLOCK
BLOCK
83
Summary

Derived datatypes are datatypes that are built
from the basic MPI datatypes
Derived datatypes provide a portable and elegant
way of communicating non-contiguous or mixed
types in a message.
Efficiency may depend on the implementation(see
how it compares to MPI_BYTE)

84
Several datatypes
85
Several datatypes
86
GROUP
87
Group (cont.)
88
Group (cont.)
c if(rank .eq. 1) then print, 'sum of
group1', (rbuf(i), i1, count)c
print, 'sum of group1', (sbuf(i), i1,
count) endif count2 size do i1,
count2 sbuf2(i) rank
rank enddo CALL MPI_REDUCE(SBUF2,RBUF2,COUNT2,MP
I_INTEGER,
MPI_SUM,0,WCOMM,IERR) if(rank .eq. 0) then
print, 'sum of wgroup', (rbuf2(i), i1,
count2) else CALL
MPI_COMM_FREE(SUBCOMM, IERR) endif CALL
MPI_GROUP_FREE(GROUP1, IERR) CALL
MPI_FINALIZE(IERR) stop end
89
PERFORMANCE ISSUES

Hidden communication takes place
Performance depends on implementation of MPI
Because of forced synchronization, it is not
always best to use collective communication

90
Example simple broadcast
1
B
DataB(P-1) Steps P-1
2
B
B
3
8
91
Example simple scatter
1
B
DataB(P-1) Steps P-1
2
B
B
3
8
92
Example better scatter
1
DataBplogP Steps log P
4B
2
1
2B
2B
2
4
1
3
B
B
B
B
1
5
3
6
2
7
4
8
93
Timing for sending a message

Time is composed of startup time time to send a
0 length message and transfer time time to
transfer a byte of data.

Tcomm Tstartup B Ttransfer It may
be worthwhile to group several sends together
94
Performance evaluation

Fortran
Real8 t1
T1 MPI_Wtime() ! Returns elapsed time
C
double t1
t1 MPI_Wtime ()

95
MPI References

The MPI Standard
www-unix.mcs.anl.gov/mpi/index.html
Parallel Programming with MPI,Peter S.
Pacheco,Morgan Kaufmann,1997
Using MPI, W. Gropp,Ewing Lusk,Anthony Skjellum,
The MIT Press,1999.

96
Example better broadcast
1
B
B
DataB(P-1) Steps log P
2
1
2
7
1
3
1
5
3
6
2
7
4
8

Write a Comment

User Comments (0)

About PowerShow.com

PARALLEL COMPUTING WITH MPI - PowerPoint PPT Presentation

PARALLEL COMPUTING WITH MPI

Sender can find out if the message-buffer can be re-used ... The jth block of the receive buffer is the block of data sent from the jth process ... – PowerPoint PPT presentation