Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010 - PowerPoint PPT Presentation

Loading...

PPT – Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010 PowerPoint presentation | free to download - id: 773402-MDZiZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010

Description:

Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010 – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 80
Provided by: JasonH182
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010


1
Introduction to Parallel Programming with C and
MPI at MCSRPart 1The University of Southern
MississippiApril 8, 2010
2
What is a Supercomputer?
  • Loosely speaking, it is a large computer with
    an architecture that has been optimized for
    bigger solving problems faster than a
    conventional desktop, mainframe, or server
    computer.- Pipelining
  • - Parallelism (lots of CPUs or Computers)

3
Supercomputers at MCSR mimosa
  • 253 CPU Intel Linux Cluster Pentium 4
  • Distributed memory 500MB 1GB per node
  • Gigabit Ethernet

4
Supercomputers at MCSR redwood
  • 224 CPU Memory Supercomputer
  • Intel Itanium 2
  • Shared Memory 1GB per node

5
Supercomputers at MCSR sequoia
  • 46 node Linux Cluster
  • 8 cores (CPUs) per node 368 cores total
  • 2 GB memory per core (16 GB per node)
  • Shared memory intra-node
  • Distributed memory inter-node
  • Intel Xeon processors

6
Supercomputers at MCSR sequoia
7
What is Parallel Computing?
  • Using more than one computer (or processor) to
    complete a computational problem

8
How May a Problem be Parallelized?
  • Data Decomposition
  • Task Decomposition

9
Models of Parallel Programming
  • Message Passing Computing
  • Processes coordinate and communicate results via
    calls to message passing library routines
  • Programmers parallelize algorithm and add
    message calls
  • At MCSR, this is via MPI programming with C or
    Fortran
  • Sweetgum Origin 2800 Supercomputer (128 CPUs)
  • Mimosa Beowulf Cluster with 253 Nodes
  • Redwood Altix 3700 Supercomputer (224 CPUs)
  • Shared Memory Computing
  • Processes or threads coordinate and communicate
    results via shared memory variables
  • Care must be taken not to modify the wrong memory
    areas
  • At MCSR, this is via OpenMP programming with C or
    Fortran on sweetgum

10
Message Passing Computing at MCSR
  • Process Creation
  • Manager and Worker Processes
  • Static vs. Dynamic Work Allocation
  • Compilation
  • Models
  • Basics
  • Synchronous Message Passing
  • Collective Message Passing
  • Deadlocks
  • Examples

11
Message Passing Process Creation
  • Dynamic
  • one process spawns other processes gives them
    work
  • PVM
  • More flexible
  • More overhead - process creation and cleanup
  • Static
  • Total number of processes determined before
    execution begins
  • MPI

12
Message Passing Processes
  • Often, one process will be the manager, and the
    remaining processes will be the workers
  • Each process has a unique rank/identifier
  • Each process runs in a separate memory space and
    has its own copy of variables

13
Message Passing Work Allocation
  • Manager Process
  • Does initial sequential processing
  • Initially distributes work among the workers
  • Statically or Dynamically
  • Collects the intermediate results from workers
  • Combines into the final solution
  • Worker Process
  • Receives work from, and returns results to, the
    manager
  • May distribute work amongst themselves
    (decentralized load balancing)

14
Message Passing Compilation
  • Compile/link programs w/ message passing
    libraries using regular (sequential) compilers
  • Fortran MPI example include mpif.h
  • C MPI example
  • include mpi.h

15
Message Passing Compilation
16
(No Transcript)
17
Message Passing Models
  • SPMD Shared Program/Multiple Data
  • Single version of the source code used for each
    process
  • Manager executes one portion of the program
    workers execute another some portions executed
    by both
  • Requires one compilation per architecture type
  • MPI
  • MPMP Multiple Program/Multiple Data
  • Once source code for master another for slave
  • Each must be compiled separately
  • PVM

18
Message Passing Basics
  • Each process must first establish the message
    passing environment
  • Fortran MPI example
  • integer ierror
  • call MPI_INIT (ierror)
  • C MPI example MPI_Init(argc, argv)

19
Message Passing Basics
  • Each process has a rank, or id number
  • 0, 1, 2, n-1, where there are n processes
  • With SPMD, each process must determine its own
    rank by calling a library routine
  • Fortran MPI Example integer comm, rank,
    ierror call MPI_COMM_RANK(MPI_COMM_WORLD, rank,
    ierror)
  • C MPI Example MPI_Comm_rank(MPI_COMM_WORLD,
    rank)

20
Message Passing Basics
  • Each process has a rank, or id number
  • 0, 1, 2, n-1, where there are n processes
  • Each process may use a library call to determine
    how many total processes it has to play with
  • Fortran MPI Example integer comm, size,
    ierror call MPI_COMM_SIZE(MPI_COMM_WORLD, size,
    ierror)
  • C MPI Example MPI_Comm_size(MPI_COMM_WORLD,
    size)

21
Message Passing Basics
  • Each process has a rank, or id number
  • 0, 1, 2, n-1, where there are n processes
  • Once a process knows the size, it also knows the
    ranks (id s) of those other processes, and can
    send or receive a message to/from any other
    process.
  • C Example MPI_Send(buf, count, datatype, dest,
    tag, comm, ierror) ------DATA----------
    ---EVELOPE--- -status------ MPI_Recv(buf,
    count, datatype, sourc,tag,comm, status,ierror)

22
MPI Send and Receive Arguments
  • Buf starting location of data
  • Count number of elements
  • Datatype MPI_Integer, MPI_Real, MPI_Character
  • Destination rank of process to whom msg being
    sent
  • Source rank of sender from whom msg being
    received or MPI_ANY_SOURCE
  • Tag integer chosen by program to indicate type
    of message or MPI_ANY_TAG
  • Communicator ids the process team, e.g.,
    MPI_COMM_WORLD
  • Status the result of the call (such as the
    data items received)

23
Synchronous Message Passing
  • Message calls may be blocking or nonblocking
  • Blocking Send
  • Waits to return until the message has been
    received by the destination process
  • This synchronizes the sender with the receiver
  • Nonblocking Send
  • Return is immediate, without regard for whether
    the message has been transferred to the receiver
  • DANGER Sender must not change the variable
    containing the old message before the transfer is
    done.
  • MPI_ISend() is nonblocking

24
Synchronous Message Passing
  • Locally Blocking Send
  • The message is copied from the send parameter
    variable to intermediate buffer in the calling
    process
  • Returns as soon as the local copy is complete
  • Does not wait for receiver to transfer the
    message from the buffer
  • Does not synchronize
  • The senders message variable may safely be
    reused immediately
  • MPI_Send() is locally blocking

25
Synchronous Message Passing
  • Blocking Receive
  • The call waits until a message matching the given
    tag has been received from the specified source
    process.
  • MPI_RECV() is blocking.
  • Nonblocking Receive
  • If this process has a qualifying message waiting,
    retrieves that message and returns
  • If no messages have been received yet, returns
    anyway
  • Used if the receiver has other work it can be
    doing while it waits
  • Status tells the receive whether the message was
    received
  • MPI_Irecv() is nonblocking
  • MPI_Wait() and MPI_Test() can be used to
    periodically check to see if the message is
    ready, and finally wait for it, if desired

26
Collective Message Passing
  • Broadcast
  • Sends a message from one to all processes in the
    group
  • Scatter
  • Distributes each element of a data array to a
    different process for computation
  • Gather
  • The reverse of scatterretrieves data elements
    into an array from multiple processes

27
Collective Message Passing w/MPI
  • MPI_Bcast() Broadcast from root to all other
    processes
  • MPI_Gather() Gather values for group of
    processes
  • MPI_Scatter() Scatters buffer in parts to group
    of processes
  • MPI_Alltoall() Sends data from all processes to
    all processes
  • MPI_Reduce() Combine values on all processes to
    single val
  • MPI_Reduce_Scatter() Broadcast from root to all
    other processes
  • MPI_Bcast() Broadcast from root to all other
    processes

28
Message Passing Deadlock
  • Deadlock can occur when all critical processes
    are waiting for messages that never come, or
    waiting for buffers to clear out so that their
    own messages can be sent
  • Possible Causes
  • Program/algorithm errors
  • Message and buffer sizes
  • Solutions
  • Order operations more carefully
  • Use nonblocking operations
  • Add debugging output statements to your code to
    find the problem

29
(No Transcript)
30
Sample PBS Script
  • sequoia vi example.pbs
  • !/bin/bash
  • PBS -l nodes4 Mimosa
  • PBS l ncpus4 Redwood
  • PBS l ncpus4 Sequoia
  • PBS l cput050 Request 5 minutes of CPU
    time
  • PBS N example
  • cd PWD
  • rm .pbs.eo
  • icc lmpi o add_mpi.exe add_mpi.c Sequoia
  • mpiexec -n 4 add_mpi.exe
    Sequoia
  • sequoia qsub example.pbs
  • 37537.sequoia.mcsr.olemiss.edu

31
PBS Querying Jobs
32
MPI Programming Exercises
  • Hello World
  • sequential
  • parallel (w/MPI and PBS)

Add the prime numbers in an Array of
numbers sequential parallel (w/MPI and PBS)
33
Log in to sequoia get workshop files
  • Use secure shell to login from your PC to
    hpcwoods
  • ssh trn_N8Y9_at_hpcwoods.olemiss.edu
  • B. Use secure shell to from hpcwoods to your
    training account on sequoia
  • ssh tracct1_at_sequoia
  • ssh tracct2_at_sequoia

C. Copy workshop files into your home directory
by running /usr/local/apps/ppro/prepare_mpi_work
shop
34
(No Transcript)
35
Examine, compile, and execute hello.c
36
Examine hello_mpi.c
37
Examine hello_mpi.c
Add macro to include theheader file for the MPI
library calls.
38
Examine hello_mpi.c
Add function call to initialize the MPI
environment
39
Examine hello_mpi.c
Add function call find out how many parallel
processes there are.
40
Examine hello_mpi.c
Add function call to find out which processthis
is the MPI process ID of this process.
41
Examine hello_mpi.c
Add IF structure so that the manager/boss
process can do one thing, and everyone else
(the workers/servants)can do something else.
42
Examine hello_mpi.c
All processes, whether manager or worker, must
finalize MPI operations.
43
Compile hello_mpi.c
Compile it.
Why wont this compile?
You must link to the MPI library.
44
Run hello_mpi.exe
On 1 CPU
On 2 CPUs
On 4 CPUs
45
hello_mpi.pbs
46
hello_mpi.pbs
47
hello_mpi.pbs
48
hello_mpi.pbs
49
hello_mpi.pbs
50
hello_mpi.pbs
51
Submit hello_mpi.pbs
52
Submit hello_mpi.pbs
53
Submit hello_mpi.pbs
54
Examine, compile, and execute add_mpi.c
55
Examine, compile, and execute add_mpi.c
56
Examine, compile, and execute add_mpi.c
57
Examine, compile, and execute add_mpi.c
58
Examine, compile, and execute add_mpi.c
59
(No Transcript)
60
Examine, compile, and execute add_mpi.c
61
Examine, compile, and execute add_mpi.c
62
Examine, compile, and execute add_mpi.c
63
Examine, compile, and execute add_mpi.c
64
(No Transcript)
65
Examine, compile, and execute add_mpi.c
66
Examine, compile, and execute add_mpi.c
67
Examine, compile, and execute add_mpi.c
68
Examine add_mpi.pbs
69
Examine add_mpi.pbs
70
Examine add_mpi.pbs
71
Submit PBS Script add_mpi.pbs
72
Examine Output and Errors add_mpi.c
73
Determine Speedup
74
Determine Parallel Efficiency
75
(No Transcript)
76
How Could Speedup/Efficiency Improve?
77
What Happens to ResultsWhen MAXSIZE NotEvenly
Divisible by n?
78
Exercise 1Change Code to Work When MAXSIZE is
Not EvenlyDivisible by n
79
Exercise 2Change Code to Improve Speedup
About PowerShow.com