Parallel Programming Models: Shared Memory Programming, Intro to Message Passing, and Shared Objects - PowerPoint PPT Presentation

About This Presentation

Title:

Parallel Programming Models: Shared Memory Programming, Intro to Message Passing, and Shared Objects

Description:

for (k=0; k L; k ) C[i][j] = A[i][k]*B[k][j]; Running Example: ... Each running in its own address space. Processors have direct access to only their memory ... – PowerPoint PPT presentation

Number of Views:210

Avg rating:3.0/5.0

Slides: 41

Provided by: laxmika

Learn more at: http://charm.cs.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Programming Models: Shared Memory Programming, Intro to Message Passing, and Shared Objects

1
Parallel Programming ModelsShared Memory
Programming, Intro to Message Passing,and
Shared Objects Programming in Charm

Laxmikant Kale

2
Writing parallel programs

Programming model
How should a programmer view the parallel
machine?
Sequential programming von Neumann model
Parallel programming models
Shared memory (Shared address space) model
Message passing model
Shared Objects model

3
Shared Address Space Model

All memory is accessible to all processes
Processes are mapped to processors, typically by
a symmetric OS
Coordination among processes
by sharing variables
Avoid stepping on toes
using locks and barriers

4
Matrix multiplication
for (i0 iltM i) for (j0 jltN j) for
(k0 kltN k) CIj AikBkj
In a shared memory style, this program is trivial
to parallelize Just have each processor deal
with a different range of I (or J?) (or
Both?)
5
Programming Models 2

Basics of Shared Address Space Programming and
Message-passing

6
Shared Address Space Model

All memory is accessible to all processes
Processes are mapped to processors, typically by
a symmetric OS
Coordination among processes
by sharing variables
Avoid stepping on toes
using locks and barriers

7
Matrix multiplication
for (i0 iltM i) for (j0 jltN j) for
(k0 kltL k) Cij AikBkj
In a shared memory style, this program is trivial
to parallelize Just have each processor deal
with a different range of I (or J?) (or
Both?)
8
SAS version pseudocode
size M/numPEs( ) myStart myPE( ) for
(imyStart iltmyStartsize i) for (j0 jltN
j) for (k0 kltL k) Cij
AikBkj
9
Running Example computing pi

Area of circle prr
Ratio of the area of a circle, and that of the
enclosing square
p/4
Method compute a set of random number pairs (in
the range 0-1) and count the number of pairs that
fall inside the circle
The ratio gives us an estimate for p/4
In parallel Let each processor compute a
different set of random number pairs (in the
range 0-1) and count the number of pairs that
fall inside the circle

10
Pi on shared memory
int count Lock countLock piFunction(int
myProcessor) seed s makeSeed(myProcessor)
for (I0 Ilt100000/P I) x random(s) y
random(s) if (xx yy lt 1.0)
lock(countLock) count unlock(countLock)
barrier() if (myProcessor 0)
printf(pif\n, 4count/100000)
11
main() countLock createLock()
parallel(piFunction)
The system needs to provide the functions for
locks, barriers, and thread (or process) creation.
12
Pi on shared memory efficient version
int count Lock countLock piFunction(int
myProcessor) int c seed s
makeSeed(myProcessor) for (I0 Ilt100000/P
I) x random(s) y random(s)
if (xx yy lt 1.0) c lock(countLock)
count c unlock(countLock) barrier() if
(myProcessor 0) printf(pif\n,
4count/100000)
13
Real SAS systems

Posix threads (Pthreads) is a standard for
threads-based shared memory programming
Shared memory calls just a few, normally
standard calls
In addition, lower level calls fetch-and-inc,
fetch-and-add

14
Message Passing

Assume that processors have direct access to only
their memory
Each processor typically executes the same
executable, but may be running different part of
the program at a time

15
Message passing basics

Basic calls send and recv
send(int proc, int tag, int size, char buf)
recv(int proc, int tag, int size, char buf)
Recv may return the actual number of bytes
received in some systems
tag and proc may be wildcarded in a recv
recv(ANY, ANY, 1000, buf)
broadcast
Other global operations (reductions)

16
Posix Threads on Origin 2000

Shared memory programming on Origin 2000
Important calls
Thread creation and joining
pthread_create(pthread_t threadID,
At,functionName, (void ) arg)
pthread_join(pthread_t, threadID, void result)
Locks
pthread_mutex_t lock
pthread_mutex_lock(lock)
pthread_mutex_unlock(lock)
Condition variables
pthread_cond_t cv
pthread_cond_init(cv, (pthread_condattr_t ) 0)
pthread_cond_wait(cv, cv_mutex)
pthread_cond_broadcast(cv)
Semaphores, and other calls

17
Declarations
/ pgm.c / include ltpthread.hgt include
ltstdlib.hgt include ltstdio.hgt define nThreads
4 define nSamples 1000000 typedef struct
_shared_value pthread_mutex_t lock int
value shared_value shared_value sval
18
Function in each thread
void doWork(void id) size_t tid (size_t)
id int nsucc, ntrials, i ntrials
nSamples/nThreads nsucc 0
srand48((long) tid) for(i0iltntrialsi)
double x drand48() double y
drand48() if((xx yy) lt 1.0)
nsucc pthread_mutex_lock((sval.lock))
sval.value nsucc pthread_mutex_unlock((sval
.lock)) return 0
19
Main function
int main(int argc, char argv) pthread_t
tidsnThreads size_t i double est
pthread_mutex_init((sval.lock), NULL)
sval.value 0 printf("Creating Threads\n")
for(i0iltnThreadsi)
pthread_create(tidsi, NULL, doWork, (void )
i) printf("Created Threads... waiting for them
to complete\n") for(i0iltnThreadsi)
pthread_join(tidsi, NULL) printf("Threads
Completed...\n") est 4.0 ((double)
sval.value / (double) nSamples)
printf("Estimated Value of PI lf\n", est)
exit(0)
20
Compiling Makefile
Makefile for solaris FLAGS -mt for
Origin2000 FLAGS pgm pgm.c cc -o
pgm (FLAGS) pgm.c -lpthread clean rm
-f pgm .o
21
Message Passing

Program consists of independent processes,
Each running in its own address space
Processors have direct access to only their
memory
Each processor typically executes the same
executable, but may be running different part of
the program at a time
Special primitives exchange data send/receive
Early theoretical systems
CSP communicating sequential processes
send and matching receive from another processor
both wait.
OCCAM on Transputers used this model
Performance problems due to unnecessary(?) wait
Current systems
Send operations dont wait for receipt on remote
processor

22
Message Passing
send
receive
copy
data
data
PE0
PE1
23
Basic Message Passing

We will describe a hypothetical message passing
system,
with just a few calls that define the model
Later, we will look at real message passing
models (e.g. MPI), with a more complex sets of
calls
Basic calls
send(int proc, int tag, int size, char buf)
recv(int proc, int tag, int size, char buf)
Recv may return the actual number of bytes
received in some systems
tag and proc may be wildcarded in a recv
recv(ANY, ANY, 1000, buf)
broadcast
Other global operations (reductions)

24
Pi with message passing
Int count, c1 main() Seed s
makeSeed(myProcessor) for (I0 Ilt100000/P
I) x random(s) y random(s)
if (xx yy lt 1.0) count send(0,1,4,
count)
25
Pi with message passing
if (myProcessorNum() 0) for (I0
IltmaxProcessors() I) recv(I,1,4,
c) count c printf(pif\n,
4count/100000) / end function main /
26
Collective calls

Message passing is often, but not always, used
for SPMD style of programming
SPMD Single process multiple data
All processors execute essentially the same
program, and same steps, but not in lockstep
All communication is almost in lockstep
Collective calls
global reductions (such as max or sum)
syncBroadcast (often just called broadcast)
syncBroadcast(whoAmI, dataSize, dataBuffer)
whoAmI sender or receiver

27
Standardization of message passing

Historically
nxlib (On Intel hypercubes)
ncube variants
PVM
Everyone had their own variants
MPI standard
Vendors, ISVs, and academics got together
with the intent of standardizing current practice
Ended up with a large standard
Popular, due to vendor support
Support for
communicators avoiding tag conflicts, ..
Data types
..

28
Parallel Programming tasks

Decomposition (what to do in parallel)
Mapping
Scheduling (sequencing)
Machine dependent expression

29
Spectrum of parallel Languages
Leve l
MPI
Specialization
30
Charm

Data Driven Objects
Asynchronous method invocation
Prioritized scheduling
Object Arrays
Object Groups
global object with a representative on each PE
Information sharing abstractions

31
Data Driven Execution
Objects
Scheduler
Scheduler
Message Q
Message Q
32
CkChareID mainhandle mainmain(CkArgMsg m)
int i, low 0 for (i0 ilt100 i)
new CProxy_piPart() responders 100 count
0 mainhandle thishandle // readonly
initialization void mainresults(DataMsg
msg) count msg-gtcount if (0
--responders) CkPrintf("pi f \n",
4.0count/100000) CkExit()
Execution begins here
argc/argv
Exit scheduler after method returns
33
piPartpiPart() // declarations..
CProxy_main mainproxy(mainhandle)
srand48((long) this) mySamples
100000/100 for (i 0 ilt mySamples i)
x drand48() y drand48() if
((xx yy) lt 1.0) localCount
DataMsg result new DataMsg result-gtcount
localCount mainproxy.results(result) delete
this
34
Chares (Data driven Objects)

Regular C classes,
with some methods designated as remotely
invokable (called entry methods )
entry methods have only one parameter
of type message
Creation of an instance of chare class C
new CProxy_C(msg)
Creates an instance of C on a specified processor
pe
new CProxy_C (msg, pe)
Cproxy_C a proxy class generated by Charm for
chare class C declared by the user

35
Messages

A user-defined C class
inherits from a system-defined class
messages can be communicated to others as
parameters
Has regular data fields
Declaration normal C,
inherit from a system defined class
Creation (just usual C)
MsgType m new MsgType

36
Remote method invocation

Proxy Classes
For each chare class C, the system generates a
proxy class.
(C CProxy_C)
Each chare has a global ID (ChareID)
Global in the sense of being valid on all
processors
thishandle (analogous to this) gets you the
ChareID
You can send thishandle in messages
Given a handle h, you can create a proxy
CProxy_C p(h) // or q new CProxy_C(h)
p.method(msg) // or q-gtmethod(msg)

37
Object Arrays

A collection of chares,
with a single global name for the collection, and
each member addressed by an index
Mapping of element objects to processors handled
by the system

Users view
A0
A1
A2
A3
A..
System view
A0
A3
38
Object Groups

A group of objects (chares)
with exactly one representative on each processor
A single Id for the group as a whole
invoke methods in a branch (asynchronously), all
branches (broadcast), or in the local branch
creation
groupId new Cproxy_C(msg)
remote invocation
CProxy_C p(groupId)
p.methodName(msg) // p.methodName(msg, peNum)
p.LocalBranch-gtf(.)

39
Information sharing abstractions