# Parallel Programming Concepts - PowerPoint PPT Presentation

PPT – Parallel Programming Concepts PowerPoint presentation | free to view - id: 115503-YzRhN

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Parallel Programming Concepts

Description:

### Fine Grain Parallelism ... use a language with built-in explicit control for parallelism ... I.e. for expressing loop parallelism (pardo) and data placement ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 30
Provided by: stefan54
Category:
Tags:
Transcript and Presenter's Notes

Title: Parallel Programming Concepts

1
Parallel Programming Concepts
Performance measures and related issues
Parallelisation approaches Code
organization Sources of parallelism
2
Performance Measures and Related Issues
Speedup Amdahls law Load Balancing Granularity
3
Superlinear Speedup
• Should not be possible
• you can simulate the fast parallel algorithm on
a single processor to beat the best sequential
algorithm
• Yet sometimes happens is practice
• more memory available in a parallel computer
• different search order in search problems

Sequential search 12 moves
Parallel search 2 moves
4
• Efficiency
• Speedup/p
• Cost
• time x p
• Scalability
• how efficiently can the hardware/algorithm use
• Gustaffson law (observation)
• situation is not as tragic as Amdahls law
suggests
• the serial fraction usually stays (nearly)
constant as the problem size increases
• consequence nearly linear speedup us possible
if the problem size increases with the number of
processors
• Isoefficiency function see Kumar al. book

5
Exercises Speedup, Efficiency, Cost
Example 1 Compute the sums of columns in an
upper triangular matrix. Program1 for processor
i, n processors sumi 0 for(j0 jlti
j) sumi Aj,i Program2 for processor i,
p processors for(kip klt(i1)p-1 k)
sumk 0 for(j0 jltk j)
sumi Aj,k Program3 for
processor i, p processors for(ki kltn k
p) sumk 0 for(j0 jltk
j) sumi Aj,k
6
Exercises Speedup, Efficiency, Cost
Can we do any better? int AddColumn(int col)
int sum 0 for(i0 iltcol i)
sum Acol, i return sum Program4 for
processor i, p processors col -i while
(colltn) col2i sumcol
7
Exercises Speedup, Efficiency, Cost
Example 2 Compute the sum of n numbers Program
for processor i, n processors tmpSum Ai
for(j2 jltn j2) if (i j 0)
tmpSum hisSum else
send(i-j/2, tmpSum) break
8
Sources of inefficiency
P0 P1

computation P2

idle P3 P4 P5 P6 P7 P8
execution time
9
Sources of inefficiency II
P0 P1

computation P2
idle P3

communication P4 P5 P6 P7 P8
execution time
10
Sources of inefficiency
P0 P1

computation P2
idle P3

communication P4

repeated com- P6

putation P7 P8
execution time
11
P0 P1

computation P2
idle (wasted) P3 P4
12
• P0
• P1
computation
• P2
idle (wasted)
• P3
moved
• P4
• before execution
• during execution

13
Granularity
• The size of the computation segments between
communication.
• fine grained
coarse
grained
• ILP loop
parallelism
• The most efficient granularity is dependent on
the algorithm and the hardware environment in
which it runs
• In most cases overhead associated with
communications and synchronization is high
relative to execution speed so it is advantageous
to have coarse granularity.

14
Fine Grain Parallelism
• All tasks execute a small number of instructions
between communication cycles
• Low computation to communication ratio
• Implies high communication overhead and less
opportunity for performance enhancement
• If granularity is too fine it is possible that
the overhead required for communications and
synchronization between tasks takes longer than
the computation

15
Coarse Grain Parallelism
• Typified by long computations consisting of
large numbers of instructions between
communication synchronization points
• High computation to communication ratio
• Lower communication overhead, more opportunity
for performance increase
• Harder to load balance efficiently

P0 P1
computation P2

commmunication P3 P4
16
Granularity vs. Coupling
• Granularity
• fine grained
coarse
grained
• tightly

loosely
• SMP ccNUMA NUMA
MPP ethernet cluster
• Coupling
• the looser the coupling, the coarser granularity
must be for the communication not to overwhelm
computation

17
Parallel Programming Concepts
Performance measures and related
issues Parallelisation approaches Code
organization Sources of parallelism
18
Parallelisation Approaches
• Parallelizing compiler
• Parallel domain-specific libraries
• e.g. linear algebra, numerical libraries,
quantum chemistry
• usually good choice, use when possible
• Communication libraries
• message passing libraries MPI, PVM
• shared memory libraries declare and access
shared memory variables (on MPP machines done by
emulation)
• disadvantage low level programming (parallel
assembler)

19
Parallelisation Approaches (cont.)
• New parallel languages
• use a language with built-in explicit control
for parallelism
• no language is the best in every domain
• needs new compiler
• fights against inertia
• Parallel features in existing languages
• adding parallel features to an existing language
• I.e. for expressing loop parallelism (pardo) and
data placement
• example High Performance Fortran
• Additional possibilities in shared-memory systems
• preprocessor compiler directives (OpenMP)

20
Parallelisation Approaches Our Focus
• Communication libraries MPI, PVM
• industry standard, available for every platform
• very general, low level approach
• perfectly match for clusters
• most likely to be useful for you
• Shared memory programming
• also very important
• likely to be useful in next iterations of PCs

21
Parallel Programming Concepts
Performance measures and related
issues Parallelisation approaches Code
organization Sources of parallelism
22
Code Organization - SPMD
• Single Program Multiple Data
• well suited for SIMD computers
• popular choice even for MIMD, as it keeps
everything in one place
• typical in MPI programs
• static process creation
• may waste memory
• Example
• - heap-like computation SPMD way

main() if (id 0)
rootNode() else if (id lt p/2) innerNode()
else leafNode()
23
Code Organization - MPMD
• Multiple Programs Multiple Data
• allows dynamic process creation
• typically master-slave approach
• more memory-efficient
• typical in PVM

master.c main() for(i1 iltp i)
sidi spawn(slave(i))
slave.c main(int id) // slave code
here
24
Parallel Programming Concepts
Performance measures and related
issues Parallelisation approaches Code
organization Sources of parallelism
25
Sources of Parallelism
26
Data Parallelism
• divide data up amongst processors.
• process different data segments in parallel
• communicate boundary information, if necessary
• includes loop parallelism
• well suited for SIMD machines
• communication is often implicit (HPF)

27
• decompose algorithm into different sections
• assign sections to different processors
• often uses fork()/join()/spawn()
• usually does not yield itself to high level of
parallelism

28
Pipelining
• a sequence of tasks whose execution can overlap
• sequential processor must execute them
sequentially, without overlap
• parallel computer can overlap the tasks,
increasing throughput (but not decreasing latency)

29
New Concepts and Terms - Summary
• speedup, efficiency, cost, scalability
• Amdahls law, Gustaffsons law