Parallel Programming Concepts - PowerPoint PPT Presentation

Loading...

PPT – Parallel Programming Concepts PowerPoint presentation | free to view - id: 115503-YzRhN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Parallel Programming Concepts

Description:

Fine Grain Parallelism ... use a language with built-in explicit control for parallelism ... I.e. for expressing loop parallelism (pardo) and data placement ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 30
Provided by: stefan54
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Parallel Programming Concepts


1
Parallel Programming Concepts
Performance measures and related issues
Parallelisation approaches Code
organization Sources of parallelism
2
Performance Measures and Related Issues
Speedup Amdahls law Load Balancing Granularity
3
Superlinear Speedup
  • Should not be possible
  • you can simulate the fast parallel algorithm on
    a single processor to beat the best sequential
    algorithm
  • Yet sometimes happens is practice
  • more memory available in a parallel computer
  • different search order in search problems

Sequential search 12 moves
Parallel search 2 moves
4
Additional terms
  • Efficiency
  • Speedup/p
  • Cost
  • time x p
  • Scalability
  • how efficiently can the hardware/algorithm use
    additional processors
  • Gustaffson law (observation)
  • situation is not as tragic as Amdahls law
    suggests
  • the serial fraction usually stays (nearly)
    constant as the problem size increases
  • consequence nearly linear speedup us possible
    if the problem size increases with the number of
    processors
  • Isoefficiency function see Kumar al. book

5
Exercises Speedup, Efficiency, Cost
Example 1 Compute the sums of columns in an
upper triangular matrix. Program1 for processor
i, n processors sumi 0 for(j0 jlti
j) sumi Aj,i Program2 for processor i,
p processors for(kip klt(i1)p-1 k)
sumk 0 for(j0 jltk j)
sumi Aj,k Program3 for
processor i, p processors for(ki kltn k
p) sumk 0 for(j0 jltk
j) sumi Aj,k
6
Exercises Speedup, Efficiency, Cost
Can we do any better? int AddColumn(int col)
int sum 0 for(i0 iltcol i)
sum Acol, i return sum Program4 for
processor i, p processors col -i while
(colltn) col2i sumcol
AddColumn(col) col2p-2i
sumcol AddColumn(col)
7
Exercises Speedup, Efficiency, Cost
Example 2 Compute the sum of n numbers Program
for processor i, n processors tmpSum Ai
for(j2 jltn j2) if (i j 0)
receive(ij/2, hisSum)
tmpSum hisSum else
send(i-j/2, tmpSum) break
8
Sources of inefficiency
P0 P1

computation P2

idle P3 P4 P5 P6 P7 P8
execution time
9
Sources of inefficiency II
P0 P1

computation P2
idle P3

communication P4 P5 P6 P7 P8
execution time
10
Sources of inefficiency
P0 P1

computation P2
idle P3

communication P4

additional or P5

repeated com- P6

putation P7 P8
execution time
11
Load Balancing
Efficiency adversely affected by uneven workload
P0 P1

computation P2
idle (wasted) P3 P4
12
Load Balancing (cont.)
  • Load balancing shifting work from heavily loaded
    processors to lightly loaded ones.
  • P0
  • P1
    computation
  • P2
    idle (wasted)
  • P3
    moved
  • P4
  • Static load balancing
  • before execution
  • Dynamic load balancing
  • during execution

13
Granularity
  • The size of the computation segments between
    communication.
  • fine grained
    coarse
    grained
  • ILP loop
    parallelism task
    parallelism
  • The most efficient granularity is dependent on
    the algorithm and the hardware environment in
    which it runs
  • In most cases overhead associated with
    communications and synchronization is high
    relative to execution speed so it is advantageous
    to have coarse granularity.

14
Fine Grain Parallelism
  • All tasks execute a small number of instructions
    between communication cycles
  • Low computation to communication ratio
  • Facilitates load balancing
  • Implies high communication overhead and less
    opportunity for performance enhancement
  • If granularity is too fine it is possible that
    the overhead required for communications and
    synchronization between tasks takes longer than
    the computation

15
Coarse Grain Parallelism
  • Typified by long computations consisting of
    large numbers of instructions between
    communication synchronization points
  • High computation to communication ratio
  • Lower communication overhead, more opportunity
    for performance increase
  • Harder to load balance efficiently

P0 P1
computation P2

commmunication P3 P4
16
Granularity vs. Coupling
  • Granularity
  • fine grained
    coarse
    grained
  • tightly

    loosely
  • SMP ccNUMA NUMA
    MPP ethernet cluster
  • Coupling
  • the looser the coupling, the coarser granularity
    must be for the communication not to overwhelm
    computation

17
Parallel Programming Concepts
Performance measures and related
issues Parallelisation approaches Code
organization Sources of parallelism
18
Parallelisation Approaches
  • Parallelizing compiler
  • advantage use your current code
  • disadvantage very limited abilities
  • Parallel domain-specific libraries
  • e.g. linear algebra, numerical libraries,
    quantum chemistry
  • usually good choice, use when possible
  • Communication libraries
  • message passing libraries MPI, PVM
  • shared memory libraries declare and access
    shared memory variables (on MPP machines done by
    emulation)
  • advantage use standard compiler
  • disadvantage low level programming (parallel
    assembler)

19
Parallelisation Approaches (cont.)
  • New parallel languages
  • use a language with built-in explicit control
    for parallelism
  • no language is the best in every domain
  • needs new compiler
  • fights against inertia
  • Parallel features in existing languages
  • adding parallel features to an existing language
  • I.e. for expressing loop parallelism (pardo) and
    data placement
  • example High Performance Fortran
  • Additional possibilities in shared-memory systems
  • use threads
  • preprocessor compiler directives (OpenMP)

20
Parallelisation Approaches Our Focus
  • Communication libraries MPI, PVM
  • industry standard, available for every platform
  • very general, low level approach
  • perfectly match for clusters
  • most likely to be useful for you
  • Shared memory programming
  • also very important
  • likely to be useful in next iterations of PCs

21
Parallel Programming Concepts
Performance measures and related
issues Parallelisation approaches Code
organization Sources of parallelism
22
Code Organization - SPMD
  • Single Program Multiple Data
  • well suited for SIMD computers
  • popular choice even for MIMD, as it keeps
    everything in one place
  • typical in MPI programs
  • static process creation
  • may waste memory
  • Example
  • - heap-like computation SPMD way

main() if (id 0)
rootNode() else if (id lt p/2) innerNode()
else leafNode()
23
Code Organization - MPMD
  • Multiple Programs Multiple Data
  • allows dynamic process creation
  • typically master-slave approach
  • more memory-efficient
  • typical in PVM

master.c main() for(i1 iltp i)
sidi spawn(slave(i))
slave.c main(int id) // slave code
here
24
Parallel Programming Concepts
Performance measures and related
issues Parallelisation approaches Code
organization Sources of parallelism
25
Sources of Parallelism
Data Parallelism Task Parallelism Pipelining
26
Data Parallelism
  • divide data up amongst processors.
  • process different data segments in parallel
  • communicate boundary information, if necessary
  • includes loop parallelism
  • well suited for SIMD machines
  • communication is often implicit (HPF)

27
Task Parallelism
  • decompose algorithm into different sections
  • assign sections to different processors
  • often uses fork()/join()/spawn()
  • usually does not yield itself to high level of
    parallelism

28
Pipelining
  • a sequence of tasks whose execution can overlap
  • sequential processor must execute them
    sequentially, without overlap
  • parallel computer can overlap the tasks,
    increasing throughput (but not decreasing latency)

29
New Concepts and Terms - Summary
  • speedup, efficiency, cost, scalability
  • Amdahls law, Gustaffsons law
  • Load Balancing static, dynamic
  • Granularity fine, coarse
  • Tightly, loosely coupled system
  • SPMD, MPMD
  • Data Parallelism, Task Parallelism, Pipelining
About PowerShow.com