Parallelisation of Gridoriented Problems - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Parallelisation of Gridoriented Problems

Description:

do calculations for all grid points in subdomain ... Tcalc : calculation time ; Tcomm : communication ... relative to calculation cost ! For the model problems ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 25
Provided by: dirkr
Category:

less

Transcript and Presenter's Notes

Title: Parallelisation of Gridoriented Problems


1
Parallelisation of Grid-oriented Problems
  • Algoritmen voor parallelle computers
  • 8/11/2000

2
Grid-oriented problems
  • PDEs, image processing, data set defined on a
    gridlocal computations with small stencilsÆ
    data dependencies between neighbouring grid
    points
  • grid point generic name for data associated
    withgrid point, pixel, cell, finite element,
  • grid, data set associated work partitioned in
    subdomainsthe subdomains are assigned (mapped)
    to processors

3
Grid-oriented problems (cont.)
  • extra tasks (compared with sequential code)
  • partitioning mapping to ensure work load
    balance and communication minimisation
  • communication between neighbouring subdomains

4
Model problems
  • PDEs
  • explicit time integration (forward Euler)
  • relaxation methods (Jacobi, Gauss-Seidel, SOR, )
  • on a structured (regular) 2D grid
  • image processing
  • convolution
  • on a 2D pixel matrix
  • same data-dependency pattern
  • Æ same parallelisation strategy

5
Explicit time integration convolution
6
Computational molecules
  • 5 point stencil 9 point stencil

7
Computational molecules (cont.)
  • two different 9 point stencils

8
Subdomains overlap regions
  • Note overlap region can have a width gt 1

9
Skeleton of a typical program
  • in every subdomain (processor)
  • exchange data in the overlap region
  • communication with procs. holding neighbouring
    subdomains
  • do calculations for all grid points in subdomain
  • check for stopping criterion (e.g. convergence
    check)
  • global communication (reduction)

10
Exchange overlap regions
  • 5 point stencil

11
Analysis of communication overhead
  • assume p processors n nx x ny points per
    subdomain
  • only communication overhead no sequential
    partno load imbalance
  • T(n,p) parallel execution time T(n,1)
    execution time on 1 proc.
  • Tcalc calculation time Tcomm communication
    time
  • Speedup
  • Efficiency

12
Analysis of communication overhead (cont.)
  • Communication overhead
  • relative to calculation cost !
  • For the model problems
  • tcalc time to perform a floating point
    operation
  • tcomm average time to communicate one floating
    point number
  • Note in case of 1 message of length m tcomm
    (ts mtw)/m !!

13
Analysis of communication overhead (cont.)
  • Communication overhead
  • depends on
  • the size of the subdomain large subdomains have
    a small perimeter to surface ratio
  • the machine characteristic tcomm/tcalc
    indicates how fast communication can be performed
    compared with floating point operations
  • the algorithm via the ratio cc /cf fc is small
    when many flops per grid point (cf) compared with
    the amount of data assocoated with a grid point
    (cc)

14
Partitioning strategies
  • 2D grid M grid points n M/p grid points per
    proc.
  • blockwise partitioning stripwise
    partitioning
  • n nx x ny square blockwise partitioning if
    nx ny n

15
Partitioning strategies (cont.)
  • communication volume perimeter of subdomain
  • square subdomains (nx ny) minimal perimeter
  • ? blockwise partitioning is to be preferred
  • BUT
  • stripwise partitioning
  • higher communication volume
  • fewer neighbours Æ fewer messages
  • choice depends on problem machine
    characteristics
  • stripwise partitioning may be better also when
    communication mainly in one direction
  • (an-isotropic communication)

16
Comm. overhead dependence on problem size
  • 2D grid M grid points n M/p grid points
    per proc.
  • blockwise partitioning
  • per proc points
  • fc (and speedup efficiency) is constant when n
    (problem size per proc) is constant and p grows
  • fc ?(speedup efficiencyØ) when total problem
    size is constant and p grows

17
Comm. overhead dependence on problem size
  • 2D grid M grid points n M/p grid points
    per proc.
  • stripwise partitioning
  • per proc points
  • fc ? (speedup efficiency Ø) when n is constant
    and p grows
  • fc ?? (speedup efficiency ØØ) when total
    problem size is constant and p grows

18
Comm. overhead dependence on problem size
  • 3D problemscommunication volume
  • µ surface to volume ratio of the subdomains
  • blockwise partitioning points per
    proc.
  • fc increases slower as function of n than in 2D
    case
  • d-dimensional problems fc µ 1/n1/d

19
Comm. overhead dependence on comput. molecule
  • computational molecules of increasing size
  • when the molecule covers the whole domain (i.e.
    new value
  • of grid point depends on all other grid points)
    !!

l l u l l
l l l l l l l l l l l l l l l l l l l l l l l l u
l l l l l l l l l l l l l l l l l l l l l l l l
l l l l u l l l l
l l l l l l l l l l l l u l l l l l l l l l l l l
20
Analysis of load imbalance
  • Let calculation time for processor i,
    i 1 p
  • average calculation time
  • maximal calculation time (over all
    procs.)
  • Assume
  • number of operations (counted sequentially)
    independent of p
  • communication time sequential fraction can be
    neglected
  • Execution time of the parallel program
    determined by
  • Efficiency
  • Load balance factor

21
Analysis of load imbalance (cont.)
  • Load balance factor does not depend on !

22
Analysis of load imbalance (cont.)
  • Assume in addition
  • amount of work is equal for each grid point
  • procs. are (implicitly) synchronised by the
    communication at the end of each iteration
  • Let Nmax maximum number of grid points per
    subdomain
  • Naverage M/p average number of grid
    points /subdomain
  • then

23
Load imbalance and partitioning
  • If computational cost is NOT equal for each grid
    point
  • different physics in different regions
  • grid points corresponding to boundary
    conditions
  • optimal partitioning w.r.t. work load balance
    difficult to compute
  • If work load imbalance is due only to boundary
    conditions
  • then blockwise partitioning ensures that the
    boundary
  • conditions are well distributed over the
    processors
  • Æ good load balance
  • blockwise partitioning stripwise
    partitioning

24
Load imbalance and partitioning (cont.)
  • If in a rectangular grid, the number of grid
    lines is not a multiple of p then typically the
    grid is partitioned in (unequal) rectangles
  • not optimal w.r.t work load balance, but easy
  • Also in this case, a blockwise partitioning leads
    to
  • minimal work load imbalance
  • blockwise partitioning stripwise
    partitioning
Write a Comment
User Comments (0)
About PowerShow.com