Domain decomposition in parallel computing - PowerPoint PPT Presentation

About This Presentation
Title:

Domain decomposition in parallel computing

Description:

Domain decomposition in parallel computing Ashok Srinivasan ... * Three phases Graph coarsening Combine vertices to create a smaller graph Example: ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 34
Provided by: asri9
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Domain decomposition in parallel computing


1
Domain decomposition in parallel computing
COT 5410 Spring 2004
  • Ashok Srinivasan
  • www.cs.fsu.edu/asriniva
  • Florida State University

2
Outline
  • Background
  • Geometric partitioning
  • Graph partitioning
  • Static
  • Dynamic
  • Important points

3
Background
  • Tasks in a parallel computation need access to
    certain data
  • Same datum may be needed by multiple tasks
  • Example In matrix-vector multiplication, b2 is
    needed for the computation of all ci2, 1 lt i lt n
  • If a process does not own a datum needed by its
    task, then it has to get it from a process that
    has it
  • This communication is expensive
  • Aims of domain decomposition
  • Distribute the data in such a manner that the
    communication required is minimized
  • Ensure that the computational loads on
    processes are balanced

4
Domain decomposition example
  • Finite difference computation
  • New value of a node depends on old values of its
    neighbors
  • We want to divide the nodes amongst the processes
    so that
  • Communication is minimized
  • Measure of partition quality
  • Computational load is evenly balanced

5
Geometric partitioning
  • Partition a set of points
  • Uses only coordinate information
  • Balances the load
  • The heuristic tries to ensure that communication
    costs are low
  • Algorithms are typically fast, but partition not
    of high quality
  • Examples
  • Orthogonal recursive bisection
  • Inertial
  • Space filling curves

6
Orthogonal recursive bisection
  • Recursively bisect orthogonal to the longest
    dimension
  • Assume communication is proportional to the
    surface area of the domain, and aligned with
    coordinate axes
  • Recursive bisection
  • Divide into two pieces, keeping load balanced
  • Apply recursively, until desired number of
    partitions obtained

7
Inertial
  • ORB may not be effective if cuts along the x, y,
    or z directions are not good ones
  • Inertial
  • Recursively bisect orthogonal to the inertial axis

8
Space filling curves
  • Space filling curves
  • A continuous curve that fills the space
  • Order the points based on their relative position
    on the curve
  • Choose a curve that preserves proximity
  • Points that are close in space should be close in
    the ordering too
  • Example
  • Hilbert curve

9
Hilbert curve
  • Sources
  • http//www.dcs.napier.ac.uk/andrew/hilbert.html
  • http//www.fractalus.com/kerry/tutorials/hilbert/h
    ilbert-tutorial.html

10
Domain decomposition with a space filling curve
  • Order points based on their position on the curve
  • Divide into P parts
  • P is the number of processes
  • Space filling curves can be used in adaptive
    computations too
  • They can be extended to higher dimensions too

11
Graph partitioning
  • Model as graph partitioning
  • Graph G (V, E)
  • Each task is represented by a vertex
  • A weight can be used to represent the
    computational effort
  • An edge exists between tasks if one needs data
    owned by the other
  • Weights can be associated with edges too
  • Goal
  • Partition vertices into P parts such that each
    partition has equal vertex weights
  • Minimize the weights of edges cut
  • Problem is NP hard
  • Edge cut metric
  • Judge the quality of the partitioning by the
    number of edges cut

12
Static graph partitioning
  • Combinatorial
  • Levelized nested dissection
  • Kernighan-Lin/Feduccia-Matheyses
  • Spectral partitioning
  • Multi-level methods

13
Combinatorial partitioning
  • Use only connectivity information
  • Examples
  • Levelized nested dissection
  • Kernighan-Lin/Feduccia-Matheyses

14
Levelized nested dissection (LND)
  • Idea is similar to the geometric methods
  • But cannot use coordinate information
  • Instead of projecting vertices along the longest
    axis, order them based on distance from a vertex
    that may be one extreme of the longest dimension
    of a graph
  • Pseudo-peripheral vertex
  • Perform a breadth-first search, starting from an
    arbitrary vertex
  • The vertex that is encountered last might be a
    good approximation to a peripheral vertex

15
LND example Finding a pseudoperipheral vertex
3
2
3
2
1
3
1
2
Initial vertex
1
3
4
Pseudoperipheral vertex
16
LND example Partitioning
5
6
3
4
5
2
5
4
2
3
1
Partition
Initial vertex
Recursively bisect the subgraphs
17
Kernighan-Lin/Fiduccia-Matheyses
  • Refines an existing partition
  • Kernighan-Lin
  • Consider pairs of vertices from different
    partitions
  • Choose a pair whose swapping will result in the
    best improvement in partition quality
  • The best improvement may actually be a worsening
  • Perform several passes
  • Choose best partition among those encountered
  • Fiduccia-Matheyses
  • Similar but more efficient
  • Boundary Kernighan-Lin
  • Consider only boundary vertices to swap
  • ... and many other variants

18
Kernighan-Lin example
Swap these
Better partition Edge cut 3
Existing partition Edge cut 4
19
Spectral method
  • Based on the observation that a Fiedler vector of
    a graph contains connectivity information
  • Laplacian of a graph L
  • lii di (degree of vertex i)
  • lij -1 if edge i,j exists, otherwise 0
  • Smallest eigenvalue of L is 0 with eigenvector
    all 1
  • All other eigenvalues are positive for a
    connected graph
  • Fiedler vector
  • Eigenvector corresponding to the second smallest
    eigenvalue

20
Fiedler vector
  • Consider a partitioning of V into A and B
  • Let yi 1 if vi e A, and yi -1 if vi e B
  • For load balance, Si yi 0
  • Also Seij e E (yi-yj)2 4 x number of edges
    across partitions
  • Also, yTLy Si di yi2 2 Seij e E yiyj
  • Seij e E (yi-yj)2

21
Optimization problem
  • The optimal partition is obtain by solving
  • Minimize yTLy
  • Constraints
  • yi e -1,1
  • Si yi 0
  • This is NP hard
  • Relaxed problem
  • Minimize yTLy
  • Constraints
  • Si yi 0
  • Add a constraint on a norm of y, example, y2
    n0.5
  • Note
  • (1, 1, ..., 1)T is an eigenvector with eigenvalue
    0
  • For a connected graph, all other eigenvalues are
    positive and orthogonal to this eigenvector,
    which implies Si yi 0
  • The objective function is minimized by a Fiedler
    vector

22
Spectral algorithm
  • Find a Fiedler vector of the Laplacian of the
    graph
  • Note that the Fiedler value (the second smallest
    eigenvalue) yields a lower bound on the
    communication cost, when the load is balanced
  • From the Fiedler vector, bisect the graph
  • Let all vertices with components in the Fiedler
    vector greater than the median be in one
    component, and the rest in the other
  • Recursively apply this to each partition
  • Note Finding the Fiedler vector of a large graph
    can be time consuming

23
Multilevel methods
  • Idea
  • It takes time to partition a large graph
  • So partition a small graph instead!
  • Three phases
  • Graph coarsening
  • Combine vertices to create a smaller graph
  • Example Find a suitable matching
  • Apply this recursively until a suitably small
    graph is obtained
  • Partitioning
  • Use spectral or another partitioning algorithm to
    partition the small graph
  • Multilevel refinement
  • Uncoarsen the graph to get a partitioning of the
    original graph
  • At each level, perform some graph refinement

24
Multilevel example(without refinement)
9
10
5
7
3
11
2
4
8
12
16
1
1
6
15
13
14
25
Multilevel example(without refinement)
9
10
5
7
3
11
2
4
8
12
16
1
1
6
15
13
14
26
Multilevel example(without refinement)
9
10
5
7
3
1
1
2
11
1
2
1
2
2
4
8
1
12
16
1
1
1
6
15
1
13
14
27
Multilevel example(without refinement)
9
10
5
7
3
1
1
2
11
1
2
1
2
2
4
8
1
12
16
1
1
6
15
1
13
14
28
Multilevel example(without refinement)
9
10
5
7
3
1
1
2
11
1
2
1
2
2
4
8
1
12
16
1
1
6
15
1
13
14
1
2
2
1
29
Dynamic partitioning
  • We have an initial partitioning
  • Now, the graph changes
  • Determine a good partition, fast
  • Also minimize the number of vertices that need
    to be moved
  • Examples
  • PLUM
  • Jostle
  • Diffusion

30
PLUM
  • Partition based on the initial mesh
  • Vertex and edge weights alone changed
  • Map partitions to processors
  • Use more partitions than processors
  • Ensures finer granularity
  • Compute a similarity matrix based on data already
    on a process
  • Measures savings on data redistribution cost for
    each (process, partition) pair
  • Choose assignment of partitions to processors
  • Example Maximum weight matching
  • Duplicate each processor of partitions/P times
  • Alternative Greedy approximation algorithm
  • Assign in order of maximum similarity value
  • http//citeseer.nj.nec.com/oliker98plum.html

31
JOSTLE
  • Use Hu and Blakes scheme for load balancing
  • Solve Lx b using Conjugate Gradient
  • L Laplacian of processor graph, bi Weight on
    process Pi Average weight
  • Move max(xi-xj, 0) weight between Pi and Pj
  • Leads to balanced load
  • Equivalent to Pi sending xi load to each neighbor
    j, and each neighbor Pj sending xj to Pi
  • Net loss in load for Pi di xi - Sneighborj xj
    L(i)x bi
  • where L(i) is row i of L, and di is degree of i
  • New load for Pi weight on Pi - bi average
    weight
  • Leads to minimum L2 norm of load moved
  • Using max(xi-xj, 0)
  • Select vertices to move, based on relative gain
  • http//citeseer.nj.nec.com/walshaw97parallel.html

32
Diffusion
  • Involves only communication with neighbors
  • A simple scheme
  • Processor Pi repeatedly sends a wi weight to each
    neighbor
  • wi weight on Pi
  • wk (I a L) wk-1 , wk weight vector at
    iteration k
  • Simple criteria exist for choosing a to ensure
    convergence
  • Example a 0.5/(maxi di),
  • More sophisticated schemes exist

33
Important points
  • Goals of domain decomposition
  • Balance the load
  • Minimize communication
  • Space filling curves
  • Graph partitioning model
  • Spectral method
  • Relax NP hard integer optimization to floating
    point, and then discretize to get approximate
    integer solution
  • Multilevel methods
  • Three phases
  • Dynamic partitioning additional requirements
  • Use old solution to find new one fast
  • Minimize number of vertices moved
Write a Comment
User Comments (0)
About PowerShow.com