Partitioning and divide and conquer strategies - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Partitioning and divide and conquer strategies

Description:

Sorting using bucket sort. Unsorted numbers are uniformly distributed over some region ... Assign p small buckets to cover the range of numbers. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 30
Provided by: ITS8213
Category:

less

Transcript and Presenter's Notes

Title: Partitioning and divide and conquer strategies


1
Partitioning and divide and conquer strategies
2
Partitioning
  • Partitioning simply divides the data into parts
  • Data partitioning or domain decomposition
  • Functional decomposition
  • Divide and Conquer
  • Characterized by dividing the problem into
    sub-problems of the same form as the larger
    problem. Further divisions into still smaller
    sub-problems usually done by recursion.
  • Recursive divide and conquer is amenable to
    parallelization because separate processes can be
    used for the divided parts.
  • Also, usually data is naturally localized.

3
Partitioning/Divide conquer examples
  • Many possibilities
  • Operations on sequences of numbers such as simply
    adding them together
  • Sorting algorithms can often be partitioned or
    constructed in a recursive fashion
  • Numerical integration (quadrature)
  • N-body problem

4
Summing numbers
Have a set of n numbers x1, x2, , xn we wish
to collapse. Have p processors, so divide set
into parts with n/p items.
p - 1 additions in recombination phase
(n/p) additions per processor
5
Speedup analysis
Number of steps tserial ? n - 1 tparallel ?
n/p p - 2 Speed-up S tserial/tparallel
n - 1
n/p p - 2
No speedup for n?p Parallel performance worse for
nltp For large n, S?p as expected.
(Analysis ignores start-up and communication
times, see text p110)
6
Divide conquer implementation
Termination when 2 numbers left
Sequential recursive add (pseudocode)
int add(int s) if (number(s) lt 2) return n1
n2 else divide(s, s1, s2) part_sum1
add(s1) part_sum2 add(s2) return
(part_sum1 part_sum2)
Add calls itself recursively
7
Tree diagram
add()
add()
add()
add()
In a parallel implementation, we want to traverse
several parts of the tree simultaneously.
8
Obvious (naïve) method
Assign one processor to each node. Requires 2m-1
processors Very inefficient, as lots of
processors would be idle.
Re-use processor at different levels of the tree
9
Original list
List division
Partial summation
P7
P0
P2
P3
P6
P1
P4
P5
P0
P2
P4
P6
P0
P4
P0
Final sum
10
Speedup in divide conquer
There are log2 p levels in the tree Number of
computational steps tparallel n/p log2
p Speedup
n - 1
S
n/p log2 p
11
M-ary divide and conquer
Divide and conquer can also be applied where a
task is divided into more than two parts at each
stage
Int add(int s) if (number(s) lt M)
return(n1n2?nM) else divide(s, s1, s2, ,
sM) part_sum1 add(s1) part_sum2
add(s2) part_sumM add(sM) return
(part_sum1part_sum2partsumM)
12
Sorting numbers
Given a sequence of unsorted numbers, one can use
a number of algorithms of order n log n to sort
them (eg quicksort). How can we parallelize this?
13
Sorting using bucket sort
Unsorted numbers are uniformly distributed over
some region Divide range into equal sized
regions Assign one bucket for each region Sort
each bucket using some standard algorithm (eg
quicksort) Concatenate the buckets.
(see text p117 (p119 old edition) for diagrams
and more explanations)
14
Parallelizing bucket sort
Obvious method one processor per bucket!
15
Further parallelization
Partition sequence into m regions, one region for
each processor. Each processor maintains p
small buckets and separates the numbers in its
region into its own small buckets. Small buckets
are than emptied into p final buckets for
sorting. This requires each processor to send one
small bucket to each of the other processors
(bucket i to processor i)
16
Recall all-to-all broadcast
n/p numbers
Unsorted numbers
p processors
p small buckets
Large buckets
Sort contents of buckets
Merge lists
Sorted numbers
17
Steps on one processor
Take its subset of numbers Assign p small
buckets to cover the range of numbers. Note
small bucket i will cover the same range as the
large bucket on processor i. Throw them into the
small buckets depending on the value of each
number. Empty each small bucket into the
appropriate large bucket, ie send small bucket i
to processor i. Sort its own large bucket.
18
Analysis
Sequential bucket sort (with p buckets) t ? n
p (n/p) log(n/p) n n log(n/p) Parallel
bucket sort (p small buckets plus one big bucket
per processor) ti ? (n/p) (n/p) log(n/p)
19
Numerical integration - quadrature
  • Just straightforward data partitioning
  • Several ways of doing the quadrature
  • Using rectangles
  • Trapezoidal method
  • Adaptive quadrature

See text
20
N-body simulations
Finding positions and movements of bodies in
space subject to the gravitational forces from
other bodies, using Newtonian laws of physics.
21
Gravitational N-body equations
Gravitational force between two bodies of masses
m1 and m2 is F Gm1m2 / r2 Where G is the
gravitational constant and r is the distance
between the bodies. Subject to forces, body
accelerates according to Newtons second law F
ma
22
Details
Let the time interval be ?t. For a body of mass
m, the force is f m(vt1 - vt) / ?t New
velocity is vt1 vt F ?t / m Where vt1
is the velocity at time t1 and vt is velocity at
time t. Over time interval ?t, position changes
by xt1 - xt v?t Where xt is position at
time t. Once bodies move to new positions,
forces change. Computation has to be repeated.
23
Sequential code
Overall gravitational N-body computation can be
described by
For (t 0 t lt tmax t) for (i 0 i lt N
i) F force_func(i) vinew vi F
dt / m xinew xi vi dt For
(i 0 i lt nmax i) xi xinew vi
vinew
24
Parallel code
The sequential algorithm is an O(N2) algorithm
(for one iteration) as each of the N bodies is
influenced by each of the other N-1 bodies. Not
feasible to use this direct algorithm for the
more interesting N-body problems where N is very
large.
25
Time complexity can be reduced by using the fact
that a cluster of distant bodies can be
approximated as a single distant body of the
total mass of the cluster sited at the centre of
the mass of the cluster.
Centre of mass
r
Distant cluster of bodies
26
Barnes-Hut Algorithm
  • Star with the whole space in which one cube
    contains the bodies (or particles)
  • First, this cube is divided into eight subcubes
  • If a subcube contains no particles, it is deleted
    from further consideration
  • If a subcube contains one body, this subcube is
    retained
  • If a subcube contains more than one body, it is
    recursively divided until every subcube contains
    one body.

27
Creates an octtree - a tree with up to eight
edges from each node. The leaves represent cells
containing one body. After the tree has been
constructed, the total mass and the centre of
mass of the subcube is stored at each node.
28
Force on each body is obtained by traversing the
tree, starting at the root, and stopping when the
clustering approximation can be used, ie when r
? d / ? Where ? is a constant typically 1.0 or
less. Constructing the tree requires a time
complexity of O(n log n), and so does the
computing of the forces. So the overall time
complexity of the method is O(n log n).
29
Recursive division of two-dimensional space
Write a Comment
User Comments (0)
About PowerShow.com