CSE621 : Parallel Algorithms - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

CSE621 : Parallel Algorithms

Description:

... permutation such that xp(1) xp(2) ... xp(n) and let s be a permutation ... In particular, permute the columns so that n**(3/8) columns in each block are ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 27
Provided by: poste74
Category:

less

Transcript and Presenter's Notes

Title: CSE621 : Parallel Algorithms


1
CSE621Parallel Algorithms Lecture 3Sorting
September 13, 1999
2
Overview
  • Review of the previous lecture
  • Sorting on 2-D n-step algorithm
  • Sorting on 2-D 0-1 sorting lemma
  • Sorting on 2-D \root(n)(log n 1)-step
    algorithm
  • Sorting on 2-D 3\root(n) o(\root(n))
    algorithm
  • Sorting Matching lower bound
  • Sorting on 2-D word-model vs. bit-model
  • Summary

3
Review of the previous lecture
  • Sorting on the CRCW and CREW PRAMs
  • Odd-Even Merge Sort on the EREW PRAM
  • Sorting on the One-Dimensional Mesh
  • Insertion sort
  • Transposition sort
  • Sorting on the Two-Dimensional Mesh
  • snake-order row-column sort
  • Sorting Networks
  • odd-even merge sort network
  • bitonic sort network

4
Sorting on 1-D n-step algorithm
  • Previously shown insertion and odd-even
    transposition sort are n-step algorithm.
  • How to prove that the sorting finishes before
    O(n)?
  • gt 0-1 Sorting Lemma

5
Sorting The 0-1 Sorting Lemma
  • Lemma (The 0-1 Sorting Lemma)
  • If an oblivious comparison-exchange algorithm
    sorts all input sets consisting solely of 0s and
    1s, then it sorts all input sets with arbitrary
    values.
  • Proof (By contradiction)
  • 1 Assume that an oblivious comparison-exchange
    algorithm fails to correctly sort some set of
    input values x1, x2, , xn.
  • 2 Let p be a permutation such that xp(1) lt xp(2)
    lt lt xp(n) and let s be a permutation such
    that the output of the sorting algorithm is
    xs(1), xs(2), .., xs(n).
  • 3 Let k be the smallest value such that xs(k) ltgt
    xp(k) (by 1).
  • 4 By definition, this means that xs(i) xp(i)
    for 0lt i lt k and xs(k) gt xp(k). Hence, there must
    be a value of rgt k such that xs(r) xp(k).
  • 5 Define xi 0 if xi lt xp(k) and 1 if xi gt
    xp(k) and examine the actions of the algorithm
    on the input set obtained by replacing xi with
    xi for 0lt i lt n
  • 6 Since xi gt xj gt xi gt xj for every i and
    j, the algorithm performs the same
    comparison-exchange operations on the xinputs as
    it did on the original inputs.

6
Sorting The 0-1 Sorting Lemma (the proof contd)
  • Proof (By contradiction)
  • 7 Hence the output on the 0-1 values will be
  • xs(1), xs(2), .., xs(n) 0, 0, , 0, 1, ,
    0,
  • which is incorrect.
  • 8 The result contradicts against the assumption.

7
Sorting on 2-D Snake-order row-column sort
  • Aka Shear sort
  • How to prove that the shear sort completes the
    sorting in log n 1 phases?
  • Use 0-1 Sorting Lemma
  • After applying two phases, the number of unsorted
    rows become less than the half of the total rows.
  • After the sorting of rows in the 1st phase and
    paring two rows
  • 00000111 001111 0..01..1
  • 1.100 1.100 1..10..0
  • more 0s more 1s equal number
  • - After the column exchange,
  • 00000000 001101 0..00..0
  • 1.10..01.1 1.111 1..11..1
  • After the sorting of columns, all 0-rows move to
    the upper region and all 1-rows move to the lower
    region
  • Since at least one row in each pair becomes all-0
    or all-1 and is moved out of middle region after
    the sorting the row and column, the middle region
    (dirty region) decreases in size by at least 1/2
    for each pair of phases.
  • Total repetition of phases gt 2 log (\root(n))
    log n

8
Sort on 2-D Quadrant sorting
  • Algorithm
  • 1 Recursively sort each quadrant in snake-order
  • 2 Sort the rows in snake order
  • 3 Sort the columns
  • 4 Do 4 \root(n) steps of snake-order bubble sort

0
0
0
50/50
1
1
1
Border lines
0
0
0
50/50
1
1
1
All 0
4 dirty rows gt transposition sort
All 1
9
Sort on 2-D Quadrant sorting contd
  • Timing Analysis
  • Phase 1 O( n1/4 X 1/2 log n) O( 1/2 n1/4
    log n)
  • Phase 2 O(\root(n))
  • Phase 3 O(\root(n))
  • Phase 4 4\root(n)
  • Total O(n1/4 log n 6\root(n))
    O(\root(n))
  • Extension of idea
  • Sort a mesh of size 2i X 2I
  • After the sort of 2(i-1) X 2 (i-1) submesh,
    the algorithm requires 6 X 2i additional steps
  • 6 ( 1 2 4 . . . . \root(n)/2 \root(n))
    O(\root(n))

10
Sort on 2-D 3\root(n) o(\root(n)) step
algorithm
  • Algorithm
  • 1 Divide the mesh into n1/4 blocks of size
    n(3/8) X n(3/8) and simultaneously sort each
    block in snake-order.
  • 2 Perform an n(1/8)-way unshuffle of the
    columns. In particular, permute the columns so
    that n(3/8) columns in each block are
    distributed evenly among the n(1/8) vertical
    slices.
  • 3 Sort each block into snake-order.
  • 4 Sort each column in linear-order.
  • 5 Collectively sort blocks 1 and 2, blocks 3 and
    4, etc. of each vertical slice into snake-order.
  • 6 Collectively sort blocks 2 and 3, blocks 4 and
    5, etc. of each vertical slice into snake-order.
  • 7 Sort each row in linear order according to the
    direction of the overall n-cell snake.
  • 8 Perform 2 n(3/8) steps of odd-even
    transposition sort on the overall n-cell snake.

11
Blocks and Slices
After phase 3 at most 2 rows in each horizontal
slice
After phase 1 at most 1 row in each block
12
After phase 6 each vertical slice contains at
most one dirty row
13
Sort on 2-D 3\root(n) o(\root(n)) step
algorithm
  • Timing Analysis
  • 1 O(n(3/8) log n)
  • 2 \root(n) o(n(3/8))
  • 3 O(n(3/8) log n)
  • 4 2\root(n)
  • 5 O(n(3/8) log n)
  • 6 O(n(3/8) log n)
  • 7 2\root(n)
  • 8 2n(3/8)
  • TOTAL 3\root(n) O(n(3/8) log n) lt 3\root(n)
    o(\root(n))

14
Sorting on 2-D Matching lower bound
  • Claim Lower bound is 3\root(n) - o(\root(n))
    steps to sort n items on the 2-d mesh.
  • Reason 1 Any sorting algorithm on the 2-d mesh
    must take at least (2\root(n) -2 steps) to move
    from (1,1) to (\root(n), \root(n)) position.
  • Reason 2 (stronger lower bound)
  • Consider numbers in the left upper triangle of
    size 2n(1/4) X 2n(1/4) (unknown values).
  • The numbers between 1 and n-2\root(n) stored
    arbitrarily in the remainder of the mesh.

15
Sorting on 2-D Matching lower bound
  • Reason 2 (stronger lower bound) contd
  • Let x denote the number in cell (\root(n),
    \root(n)) after 2\root(n)-2n(1/4) -3.
  • Then x is independent of the number in the
    triangle.
  • Let C(m,x) denote the correct column for x when
    precisely m of the unknown values are set to 0
    and 2\root(n)-m values are set to n.
  • As m varies between 0 and 2\root(n), C(m,x)
    varies between 1 and \root(n) achieving each
    possible value at least twice.
  • Pick m so that C(m,x) 1. Then x will have to
    move from cell (\root(n),\root(n) to a cell in
    the first column.
  • This will take at least \root(n)-1 additional
    steps.
  • Thus the algorithm takes at least
  • 3\root(n) - 2n(1/4) - 4 steps.
  • 3\root(n) - o(\root(n)) steps

16
Sorting Word-model vs. Bit-model
  • Previous sorting algorithms
  • Algorithm and analysis is based on word-model.
  • Bit-model a more precise model
  • used to analyze the number of gates or components
    actually needed to build that device
  • close to low-level machine
  • Sorting algorithm in word-model
  • key function comparison/ store/ send/ receive
  • how to change these functions to bit-model?

17
Sorting Word-model vs. Bit-model (contd)
  • Comparison in bit-model
  • Method 1 Use of linear array to compare the
    numbers bit by bit, starting with the MSB
  • Method 2 Use a complete binary tree network. The
    result is condensed and propagated.
  • Method 2 is superior to the linear array method
    in two respects
  • use log k 1 bit steps
  • use the tree to tell each leaf simultaneously
    which number to pass
  • Can do better?
  • If there are many numbers to compare (consider
    insertion sort) pipelined execution in linear
    array is better.
  • Total complexity 2nk-2 bit steps
  • Complexity deciding factors interconnection
    parameters such as bandwidth, diameter, and
    bisection width.

18
(No Transcript)
19
(No Transcript)
20
Sorting Non-comparison based sorting
  • Sorting n k-bit numbers on a binary tree
  • Assume that each leaf consists of a k-cell linear
    array of bit processors.
  • Root log n -cell linear array of
    bit-processors
  • Each leaf initially contains one of the k-bit
    binary numbers to be sorted
  • The sorting completes when the ith leaf contains
    the ith smallest number
  • Analysis based on interconnection parameters
    indicates that the time complexity is larger than
    W(Nk) bit steps
  • The argument is correct if k is bigger than
    (1e)log n for some constant e. But for smaller
    k, there are O(log N) time algorithm.
  • Sorting n 1-bit numbers on a binary tree
  • Change it to a counting problem.
  • After counting the number of 1s in leaves (say
    m), set the values in the right most m leaves to
    1 and the values in the leftmost n-m leaves to 0.

21
(No Transcript)
22
(No Transcript)
23
Other Issues on Sorting
  • Other sorting algorithms
  • Quick Sort
  • Radix Sort
  • Extending to other topologies
  • Hypercube
  • Tree

24
Other sorting algorithms Quick sort
  • Quick sort
  • Sequential version
  • Choose a pivot
  • Divide a list into two sub-lists which are
    smaller than or equal to and larger than the
    pivot
  • Recursive
  • Pivot is very important to avoid a worst case
  • Parallel version on 2-D Mesh
  • Assume to sort pm, pm1, pm2, , pmk.
  • Choose a random pivot and broadcast this pivot to
    all k processors using embedded tree.
  • Each processor propagates two values ( of
    elements larger than and of elements smaller
    than) to its parents.
  • Information is propagated down the tree to enable
    each element to be moved to its proper position.
  • Parallel version on Hypercube
  • Split each dimension by newly chosen pivot value.

25
Other sorting algorithms Radix sort
  • Radix sort algorithm
  • Relies on the binary representation of the
    elements to be sorted
  • Examines the elements to be sorted r bits at a
    time, where r lt b.
  • Radix sort requires b/r iterations
  • Parallel radix sort
  • load balance

26
Summary
  • Sorting on 2-D n-step algorithm
  • Sorting on 2-D 0-1 sorting lemma
  • Proof of correctness and time complexity
  • Sorting on 2-D \root(n)(log n 1)-step
    algorithm
  • Shear sort
  • Sorting on 2-D 3\root(n) o(\root(n))
    algorithm
  • Reducing dirty region
  • Sorting Matching lower bound
  • 3\root(n) - o(\root(n))
  • Sorting on 2-D word-model vs. bit-model
Write a Comment
User Comments (0)
About PowerShow.com