Case Studies: Algorithms Sorting, FFT and All-pairs-meet - PowerPoint PPT Presentation

About This Presentation
Title:

Case Studies: Algorithms Sorting, FFT and All-pairs-meet

Description:

Hist time (%) Histo iteration. Distribution. 9. Effect of using multiple phases:128 procs ... Hist % Phases. K. 10. Further optimizations? Your suggestions? 11 ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 14
Provided by: san7196
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Case Studies: Algorithms Sorting, FFT and All-pairs-meet


1
Case Studies AlgorithmsSorting, FFTand
All-pairs-meet
  • CS320
  • Spring 2003
  • Laxmikant Kale
  • http//charm.cs.uiuc.edu
  • Parallel Programming Laboratory
  • Dept. of Computer Science
  • University of Illinois at Urbana Champaign

2
Sorting
  • Problem
  • Given N records, each with an integer key
  • Distributed possibly non-uniformly (and randomly)
    to begin with
  • Sort them across processors
  • So, processor 0 has the smallest set of keys ..
  • We will focus on keys only
  • The rest of the data moves with the keys
  • Either whenever the key moves, or once at the end.

3
Parallel Sorting Algorithms
  • Comparison based vs radix based
  • Algorithms in literature
  • Bitonic Sort
  • Radix sort
  • Load balanced sort
  • Sample sort
  • Multiphase Histogramming sort
  • We will focus on the last

4
Basic Idea
  • Phase I
  • Iteratively guess-and-correct a set of P-1
    splitter keys
  • These form the boudary values between processors
  • So that they divide the the data equally
  • Phase II
  • Then have each processor send each record to the
    destined processor
  • How to do phase I?
  • Histograms
  • let proc 0 guess splitter key, and broadcast
  • All processors compute the number of keys in each
    partition
  • Reduction to proc 0, which adjusts the splitter
    keys and repeats

5
Problems with basic idea
  • Histogramming may take too long
  • Optimizations
  • use m(P-1) splitter keys for faster convergence
  • Major change
  • Make the algorithm a multiphase one
  • In first phase, partition the data into k
    partitions of P/k processors each
  • Repeat the algorithm in each phase
  • Consequence Size of histogram is small
  • But, each key moves multiple times (may be ok)
  • Challenge data transfer in early phases
  • Whom should processor I send its data for
    partition j to?
  • There are multiple processors in that partition

6
Solution to data transfer challenge
  • Use the reduction-broadcast phase to generate
    data-transfer schedules
  • Do histogram reduction using spanning tree
    explicitly
  • I.e. dont call mpi-reduce
  • At each intermediate node of the tree
  • Record the histograms for each subtree
  • At root, when the final splittter keys are
    decided
  • Send a quintuple to each child, for each
    partition
  • (startProc, numStart, numMid, endProc, numEnd)
  • Each intermediate node can use this quintuple,
    and the stored (last) histogram to generate
    quintuples for each child
  • At the leaf,
  • each proc knows, for each partition, how many
    keys to send to each processor

7
Performance
P Ncube/2 iPSC.860
64 12.3 3.87
128 6.87 2.66
256 3.93
512 2.46
1024 2.00
With number of partitions k8
8
Non uniform data
Distribution Histo iteration Hist time () Total time
D1 332 4.4 4.0
D2 432 5 3.93
D3 1183 10.1 4.23
D4 1252 10.4 4.28
D5 1531 8.2 5.2
9
Effect of using multiple phases128 procs
K Phases Hist Move Total time
4 4 2.2 12.6 7.27
8 3 2.1 10.4 6.87
16 2 2.7 8.1 6.44
128 1 7.9 7.9 6.81
10
Further optimizations?
  • Your suggestions?

11
All-pairs-meet
  • Example
  • N atoms distributed across P processors
  • Must calculate forces between each pair of atoms
  • (Better algorithms exist, but we will focus on
    explicit calculation for illustration)
  • Straightforward implementation
  • Each proc broadcasts its atoms to all
  • Problems?

12
Ring
  • Each processor sends its atoms to the next
    processor
  • In each subsequent phase, they forward the forces
    and atoms to the next proc

13
Pair-objects
  • Let there be a set of kxk objects
  • Each responsible for calculating interaction
    between a subset of pairs of processors
Write a Comment
User Comments (0)
About PowerShow.com