An InstallTime System for Automatic Generation of Optimized Parallel Sorting Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

An InstallTime System for Automatic Generation of Optimized Parallel Sorting Algorithms

Description:

An Overview of Sorting Algorithms. Our install-time empirical system ... Parallelized divide and conquer algorithms. Effective for small numbers of processors ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 21
Provided by: Michae7
Category:

less

Transcript and Presenter's Notes

Title: An InstallTime System for Automatic Generation of Optimized Parallel Sorting Algorithms


1
An Install-Time System for Automatic Generation
ofOptimized Parallel Sorting Algorithms
  • Marek Olszewski and Michael Voss
  • ECE Department
  • University of Toronto

2
Motivation
  • Sorting is a fundamental algorithm
  • Many algorithmic choices for sorting
  • Performance heavily influenced by
  • Data being sorted (type, entropy)
  • Target machine being used
  • How can we build the best sort for a given
    machine?
  • An empirical install-time system

3
Outline of Talk
  • Motivation
  • An Overview of Sorting Algorithms
  • Our install-time empirical system
  • An adaptive hybrid sequential sort
  • An adaptive hybrid parallel sort
  • An Evaluation
  • Related Work
  • Conclusions

4
An overview of sorting algorithms
  • Art of Computer Programming V3 (Knuth)
  • 25 algorithms comprehensively studied
  • Comparison sorts
  • Lower bound shown to be W (n log n)
  • Examples include insertion sort, quick sort and
    merge sort
  • Non-comparison sorts
  • Can be linear time, i.e. O(n)
  • But require knowing the range of the data
  • Examples include radix sort and bucket sort

5
An overview of sorting algorithms
  • Hybrid sorts
  • Divide and conquer sorts are recursive
  • May be beneficial to switch algorithms
  • Most C STL sorts are hybrid sorts
  • Gnu stdsort is a hybrid sort with pre-defined
    points to switch between heap sort, quick sort,
    merge sort and insertion sort

6
An overview of parallel sorts
  • Ideally, O( (n log n) / p)
  • If p n, then O( log n)
  • Several parallel sorts demonstrate this bound,
    e.g. Column sort
  • Parallelized sequential sorts often better for
    low numbers of processors (our focus).
  • Parallelized divide and conquer algorithms
  • Effective for small numbers of processors
  • Use a work-queue model
  • Tasks are place in a shared work-queue
  • Idle processors remove tasks from the queue
  • Good load balance

7
Our install-time system
Sample input data provided to installer
Time Sorts Random algorithms at each recursive
step
Calculate best sorting algorithm for each data
aet size
Start
Specialized decision Function place in library
Convert tree to C
C4.5 creates decision tree
Parallel?
Time Sorts Different input sizes and work-share
points
Work-share cutoff point tree and C functions
generated
End
End
8
Algorithms available to our hybrid sort
9
Hybrid Adaptive Sequential Sort
  • Use random data to train system
  • Up to 10 million elements
  • Insertion sort not used for large inputs
  • Not all inputs sorted to completion
  • Dynamic programming used to find best choice
  • Assume best sort at each subsequent step
  • Per step timings were measured
  • C4.5 decision tree used to analyze this data
  • C4.5 tree converted to C template code

10
Hybrid Adaptive Parallel Sort
  • Start with sequential hybrid sort
  • Determine work-sharing cutoff point
  • When should a thread execute its own tasks
  • When should a thread place tasks in work queue
  • Determines the point at which synchronization
    costs are no longer amortized by small work

11
Methodology Platforms
  • Sequential platforms
  • Linux 2.4.18 Intel Penitum 4 1.6 GHz Xeon
  • Linux 2.4.24 AMD Athlon XP 1700
  • SunOS 5.8 on a 600 MHz Sparc Workstation
  • Parallel platform
  • 4 processor 1.6 GHz Intel Xeon SMP
  • Modified 2.4.18-smp kernel (allowed binding)

12
Methodology Comparisons
  • Adaptive Hybrid Sequential Sort
  • Adaptive Hybrid Parallel Sort
  • Gnu G 2.96 stdsort and stdstable_sort
  • Also hybrid sorts
  • Complex not easily parallelized
  • 8 equally sized merge sorts that called stdsort
    and stdstable_sort in parallel

13
Serial Non-Optimized (w/o O) Results
14
Serial Optimized (w O) Results
15
Parallel Work-share Cutoff Point
16
Parallel Non-Optimized (w/o O) Results
17
Parallel Optimized (with O) Results
18
Parallel Sort Speedups
19
Related Work
  • Install-time empirical optimization systems
  • ATLAS Level 3 BLAS
  • FFTW FFT
  • STAPL Adaptive Parallel C Library
  • Uses decision trees like our approach
  • Uses only single-level sorts, not hybrids
  • Not available for comparison
  • A Dynamically Tuned Sorting Library (CGO04)
  • Install-time tuning of sequential sorts
  • Only single-level sorts, not hybrid

20
Conclusion
  • Presented an install-time system for empirically
    constructing a best sorting algorithm for a
    target machine
  • Competitive with STL sort on 1 processor
  • Better than a parallelized STL sort on multiple
    processors
Write a Comment
User Comments (0)
About PowerShow.com