Complex Sorting - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Complex Sorting

Description:

Complex Sorting * The Perl Sorting Paradigm 1. Preprocess the input to extract the sortkeys. 2. Sort the data by comparing the sortkeys. 3. Postprocess the output to ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 11
Provided by: Mariaj423
Category:

less

Transcript and Presenter's Notes

Title: Complex Sorting


1
Complex Sorting

2
The Perl Sorting Paradigm
  • 1. Preprocess the input to extract the sortkeys.
  • 2. Sort the data by comparing the sortkeys.
  • 3. Postprocess the output to retrieve the data.
  • _at_out These may be separate steps. map
    POSTPROCESS(_) gt sort sortsub map
    PREPROCESS(_) gt _at_in
  • _at_out sort _at_in The default sort.

3
Perl Sorting Techniques
  • Naive (no pre- or postprocessing)
  • Sortkeys recomputed on every comparison.
  • Cached sortkeys the Orcish Maneuver
  • Sortkeys cached in hashes.
  • The Schwartzian Transform
  • Sortkeys cached in anonymous arrays.
  • The Packed-Default Sort
  • Sortkeys and operands packed in strings.

4
Schwartzian Transformation (ST)
Sort a list of strings according to a dotted-quad
IP address. _at_out map _-gt0 gt sort
a-gt1 ltgt b-gt1 a-gt2 ltgt
b-gt2 a-gt3 ltgt b-gt3
a-gt4 ltgt b-gt4 map _,
/(\d)\.(\d)\.(\d)\.(\d)/ gt _at_in
5
ST with Packed Sortkeys
Concatenate the subkeys into a sortable string.
_at_out map _-gt0 gt sort a-gt1
cmp b-gt1 map _, pack('C4' gt
/(\d)\.(\d)\.(\d)\.(\d)/) gt
_at_in
6
The Packed-Default Sort
Append the operands to the packed sortkeys.
_at_out map substr(_, 4) gt sort map
pack('C4' gt /(\d)\.(\d)\.(\d)\.(
\d)/) . _ gt _at_in
7
Selected Benchmarks
CPU time (microseconds per line)
O(NlogN) comparisons dominate the ST. O(N)
preprocessing dominates the P-D.
8
Packing the Sortkeys
  • Strings fixed or varying lengths ascending or
    descending can be case-insensitive
  • Integers chars, shorts, or longs signed or
    unsigned ascending or descending
  • Floating-point numbers floats or doubles
    ascending or descending
  • Indexes of strings (to achieve stable sorting) or
    indexes of arrays or hashes (for retrieval)

9
The SortRecords Module
  • Combines the packed-default sort technique with
    automatic subkey extraction using a simple
    attribute/value syntax.
  • Sort /etc/passwd by user name. sort1
    SortRecords-gt new(width gt 10, split gt
    '', 0)_at_pw sort1-gtsort(cat
    /etc/passwd)
  • Sort /etc/passwd by user ID. sort2
    SortRecords-gt new(type gt 'int', split gt
    '', 2)_at_pw sort2-gtsort(cat /etc/passwd)

10
Conclusions
  • Packing subkeys into sortable strings speeds up
    large sorts, using any sorting method.
  • Appending the operands to the sortkeys makes it
    possible to use the fast default lexicographic
    sort comparison.
  • The module SortRecords encapsulates the code
    conveniently.
  • ltURLhttp//www.hpl.hp.com/personal/Larry_Rosler/s
    ort/gt ltURLhttp//www.sysarch.com/perl/sort/gt
Write a Comment
User Comments (0)
About PowerShow.com