Title: Complex Sorting
1Complex Sorting
2The Perl Sorting Paradigm
- 1. Preprocess the input to extract the sortkeys.
- 2. Sort the data by comparing the sortkeys.
- 3. Postprocess the output to retrieve the data.
- _at_out These may be separate steps. map
POSTPROCESS(_) gt sort sortsub map
PREPROCESS(_) gt _at_in - _at_out sort _at_in The default sort.
3Perl Sorting Techniques
- Naive (no pre- or postprocessing)
- Sortkeys recomputed on every comparison.
- Cached sortkeys the Orcish Maneuver
- Sortkeys cached in hashes.
- The Schwartzian Transform
- Sortkeys cached in anonymous arrays.
- The Packed-Default Sort
- Sortkeys and operands packed in strings.
4Schwartzian Transformation (ST)
Sort a list of strings according to a dotted-quad
IP address. _at_out map _-gt0 gt sort
a-gt1 ltgt b-gt1 a-gt2 ltgt
b-gt2 a-gt3 ltgt b-gt3
a-gt4 ltgt b-gt4 map _,
/(\d)\.(\d)\.(\d)\.(\d)/ gt _at_in
5ST with Packed Sortkeys
Concatenate the subkeys into a sortable string.
_at_out map _-gt0 gt sort a-gt1
cmp b-gt1 map _, pack('C4' gt
/(\d)\.(\d)\.(\d)\.(\d)/) gt
_at_in
6The Packed-Default Sort
Append the operands to the packed sortkeys.
_at_out map substr(_, 4) gt sort map
pack('C4' gt /(\d)\.(\d)\.(\d)\.(
\d)/) . _ gt _at_in
7Selected Benchmarks
CPU time (microseconds per line)
O(NlogN) comparisons dominate the ST. O(N)
preprocessing dominates the P-D.
8Packing the Sortkeys
- Strings fixed or varying lengths ascending or
descending can be case-insensitive - Integers chars, shorts, or longs signed or
unsigned ascending or descending - Floating-point numbers floats or doubles
ascending or descending - Indexes of strings (to achieve stable sorting) or
indexes of arrays or hashes (for retrieval)
9The SortRecords Module
- Combines the packed-default sort technique with
automatic subkey extraction using a simple
attribute/value syntax. - Sort /etc/passwd by user name. sort1
SortRecords-gt new(width gt 10, split gt
'', 0)_at_pw sort1-gtsort(cat
/etc/passwd) - Sort /etc/passwd by user ID. sort2
SortRecords-gt new(type gt 'int', split gt
'', 2)_at_pw sort2-gtsort(cat /etc/passwd)
10Conclusions
- Packing subkeys into sortable strings speeds up
large sorts, using any sorting method. - Appending the operands to the sortkeys makes it
possible to use the fast default lexicographic
sort comparison. - The module SortRecords encapsulates the code
conveniently. - ltURLhttp//www.hpl.hp.com/personal/Larry_Rosler/s
ort/gt ltURLhttp//www.sysarch.com/perl/sort/gt