Ubuntu Manpage: Sort::Maker - A simple way to make efficient sort subs

name
synopsis
description
"make_sorter"
key caching
tests
benchmarks
export
bugs
author
acknowledgements

Provided by: libsort-maker-perl_0.06-2_all

NAME

       Sort::Maker - A simple way to make efficient sort subs

SYNOPSIS

               use Sort::Maker ;

               my $sorter = make_sorter( ... ) ;

DESCRIPTION

       This module has two main goals: to make it easy to create correct sort functions, and to
       make it simple to select the optimum sorting algorithm for the number of items to be
       sorted. Sort::Maker generates complete sort subroutines in one of four styles, plain,
       orcish manouver, Schwartzian Transform and the Guttman-Rosler Transform. You can also get
       the source for a sort sub you create via the sorter_source call.

"make_sorter"

       The sub "make_sorter" is exported by Sort::Maker. It makes a sort sub and returns a
       reference to it. You describe how you want it to sort its input by passing general options
       and key descriptions to "make_sorter".

   Arguments to "make_sorter"
       There are two types of arguments, boolean and value. Boolean arguments can be set just
       with the option name and can optionally be followed by '1'. You can easily set multiple
       boolean general arguments with qw(). Value arguments must have a following value.
       Arguments can appear in any order but the key descriptions (see below) must appear in
       their sort order. The code examples below show various ways to set the various arguments.

       Arguments fall into four categories: selecting the style of the sort, key descriptions,
       setting defaults for key description attributes, and setting general flags and values. The
       following sections will describe the categories and their associated arguments.

   Sort Style
       The style of the sort to be made is selected by setting one of the following Boolean
       arguments. Only one may be set otherwise an error is reported (see below for error
       handling). Also see below for detailed descriptions of the supported sort styles.

               plain
               orcish
               ST
               GRT

               # Make a plain sorter
               my $plain_sorter = make_sorter( qw( plain ) ... ) ;

               # Make an orcish manouevre sorter
               my $orcish_sorter = make_sorter( orcish => 1 ... ) ;

               # Make a Schwartzian Transform sorter
               my $st_sorter = make_sorter( 'ST', 1, ... ) ;

               # Make a GRT sort
               my $GRT = make_sorter( 'GRT', ... ) ;

   Key Attribute Defaults
       The following arguments set defaults for the all of the keys' attributes.  These default
       values can be overridden in any individual key.  Only one of the attributes in each of the
       groups below can be set as defaults or for any given key. If more than one attribute in
       each group is set, then "make_sorter" will return an error.  The attribute that is the
       default for each group is marked.  See below for details on key attributes.

               ascending       (default)
               descending

               case            (default)
               no_case

               signed
               unsigned
               signed_float    (default)
               unsigned_float

               fixed
               varying

   General Options
       These arguments set general options that apply to how the generated sorter interacts with
       the outside world.

       "name"

       This is a value option which exports the generated sort sub to that name. The call to
       "make_sorter" must be run to install the named named function before it is called. You
       should still check the result of "make_sorter" to see if an error occurred (it returns
       undef).

               my $sorter = make_sorter( name => 'sort_func', ... ) ;
               die "make_sorter: $@" unless $sorter ;

               ...

               @sorted = sort_func @unsorted ;

       "ref_in/ref_out"

       This boolean arguments specifies that the input to and output from the sort sub will be
       array references. "ref_in" makes the sorter only take as input a single array reference
       (which contains the unsorted records). "ref_out" makes the sorter only return a single
       array reference (which contains the sorted records). You can set both of these options in
       a sorter.

       Note: This does not affect key extraction code which still gets each record in $_. It only
       modifies the I/O of the generated sorter.

               # input records are in an array reference
               my $sorter = make_sorter( qw( ref_in ), ... ) ;
               @sorted_array = $sorter->( \@unsorted_input ) ;

               # sorted output records are in an array reference
               my $sorter = make_sorter( ref_out => 1, ... ) ;
               $sorted_array_ref = $sorter->( @unsorted_input ) ;

               # input and output records are in array references
               my $sorter = make_sorter( qw( ref_in ref_out ), ... ) ;
               $sorted_array_ref = $sorter->( \@unsorted_input ) ;

       "string_data"

       This boolean argument specifies that the input records will be plain text strings with no
       null (0x0) bytes in them.  It is only valid for use with the GRT and it is ignored for the
       other sort styles. It tells the GRT that it can put the record directly into the string
       cache and it will be separated from the packed keys with a null byte (hence that
       restriction). This is an optimization that can run slightly faster than the normal index
       sorting done with the GRT. Run this to see the benchmark results.

               perl t/string_data.t -bench

       "init_code"

       This value argument is code that will be put into the beginning of the generated sorter
       subroutine. It is meant to be used to declare lexical variables that the extraction code
       can use. Normally different extraction code have no way to share common code. By declaring
       lexicals with the "init_code" option, some key extraction code can save data there for use
       by another key. This is useful if you have two (or more) keys that share a complex piece
       of code such as accessing a deep value in a record tree.

       For example, suppose the input record is an array of arrays of hashes of strings and the
       string has 2 keys that need to be grabbed by a regex. The string is a string key, a ':'
       and a number key. So the common part of the key extraction is:

               $_->[0][0]{a}

       And the make_sorter call is:

               my $sorter = make_sorter(
                       'ST',
                       init_code => 'my( $str, $num ) ;',
                       string => 'do{( $str, $num ) =
                               $_->[0][0]{a} =~ /^(\w+):(\d+)$/; $str}',
                       number => '$num'
               ) ;

       In the above code both keys are extracted in the first key extraction code and the number
       key is saved in $num. The second key extraction code just uses that saved value.

       Note that "init_code" is only useful in the ST and GRT sort styles as they process all the
       keys of a record at one time and can use variables declared in "init_code" to transfer
       data to later keys. The plain and orcish sorts may not process a later key at the same
       time as an earlier key (that only happens when the earlier key is compared to an equal
       key). Also for "init_code" to be a win, the data set must be large enough and the work to
       extract the keys must be hard enough for the savings to be noticed. The test init_code.t
       shows some examples and you can see the speedup when you run:

               perl t/init_code.t -bench

   Key Description Arguments
       Sorting data requires that records be compared in some way so they can be put into a
       proper sequence. The parts of the records that actually get compared are called its keys.
       In the simplest case the entire record is the key, as when you sort a list of numbers or
       file names. But in many cases the keys are embedded in the full record and they need to be
       extracted before they can be used in comparisons.  Sort::Maker uses key descriptions that
       extract the key from the record, and optional other attributes that will help optimize the
       sorting operation. This section will explain how to pass key description arguments to the
       make_sorter subroutine and what the various attributes mean and how to best use them.

       The generated sorter will sort the records according to the order of the key arguments.
       The first key is used to compare a pair of records and if they are deemed equal, then the
       next key is examined. This happens until the records are given an ordering or you run out
       of keys and the records are deemed equal in sort order.  Key descriptions can be mixed
       with the other arguments which can appear in any order and anywhere in the argument list,
       but the keys themselves must be in the desired order.

       A key argument is either 'string' or 'number' followed by optional attributes. The key
       type sets the way that the key is compared (e.g. using 'cmp' or '<=>').  All key
       attributes can be set from the default values in the global arguments or set in each
       individual key description.

       There are 4 ways to provide attributes to a key:

       No attributes

       A key argument which is either at the end of the argument list or is followed by a valid
       keyword token has no explict attributes. This key will use the default attributes.  In
       both of these examples, a default attribute was set and used by the key description which
       is just a single key argument.

               # sort the record as a single number in descending order
               my $sorter = make_sorter( qw( plain number descending ) ) ;

               # sort the record as a case sensitive string
               my $sorter = make_sorter( qw( plain case string ) ) ;

               # sort the record as a single number in ascending order
               my $sorter = make_sorter( qw( ST number ) ) ;

       Only Code as a Value

       A key argument which is followed by a scalar value which is not a valid keyword token,
       will use that scalar value as its key extraction code. See below for more on key
       extraction code.

               # sort by the first (optionally signed) number matched
               my $sorter = make_sorter( qw( plain number /([+-]?\d+)/ ) ) ;

               # string sort by the 3rd field in the input records (array refs)
               my $sorter = make_sorter( 'ST', string => '$_->[2]' ) ;

       An Array Reference

       A key argument which is followed by an array reference will parse that array for its
       description attributes. As with the general boolean arguments, any boolean attribute can
       be optionally followed by a '1'. Value attributes must be followed by their value.

               # another way to specify the same sort as above
               # sort by the first (optionally signed) number matched

               my $sorter = make_sorter(
                       qw( plain ),
                       number => [
                               code => '/(\d+)/',
                               'descending',
                       ],
               ) ;

               # same sort but for the GRT which uses the 'unsigned'
               # attribute to optimize the sort.

               my $sorter = make_sorter(
                       qw( GRT ),
                       number => [
                               qw( descending unsigned ),
                               code => '/(\d+)/',
                       ],
               ) ;

       A Hash Reference

       A key argument which is followed a hash reference will use that hash as its description
       attributes. Any boolean attribute in the hash must have a value of '1'.  Value attributes
       must be followed by their value.

               # another way to specify the same sort as above
               # sort by the first (optionally signed) number matched

               my $sorter = make_sorter(
                       qw( plain ),
                       number => {
                               code => '/(\d+)/',
                               descending => 1,
                       },
               ) ;

               # a multi-key sort. the first key is a descending unsigned
               # integer and the second is a string padded to 10 characters

               my $sorter = make_sorter(
                       qw( GRT ),
                       number => {
                               code => '/(\d+)/',
                               descending => 1,
                               unsigned => 1,
                       },
                       string => {
                               code => '/FOO<(\w+)>/',
                               fixed => 10,
                       },
               ) ;

   Key Description Attributes
       What follows are the attributes for key descriptions. Most use the default values passed
       in the arguments to "make_sorter".

       "code"

       This value attribute is the code that will be used to extract a key from the input record.
       It can be a string of Perl code, a qr// regular expression (Regexp reference) or an
       anonymous sub (CODE reference) that operates on $_ and extracts a value.  The code will be
       wrapped in a do{} block and called in a list context so that regular expressions can just
       use () to grab a key value. The code defaults to $_ which means the entire record is used
       for this key. You can't set the default for code (unlike all the other key attributes).
       See the section on Extraction Code for more.

               # make an ST sort of the first number grabbed in descending order

               my $sorter = make_sorter(
                       qw( ST ),
                       number => {
                               code    => '/(\d+)/',
                               descending => 1,
                       },
               ) ;

       "ascending/descending"

       These two Boolean attributes control the sorting order for this key. If a key is marked as
       "ascending" (which is the initial default for all keys), then lower keys will sort before
       higher keys. "descending" sorts have the higher keys sort before the lower keys. It is
       illegal to have both set in the defaults or in any key.

               # sort by descending order of the first grabbed number
               # and then sort in ascending order the first grabbed <word>

               my $sorter = make_sorter(
                       qw( ST descending ),
                       number => {
                               code    => '/(\d+)/',
                       },
                       string => {
                               code    => '/<(\w+)>/',
                               ascending => 1,
                       },
               ) ;

               # this will return undef and store an error in $@.
               # you can't have both 'ascending' and 'descending' as defaults

               my $sorter = make_sorter(
                       qw( ST ascending descending ),
                       number => {
                               code    => '/(\d+)/',
                               descending => 1,
                       },
               ) ;

               # this will return undef and store an error in $@.
               # you can't have both 'ascending' and 'descending' in a key

               my $sorter = make_sorter(
                       qw( ST )
                       number => {
                               code    => '/(\d+)/',
                               descending => 1,
                               ascending => 1,
                       },
               ) ;

       "case/no_case"

       These two Boolean attributes control how 'string' keys handle case sensitivity. If a key
       is marked as "case" (which is the initial default for all keys), then keys will treat
       upper and lower case letters as different.  If the key is marked as "no_case" then they
       are treated as equal.  It is illegal to have both set in the defaults or in any key.
       Internally this uses the uc() function so you can use locale settings to affect string
       sorts.

               # sort by the first grabbed word with no case
               # and then sort the grabbed <word> with case

               my $sorter = make_sorter(
                       qw( ST no_case ),
                       string => {
                               code    => '/(\w+)/',
                       },
                       string => {
                               code    => '/<(\w+)>/',
                               case => 1,
                       },
               ) ;

               # this will return undef and store an error in $@.
               # you can't have both 'case' and 'no_case' as defaults

               my $sorter = make_sorter(
                       qw( ST no_case case ),
                       string => {
                               code    => '/(\w+)/',
                       },
               ) ;

               # this will return undef and store an error in $@.
               # you can't have both 'case' and 'no_case' in a key

               my $sorter = make_sorter(
                       qw( ST )
                       string => {
                               code    => '/(\w+)/',
                               no_case => 1,
                               case    => 1,
                       },
               ) ;

       "closure"

       This Boolean attribute causes this key to use call its CODE reference to extract its
       value. This is useful if you need to access a lexical variable during the key extraction.
       A typical use would be if you have a sorting order stored in a lexical and need to access
       that from the extraction code. If you didn't set the "closure" attribute for this key, the
       generated source (see Key Extraction) would not be able to see that lexical which will
       trigger a Perl compiling error in make_sorter.

               my @months = qw(
                       January February March April May June
                       July August September October November December ) ;
               my @month_jumble = qw(
                       February June October March January April
                       July November August December May September ) ;

               my %month_to_num ;
               @month_to_num{ @months } = 1 .. @months ;

       # this will fail to generate a sorter if 'closure' is removed # as %month_to_num will not
       be in scope to the eval inside sort_maker.

               my $sorter = make_sorter(
                       'closure',
                       number => sub { $month_to_num{$_} },
               ) ;

               my @sorted = $sorter->( @month_jumble ) ;

       "signed/unsigned/signed_float/unsigned_float" (GRT only)

       These Boolean attributes are only used by the GRT sort style. They are meant to describe
       the type of a number key so that the GRT can best process and cache the key's value. It is
       illegal to have more than one of them set in the defaults or in any key. See the section
       on GRT sorting for more.

       The "signed" and "unsigned" attributes mark this number key as an integer. The GRT does
       the least amount of work processing an unsigned integer and only slightly more work for a
       signed integer. It is worth using these attributes if a sort key is restricted to
       integers.

       The "signed_float" (which is the normal default for all keys) and "unsigned_float"
       attributes mark this number key as a float. The GRT does the less work processing an
       unsigned float then a signed float.  It is worth using the "unsigned_float" attribute if a
       sort key is restricted to non-negative values. The "signed_float" attribute is supported
       to allow overriding defaults and to make it easier to auto-generate sorts.

       "fixed/varying" (GRT only)

       These attributes are only used by the GRT sort style. They are used to describe the type
       of a string key so that the GRT can properly process and cache the key's value. It is
       illegal to have more than one of them set in the defaults or in any key. See the section
       on GRT sorting for more.

       "fixed" is a value attribute that marks this string key as always being this length. The
       extracted value will either be padded with null (0x0) bytes or truncated to the specified
       length (the value of "fixed"). The data in this key may have embedded null bytes (0x0) and
       may be sorted in descending order.

       "varying" is a Boolean attribute marks this string key as being of varying lengths. The
       GRT sorter will do a scan of all of this key's values to find the maximum string length
       and then it pads all the extracted values to that length. The data in this key may have
       embedded null bytes (0x0) and may be sorted in descending order.

   Key Extraction Code
       Each input record must have its sort keys extracted from the data.  This is the purpose of
       the 'code' attribute in key descriptions.  The code has to operate on a record which is in
       $_ and it must return the key value. The code is executed in a list context so you can use
       grabs in m// to return the key. Note that only the first grab will be used but you
       shouldn't have more than one anyway. See the examples below.

       Code can be either a string, a qr// object (Regexp reference) or an anonymous sub (CODE
       reference).

       If qr// is used, the actual generated code will be m($qr) which works because qr// will
       interpolate to its string representation. The advantage of qr// over a string is that the
       qr// will be syntax checked at compile time while the string only later when the generated
       sorter is compiled by an eval.

       If a CODE reference is found, it is used to extract the key in the generated sorter. As
       with qr//, the advantage is that the extraction code is syntax checked at compile time and
       not runtime. Also the deparsed code is wrapped in a "do{}" block so you may use complex
       code to extract the key. In the default case a CODE reference will be deparsed by the
       B::Deparse module into Perl source. If the key has the "closure" attribute set, the code
       will be called to extract the key.

       The following will generate sorters with exact same behavior:

               $sorter = make_sorter( 'ST', string => '/(\w+)/' ) ;
               $sorter = make_sorter( 'ST', string => qr/(\w+)/ ) ;
               $sorter = make_sorter( 'ST', string => sub { /(\w+)/ } ) ;
               $sorter = make_sorter( 'ST', 'closure', string => sub { /(\w+)/ } ) ;

       Extraction code for a key can be set in one of three ways.

       No explicit code

       If you don't pass any extraction code to a key, it will default to $_ which is the entire
       record. This is useful in certain cases such as in simple sorts where you are sorting the
       entire record.

               # sort numerically and in reverse order
               my $sorter = make_sorter( qw( plain number descending ) ;

               # sort with case folding
               my $sorter = make_sorter( qw( plain no_case string ) ;

               # sort by file time stamp and then by name
               my $sorter = make_sorter( 'ST', number => '-M', 'string' ) ;

       Code is the only key attribute

       In many cases you don't need to specify any specific key attributes (the normal or
       globally set defaults are fine) but you need extraction code. If the argument that follows
       a key type ( 'string' or 'number' ) is not a valid keyword, it will be assumed to be the
       extraction code for that key.

               # grab the first number string as the key
               my $sorter = make_sorter( qw( plain number /(\d+)/ ) ) ;

               # no_case string sort on the 3rd-5th chars of the 2nd array element
               my $sorter = make_sorter(
                       plain   => 1,
                       no_case => 1,
                       string  => 'substr( $_->[1], 2, 3)'
               ) ;

       Key needs specific attributes

       When the key needs to have its own specific attributes other than its code, you need to
       pass them in an ARRAY or HASH reference. This is mostly needed when there are multiple
       keys and the defaults are not correct for all the keys.

               # string sort by the first 3 elements of the array record with
               # different case requirements

               my $sorter = make_sorter(
                       ST      => 1,
                       string  => {
                               code    => '$_->[0]',
                               no_case => 1,
                       },
                       string  => '$_->[1]',
                       string  => {
                               code    => '$_->[2]',
                               no_case => 1,
                       },
               ) ;

               # GRT sort with multiple unsigned integers and padded strings
               # note that some keys use a hash ref and some an array ref
               # the record is marked with key\d: sections
               my $sorter = make_sorter(
                       GRT     => 1,
                       descending => 1,
                       number  => {
                               code    => 'key1:(\d+)',
                               unsigned => 1,
                       },
                       number  => [
                               code    => 'key2:([+-]?\d+)',
                               qw( signed ascending ),
                       ],
                       string  => [
                               code    => 'key3:(\w{10})',
                               fixed => 1,
                               ascending => 1,
                       ],
                       # pad the extracted keys to 8 chars
                       string  => {
                               code    => 'key4:([A-Z]+)',
                               pad => 8,
                       },
               ) ;

Key Caching

A good question to ask is "What speed advantages do you get from this module when all the
sorts generated use Perl's internal sort function?" The sort function has a O( N * log N
) growth function which means that the amount of work done increases by that formula as N
(the number of input records) increases. In a plain sort this means the the key extraction
code is executed N * log N times when you only have N records. That can be a very large
waste of cpu time. So the other three sort styles speed up the overall sort by only
executing the extraction code N times by caching the extracted keys. How they cache the
keys is their primary difference. To compare or study the actual code generated for the
different sort styles, you can run make_sorter and just change the style. Then call
sorter_source (not exported by default) and pass it the sort code reference returned by
make_sorter. It will return the generated sort source.

"plain"
Plain sorting doesn't do any key caching. It is fine for short input lists (see the
Benchmark section) and also as a way to see how much CPU is saved when using one of the
other styles.

"orcish"
The Orcish maneuvre (created by Joseph Hall) caches the extracted keys in a hash. It does
this with code like this:

$cache{$a} ||= CODE($a) ;

CODE is the extract code and it operates on a record in $a. If we have never seen this
record before then the cache entry will be undef and the ||= operator will assign the
extracted key to that hash slot. The next time this record is seen in a comparison, the
saved extracted key will be found in the hash and used. The name orcish comes from OR-
cache.

"ST"
The ST (Schwartzian Transform and popularized by Randal Schwartz) uses an anonymous array
to store the record and its extracted keys. It first executes a map that creates an
anonymous array:

map [ $_, CODE1( $_ ), CODE2( $_ ) ], @input

The CODE's extract the set of keys from the record but only once per record so it is O(N).
Now the sort function can just do the comparisons and it returns a list of sorted
anonymous arrays.

sort {
$a->[1] cmp $b->[1]
||
$a->[2] cmp $b->[2]
}

Finally, we need to get back the original records which are in the first slot of the
anonymous array:

map $_->[0]

This is why the ST is known as a map/sort/map technique.

"GRT"
The Guttman-Rosler Transform (popularized by Uri Guttman and Larry Rosler) uses a string
to cache the extracted keys as well as either the record or its index. It is also a
map/sort/map technique but because its cache is a string, it can be sorted without any
Perl level callback (the {} block passed to sort). This is a signifigant win since that
callback is running O( N log N). But this speedup comes at a cost of complexity. You can't
just join the keys into a string and properly sort them. Each key may need to be processed
so that it will correctly sort in order and it doesn't interfere with other keys. That is
why the GRT has several key attributes to enable it to properly and efficiently pack the
sort keys into a single string. The following lists the GRT key attributes, when you need
them and what key processing is done for each. Note that you can always enable the GRT
specific attributes as they are just ignored by the other sort styles.

The GRT gains its speed by using a single byte string to cache all of the extracted keys
from a given input record. Packing keys into a string such that it will lexically sort the
correct way requires some deep mojo and data munging. But that is why this module was
written - to hide all that from the coder. Below are descriptions of how the various key
types are packed and how to best use the GRT specific key attributes. Note: you can only
use one of the GRT number or string attributes for any key. Setting more than one in
either the defaults or in any given key is an error (a key's attribute can override a
default choice).

"unsigned"

The 'unsigned' Boolean attribute tells the GRT that this number key is a non-negative
integer. This allows the GRT to just pack it into 4 bytes using the N format (network
order - big endian). An integer packed this way will have its most significant bytes
compared before its least signifigant bytes. This involves the least amount of key munging
and so it is the most efficient way to sort numbers in the GRT.

If you want this key to sort in descending order, then the key value is negated and
normalized (see the 'signed' attribute) so there is no advantage to using 'unsigned'.

"signed"

The 'signed' Boolean attribute tells the GRT that this number key is an integer. This
allows the GRT to just pack it into 4 bytes using the N format (network order - big
endian). The key value must first be normalized which will convert it to an unsigned
integer but with the same ordering as a signed integer. This is simply done by inverting
the sign (highest order) bit of the integer. As mentioned above, when sorting this key in
descending order, the GRT just negates the key value.

NOTE: In the GRT the signed and unsigned integer attributes only work on perl built with
32 bit integers. This is due to using the N format of pack which is specified to be 32
bits. A future version may support 64 bit integers (anyone want to help?).

"unsigned_float"

The 'unsigned_float' Boolean attribute tells the GRT that this number key is a non-
negative floating point number. This allows the GRT to pack it into 8 bytes using the 'd'
format. A float packed this way will have its most significant bytes compared before its
least signifigant bytes.

"signed_float"

The "signed_float" Boolean attribute (which is the default for all number keys when using
the GRT) tells the GRT that this number key is a floating point number. This allows the
GRT to pack it into 8 bytes using the 'd' format. A float packed this way will have its
most significant bytes compared before its least signifigant bytes. When processed this
key will be normalized to an unsigned float similar to to the "signed" to "unsigned"
conversion mentioned above.

NOTE: The GRT only works with floats that are in the IEEE format for doubles. This
includes most modern architectures including x86, sparc, powerpc, mips, etc. If the cpu
doesn't have IEEE floats you can either use the integer attributes or select another sort
style (all the others have no restriction on float formats).

simple string.

If a string key is being sorted in ascending order with the GRT and it doesn't have one of
the GRT string attributes, it will be packed without any munging and a null (0x0) byte
will be appended to it. This byte enables a shorter string to sort before longer ones that
start with the shorter string.

NOTE: You cannot sort strings in descending order in the GRT unless the key has either the
'fixed' or 'varying' attributes set. Also, if a string is being sorted in ascending order
but has any null (0x0) bytes in it, the key must have one of those attributes set.

"fixed"

This value attribute tells the GRT to pack this key value as a fixed length string. The
extracted value will either be padded with null (0x0) bytes or truncated to the specified
length (the value of the "fixed" attribute). This means it can be packed into the cache
string with no padding and no trailing null byte is needed. The key can contain any data
including null (0x0) bytes. Data munging happens only if the key's sort order is
descending. Then the key value is xor'ed with a same length string of 0xff bytes. This
toggles each bit which allows for a lexical comparison but in the reverse order. This same
bit inversion is used for descending varying strings.

"varying"

This Boolean attribute tells the GRT that this key value is a varying length string and
has no predetermined padding length. A prescan is done to determine the maximum string
length for this key and that is used as the padding length. The rest is the same as with
the 'fixed' attribute.

"sorter_source"
This sub (which can be exported) returns the source of a generated sort sub or the source
of the last one that had an error. To get the source of an existing sort sub, pass it a
reference to that sub (i.e. the reference returned from make_sorter). To get the source
for a failed call to make_sorter, don't pass in any arguments.

my $sorter = make_sorter( ... ) ;
print sorter_source( $sorter ) ;

make_sorter( name => 'my_sorter', ... ) ;
print sorter_source( \&my_sorter ) ;

my $sorter = make_sorter( ... )
or die "make_sorter error: $@\n", sorter_source();

If all you want is the generated source you can just do:

print sorter_source make_sorter( ... ) ;

Error Handling
When "make_sorter" detects an error (either bad arguments or when the generated sorter
won't compile), it returns undef and set $@ to an error message. The error message will
include the generated source and compiler and warning errors if the sorter didn't compile
correctly. The test t/errors.t covers all the possible error messages. You can also
retrieve the generated source after a compiling error by calling "sorter_source".

TESTS

       "Sort::Maker" uses a table of test configurations that can both run tests and benchmarks.
       Each test script is mostly a table that generates multiple versions of the sorters,
       generate sample data and compares the sorter results with a sort that is known to be good.
       If you run the scripts directly and with a -bench argument, then they generate the same
       sorter subs and benchmark them. This design ensures that benchmarks are running on
       correctly generated code and it makes it very easy to add more test and benchmark
       variations. The code that does all the work is in t/common.pl. Here is a typical test
       table entry:

               {
                       skip    => 0,
                       source  => 0,
                       name    => 'init_code',
                       gen     => sub { rand_choice( @string_keys ) . ':' .
                                        rand_choice( @number_keys ) },
                       gold    => sub {
                                ($a =~ /^(\w+)/)[0] cmp ($b =~ /^(\w+)/)[0]
                                               ||
                                ($a =~ /(\d+$)/)[0] <=> ($b =~ /(\d+$)/)[0]
                       },
                       args    => [
                               init_code => 'my( $str, $num ) ;',
                               string => 'do{( $str, $num ) = /^(\w+):(\d+)$/; $str}',
                               number => '$num',
                       ],
               },

       "skip" is a boolean that causes this test/benchmark to be skipped.  Setting "source"
       causes the sorter's source to be printed out.  "gen" is a sub that generates a single
       input record. There are support subs in t/common.pl that will generate random data. Some
       tests have a "data" field which is fixed data for a test (instead of the generated data).
       The <gold> field is a comparision subroutine usable by the sort function. It is used to
       sort the test data into a golden result which is used to compare against all the generated
       sorters.  "args" is an anonymous array of arguments for a sorter or a hash ref with
       multiple named/args pairs. See t/io.t for an example of that.

BENCHMARKS

EXPORT

       This module always exports the "make_sorter" sub.  It can also optionally export
       "sorter_source".

BUGS

       Sort::Maker GRT currently works only with 32 bit integers due to pack N format being
       exactly 32 bits. If someone with a 64 bit Perl wants to work on using the Q format or the
       ! suffix and dealing with endian issues, I will be glad to help and support it. It would
       be best if there was a network (big endian) pack format for quads/longlongs but that can
       be done similarly to how floats are packed now.

AUTHOR

       Uri Guttman, <uri@stemsystems.com>

ACKNOWLEDGEMENTS

       I would like to thank the inimitable Damian Conway for his help in the API design, the
       POD, and for being a good Perl friend.

       And thanks to Boston.pm for the idea of allowing qr// for key extraction code.