Schwartzian transform

Abigail abigail at abigail.be
Wed Aug 13 13:24:34 BST 2014


On Wed, Aug 13, 2014 at 09:45:59PM +1000, Damian Conway wrote:
> Mark and Aaron are ntirely correct that the ST is not required for your example,
> but if your actual application uses a significantly larger dataset than just
> four hash entries, then the ST may still be preferable...as array lookups
> are around twice as fast as hash lookups. That might be a significant
> performance enhancement if you have to do O(NlogN) key look-ups for
> a sufficiently large value of N.
> 
> As always, only benchmarking on real(istic) data can determine whether
> you will actually benefit from using the ST or not.


But if your dataset is that large that you will benefit from halving
the lookup time, you can save twice the amount of time by not having
a lookup time at all -- use the Guttman-Rossler Transform.

Here's an (untested) version of the GRT which doesn't require "position"
to be a single digit integer. Instead, we assume both the keys of %$ref
and "positions" are integers:

my $count = 0;

my @sorted_records =
    map  {my $r = $$ref {(unpack "LL", $_) [1]};
            $$r {position} = ++ $count; $r}
    sort
    map  {pack "LL" => 0 + $$ref {$_} {position}, $_} keys %$ref;



Abigail


More information about the london.pm mailing list