Stripping duplicates from a list
Greg McCarroll
greg at mccarroll.org.uk
Sat Jun 14 09:50:07 BST 2008
On Thu, Jun 12, 2008 at 05:28:28PM +0100, Kake L Pugh wrote:
> On Thu 12 Jun 2008, Adrian Lai <perl at loathe.me.uk> wrote:
> > my %hash = map {lc($_) => $_} reverse sort @list;
> > my @uniquelist = values %hash;
>
> Aidy is the winner. Thanks everyone for replying.
>
> Kake
But Kake, his solution has all the nasty memory overhead of a hash and
the cpu of the hashing function, why not simply reduce it down to a simple number,
a small static array of lookup values (the same for all lists of data) and some
simple maths.
I give you uniq_via_primes ...
#!/usr/local/bin/perl
# TODO: Kake's requirement involves keeping the 'most capitalized'
# word lets worry about this in v2.
use strict;
use warnings;
use bigint;
use Math::Prime::TiedArray; # Marvel at my efficiency!
my @primes;
tie @primes, 'Math::Prime::TiedArray';
# n.b. I haven't tested this with your test data, but it works fine
# with the following, so i'm sure its fine with larger words.
my @non_uniq_list = qw( Aa a b b c cat CaT CAT a aa aaa aaa);
sub word_to_number {
my ($word) = @_;
my @chars = reverse split(//,$word);
my $num = 0;
my $exp = 0;
foreach my $char (@chars) {
$num += (ord(lc($char))-96) * (27 ** $exp);
$exp++;
}
return $num;
}
my $summary = 1; # Keeps track of words we've already seen
# using just one scalar! Take that hash
# solution!
my @uniq_list = ();
foreach my $word (@non_uniq_list) {
my $prime = $primes[word_to_number($word)-1];
if ($summary == 1) {
$summary *= $prime;
push(@uniq_list,$word);
} else {
unless (($summary % $prime) == 0) {
$summary *= $prime;
push(@uniq_list,$word);
}
}
}
print join(',', at non_uniq_list),"\n";
print join(',', at uniq_list),"\n";
More information about the london.pm
mailing list