Stripping duplicates from a list

Sat Jun 14 09:50:07 BST 2008

On Thu, Jun 12, 2008 at 05:28:28PM +0100, Kake L Pugh wrote:
> On Thu 12 Jun 2008, Adrian Lai <perl at loathe.me.uk> wrote:
> > my %hash = map {lc($_) => $_} reverse sort @list;
> > my @uniquelist = values %hash;
> 
> Aidy is the winner.  Thanks everyone for replying.
> 
> Kake

But Kake, his solution has all the nasty memory overhead of a hash and
the cpu of the hashing function, why not simply reduce it down to a simple number,
a small static array of lookup values (the same for all lists of data) and some
simple maths.

I give you uniq_via_primes ...

#!/usr/local/bin/perl
# TODO: Kake's requirement involves keeping the 'most capitalized'
#       word lets worry about this in v2.

use strict;
use warnings;

use bigint;
use Math::Prime::TiedArray; # Marvel at my efficiency!
my @primes;
tie @primes, 'Math::Prime::TiedArray';

# n.b. I haven't tested this with your test data, but it works fine
# with the following, so i'm sure its fine with larger words.
my @non_uniq_list = qw( Aa a b b c cat CaT CAT a aa aaa aaa);

sub word_to_number {
  my ($word) = @_;
  my @chars = reverse split(//,$word);
  my $num = 0;
  my $exp = 0;
  foreach my $char (@chars) {
    $num += (ord(lc($char))-96) * (27 ** $exp);
    $exp++;
  }
  return $num;
}

my $summary = 1; # Keeps track of words we've already seen
	         # using just one scalar! Take that hash 
		 # solution!
my @uniq_list = ();
foreach my $word (@non_uniq_list) {
  my $prime = $primes[word_to_number($word)-1];
  if ($summary == 1) {
    $summary *= $prime;
    push(@uniq_list,$word);
  } else {
    unless (($summary % $prime) == 0) {
      $summary *= $prime;
      push(@uniq_list,$word);
    }
  }
}

print join(',', at non_uniq_list),"\n";
print join(',', at uniq_list),"\n";