[JOB] Perl Software Developer and Database programmer

McGlinchy, Alistair Alistair.McGlinchy at marks-and-spencer.com
Wed Feb 22 22:05:04 GMT 2006


Ovid proposed:
>   Imagine we're building an app and it turns out to be slow.  We've
>   profiled the code and reduced the problem down to this:
> 
>     foreach my $alpha ( @alphas ) {
>         foreach my $beta ( @betas ) {
>             if ( $alpha == $beta ) {
>                 push @results, $beta;
>             }
>         }
>     }
> 
>   1.  What is a likely reason for the performance problem?
>   2.  How might we speed it up?

That is a really peculiar piece of code! It is highly deceptive as, at
first glance, as if it finds the intersection of the two lists, but it
does not do that. It's no wonder it's tripped up some of the potential
staff requested to answer the question as well as some L.PMers. All the
suggested "improvements" I have seen so far are wrong for @alphas
containing duplicates (or more precisely, values that evaluate to
duplicates) and/or they return @results in the wrong order. [ They also
look very un-perlish IMHO ].

Consider: 
	my @betas =  qw( 1.0  01 1.00 );
	my @alphas = qw( 0x1    1a    );   # Note "0x1" and "1a" == 1
Output 
	@results  must be ( "1.0",  "01", "1.00", "1.0",  "01", "1.00"
);  

The point of the question (I presume) is to observe the inefficiency of
the O( N**2 ) loop.   Unfortunately the worst case will require N**2 so
we can't win every time. Consider @alphas == @betas == (1) x $N where
@results will contain $N**2 elements.

Having identified we can't avoid O(N**2) in all cases we can still speed
the process for many cases. A key to this problem is to realise the that
the inner loop is a function of alpha which says. 
	sub inner_loop {
		my $alpha=shift;
		push @results, map {$_ == $alpha} @betas; 
	}

This function is a constant for a given @betas so it need be evaluated
only once, perhaps by caching via a hash.  There are several other
alternatives that would be worth considering depending on what we expect
from a normal @alphas and @betas, and how we value the risk of the worst
case. All these hypothesis should be benchmarked.

__CODE__

# Cache of ==equal sub-lists of @betas, in original order.

my %hash_of_equals;   
# if @betas is ("1.0", 1, "1a" ,  2 , [2, 2, "2.0" , 'cat', undef)
# %hash_of_equals will become 
# ( 1 => [ "1.0", 1, "1a" ] ,  2 => [2, 2, "2.0"], 0=> ['cat',undef])

# Build the cache
for my $beta (@betas) {
    my $beta_as_number = $beta + 0;                   # force numerical
value
    push @{$hash_of_equals{$beta_as_number}}, $beta;  # Add it to our
cache
}

# 
my @results;
for my $alpha (@alphas) {
    my $alpha_as_number= $alpha+0;                  
    next if !$hash_of_equals{$alpha_as_number}; 
    push @results, @{$hash_of_equals{$alpha_as_number}}
}

# Benchmarked with:
# my @alphas = shuffle ( 1.. 100 , 50 .. 1500); # Overlap 50..100 to
ensure duplicates
# my @betas  = shuffle ( 1.. 100 , 50 .. 1500); # same again in a
different order
# Benchmark: running one, two for at least 5 CPU seconds...
#      orig:  6 wallclock secs ( 6.55 usr +  0.00 sys =  6.55 CPU) @
0.61/s (n=4)
#      amcg:  6 wallclock secs ( 5.52 usr +  0.05 sys =  5.56 CPU) @
30.92/s (n=172)

# Further optimisations
# - Maybe faster as lisp? 
#   my @results = map  { $hash_of_equals{ $_ + 0 } } 
#               grep { $hash_of_equals{ $_ + 0 } } 
#               @alphas
# - build a %_is_in_alphas hash too to prevent building $h_o_e for
unneeded $betas
# - build h_o_e via lisp too
# - Memoize
# 
__END__

Now can I have the job?  :-)

Cheers,

Alistair

**********************************************************************
Registered Office:
Marks and Spencer plc
Waterside House
35 North Wharf Road
London
W2 1NW

Registered No. 214436 in England and Wales.

Telephone (020) 7935 4422
Facsimile (020) 7487 2670

<<www.marksandspencer.com>>

Please note that electronic mail may be monitored.

This e-mail is confidential. If you received it by mistake, please let us know and then delete it from your system; you should not copy, disclose, or distribute its contents to anyone nor act in reliance on this e-mail, as this is prohibited and may be unlawful.
2005





More information about the london.pm mailing list