An interesting problem

Fri Jan 5 20:45:15 GMT 2007

* McGlinchy, Alistair (Alistair.McGlinchy at marks-and-spencer.com) [070104 17:41]:
> Simon Matthews wrote:
> > One thing that might speed it up even further would be to 
> > pre-cache the 
> > total of (0..x, 0..y) for each x and y.
> 
> I've had a go at this method. Its slower than my previous post (up to
> 40secs from about 16secs).  There's some hash ref caching that may help
> speed it up a bit. What I cannot work out is how you think this method
> can be intergrare within the previous method.  Please advise.
> 

I've done some experiments and replacing the use of the intermediate
variables $A, $AB, $AC and $ABCD with direct access to $sum_from_origin
changed the run time from 25 seconds to around 15 seconds (the previous
source code takes 17 seconds to run). I reduced this even further down to
~10 seconds by changing the function to cache some of the intermediate 
values.

I noticed a problem where it was giving the wrong results - when
calculating $A etc it needs to refer to the cell(s) before the range area.
Hence:
    $A = $sum_from_origin[$x1-1][$y1-1];
    $AB = $sum_from_origin[$x1-1][$y2];
    $AC = $sum_from_origin[$x2][$y1-1];

This then needs handling for the cases where $x1 and $y1 are 0, so I ended up
inserting 0s into the $sum_from_origin area to cope with this.

See below for the code with the appropriate modifications.

Simon

__CODE__

# Top part as before

sub prepare_origin_range {

    # Increment since sum_from_original has an extra row and column for
    # 0s
    $max_x++;
    $max_y++;

    # $sum_from_origin[$x][$y] is the sum of the cells from
    # 0 .. $x and 0 .. $y in row
    for my $y (0 .. $max_y) {
	$sum_from_origin[0][$y] = 0;
    }

    for my $x (1 .. $max_x) {
	my $tot = 0;
	$sum_from_origin[$x][0] = 0;

	for my $y (1 .. $max_y) {
	    $tot += $table[$x-1][$y-1];
	    $sum_from_origin[$x][$y] = $sum_from_origin[$x-1][$y] + $tot;
	}
    }
}

sub max_sum_range {
    my $max = $sum_from_origin[1][1];
    my $val;

    for my $x1(1   .. $max_x) {
	for my $y1(1   .. $max_y) {
	    my $A    = $sum_from_origin[$x1-1][$y1-1];

	    for my $x2($x1 .. $max_x) {
		my $tmp   = $A - $sum_from_origin[$x2][$y1-1];

		for my $y2($y1 .. $max_y) {
		    #+---|---|
		    #| A | B |
		    #----+---+
		    #| C | D |
		    #----+---+
		    $val = $sum_from_origin[$x2][$y2] -   # ABCD
			$sum_from_origin[$x1-1][$y2] +    # AC
			$tmp                              # (A - AB)
#			$sum_from_origin[$x1-1][$y1-1] -  # A
#			$sum_from_origin[$x2][$y1-1]      # AB
		        ;

		    if ($max  < $val) {
			$max = $val;
			print "Range ($x1,$y1) , ($x2,$y2) is better with $val\n";
			@result = ($x1,$y1,$x2,$y2);
		    }
		}
	    }
	}
    }

    # Fix up the coordinates returned to be 0-based
    $result[0]--;
    $result[1]--;
    $result[2]--;
    $result[3]--;

    return ($max, @result);
}   

__EDOC__