[OT] benchmarking "typical" programs

Nicholas Clark nick at ccl4.org
Fri Sep 21 10:22:44 BST 2012

On Fri, Sep 21, 2012 at 08:56:34AM +0100, Simon Wistow wrote:
> On Thu, Sep 20, 2012 at 12:35:18PM +0100, Nicholas Clark said:
> > Lots of "one trick pony" type benchmarks exist, but very few that actually
> > try to look like they are doing typical things typical programs do, at the
> > typical scales real programs work out, so
> As a search engineer (recovering) I'm inclined to say - get a corpus of 
> docs, build an inverted index out of it and then do some searches. This 
> will test
> 1) File/IO Performance (Reading in the corpus)
> 2) Text manipulation (Tokenizing, Stop word removal, Stemming)
> 3) Data structure performance (Building the index)
> 4) Maths Calculation (performing TF/IDF searches)
> All in pretty good, discrete steps. Plus by tweaking the size of the 
> corpus you can stress memory as well.

Thanks, this is a useful suggestion, but...

I'm not a search engineer (recovering or otherwise), so this represents
rather more work that I wanted to do. In that I first have to learn enough
of how to *be* a search engineer to figure out how to write the above code
to do something useful, and *then* how to write such code to a reasonably
performant production versions, and then to turn working code into something
sufficiently stand alone to be a benchmark.

I don't want to be spending my time figuring out the right way to do all the
above algorithms in Perl. I want to get as fast as possible to the point of
figuring out how the perl interpreter (mis)behaves when presented with
extant decent code to do the above.

Unless there's a CPAN-in-a-box for doing most of the four steps.
(which doesn't depend on external C libraries. That was one of my
"preferably" criteria)

So, next question - if I wanted to be as lazy as possible and write a search
engine (as described above) using as much of CPAN as possible, which modules
are recommended? :-)

Nicholas Clark

More information about the london.pm mailing list