[OT] benchmarking "typical" programs

Fri Sep 21 10:38:12 BST 2012

On Fri, 21 Sep 2012, Nicholas Clark wrote:

> On Fri, Sep 21, 2012 at 08:56:34AM +0100, Simon Wistow wrote:
>> On Thu, Sep 20, 2012 at 12:35:18PM +0100, Nicholas Clark said:
>>> Lots of "one trick pony" type benchmarks exist, but very few that actually
>>> try to look like they are doing typical things typical programs do, at the
>>> typical scales real programs work out, so
>>
>> As a search engineer (recovering) I'm inclined to say - get a corpus of
>> docs, build an inverted index out of it and then do some searches. This
>> will test
>>
>>
>> 1) File/IO Performance (Reading in the corpus)
>> 2) Text manipulation (Tokenizing, Stop word removal, Stemming)
>> 3) Data structure performance (Building the index)
>> 4) Maths Calculation (performing TF/IDF searches)
>>
>> All in pretty good, discrete steps. Plus by tweaking the size of the
>> corpus you can stress memory as well.
>
> Thanks, this is a useful suggestion, but...
>
> I'm not a search engineer (recovering or otherwise), so this represents
> rather more work that I wanted to do. In that I first have to learn enough
> of how to *be* a search engineer to figure out how to write the above code
> to do something useful, and *then* how to write such code to a reasonably
> performant production versions, and then to turn working code into something
> sufficiently stand alone to be a benchmark.
>
> I don't want to be spending my time figuring out the right way to do all the
> above algorithms in Perl. I want to get as fast as possible to the point of
> figuring out how the perl interpreter (mis)behaves when presented with
> extant decent code to do the above.
>
> Unless there's a CPAN-in-a-box for doing most of the four steps.
> (which doesn't depend on external C libraries. That was one of my
> "preferably" criteria)
>
> So, next question - if I wanted to be as lazy as possible and write a search
> engine (as described above) using as much of CPAN as possible, which modules
> are recommended? :-)
>

the Plucene test suite maybe the answer. I know it cetainly does the 
indexing bit.

-- 
bob walker
everything should be purple and bendy
http://randomness.org.uk