[OT] benchmarking "typical" programs

Fri Sep 21 09:17:21 BST 2012

+1

And as a bonus, you cover pretty much the whole data munging market as a side effect with this one. 

On 21/09/2012, at 17:56, Simon Wistow <simon at thegestalt.org> wrote:

> On Thu, Sep 20, 2012 at 12:35:18PM +0100, Nicholas Clark said:
>> Lots of "one trick pony" type benchmarks exist, but very few that actually
>> try to look like they are doing typical things typical programs do, at the
>> typical scales real programs work out, so
> 
> As a search engineer (recovering) I'm inclined to say - get a corpus of 
> docs, build an inverted index out of it and then do some searches. This 
> will test
> 
> 
> 1) File/IO Performance (Reading in the corpus)
> 2) Text manipulation (Tokenizing, Stop word removal, Stemming)
> 3) Data structure performance (Building the index)
> 4) Maths Calculation (performing TF/IDF searches)
> 
> All in pretty good, discrete steps. Plus by tweaking the size of the 
> corpus you can stress memory as well.
> 
> Simon