Perl API to Lucene

Marvin Humphrey marvin at rectangular.com
Thu May 4 17:57:55 BST 2006


Jonathan Peterson wrote:

 > Kinosearch is a Perl search engine inspired by [P]Lucene rather than
 > replicating it, and it's performance is in the same range as Lucene
 > (though still significantly slower).

For now, that is...

Benchmarking performance on my G4 laptop
-----------------------------------------------------------
     version 0.08      148.71 secs (6 reps, truncated mean)
     version 0.09_03    92.94 secs (6 reps, truncated mean)
     version 0.09       73.51 secs (6 reps, truncated mean)

I have not yet begun to optimize.  :)

 > It uses a good deal less RAM however.

At index-time, this is because KinoSearch uses a different and  
theoretically better merge model than Lucene.

I haven't yet measured search-time RAM usage in a controlled  
experiment.  However, Lucene and CLucene store all text in RAM using  
UTF-16, whereas KinoSearch doesn't care what encoding you use.  For  
text in Latin-1, UTF-8 and similar, that means the cache of the term  
dictionary ought to take up a lot less space.  I plan to exploit this  
by making it possible to import the entire term dictionary into RAM  
rather than just a portion as is now the case with both Lucene and  
KinoSearch, which will mean fewer disk seeks and ultimately, faster  
searching.

Looking forward to some healthy competition,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



More information about the london.pm mailing list