Perl API to Lucene
Marvin Humphrey
marvin at rectangular.com
Thu May 4 17:57:55 BST 2006
Jonathan Peterson wrote:
> Kinosearch is a Perl search engine inspired by [P]Lucene rather than
> replicating it, and it's performance is in the same range as Lucene
> (though still significantly slower).
For now, that is...
Benchmarking performance on my G4 laptop
-----------------------------------------------------------
version 0.08 148.71 secs (6 reps, truncated mean)
version 0.09_03 92.94 secs (6 reps, truncated mean)
version 0.09 73.51 secs (6 reps, truncated mean)
I have not yet begun to optimize. :)
> It uses a good deal less RAM however.
At index-time, this is because KinoSearch uses a different and
theoretically better merge model than Lucene.
I haven't yet measured search-time RAM usage in a controlled
experiment. However, Lucene and CLucene store all text in RAM using
UTF-16, whereas KinoSearch doesn't care what encoding you use. For
text in Latin-1, UTF-8 and similar, that means the cache of the term
dictionary ought to take up a lot less space. I plan to exploit this
by making it possible to import the entire term dictionary into RAM
rather than just a portion as is now the case with both Lucene and
KinoSearch, which will mean fewer disk seeks and ultimately, faster
searching.
Looking forward to some healthy competition,
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
More information about the london.pm
mailing list