site text search

Dirk Koopman djk at tobit.co.uk
Thu Feb 9 12:10:36 GMT 2006


On Thu, 2006-02-09 at 11:12 +0000, Aaron Crane wrote:
> Dirk Koopman writes:
> > I am, just for now, using http://swish-e.org One of the reasons for this
> > is that I have used it (long ago) in the past and also I find it (much)
> > faster and much easier to have several instances of than htdig. The fact
> > that is does not use a whole 512Mb (+ all the swap and then core dump
> > and fail) on a website of about 30MB is also a point in its favour. I
> > have used it on much larger ones.
> 
> I have some familiarity with Swish-e, and, yes, it seems to be pretty
> fast at both indexing and searching.  And the Perl interface is entirely
> good enough.
> 
> But I'd be wary of trying to use it on a new project any time soon:
> 
>   - It has no Unicode support (though you can use any 8-bit character
>     set).  Apparently this is due to be fixed in version 3.0, but I've
>     no idea of how soon that's expected.

I don't think it is under active development and, in any case, if I were
doing it would be UTF8 rather unicode. Much easier to deal with and more
generalised as I think others connected with this list have found. Since
I only speak English and Dutch and my users have to communicate in one
of those to get any support, the UTF8/Unicode aspect doesn't affect
me :-)

>   - It doesn't support deleting documents from an index, or reindexing
>     changed documents that are already in the index.  The current stable
>     version can be built with ./configure --enable-incremental to do
>     that, but it's described as an experimental feature.  Again, the
>     plan seems to be for version 3.0 to do this properly.
> 

This isn't an issue with the amounts of data I am trying to index.
Although I have, in the past, used it for databases of several 100's of
MB and it wasn't much of an issue then. 

It indexes the whole 30-odd MB in less than a minute on a 600Mhz Via C3,
once a day. Can't complain really.

Dirk



More information about the london.pm mailing list