site text search
Dirk Koopman
djk at tobit.co.uk
Thu Feb 9 12:10:36 GMT 2006
On Thu, 2006-02-09 at 11:12 +0000, Aaron Crane wrote:
> Dirk Koopman writes:
> > I am, just for now, using http://swish-e.org One of the reasons for this
> > is that I have used it (long ago) in the past and also I find it (much)
> > faster and much easier to have several instances of than htdig. The fact
> > that is does not use a whole 512Mb (+ all the swap and then core dump
> > and fail) on a website of about 30MB is also a point in its favour. I
> > have used it on much larger ones.
>
> I have some familiarity with Swish-e, and, yes, it seems to be pretty
> fast at both indexing and searching. And the Perl interface is entirely
> good enough.
>
> But I'd be wary of trying to use it on a new project any time soon:
>
> - It has no Unicode support (though you can use any 8-bit character
> set). Apparently this is due to be fixed in version 3.0, but I've
> no idea of how soon that's expected.
I don't think it is under active development and, in any case, if I were
doing it would be UTF8 rather unicode. Much easier to deal with and more
generalised as I think others connected with this list have found. Since
I only speak English and Dutch and my users have to communicate in one
of those to get any support, the UTF8/Unicode aspect doesn't affect
me :-)
> - It doesn't support deleting documents from an index, or reindexing
> changed documents that are already in the index. The current stable
> version can be built with ./configure --enable-incremental to do
> that, but it's described as an experimental feature. Again, the
> plan seems to be for version 3.0 to do this properly.
>
This isn't an issue with the amounts of data I am trying to index.
Although I have, in the past, used it for databases of several 100's of
MB and it wasn't much of an issue then.
It indexes the whole 30-odd MB in less than a minute on a 600Mhz Via C3,
once a day. Can't complain really.
Dirk
More information about the london.pm
mailing list