site text search
andy at hexten.net
Thu Feb 9 10:48:54 GMT 2006
On 9 Feb 2006, at 10:31, Dominic Mitchell wrote:
> On Wed, Feb 08, 2006 at 10:23:10PM +0000, Andy Armstrong wrote:
>> * if you have a site that presents multiple views of the same
>> data (e.g. articles sorted by date, by subject, by keyword)
>> then a crawler based indexer will index each item many times
>> - once for each view in which it appears; MySQL will only have
>> a single copy of the data.
> Then you've got a broken web site -- period. Content should have only
> one, canonical URL.
Sure - but the URLs may have e.g. extra parameters appended to
indicate which view the content was displayed in - so that the
content page can have prev and next buttons that move to the next
document in the view. You'll have to configure a spider to strip
those otherwise it'll index the same page multiple times.
Andy Armstrong, hexten.net
More information about the london.pm