site text search

Andy Armstrong andy at hexten.net
Thu Feb 9 10:48:54 GMT 2006


On 9 Feb 2006, at 10:31, Dominic Mitchell wrote:
> On Wed, Feb 08, 2006 at 10:23:10PM +0000, Andy Armstrong wrote:
>> * if you have a site that presents multiple views of the same
>>   data (e.g. articles sorted by date, by subject, by keyword)
>>   then a crawler based indexer will index each item many times
>>   - once for each view in which it appears; MySQL will only have
>>   a single copy of the data.
>
> Then you've got a broken web site -- period.  Content should have only
> one, canonical URL.

Sure - but the URLs may have e.g. extra parameters appended to  
indicate which view the content was displayed in - so that the  
content page can have prev and next buttons that move to the next  
document in the view. You'll have to configure a spider to strip  
those otherwise it'll index the same page multiple times.

-- 
Andy Armstrong, hexten.net



More information about the london.pm mailing list