site text search

Dirk Koopman djk at tobit.co.uk
Thu Feb 9 11:08:16 GMT 2006


On Thu, 2006-02-09 at 10:31 +0000, Dominic Mitchell wrote:
> On Wed, Feb 08, 2006 at 10:23:10PM +0000, Andy Armstrong wrote:
> > * if you have a site that presents multiple views of the same
> >   data (e.g. articles sorted by date, by subject, by keyword)
> >   then a crawler based indexer will index each item many times
> >   - once for each view in which it appears; MySQL will only have
> >   a single copy of the data.
> 
> Then you've got a broken web site -- period.  Content should have only
> one, canonical URL.

<rant>

Really? So we are only allowed to hyperlink with absolute names now are
we? Have you googled for stuff on this mailing list recently? 

The real world really does not work like that. In fact, I would go
further: any site that implements the "one canonical URL" paradigm is
likely to be extremely difficult to use. 

</rant>

Of course making sure that only one copy of the content goes into a
search engine's index is a different problem and is an ideal to be
striven for. The after effect of actually achieving this would be just
the one URI to the content from that search engine.

Dirk




More information about the london.pm mailing list