web crawling in perl

Andy Lester andy at petdance.com
Mon May 22 20:10:47 BST 2006


On May 22, 2006, at 1:37 PM, Sam Smith wrote:

>
> What do people think is the "best" perl (or possibly
> otherwise if it's much better) module/script for crawling
> remote websites?
>
> Some of them are relatively complicated dynamic CGI messes,
> and I'm especially interessted in things which aren't html
> documents (doc, pdf, ppt etc).

What are you trying to do with what you crawl?  If you're just  
mirroring a site, I think that wget from the command-line would be  
the way to go.  If you're programattically checking out links and  
forms, then use WWW::Mechanize for encapsulating much of your dirty  
work.

xoa

--
Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance





More information about the london.pm mailing list