web crawling in perl
andy at petdance.com
Mon May 22 20:10:47 BST 2006
On May 22, 2006, at 1:37 PM, Sam Smith wrote:
> What do people think is the "best" perl (or possibly
> otherwise if it's much better) module/script for crawling
> remote websites?
> Some of them are relatively complicated dynamic CGI messes,
> and I'm especially interessted in things which aren't html
> documents (doc, pdf, ppt etc).
What are you trying to do with what you crawl? If you're just
mirroring a site, I think that wget from the command-line would be
the way to go. If you're programattically checking out links and
forms, then use WWW::Mechanize for encapsulating much of your dirty
Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance
More information about the london.pm