web crawling in perl

Sam Smith S at mSmith.net
Mon May 22 19:37:55 BST 2006


What do people think is the "best" perl (or possibly
otherwise if it's much better) module/script for crawling
remote websites?

Some of them are relatively complicated dynamic CGI messes,
and I'm especially interessted in things which aren't html
documents (doc, pdf, ppt etc).


Google suggests LWP::RobotUA and HTML::SimpleLinkExtor and
rolling my own; lots of simple ones which don't use those
modules and have large caveats. What've I missed?



Any suggestions welcomed.

thanks
Sam


More information about the london.pm mailing list