web crawling in perl
Sam Smith
S at mSmith.net
Mon May 22 19:37:55 BST 2006
What do people think is the "best" perl (or possibly
otherwise if it's much better) module/script for crawling
remote websites?
Some of them are relatively complicated dynamic CGI messes,
and I'm especially interessted in things which aren't html
documents (doc, pdf, ppt etc).
Google suggests LWP::RobotUA and HTML::SimpleLinkExtor and
rolling my own; lots of simple ones which don't use those
modules and have large caveats. What've I missed?
Any suggestions welcomed.
thanks
Sam
More information about the london.pm
mailing list