web crawling in perl
S at mSmith.net
Mon May 22 19:37:55 BST 2006
What do people think is the "best" perl (or possibly
otherwise if it's much better) module/script for crawling
Some of them are relatively complicated dynamic CGI messes,
and I'm especially interessted in things which aren't html
documents (doc, pdf, ppt etc).
Google suggests LWP::RobotUA and HTML::SimpleLinkExtor and
rolling my own; lots of simple ones which don't use those
modules and have large caveats. What've I missed?
Any suggestions welcomed.
More information about the london.pm