Web scraping frameworks?
Joel Bernstein
joel at fysh.org
Fri Mar 7 13:02:54 GMT 2014
On 7 March 2014 12:48, Dave Hodgkinson <davehodg at gmail.com> wrote:
> Installing HTML::TreeBuilder::LibXML seemed like a good idea but didn't
> make any difference.
>
https://metacpan.org/pod/HTML::TreeBuilder::LibXML#BENCHMARK suggests it
ought to increase speed considerably - what's your benchmark look like?
Can you paste your benchmark code? Are you using local HTML so as to
discount network I/O and LWP overhead? Does your scraping perform better if
you use CSS selectors rather than XPath expressions? Does it make much
difference if you scrape more/fewer selectors - that is, are your scrapes
slow due to their complexity or due to fixed overhead in the library?
/joel
More information about the london.pm
mailing list