Web scraping frameworks?

Hernan Lopes hernanlopes at gmail.com
Tue Mar 4 23:42:56 GMT 2014

When someone goes really deep into web scrapping, they will encounter the
problems i cited.
Im not sure how to handle all those situations with Web::Scraper::LibXML.
Examples on that would be great.

On Tue, Mar 4, 2014 at 8:39 PM, Hernan Lopes <hernanlopes at gmail.com> wrote:

> Yeah, its by Mr Miyagawa.
> I find HTML::TreeBuilder::LibXML more complete than
> HTML::TreeBuilder::Xpath. Because HTML::TreeBuilder::Xpath seems to not be
> able to parse certain tags by default, and of course being from Miyagawa is
> a huge +++. Good he took some tame to dig into the scrapper area.
> On Tue, Mar 4, 2014 at 8:21 PM, Pierre M <piemas25 at gmail.com> wrote:
>> > But remember HTML::TreeBuilder::LibXML will accept more html5 tags
>> > than HTML::TreeBuilder::XPath =)
>> Would you say that HTML::TreeBuilder::LibXML always better than
>> HTML::TreeBuilder::XPath
>> ?
>> I notice that HTML::TreeBuilder::LibXM depends on HTML::TreeBuilder::XPath
>> as well as on XML::LibXML and on Web::Scraper, which is surprising.
>> What is the advantage of LibXML over XPath?
>> Oh, I just found Web::Scraper::LibXML - also by Miyagawa.

More information about the london.pm mailing list