Web scraping frameworks?

Hernan Lopes hernanlopes at gmail.com
Tue Mar 4 23:39:32 GMT 2014


Yeah, its by Mr Miyagawa.

I find HTML::TreeBuilder::LibXML more complete than
HTML::TreeBuilder::Xpath. Because HTML::TreeBuilder::Xpath seems to not be
able to parse certain tags by default, and of course being from Miyagawa is
a huge +++. Good he took some tame to dig into the scrapper area.


On Tue, Mar 4, 2014 at 8:21 PM, Pierre M <piemas25 at gmail.com> wrote:

> > But remember HTML::TreeBuilder::LibXML will accept more html5 tags
> > than HTML::TreeBuilder::XPath =)
> Would you say that HTML::TreeBuilder::LibXML always better than
> HTML::TreeBuilder::XPath
> ?
> I notice that HTML::TreeBuilder::LibXM depends on HTML::TreeBuilder::XPath
> as well as on XML::LibXML and on Web::Scraper, which is surprising.
>
> What is the advantage of LibXML over XPath?
>
>
> Oh, I just found Web::Scraper::LibXML - also by Miyagawa.
>


More information about the london.pm mailing list