XML::LibXML and HTML (in >=v1.67)
Robin Berjon
robin at berjon.com
Wed Apr 1 19:07:46 BST 2009
On Apr 1, 2009, at 09:11 , mirod wrote:
> The only problem I found was with tags like '<table 1>' which gets
> output by the as_XML method as '<table 1="1">', which is not quite
> well-formed XML. This doesn't prevent you from using XPath on it
> with HTML::TreeBuilder::XPath though.
It's more than "not quite well-formed" it's simply invalid XML :)
If you want to understand HTML documents in the way that browsers do,
à la HTML5, then you will have documents that simply cannot in the
general case be converted to XML because there are more HTML DOMs than
there are XML DOMs. But that's fine because it doesn't prevent you
from using any XML tools so long as you stick to the abstract level.
--
Robin Berjon - http://berjon.com/
Feel like hiring me? Go to http://robineko.com/
More information about the london.pm
mailing list