Test::XML not working with UTF-8

Dominic Mitchell dom at happygiraffe.net
Thu Feb 1 15:13:01 GMT 2007


On Thu, Feb 01, 2007 at 01:29:42PM -0000, Robin Barker wrote:
> > Anyway, getting to the point, I wonder if anyone has any ideas
> > why Test::XML fails to recognize UTF-8 characters, or can think
> > of an alternative I might use if Test::XML is no good for UTF-8.
> 
> Test::XML uses XML::SemanticDiff which uses Digest::MD5.
> 
>      Perl 5.8 support Unicode characters in strings.  Since the
>      MD5 algorithm is only defined for strings of bytes, it can
>      not be used on strings that contains chars with ordinal
>      number above 255.  The MD5 functions and methods will croak
>      if you try to feed them such input data.
> 
> There is work around in Digest::MD5, which I have implemented in
> XML::SemanticDiff, patch below.
> 
> I don't think Test::XML or XML::SemanticDiff know about encode="UTF-8".

I'm the Test::XML author -- though I haven't done anything with it in a
couple of years...

There is no specific UTF-8 support in Test::XML.  It's not really
appropriate as it just passes data on to lower layers.

The patch to XML::SemanticDiff looks like it will work correctly
(convert from characters to bytes at the correct place).  I wonder how
to go about updating it on CPAN?  Aha, I've just seen Robin's RT.

You also asked about UTF-8 aware editors.  Most recent versions of vim
should support UTF-8 fully.  Emacs 21 has _some_ UTF-8 support, but it's
not as good as it could be.

-Dom


More information about the london.pm mailing list