Test::XML not working with UTF-8
Dominic Mitchell
dom at happygiraffe.net
Thu Feb 1 15:13:01 GMT 2007
On Thu, Feb 01, 2007 at 01:29:42PM -0000, Robin Barker wrote:
> > Anyway, getting to the point, I wonder if anyone has any ideas
> > why Test::XML fails to recognize UTF-8 characters, or can think
> > of an alternative I might use if Test::XML is no good for UTF-8.
>
> Test::XML uses XML::SemanticDiff which uses Digest::MD5.
>
> Perl 5.8 support Unicode characters in strings. Since the
> MD5 algorithm is only defined for strings of bytes, it can
> not be used on strings that contains chars with ordinal
> number above 255. The MD5 functions and methods will croak
> if you try to feed them such input data.
>
> There is work around in Digest::MD5, which I have implemented in
> XML::SemanticDiff, patch below.
>
> I don't think Test::XML or XML::SemanticDiff know about encode="UTF-8".
I'm the Test::XML author -- though I haven't done anything with it in a
couple of years...
There is no specific UTF-8 support in Test::XML. It's not really
appropriate as it just passes data on to lower layers.
The patch to XML::SemanticDiff looks like it will work correctly
(convert from characters to bytes at the correct place). I wonder how
to go about updating it on CPAN? Aha, I've just seen Robin's RT.
You also asked about UTF-8 aware editors. Most recent versions of vim
should support UTF-8 fully. Emacs 21 has _some_ UTF-8 support, but it's
not as good as it could be.
-Dom
More information about the london.pm
mailing list