Test::XML not working with UTF-8

Robin Barker Robin.Barker at npl.co.uk
Thu Feb 1 13:29:42 GMT 2007


Hi

> Anyway, getting to the point, I wonder if anyone has any ideas
> why Test::XML fails to recognize UTF-8 characters, or can think
> of an alternative I might use if Test::XML is no good for UTF-8.

Test::XML uses XML::SemanticDiff which uses Digest::MD5.

     Perl 5.8 support Unicode characters in strings.  Since the
     MD5 algorithm is only defined for strings of bytes, it can
     not be used on strings that contains chars with ordinal
     number above 255.  The MD5 functions and methods will croak
     if you try to feed them such input data.

There is work around in Digest::MD5, which I have implemented in
XML::SemanticDiff, patch below.

I don't think Test::XML or XML::SemanticDiff know about encode="UTF-8".


> P.S. Is there a difference between 'use utf8' and 'utf encoding utf8'?
> One of my colleagues reckons they are equivalent.

Unless your perl script file is utf8 encoded, you don't need either.
Your is all ASCII: \x{263A} is just 9 ASCII characters.

Robin

--- XML/SemanticDiff.pm.orig    Tue Apr  9 09:57:59 2002
+++ XML/SemanticDiff.pm Thu Feb  1 13:19:17 2007
@@ -136,6 +136,7 @@
 package PathFinder;
 use strict;
 use Digest::MD5  qw(md5_base64);   
+use Encode qw(encode_utf8);
 my $descendents = {};
 my $position_index = {};
 my $char_accumulator = {};
@@ -190,7 +191,7 @@
 #    $ctx->add("$text");
 #    $doc->{"$test_context"}->{TextChecksum} = $ctx->b64digest;

-    $doc->{"$test_context"}->{TextChecksum} = md5_base64("$text");
+    $doc->{"$test_context"}->{TextChecksum} = md5_base64(encode_utf8("$text"));
     if ($opts->{keepdata}) {
         $doc->{"$test_context"}->{CData} = $text;
     }
End of patch

-------------------------------------------------------------------
This e-mail and any attachments may contain confidential and/or
privileged material; it is for the intended addressee(s) only.
If you are not a named addressee, you must not use, retain or
disclose such information.

NPL Management Ltd cannot guarantee that the e-mail or any
attachments are free from viruses.

NPL Management Ltd. Registered in England and Wales. No: 2937881
Registered Office: Serco House, 16 Bartley Wood Business Park,
                   Hook, Hampshire, United Kingdom  RG27 9UY
-------------------------------------------------------------------



More information about the london.pm mailing list