Test::XML not working with UTF-8

John Ramsden-Developer John.Ramsden2 at bbc.co.uk
Thu Feb 1 11:39:21 GMT 2007

As an IT contractor (hi all - just joined the list), I've been tasked
with making any amendments and tests needed to ensure that a large
web application of my client works reliably with UTF-8.

This has turned out, not surprisingly, to be the perl equivalent of
climbing the north face of the Eiger with a hundredweight of pots
and pans tied round my waist!

(Not least because none of the editors on on the client's Solaris
system is UTF-8 aware - I've been using 'od -xc', the hex dumper,
and recommended they purchase SlickEdit, but any other suggestions
are welcome. Are there UTF-8 compliant versions of vim or emacs
for example?)

Anyway, getting to the point, I wonder if anyone has any ideas
why Test::XML fails to recognize UTF-8 characters, or can think
of an alternative I might use if Test::XML is no good for UTF-8.

The following script fails with an error when $smiley is set to
its UTF-8 byte sequence, but passes when it is set to ';-)'.

    use strict;
    use warnings;

    use utf8;
    use encoding 'utf8';      # may be the same as 'use utf8' ?!

    my $smiley = "\x{263A}";  # test works fine if smiley is ';-)'

    use Test::XML tests => 1;

    my $xml_found = '<?xml version="0.1234"
encoding="UTF-8"?><hack>smiley ' . $smiley . ' </hack>';

    my $xml_expected = $xml_found;

    if (! Test::XML::is_xml($xml_found, $xml_expected))
        print "Test::XML::is_xml() returned false!\n";


John R Ramsden

P.S. Is there a difference between 'use utf8' and 'utf encoding utf8'?
One of my colleagues reckons they are equivalent.

