Should UTF-8 be a swear word ?

Thomas Busch tbusch at cpan.org
Tue Aug 8 14:27:54 BST 2006


Hi all,

maybe someone can help on the following UTF-8 issue
which left a few perl engineers angry and frustrated.
As a matter of fact in my office UTF-8 is currently a
swear word.

I'm using perl 5.8.6 and for some strange reason the
following program:

#!/usr/bin/perl

use strict ;

my $string = "cl\xe9ment";

utf8::upgrade($string);

if (utf8::is_utf8($string)) {
  print "is utf8\n";
}

if (utf8::valid($string)) {
  print "is valid utf8\n";
}

if ($string =~ m/\xe9/) {
  print "match \\xE9\n";
}

if ($string =~ m/\x{c3a9}/) {
  print "match \\xC3A9\n";
}

yields

is utf8
is valid utf8
match \xE9

instead of

is utf8
is valid utf8
match \xC3E9

Is this a bug ? Why is the latin e letter with acute not
getting upgraded to UTF-8 ?

Thomas.




More information about the london.pm mailing list