Should UTF-8 be a swear word ?
Thomas Busch
tbusch at cpan.org
Tue Aug 8 14:27:54 BST 2006
Hi all,
maybe someone can help on the following UTF-8 issue
which left a few perl engineers angry and frustrated.
As a matter of fact in my office UTF-8 is currently a
swear word.
I'm using perl 5.8.6 and for some strange reason the
following program:
#!/usr/bin/perl
use strict ;
my $string = "cl\xe9ment";
utf8::upgrade($string);
if (utf8::is_utf8($string)) {
print "is utf8\n";
}
if (utf8::valid($string)) {
print "is valid utf8\n";
}
if ($string =~ m/\xe9/) {
print "match \\xE9\n";
}
if ($string =~ m/\x{c3a9}/) {
print "match \\xC3A9\n";
}
yields
is utf8
is valid utf8
match \xE9
instead of
is utf8
is valid utf8
match \xC3E9
Is this a bug ? Why is the latin e letter with acute not
getting upgraded to UTF-8 ?
Thomas.
More information about the london.pm
mailing list