Should UTF-8 be a swear word ?

Thomas Busch tbusch at
Tue Aug 8 14:27:54 BST 2006

Hi all,

maybe someone can help on the following UTF-8 issue
which left a few perl engineers angry and frustrated.
As a matter of fact in my office UTF-8 is currently a
swear word.

I'm using perl 5.8.6 and for some strange reason the
following program:


use strict ;

my $string = "cl\xe9ment";


if (utf8::is_utf8($string)) {
  print "is utf8\n";

if (utf8::valid($string)) {
  print "is valid utf8\n";

if ($string =~ m/\xe9/) {
  print "match \\xE9\n";

if ($string =~ m/\x{c3a9}/) {
  print "match \\xC3A9\n";


is utf8
is valid utf8
match \xE9

instead of

is utf8
is valid utf8
match \xC3E9

Is this a bug ? Why is the latin e letter with acute not
getting upgraded to UTF-8 ?


More information about the mailing list