Should UTF-8 be a swear word ?

Tatsuhiko Miyagawa miyagawa at
Wed Aug 9 03:10:55 BST 2006

On 8/8/06, Thomas Busch <tbusch at> wrote:
> my $string = "cl\xe9ment";
> utf8::upgrade($string);

1) utf8::upgrade means upgrading the byte string to Unicode string. It
*doesn't* necessary gurantee the internal representation is utf-8.
Practically, if the string contains bytes larger than 255 it's encoded
in utf-8 and otherwise latin-1. Anyways you shouldn't rely on the
internal encoding.

2) \x{c3a9} actually refers Unicode character U+C3A9, not utf-8 bytes \xc3\xa9.

That said, try this instead:

  my $string = "cl\x{e9}ment";

  if ($string =~ /\xc3\xa9/) {
      print "match \\xc3\\xa9\n";

Tatsuhiko Miyagawa

More information about the mailing list