Should UTF-8 be a swear word ?
Tatsuhiko Miyagawa
miyagawa at gmail.com
Wed Aug 9 03:10:55 BST 2006
On 8/8/06, Thomas Busch <tbusch at cpan.org> wrote:
> my $string = "cl\xe9ment";
>
> utf8::upgrade($string);
1) utf8::upgrade means upgrading the byte string to Unicode string. It
*doesn't* necessary gurantee the internal representation is utf-8.
Practically, if the string contains bytes larger than 255 it's encoded
in utf-8 and otherwise latin-1. Anyways you shouldn't rely on the
internal encoding.
2) \x{c3a9} actually refers Unicode character U+C3A9, not utf-8 bytes \xc3\xa9.
That said, try this instead:
my $string = "cl\x{e9}ment";
utf8::encode($string);
if ($string =~ /\xc3\xa9/) {
print "match \\xc3\\xa9\n";
}
--
Tatsuhiko Miyagawa
More information about the london.pm
mailing list