utf8 oddness
Paul Makepeace
paulm at paulm.com
Wed Jun 10 16:55:06 BST 2009
Can someone explain this,
rpix:~$ perl -le 'print "À"'
À
(Great, my terminal works.)
rpix:~$ perl -le 'print ord("À")'
195
What does 195 refer to? 195 is \xC3 which is another character,
according to http://jeppesn.dk/utf-8.html (A~ versus A`)
rpix:~$ perl -le 'print chr(195)'
##
What's happening here?
rpix:~$ perl -le 'print "\xc3\x80"'
À
(So printing utf8 octets produces something reasonable.)
rpix:~$ perl -MEncode -le 'print decode("iso-8859-1", chr(195))'
##
What's this doing? Presumably chr(195) isn't \xC3 in Latin-1 so what is it?
rpix:~$ perl -MEncode -le '$a = chr(195); print decode("iso-8859-1",
$a, Encode::FB_CROAK)'
##
Why no croaking?
rpix:~$ perl -MEncode=from_to -le '$a = chr(195); from_to($a,
"iso-8859-1", "utf8", Encode::FB_CROAK); print $a'
Ã
rpix:~$
Ah, from_to works where decode didn't. But why? My understanding is
that from_to is the same except leaves the utf8 flag off. Reassuringly
at least, the character printed there IS Latin-1's \xC3 (not the
slightly different accent).
rpix:~$ perl -MEncode -le 'print Encode::is_utf8("À")'
How can this not be true?
rpix:~$ perl -MEncode -le 'print Encode::is_utf8("À", Encode::FB_CROAK)'
It's not utf8 but it's not croaking either, ...?
rpix:~$
Paul
More information about the london.pm
mailing list