utf8 oddness

Paul Makepeace paulm at paulm.com
Wed Jun 10 16:55:06 BST 2009


Can someone explain this,

rpix:~$ perl -le 'print "À"'
À

(Great, my terminal works.)

rpix:~$ perl -le 'print ord("À")'
195

What does 195 refer to? 195 is \xC3 which is another character,
according to http://jeppesn.dk/utf-8.html (A~ versus A`)

rpix:~$ perl -le 'print chr(195)'
##

What's happening here?

rpix:~$ perl -le 'print "\xc3\x80"'
À

(So printing utf8 octets produces something reasonable.)

rpix:~$ perl -MEncode -le 'print decode("iso-8859-1", chr(195))'
##

What's this doing? Presumably chr(195) isn't \xC3 in Latin-1 so what is it?

rpix:~$ perl -MEncode -le '$a = chr(195); print decode("iso-8859-1",
$a, Encode::FB_CROAK)'
##

Why no croaking?

rpix:~$ perl -MEncode=from_to -le '$a = chr(195); from_to($a,
"iso-8859-1", "utf8", Encode::FB_CROAK); print $a'
Ã
rpix:~$

Ah, from_to works where decode didn't. But why? My understanding is
that from_to is the same except leaves the utf8 flag off. Reassuringly
at least, the character printed there IS Latin-1's \xC3 (not the
slightly different accent).

rpix:~$ perl -MEncode -le 'print Encode::is_utf8("À")'

How can this not be true?

rpix:~$ perl -MEncode -le 'print Encode::is_utf8("À", Encode::FB_CROAK)'

It's not utf8 but it's not croaking either, ...?

rpix:~$


Paul



More information about the london.pm mailing list