Should UTF-8 be a swear word ?
Phil Pennock
phil.pennock at globnix.org
Wed Aug 9 14:28:05 BST 2006
On 2006-08-08 at 18:18 +0200, Thomas Busch wrote:
> 3) no matter how $string is encoded, binmode STDOUT, ":utf8"
> will force print "..." to always output in UTF-8. There will
> be no double encoding.
Not if { use encoding 'foo' } has been used. As well as changing the
internal language of the script, that also pushes layers onto the stdio
handles.
Tests below in a [ `locale charmap` = UTF-8 ] environment.
-----------------------------< cut here >-------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Encode;
use encoding 'iso-8859-1'; # <----
my $string = encode("iso-8859-1", "cl\xe9ment");
binmode(STDOUT, ":utf8");
print join(':', PerlIO::get_layers(STDOUT)) . "\n";
print "$string\n";
-----------------------------< cut here >-------------------------------
Result is:
stdio:encoding(iso-8859-1):utf8
clXent
modulo a substitute character where I've put 'X'. Comment out the "use
encoding" line and this works properly.
Same effect using "-C" instead of the explicit binmode(); you can't
describe the language that the script itself is written in without
messing up IO.
You do need either the -C or the binmode to get working output under
UTF-8.
If you want to write your script in a non-ASCII variant but use UTF-8
for stdio, using { use encoding 'whichever'; }, then make sure to also
put in { binmode STDOUT, ':raw'; } to clear the IO layers before you
push the utf8 layer on. And deal with STDIN, etc. And remember that
the :raw will also undo any layers put in place by the -C interpreter
switch.
All to the best of my understanding, which will probably now be ripped
to shreds.
--
VISTA: Viruses, Infections, Spyware, Trojans & Adware
More information about the london.pm
mailing list