Should UTF-8 be a swear word ?
Paul Makepeace
paulm at paulm.com
Tue Aug 8 20:57:27 BST 2006
Thomas,
Take a look at the Encode module, in particular encode(), decode(),
and the notes on the CHECK parameter for your to Latin-1 question.
That ought to answer pretty much everything you ask here.
I also found this page pretty useful,
http://www.ahinea.com/en/tech/perl-unicode-struggle.html
HTH - Paul
On 8/8/06, Thomas Busch <tbusch at cpan.org> wrote:
> Hi Nicolas,
>
> I get it know. Can you confirm the folling:
>
> 1) $string =~ m/\w/ will match any european accented character
> including the german sz (also called scharfes S) if $string
> has the UTF8 flag on.
>
> 2) \xE9 actually means U+00E9. What I mean by this is that
> \x{...} refers to unicode point notation and not to UTF-8.
>
> 3) no matter how $string is encoded, binmode STDOUT, ":utf8"
> will force print "..." to always output in UTF-8. There will
> be no double encoding.
>
> Also this triggers two new questions.
>
> a) Is there an efficient way to say to perl, "please downgrade
> this string to latin1 if possible otherwise leave it in UTF-8" ?
>
> b) What happens in the case of $s1 =~ m/$s2/ if $s2 has the
> UTF8 flag on and $s1 hasn't ? Does this work like excepted ?
>
> Thomas.
More information about the london.pm
mailing list