Should UTF-8 be a swear word ?

Paul Makepeace paulm at paulm.com
Tue Aug 8 20:57:27 BST 2006


Thomas,

Take a look at the Encode module, in particular encode(), decode(),
and the notes on the CHECK parameter for your to Latin-1 question.
That ought to answer pretty much everything you ask here.

I also found this page pretty useful,
http://www.ahinea.com/en/tech/perl-unicode-struggle.html

HTH - Paul

On 8/8/06, Thomas Busch <tbusch at cpan.org> wrote:
> Hi Nicolas,
>
> I get it know. Can you confirm the folling:
>
> 1) $string =~ m/\w/ will match any european accented character
>    including the german sz (also called scharfes S) if $string
>    has the UTF8 flag on.
>
> 2) \xE9 actually means U+00E9. What I mean by this is that
>    \x{...} refers to unicode point notation and not to UTF-8.
>
> 3) no matter how $string is encoded, binmode STDOUT, ":utf8"
>    will force print "..." to always output in UTF-8. There will
>    be no double encoding.
>
> Also this triggers two new questions.
>
> a) Is there an efficient way to say to perl, "please downgrade
>    this string to latin1 if possible otherwise leave it in UTF-8" ?
>
> b) What happens in the case of $s1 =~ m/$s2/ if $s2 has the
>    UTF8 flag on and $s1 hasn't ? Does this work like excepted ?
>
> Thomas.


More information about the london.pm mailing list