Should UTF-8 be a swear word ?

Paul Makepeace paulm at
Tue Aug 8 20:57:27 BST 2006


Take a look at the Encode module, in particular encode(), decode(),
and the notes on the CHECK parameter for your to Latin-1 question.
That ought to answer pretty much everything you ask here.

I also found this page pretty useful,

HTH - Paul

On 8/8/06, Thomas Busch <tbusch at> wrote:
> Hi Nicolas,
> I get it know. Can you confirm the folling:
> 1) $string =~ m/\w/ will match any european accented character
>    including the german sz (also called scharfes S) if $string
>    has the UTF8 flag on.
> 2) \xE9 actually means U+00E9. What I mean by this is that
>    \x{...} refers to unicode point notation and not to UTF-8.
> 3) no matter how $string is encoded, binmode STDOUT, ":utf8"
>    will force print "..." to always output in UTF-8. There will
>    be no double encoding.
> Also this triggers two new questions.
> a) Is there an efficient way to say to perl, "please downgrade
>    this string to latin1 if possible otherwise leave it in UTF-8" ?
> b) What happens in the case of $s1 =~ m/$s2/ if $s2 has the
>    UTF8 flag on and $s1 hasn't ? Does this work like excepted ?
> Thomas.

More information about the mailing list