UTF-8 + HTML::Template + CGI::Fast
Dirk Koopman
djk at tobit.co.uk
Fri Dec 4 14:48:38 GMT 2009
James Laver wrote:
> This is one of the fun things about character sets. There are three
> ways to determine character set:
>
<snip>
> 3. Checking if it looks like a given character set (very lossy). Eg.
> the is_utf8() function only checks if it *could* be utf-8. If you pass
> it ascii text, it'll pass. Subsets of some other character sets will
> also pass. There are no guarantees, just percentage chances. Not
> exactly the world's best fallback.
>
When I asked a related question on this list and then read the docs with
more educated eyes, I got the impression that the is_utf8 function
merely tells you that the string is in internal utf8 format - which has
nothing to do with what format the string came in as. It is very confusing.
Because I have mixed input coming into my app, and I can't reliably
(enough for me) tell what it is (could be any of the iso variants or
utf8), I don't bother with any of it and have removed all attempts to
decode it. I just treat it all as strings. As it is a message switch it
becomes SEP or a UAP.
Dirk
More information about the london.pm
mailing list