Encode::Mangled?
Richard Huxton
dev at archonet.com
Fri May 29 11:30:42 BST 2009
Robin Berjon wrote:
> If the page claims to be in ISO-8859-15 then the chances are that
> whoever is sending it to you know what they're doing, and you can just
> use the real thing.
>
> Or am I missing something?
That's exactly it. The pages in question are claiming 8859-1, but
they're not (well, not wholly). Presumably someone pastes content in
from a MS-Word document and it contains bullet-points with invalid
code-points. Your web-browser copes fine, of course.
Now I could just convert from win-1252 every time the page claims
8859-1. That's not going to work for 8859-15 where you might have a Euro
char on the page that's in a different code-point in win-1252.
So - what I've got at the moment is an ugly* tr/// to map the 20 or so
chars in question. However, Dave H's suggestion looks like it might do
the trick in a more transparent way.
* It's not the tr/// that's the problem, it's the fact that you need
eight lines of documentation to explain it, and if I've got a typo
somewhere in the hex-codes I'll probably never notice, which means
writing test cases which means...
--
Richard Huxton
Archonet Ltd
More information about the london.pm
mailing list