Encode::Mangled?

Richard Huxton dev at archonet.com
Fri May 29 11:30:42 BST 2009


Robin Berjon wrote:
> If the page claims to be in ISO-8859-15 then the chances are that 
> whoever is sending it to you know what they're doing, and you can just 
> use the real thing.
> 
> Or am I missing something?

That's exactly it. The pages in question are claiming 8859-1, but 
they're not (well, not wholly). Presumably someone pastes content in 
from a MS-Word document and it contains bullet-points with invalid 
code-points. Your web-browser copes fine, of course.

Now I could just convert from win-1252 every time the page claims 
8859-1. That's not going to work for 8859-15 where you might have a Euro 
char on the page that's in a different code-point in win-1252.

So - what I've got at the moment is an ugly* tr/// to map the 20 or so 
chars in question. However, Dave H's suggestion looks like it might do 
the trick in a more transparent way.

* It's not the tr/// that's the problem, it's the fact that you need 
eight lines of documentation to explain it, and if I've got a typo 
somewhere in the hex-codes I'll probably never notice, which means 
writing test cases which means...

-- 
   Richard Huxton
   Archonet Ltd


More information about the london.pm mailing list