Ben Evans ben at bpfh.net
Fri May 29 11:21:26 BST 2009

Richard Huxton wrote:
> I'm dealing with data from a web-page that claims to be ISO-8859-1 but 
> actually has some Win-1252 embedded in it. I can convert it to UTF-8 
> and all seems well, however the characters need mapping. It's 
> straightforward enough to handle the dozen or so chars I know about 
> but I can't believe there isn't something on cpan for this.
> Now the *correct* solution is to track down the people responsible for 
> this travesty and beat them with sticks. Failing that, are people just 
> rolling their own three-line function each time?


I've heard the standard management argument that "it'll take longer to 
fix it upstream and cost more than working around it, and anyay the 
broken data source will be going away real soon now..." more times than 
I care to think about.

Not only has it never been correct, it has never been within 1 order of 
magnitude of being correct. Sadly, the bleed and wastage that these 
types of idiocies incur is not something which is easily separately 
tracked - it just falls into the noise of "general development entropy".

So push back hard, and get the damn thing fixed upstream, where it 
should be done. If the managers ultimately refuse then use Dave's 
solution and just aggressively trim errant crap out of the feed - and 
include clear documentation as comments in your code as to what you're 
doing and why - that way if people whinge you (or the next guy) know 
where to point them.


More information about the london.pm mailing list