ben at bpfh.net
Fri May 29 11:21:26 BST 2009
Richard Huxton wrote:
> I'm dealing with data from a web-page that claims to be ISO-8859-1 but
> actually has some Win-1252 embedded in it. I can convert it to UTF-8
> and all seems well, however the characters need mapping. It's
> straightforward enough to handle the dozen or so chars I know about
> but I can't believe there isn't something on cpan for this.
> Now the *correct* solution is to track down the people responsible for
> this travesty and beat them with sticks. Failing that, are people just
> rolling their own three-line function each time?
I've heard the standard management argument that "it'll take longer to
fix it upstream and cost more than working around it, and anyay the
broken data source will be going away real soon now..." more times than
I care to think about.
Not only has it never been correct, it has never been within 1 order of
magnitude of being correct. Sadly, the bleed and wastage that these
types of idiocies incur is not something which is easily separately
tracked - it just falls into the noise of "general development entropy".
So push back hard, and get the damn thing fixed upstream, where it
should be done. If the managers ultimately refuse then use Dave's
solution and just aggressively trim errant crap out of the feed - and
include clear documentation as comments in your code as to what you're
doing and why - that way if people whinge you (or the next guy) know
where to point them.
More information about the london.pm