Character set sucktitude
Aaron Crane
perl at aaroncrane.co.uk
Tue May 22 01:00:13 BST 2007
Randy J. Ray writes:
> David Cantrell writes:
> > I have some data that is unfortunately in "Western (Mac OS Roman)",
> > whatever the fuck that is. I need to turn it into ISO-8859-1.
>
> http://search.cpan.org/~dankogai/Encode-2.21/
>
> I don't know for certain that it covers "Western (Mac OS Roman)", but I
> would be surprised if it didn't.
It does; the name is 'MacRoman'.
$ perl -MEncode -le'print for grep { /mac.*rom/i } Encode->encodings(":all")'
MacCentralEurRoman
MacRoman
MacRomanian
And one of Encode's allowable names for ISO-8859-1 is "latin-1".
Encode should work -- subject to the characters in your MacRoman data
actually being present in Latin-1, that is. By my reckoning, there are 48
MacRoman characters that might cause you problems; I can produce a list of
them on request. Encode's default in this situation is to use a question
mark as a substitution character. If you want something more clever, see
the "Handling Malformed Data" section of the Encode pod.
--
Aaron Crane
More information about the london.pm
mailing list