character set detection?
Dominic Mitchell
dom at happygiraffe.net
Sun Jan 7 11:15:42 GMT 2007
Dirk Koopman wrote:
> Is there a way of, reasonably reliably, determining what the character
> set of a lump of text is?
Not really, no. Like Jesse said, Encode::Guess might be a good start.
If you want to do what the browser does, the algorithm is described here:
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
There's a python implementation of it as well.
http://chardet.feedparser.org/
-Dom
More information about the london.pm
mailing list