Language recognition

Matt Lawrence matt.lawrence at
Mon Oct 8 17:44:54 BST 2007

Peter Hickman wrote:
> Looking at the public twitter feeds I note that although they are in
> UTF8 they do not indicate the language that they are in. I realise
> that this would be somewhat difficult. But just how difficult?
> Given the utf8 entities (is that the correct term) is there an easy
> way to tell which language it might be from, or at least which script?
> I'm sure something could be hacked up but rather than some adhoc rules
> it would appear that this could be revered from the Unicode.
> Any pointers?
See "Scripts" under man perlunicode


More information about the mailing list