Language recognition
Peter Hickman
peter.hickman at semantico.com
Mon Oct 8 17:14:18 BST 2007
Looking at the public twitter feeds I note that although they are in
UTF8 they do not indicate the language that they are in. I realise that
this would be somewhat difficult. But just how difficult?
Given the utf8 entities (is that the correct term) is there an easy way
to tell which language it might be from, or at least which script?
I'm sure something could be hacked up but rather than some adhoc rules
it would appear that this could be revered from the Unicode.
Any pointers?
--
Peter Hickman.
Semantico, Lees House, 21-23 Dyke Road, Brighton BN1 3FE
t: 01273 358223
f: 01273 723232
e: peter.hickman at semantico.com
w: www.semantico.com
More information about the london.pm
mailing list