duduping mangled UTF8 strings

Dirk Koopman djk at tobit.co.uk
Mon Jun 18 10:11:31 BST 2007

Consider all these strings. They are all the same, but have been mangled 
by various pieces of software (that don't understand utf8). The original 
is obviously the last one (shame it didn't arrive first, but that is 
part of the problem).

Radio H�licopt�re combats
Radio Hilicopthre combats
Radio Hélicoptère combats

I would like to deduplicate them. Any version of one of these strings 
can come in in any order. Any suggestions?

