[OT] Encode woes
djk at tobit.co.uk
Fri Sep 25 08:54:19 BST 2009
Dirk Koopman wrote:
> It appears that, with the increasing prevalence of 5.10, the usage of
> utf8 or not is getting more picky.
> I have a well established, networked, app that has upwards of 250 nodes
> and about 4000 users at one time (on certain weekends double that) all
> over the world. These users are running mainly windows based clients
> (which may include quite a lot of windows telnet). The nominal character
> set is ascii, as interpreted by the client's host operating system.
> To date, I have managed to avoid the tribulations of Encode and utf8 et
> al. But I am now get occasional errors, on 5.10 perl, of the ilk:-
> Wide character in null operation at /spider/perl/DXDupe.pm line 47.
> at /spider/perl/DXDupe.pm line 47
> DXDupe::find('X14163|UA0KEF|RZ6HV|������� �������') called at
> /spider/perl/Spot.pm line 420
> And also something similar on print or syswrite.
> Studying the data, what I am receiving is a mixture of utf8 and
> iso-8859-*, the reason for this being that older perls happily take what
> they are given and just pass it along. Some clients are emitting utf8
> and other iso-8859 and yet others (running Win95/8) some kind of
> codepage. In addition, there are older, usually windows based, packages
> acting as nodes, together with yet more clients that are also adding
> data to this network in who knows what character set.
> Up until recently, this has not been a problem because the important
> stuff is in 7 bit ascii and the remarks section (the usual source of
> problems), if it is unreadable, doesn't matter 'cos you can't translate
> it anyway.
> Now, is there a reasonably reliable way of determining what we have, on
> a string by string basis, to at least tell whether we are dealing with
> utf8 or iso-8859 (not caring which variant) so that I can drive Encode
> appropriately to avoid crashes of the above type. Or how do I
> completely switch off utf8 encoding/decoding - everywhere - in an 80,000
> line perl app.
As no-one seems interested in this, or may be no-one else has had these
problems themselves, can anyone suggest a better mailing list to poll?
More information about the london.pm