Character encodings and databases
Andrew Hill
london.pm at welikegoats.com
Thu Jun 19 15:58:03 BST 2014
My code has an extremely annoying bug that I can't quite solve.
The concept is simple - read some text from a text file; update a database
table based on that text.
The text file is UTF8 and the database is Oracle 11g.
I am reading the file with a normal
open FILE, "<blah";
while(<FILE>) {
chomp;
$foo = $_;
}
Then I select the VARCHAR2 field from the table into $bar, do a straight
string comparison between $foo and $bar, and if they are different, I
update the table with the value of $foo and output a debugging line to say
that, for example, Z<splodge>rich has been updated to Zürich.
However, the next time I read Zürich from the file, I get exactly the same
behaviour, ie $bar is again Z<splodge>rich, therefore $foo ne $bar and it
updates the table again. I don't understand why $foo ne $bar, given I've
just set the field to $foo.
So, as I see it, these are the possible causes:
1. Data is not being stored in the database as UTF8 - not sure how to
check when Perl is the only tool available to query it
2. Conversion is occuring in the DBD driver
3. Something else because I've been staring at it for so long
FWIW, NLS_CHARACTERSET is AL32UTF8 and $ENV{NLS_LANG} is
AMERICAN_AMERICA.AL32UTF8
Cheers,
Andrew
More information about the london.pm
mailing list