Character encodings and databases

Sam Kington sam at illuminated.co.uk
Fri Jun 20 16:29:18 BST 2014


Cutting and pasting to try and get some clarity. As a reminder, this is what Mac OS says about the problematic character in question:
ü
LATIN SMALL LETTER U WITH DIAERESIS
Unicode: U+00FC, UTF-8: C3 BC

On 20 Jun 2014, at 09:37, Andrew Hill <london.pm at welikegoats.com> wrote:

> my $zurich = "Zürich";
> print HexDump $zurich;
> 00000000  5A C3 BC 72 69 63 68                               Z..rich

So your Perl string is encoded in UTF8, which is almost certainly your bug.

> $sth = $dbh->prepare("select bar from foo");
> print HexDump $foo;
> 00000000  5A FC 72 69 63 68                                  Z.rich

Reading from the database returns a proper Unicode string, so this is correct.

> SQL> select dump(bar) from foo;
> 
> DUMP(BAR)
> --------------------------------------------------------------------------------
> Typ=1 Len=7: 90,195,188,114,105,99,104

In the table it's stored in UTF8 format, which again might be an error, depending on whether the dump function is Unicode-aware or not.

Sam
-- 
Website: http://www.illuminated.co.uk/




More information about the london.pm mailing list