Character encodings and databases

Mark Fowler mark at
Thu Jun 19 20:23:30 BST 2014

On Thu, Jun 19, 2014 at 10:58 AM, Andrew Hill < at> wrote:

> 1. Data is not being stored in the database as UTF8 - not sure how to check
> when Perl is the only tool available to query it

There are many ways, but this is my standard simple test: Store
"\N{SNOWMAN}" in your database.  If the database thinks it's three
characters long instead of one then the database is storing individual
bytes that are part of your UTF-8 encoding as characters.

Note: There is a reason I'm using an un-decomposable character like
\N{SNOWMAN} here instead of some nice string like Léon, because
technically there is a way "Léon" could be five characters long in
unicode (if it was stored in NFD for example.)


More information about the mailing list