Character encodings and databases

Mark Fowler mark at twoshortplanks.com
Thu Jun 19 20:23:30 BST 2014


On Thu, Jun 19, 2014 at 10:58 AM, Andrew Hill <london.pm at welikegoats.com> wrote:

> 1. Data is not being stored in the database as UTF8 - not sure how to check
> when Perl is the only tool available to query it

There are many ways, but this is my standard simple test: Store
"\N{SNOWMAN}" in your database.  If the database thinks it's three
characters long instead of one then the database is storing individual
bytes that are part of your UTF-8 encoding as characters.

Note: There is a reason I'm using an un-decomposable character like
\N{SNOWMAN} here instead of some nice string like Léon, because
technically there is a way "Léon" could be five characters long in
unicode (if it was stored in NFD for example.)

Mark.



More information about the london.pm mailing list