Character encodings and databases
Mark Fowler
mark at twoshortplanks.com
Thu Jun 19 20:23:30 BST 2014
On Thu, Jun 19, 2014 at 10:58 AM, Andrew Hill <london.pm at welikegoats.com> wrote:
> 1. Data is not being stored in the database as UTF8 - not sure how to check
> when Perl is the only tool available to query it
There are many ways, but this is my standard simple test: Store
"\N{SNOWMAN}" in your database. If the database thinks it's three
characters long instead of one then the database is storing individual
bytes that are part of your UTF-8 encoding as characters.
Note: There is a reason I'm using an un-decomposable character like
\N{SNOWMAN} here instead of some nice string like Léon, because
technically there is a way "Léon" could be five characters long in
unicode (if it was stored in NFD for example.)
Mark.
More information about the london.pm
mailing list