Unicode/charname.pm and regular expressions

Peter Corlett abuse at cabal.org.uk
Sun Mar 27 15:50:01 BST 2011


Hi,

I've found this rather odd interaction between charnames.pm and regular expressions. I discovered this when I wanted to build up a complex regex incrementally. I'm running Debian vendor Perl, i.e. 5.10.1.

This code:

perl -Mcharnames=:full -e 'my $foo = qr/\N{EM DASH}/; my $bar = qr/$foo$foo/; "whatever" =~ $bar'

keels over with 'Constant(\N{EM DASH}) unknown: (possibly a missing "use charnames ...") in regex' when the match is attempted. I am of course using charnames.

A quick play with Data::Dumper tells me that $foo contains qr/(?-xism:\N{EM DASH})/ - i.e. the \N{...} conversion hasn't taken place. I can't dump $bar as I get the Constant unknown error.

The variant:

perl -Mcharnames=:full -e 'my $foo = qq/\N{EM DASH}/; my $bar = qr/$foo$foo/; "whatever" =~ $bar'

works. The difference here is that $foo is set to a literal em-dash character so the \N{...} conversion *has* taken place. This is interpolated into the regex and works as expected.

I don't think it's unreasonable for me to expect the first version to work. Have I tripped over an actual bug in Perl, or is there something I misunderstand about Perl regexes and Unicode?





More information about the london.pm mailing list