These aren't the characters you're looking for...
Andy Wardley
abw at wardley.org
Tue Aug 19 10:46:45 BST 2008
I mistakenly wrote this the other day:
[\s^\n]
What I wanted was to match a whitespace character that wasn't a newline.
Of course, it doesn't work. The '^' must be at the start for it to work as a
character class negatorificator. And you can't mix "inny" classes with "outy"
classes. That's just not allowed.
Of course, I could just write this:
[ \t]
But that doesn't include the Unicode whitespace characters which \s would
normally match. So I ended up writing this:
[ \t\x{85}\x{2028}\x{2029}]
First question: is it safe to match a regex containing Unicode code points
against a non-unicode string? I'm sure it is, and it seems to work OK, but my
subconscious woke me up at 3am this morning to remind me to check. My Camel
is a little old (3rd ed - 5.6.0) and talks of problems in Unicode processing
that "will probably be fixed by the time you read this". Can I tell my
subconscious to stop worrying and go back to snuggle-bunny land?
Second: am I missing something obvious? Is there a better way to do it?
A
More information about the london.pm
mailing list