These aren't the characters you're looking for...
Abigail
abigail at abigail.be
Tue Aug 19 11:29:13 BST 2008
On Tue, Aug 19, 2008 at 11:17:01AM +0100, Robin Barker wrote:
>
> From: Andy Wardley
> > I mistakenly wrote this the other day:
> >
> > [\s^\n]
> >
> > What I wanted was to match a whitespace character that wasn't a newline.
> >
> > Of course, I could just write this:
> >
> > [ \t]
> >
> > But that doesn't include the Unicode whitespace characters which \s would
> > normally match. So I ended up writing this:
> >
> > [ \t\x{85}\x{2028}\x{2029}]
> >
> > Second: am I missing something obvious? Is there a better way to do it?
>
> You could use
> [[:blank:]]
> (see perlre), but my experience is that [:...:] does not behave as I expect with unicode (maybe my expectations are wrong).
[:blank:] is equivalent to [ \t].
And I wouldn't use any of the POSIX classes, as their behaviour depends on
whether Unicode semantics are in effect when doing the matching. Better
is to use Unicode properties; they will also match the same set of
characters.
>
> You could also do a negative look ahead
> (?!\n)\s
Or define your own property:
sub IsMySpace {<<'--'}
+utf8::IsSpace
-000A
--
/\p{IsMySpace}/
Abigail
More information about the london.pm
mailing list