These aren't the characters you're looking for...
Andy Wardley
abw at wardley.org
Tue Aug 19 12:24:17 BST 2008
Abigail wrote:
> You are also missing quite a number of characters that would match \s, but
> aren't included in [ \t\x{85}\x{2028}\x{2029}]. \s matches 25 characters,
> including \r, and \cL. NEXT LINE (\x{85} and NO-BREAK SPACE (\x{A0}) only
> match with Unicode semantics.
I thought that might be the case. I was working from my old Camel book which
claimed only those three, but then found reference to a whole class of
whitespace-ish things in the 5.10 unicode/perlre docs. Thanks for the
clarification.
> You might want to use:
>
> (?!\n)[\h\v]
In this case, I specifically want to exclude the vertical tab (not that I'm
ever likely to come across it).
What does \h match?
> Alternatively, you can use:
>
> [^\S\n]
>
> but that suffers from the problem points \x{85} and \x{A0}.
I think that's the simplest solution that's Good Enough. If I add in
\r and \x{85} (which I want to exclude) then NO-BREAK-SPACE is the only thing
it won't accept but should. I think I can live with that.
[^\S\n\r\x{85}]
Thanks everyone.
A
More information about the london.pm
mailing list