Regexp capture group list
Paul LeoNerd Evans
leonerd at leonerd.org.uk
Tue Nov 10 13:11:03 GMT 2009
I'm writing an attempt at a simple recursive-descent parser with no
backtracking or alternation, for parsing a really simple grammar.
My usual method is to write a collection of functions that eat a prefix
from the string they're passed as $_[0] (mutably so), and return any
interesting data. A basic primative to start with is something like:
sub parse
{
my ( $text, $re ) = @_;
$_[0] =~ s/^$re// or die "Expected $re in $text...\n";
}
sub parse_idspec
{
parse $_[0], qr/ID\s+(\d+)/ and return $1;
}
I was rather annoyed to find that the regexp capture buffers $1, $2,
etc... are in fact dynamically scoped. This means that $1 can't escape
from parse(). It behaves as if 'local $1' was present in parse(); $1 in
parse_idspec() contains whatever it used to.
After some headscratching I decided instead to have parse() return a
list of the capture groups. I so far haven't found a neater expression
than
sub parse
{
my ( $text, $re ) = @_;
$_[0] =~ s/^$re// or die "Expected $re in $text...\n";
return map { substr $text, $-[$_], $+[$_]-$-[$_] } 1 .. $#+
}
This seems a common-enough idiom that perhaps there's a neater solution
- I find there's no @{^MATCHGROUPS} or similar present in perl...
Can anyone offer any neater suggestions?
--
Paul "LeoNerd" Evans
leonerd at leonerd.org.uk
ICQ# 4135350 | Registered Linux# 179460
http://www.leonerd.org.uk/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
Url : http://london.pm.org/pipermail/london.pm/attachments/20091110/2a489ec6/attachment.pgp
More information about the london.pm
mailing list