Regexp capture group list

Paul LeoNerd Evans leonerd at leonerd.org.uk
Tue Nov 10 13:11:03 GMT 2009


I'm writing an attempt at a simple recursive-descent parser with no
backtracking or alternation, for parsing a really simple grammar.

My usual method is to write a collection of functions that eat a prefix
from the string they're passed as $_[0] (mutably so), and return any
interesting data. A basic primative to start with is something like:


 sub parse
 {
    my ( $text, $re ) = @_;
    $_[0] =~ s/^$re// or die "Expected $re in $text...\n";
 }

 sub parse_idspec
 {
    parse $_[0], qr/ID\s+(\d+)/ and return $1;
 }


I was rather annoyed to find that the regexp capture buffers $1, $2,
etc... are in fact dynamically scoped. This means that $1 can't escape
from parse(). It behaves as if 'local $1' was present in parse(); $1 in
parse_idspec() contains whatever it used to.

After some headscratching I decided instead to have parse() return a
list of the capture groups. I so far haven't found a neater expression
than


 sub parse
 {
    my ( $text, $re ) = @_;
    $_[0] =~ s/^$re// or die "Expected $re in $text...\n";

    return map { substr $text, $-[$_], $+[$_]-$-[$_] } 1 .. $#+
 }


This seems a common-enough idiom that perhaps there's a neater solution
- I find there's no @{^MATCHGROUPS} or similar present in perl...

Can anyone offer any neater suggestions?

-- 
Paul "LeoNerd" Evans

leonerd at leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
Url : http://london.pm.org/pipermail/london.pm/attachments/20091110/2a489ec6/attachment.pgp


More information about the london.pm mailing list