Regex help

Simon Wistow simon at
Fri Apr 20 16:29:05 BST 2012

My Regex-fu has failed me ... I don't do much heavy duty data munging 
any more and apparently the skills of The Old Ways[tm] have forsaken me. 
Or at least - I'm now lazy. After spending 5 minutes trying to figure it 
out I just mailed the list.

Plus my copy of Mr Friedl's book is currently packed up in a box 
awaiting lugging to my new place.

Anyway, the problem I have is that I'm trying to parse a line of 
DCPU16[*] assembler. It's possible the issue lies in trying to parse 
using regexps but that's a debate for another day. Anyway this regex

    my ($label, $op, $a, $b) = $line =~ m!
        (?::(\w+)      \s+)? # optional label
        ([A-Za-z]+)    \s+   # opcode
        ([^,\s]+) (?:, \s+   # operand
        ([^,\s]+))?    \s*   # optional second opcode

currently parses lines such as

  :label SET A, 2
         JSR label 
         SET [0x1000+I], [PC]

just fine. However I'm trying to add a new opcode DAT which can take any 
number of operands

   DAT 0x170, "Hello ", 0x2e1 (, ....)

and it fails there. 

Running this

my ($label, $op, @operands) = $line =~ m!
    (?::(\w+)      \s+)? # optional label
    ([A-Za-z]+)    \s+   # opcode
    ([^,\s]+) (?:, \s+   # operand
    ([^,\s]+))*    \s*   # optional second opcode


    FOO A, B, C

results in @operands being ('A', 'C');

Sp, rather than attempting to actually work this out myself, I'm asking 
the lazyweb.



[*] Notch (Of Minecraft fame)'s virtual CPU for his new game. I've been 
noodling with a Assembler/Disassembler/Emulator

