Matt Lawrence matt.lawrence at
Wed Mar 1 13:59:53 GMT 2006

Jonathan Peterson wrote:
>>>Our server has no modules so I would have to do it like this:
>>>/<a [^<]*href=["|\']?([^ "\']*)["|\']?[^>].*>([^<]*)</a>/i
>>Ooh! Ooh! Can I be the first to go "Don't use a regex, use an actual
>>parser as indicated in the FAQ"? Huh?? Can I??
> I notice is down. But the FAQ is here too:
> What the faq doesn't say is that if you have a good knowledge of, and 
> perhaps even control over, the data you are dealing with, regex solutions 
> are often acceptable.
> Looking at your regex above, it might be that you are unaware of 
> 'non-greedy quantifiers'. These are very useful (especially in your 
> situation) and can often remove the need for complicated negated character 
> classes and such. Here's a little program that I think does what you want:
> #!/usr/bin/perl
> # warning flag and use strict deliberately ommitted
> # to wind people up
> my $str = qq! This is an <a href=""> elephant 
> </a> I
> think.!;
> $str =~ s!<a .*?>(.*?)</a>!$1!i;
> print $str;
> There are many kinds of HTML that will not be correctly modified by this 
> simple regex. You'll have to try it and see if it's good enough.

It'll catch more is you give it the s flag.

s!<a .*?>(.*?)</a>!$1!is


More information about the mailing list