matt.lawrence at virgin.net
Wed Mar 1 13:59:53 GMT 2006
Jonathan Peterson wrote:
>>>Our server has no modules so I would have to do it like this:
>>>/<a [^<]*href=["|\']?([^ "\']*)["|\']?[^>].*>([^<]*)</a>/i
>>Ooh! Ooh! Can I be the first to go "Don't use a regex, use an actual
>>parser as indicated in the FAQ"? Huh?? Can I??
> I notice perldoc.com is down. But the FAQ is here too:
> What the faq doesn't say is that if you have a good knowledge of, and
> perhaps even control over, the data you are dealing with, regex solutions
> are often acceptable.
> Looking at your regex above, it might be that you are unaware of
> 'non-greedy quantifiers'. These are very useful (especially in your
> situation) and can often remove the need for complicated negated character
> classes and such. Here's a little program that I think does what you want:
> # warning flag and use strict deliberately ommitted
> # to wind people up
> my $str = qq! This is an <a href="http://www.foo.com/bar.html"> elephant
> </a> I
> $str =~ s!<a .*?>(.*?)</a>!$1!i;
> print $str;
> There are many kinds of HTML that will not be correctly modified by this
> simple regex. You'll have to try it and see if it's good enough.
It'll catch more is you give it the s flag.
More information about the london.pm