regex

Wed Mar 1 01:25:09 GMT 2006

On 01/03/06, David H. Adler <dha at panix.com> wrote:
> On Tue, Feb 28, 2006 at 06:50:36PM -0500, Thomas Castonzo wrote:
> > Hello All:
> >
> > I'm trying to write a regular expression that will strip <a href> </
> > a> tags from documents
> > but want to leave the linked text alone so the document still makes
> > sense.
> > Our server has no modules so I would have to do it like this:
> >
> > /<a [^<]*href=["|\']?([^ "\']*)["|\']?[^>].*>([^<]*)</a>/i
>
> Ooh! Ooh! Can I be the first to go "Don't use a regex, use an actual
> parser as indicated in the FAQ"? Huh?? Can I??
>

Indeed, you are the can! (and are).  Of course, the objection is
already raised in the OP that the server has no modules (is that even
possible?)  Some appropriate responses would be:

  - persuade your sysadmin to install the modules
  - get a better server/sysadmin
  - install the modules yourself

Most modules can be installed as a non-privileged user.  Try reading
`perldoc perlmodinstall`, which has some notes on the arguments to
pass the Makefile.PL.  Of course getting support from the sysadmin is
a rather good idea: HTML parsing isn't a particularly whacky or
controversial idea.