Advice on HTML editting
Peter Corlett
abuse at cabal.org.uk
Thu Apr 29 11:31:25 BST 2010
On 29 Apr 2010, at 11:14, Victoria Conlan wrote:
[...]
> I have <bunch of rather complicated web pages>.
> I want to
>
> loop through them
> pick out all the image tags
> change the path of some (particular ones, not arbitrary!)
> save the new html (possibly with a backup, but not essential)
> (then some other stuff)
For this sort of thing, HTML::Parser should suffice. By default it just prints the HTML it's just parsed, so it's a case of writing a suitable handler to look for the appropriate IMG tags and then mutate them and print them.
By way of example of such a filter, below is one of my old throwaway scripts that I used to add a comment to closing </div> tags with the the attributes of the corresponding <div> tag to have some chance of understanding the excessive div-itis that a web designer has just handed me.
#!/usr/bin/env perl
use warnings;
use strict;
use HTML::Parser;
local $^I = '.bak';
my @divs;
sub start_h {
my($text, $tagname, $attr) = @_;
print $text;
return unless $tagname eq 'div';
push @divs, $attr;
}
sub end_h {
my($text, $tagname) = @_;
print $text;
return unless $tagname eq 'div';
my $attr = pop @divs;
return unless defined $attr and scalar keys %$attr;
print "[%# ";
print join " ", map { sprintf '%s="%s"', $_, $attr->{$_} } sort keys %$attr;
print " %]";
}
my $p = new HTML::Parser
default_h => [sub { print shift }, 'text'], # print by default
start_h => [\&start_h, 'text, tagname, attr'],
end_h => [\&end_h, 'text, tagname'],
comment_h => [ sub { printf '[%%# %s %%]', shift }, 'text' ],
;
while(my $line = <>) {
$p->parse($line);
}
$p->eof;
More information about the london.pm
mailing list