Parse-text-from-HTML CPAN module ?
Ovid
publiustemp-londonpm at yahoo.com
Fri Dec 9 17:33:07 GMT 2005
--- Stephen Collyer <scollyer at netspinner.co.uk> wrote:
<http://search.cpan.org/~ovid/HTML-TokeParser-Simple-3.15/lib/HTML/
>
> > TokeParser/Simple/Token/Text.pm>
> >
> Thanks. Still rather more low level than what I'd like ideally.
> Maybe I should stop looking and start coding - it may be quicker.
Agreed that it's lower level than what you want, but it does make
extracting text pretty quick:
my $parser = HTML::TokeParser::Simple->new( file => $file );
my $text = '';
while (my $token = $parser->get_token) {
$text .= $token->as_is if $token->is_text;
}
Cheers,
Ovid
--
If this message is a response to a question on a mailing list, please send
follow up questions to the list.
Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/
More information about the london.pm
mailing list