Parse-text-from-HTML CPAN module ?

Ovid publiustemp-londonpm at yahoo.com
Fri Dec 9 17:33:07 GMT 2005


--- Stephen Collyer <scollyer at netspinner.co.uk> wrote:

<http://search.cpan.org/~ovid/HTML-TokeParser-Simple-3.15/lib/HTML/
> 
> > TokeParser/Simple/Token/Text.pm>
> >  
> Thanks. Still rather more low level than what I'd like ideally.
> Maybe I should stop looking and start coding - it may be quicker.

Agreed that it's lower level than what you want, but it does make
extracting text pretty quick:

  my $parser = HTML::TokeParser::Simple->new( file => $file );
  my $text   = '';
  while (my $token = $parser->get_token) {
      $text .= $token->as_is if $token->is_text;
  }

Cheers,
Ovid

-- 
If this message is a response to a question on a mailing list, please send
follow up questions to the list.

Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/


More information about the london.pm mailing list