Parse-text-from-HTML CPAN module ?
scollyer at netspinner.co.uk
Fri Dec 9 11:10:32 GMT 2005
I have a search-related requirement to take some arbitrary HTML,
parse out the text and stem it/apply stop words and so on. Now,
I can cook something up myself with the usual set of modules, but
this sounds like such a common requirement that someone will
already have done it and packaged it up, in a nice reusable form.
Does anyone know if there's a nice, Pure Perl implementation of
this that I can pick up and use with no further brain-power required ?
(I'm wondering if there's something in the WWW::Mechanize area that
is suitable, as that seems to have grown a lot since I last looked).
More information about the london.pm