wiki scraping

Struan Donald lpm at
Fri Feb 29 12:05:55 GMT 2008

* at 28/02 16:26 +0000 Chris Benson said:
> On Thu, Feb 28, 2008 at 02:46:55PM +0000, Nic Gibson wrote:
> > Does that sound sane? Is there some little tool lurking somewhere that can
> > do any of this for me? Have I missed an obvious solution?
> If you don't mind them looking like web-pages-printed-out then what I
> use is the venerable htmldoc:

It is worth noting that:

    HTMLDOC supports most HTML 3.2 elements, some HTML 4.0 elements, and
    can generate title and table of contents pages. The 1.8.x releases
    do not support stylesheets.

so if you have anything new fangled in your HTML then the PDF isn't
likely to bear much resemblance to the original. 


More information about the mailing list