wiki scraping
Struan Donald
lpm at exo.org.uk
Fri Feb 29 12:05:55 GMT 2008
* at 28/02 16:26 +0000 Chris Benson said:
> On Thu, Feb 28, 2008 at 02:46:55PM +0000, Nic Gibson wrote:
> > Does that sound sane? Is there some little tool lurking somewhere that can
> > do any of this for me? Have I missed an obvious solution?
>
> If you don't mind them looking like web-pages-printed-out then what I
> use is the venerable htmldoc: http://www.easysw.com/htmldoc/
It is worth noting that:
HTMLDOC supports most HTML 3.2 elements, some HTML 4.0 elements, and
can generate title and table of contents pages. The 1.8.x releases
do not support stylesheets.
so if you have anything new fangled in your HTML then the PDF isn't
likely to bear much resemblance to the original.
s
More information about the london.pm
mailing list