wiki scraping

IvorW combobulus at
Thu Feb 28 15:42:10 GMT 2008

Nic Gibson wrote:
> Afternoon all
> I'm after a bit of advice (and rough plan knocking down) and it's sort of
> perlish. Well, I plan to use perl to do it...
> I've been asked to generate some pdf docs for one of our projects. Not too
> hard. The problem is that the docs are currently in a trac wiki. I don't
> have access to the database (assuming trac keeps the wiki in a db) or the
> server (big internationals being what they are) so I'm going to have to grab
> it in some sort of mirroring manner. Now, iirc, trac lets you append
> 'format=text' to the url and get the content so I plan to do it that way.
> I'm planning to put together a little script using LWP::UserAgent and so on,
> convert the wiki markup to xml, feed it through FOP and hand over a pdf.
> Does that sound sane? Is there some little tool lurking somewhere that can
> do any of this for me? Have I missed an obvious solution?
Reasonably sane. If there are any feeds available, such as RDF, RSS or
Atom, this may help assist you getting to raw data and ignoring the
formatting. Still, format=text may give you this.

See in particular
the og_mirror script that  comes with it, for where I used this for


More information about the mailing list