wiki scraping

Nic Gibson nicg at noslogan.org
Thu Feb 28 14:46:55 GMT 2008


Afternoon all

I'm after a bit of advice (and rough plan knocking down) and it's sort of
perlish. Well, I plan to use perl to do it...

I've been asked to generate some pdf docs for one of our projects. Not too
hard. The problem is that the docs are currently in a trac wiki. I don't
have access to the database (assuming trac keeps the wiki in a db) or the
server (big internationals being what they are) so I'm going to have to grab
it in some sort of mirroring manner. Now, iirc, trac lets you append
'format=text' to the url and get the content so I plan to do it that way.

I'm planning to put together a little script using LWP::UserAgent and so on,
convert the wiki markup to xml, feed it through FOP and hand over a pdf.

Does that sound sane? Is there some little tool lurking somewhere that can
do any of this for me? Have I missed an obvious solution?

nic
-- 
Nic Gibson
Director, Corbas Consulting
Editorial and Technical Consultancy
http://www.corbas.co.uk/


More information about the london.pm mailing list