Present web page as image file
Tom Insam
tom at jerakeen.org
Wed Aug 16 13:20:25 BST 2006
Jacqui Caren wrote:
> I need to machine generate thumbnails of a heap of web pages
> (over 100,000).
>
> Anyone here done this before?
I do it in an automated fashion, but it's a _pain_.
http://jerakeen.org/svn/tomi/Config/bin/web2png
http://jerakeen.org/svn/tomi/Config/bin/web2png_raw
takes an url, produces some png files. The way it works is, it starts a
headless X server, starts an app using GtkMozEmbed inside it, opens the
url, then screenshots the server. Extremely horrible. Will translate
into perl fairly easily, if you care about that sort of thing. It won't
thumbnail sites with invalid SSL certs, because you can't script a way
around the mozilla 'bad cert!' dialog.
What I _suggest_ you do it use the Alexa service (new!)
http://www.amazon.com/b/ref=sc_fe_l_2/102-4858571-5842500?ie=UTF8&node=236156011&no=15879911&me=A36L942TSJ2AJA
it's not very good, because you can't store the generated thumbs
locally, but mostly because it will only thumbnail the root page of a
give site, not deep pages (I challenge you to find the bit of the docs
that _tell_ you this, though) but it's fast to use and doesn't require
insane X server pain.
There are other solutions, using webkit, for instance (requiring a mac)
or IE magic (requiring windows). For a one-off effort, they might be better.
My links on this topic are at http://del.icio.us/jerakeen/thumbnail
--
Tom Insam
tom at jerakeen.org
More information about the london.pm
mailing list