Present web page as image file

Tom Insam tom at jerakeen.org
Wed Aug 16 13:20:25 BST 2006


Jacqui Caren wrote:
> I need to machine generate thumbnails of a heap of web pages
> (over 100,000).
> 
> Anyone here done this before?

I do it in an automated fashion, but it's a _pain_.

http://jerakeen.org/svn/tomi/Config/bin/web2png

http://jerakeen.org/svn/tomi/Config/bin/web2png_raw

takes an url, produces some png files. The way it works is, it starts a 
headless X server, starts an app using GtkMozEmbed inside it, opens the 
url, then screenshots the server. Extremely horrible. Will translate 
into perl fairly easily, if you care about that sort of thing. It won't 
thumbnail sites with invalid SSL certs, because you can't script a way 
around the mozilla 'bad cert!' dialog.

What I _suggest_ you do it use the Alexa service (new!)

http://www.amazon.com/b/ref=sc_fe_l_2/102-4858571-5842500?ie=UTF8&node=236156011&no=15879911&me=A36L942TSJ2AJA

it's not very good, because you can't store the generated thumbs 
locally, but mostly because it will only thumbnail the root page of a 
give site, not deep pages (I challenge you to find the bit of the docs 
that _tell_ you this, though) but it's fast to use and doesn't require 
insane X server pain.

There are other solutions, using webkit, for instance (requiring a mac) 
or IE magic (requiring windows). For a one-off effort, they might be better.

My links on this topic are at http://del.icio.us/jerakeen/thumbnail

-- 
Tom Insam
tom at jerakeen.org


More information about the london.pm mailing list