On Mon, Jun 30, 2008 at 04:24:39PM +0100, Roger Burton West wrote: >Extracting the text one page at a time is easy: pdftotext. Or do you mean that the text is embedded into the bitmap rather than being stored as text within the PDF? In which case pdfimages and, I guess, gocr or similar. R