OCRing PDFs

Roger Burton West roger at firedrake.org
Mon Jun 30 16:39:53 BST 2008


On Mon, Jun 30, 2008 at 04:24:39PM +0100, Roger Burton West wrote:

>Extracting the text one page at a time is easy: pdftotext.

Or do you mean that the text is embedded into the bitmap rather than
being stored as text within the PDF? In which case pdfimages and, I
guess, gocr or similar.

R


More information about the london.pm mailing list