OCRing PDFs
Roger Burton West
roger at firedrake.org
Mon Jun 30 16:24:39 BST 2008
On Mon, Jun 30, 2008 at 03:57:49PM +0100, David Cantrell wrote:
>I have a very large PDF document. It's something like 600 bitmaps, one
>per page. Most pages contain both text and diagrams, and the layout is
>quite important. Does anyone know of any software which will read the
>text and build an index in the PDF file, so that it's easily searchable?
Extracting the text one page at a time is easy: pdftotext.
Building the index sounds like a job for Perl.
Appending a new page or two to a PDF can be done by PDF::API2.
R
More information about the london.pm
mailing list