OCRing PDFs

David Cantrell david at cantrell.org.uk
Mon Jun 30 15:57:49 BST 2008


I have a very large PDF document.  It's something like 600 bitmaps, one
per page.  Most pages contain both text and diagrams, and the layout is
quite important.  Does anyone know of any software which will read the
text and build an index in the PDF file, so that it's easily searchable?

All I need is for it to be able to spot that the word "frobnitz" occurs
on pages 13, 200, 255 and 432, not to attempt to convert the file to
Wyrd or anything like that.

-- 
David Cantrell | Enforcer, South London Linguistic Massive

Just because it is possible to do this sort of thing
in the English language doesn't mean it should be done


More information about the london.pm mailing list