Re: Reading math formulas in PDF files

You should be able to run ocropus from inside emacspeak without
too much trouble --- eventually I will check in an appropriate
script for use with emacspeak-ocr to use ocropus. At present
quality of output is highly variable from ocropus, but it looks promising.

>>>>> "Jason" == Jason White <jasonw@ariel.its.unimelb.edu.au> writes:
    Jason> On Wed, Apr 18, 2007 at 10:15:02AM +0200, Lukas
    Jason> Loehrer wrote:
    >> At least for some pdf files, google does an excellent job
    >> at preserving math formulas in their "View as HTML" view.
    Jason> Interesting. There are also PDF files that contain
    Jason> only scanned images of text. To read these, you need
    Jason> OCR software, and it now appears that quality, free as
    Jason> in freedom, OCR solutions are coming down the
    Jason> pipeline:
    Jason> http://code.google.com/p/ocropus/
    Jason> and it shouldn't be difficult for the Emacs Lisp
    Jason> enthusiasts on the mailing list to write a function
    Jason> that will run OCR Opus on a set of image files, or
    Jason> even scan a page, and then read the output into an
    Jason> Emacs buffer. Ideally this would be an Emacs mode that
    Jason> lets you set scanning parameters.
    Jason> The OCR software itself isn't expected to be ready for
    Jason> release until late next year, but I'm sure members of
    Jason> this list will be helping with the beta testing along
    Jason> the way. XPDF can extract image files from PDF
    Jason> documents, which could then be converted to whatever
    Jason> format the OCR software accepts.
