[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading math formulas in PDF files

You should be able to run ocropus from inside emacspeak without
too much trouble --- eventually I will check in an appropriate
script for use with emacspeak-ocr to use ocropus. At present
quality of output is highly variable from ocropus, but it looks promising.

>>>>> "Jason" == Jason White <jasonw@ariel.its.unimelb.edu.au> writes:
    Jason> On Wed, Apr 18, 2007 at 10:15:02AM +0200, Lukas
    Jason> Loehrer wrote:
    >> At least for some pdf files, google does an excellent job
    >> at preserving math formulas in their "View as HTML" view.
    Jason> Interesting. There are also PDF files that contain
    Jason> only scanned images of text. To read these, you need
    Jason> OCR software, and it now appears that quality, free as
    Jason> in freedom, OCR solutions are coming down the
    Jason> pipeline:
    Jason> http://code.google.com/p/ocropus/
    Jason> and it shouldn't be difficult for the Emacs Lisp
    Jason> enthusiasts on the mailing list to write a function
    Jason> that will run OCR Opus on a set of image files, or
    Jason> even scan a page, and then read the output into an
    Jason> Emacs buffer. Ideally this would be an Emacs mode that
    Jason> lets you set scanning parameters.
    Jason> The OCR software itself isn't expected to be ready for
    Jason> release until late next year, but I'm sure members of
    Jason> this list will be helping with the beta testing along
    Jason> the way. XPDF can extract image files from PDF
    Jason> documents, which could then be converted to whatever
    Jason> format the OCR software accepts.
    Jason> -----------------------------------------------------------------------------
    Jason> To unsubscribe from the emacspeak list or change your
    Jason> address on the emacspeak list send mail to
    Jason> "emacspeak-request@cs.vassar.edu" with a subject of
    Jason> "unsubscribe" or "help"

Best Regards,

Email:  raman@users.sf.net
WWW:    http://emacspeak.sf.net/raman/
AIM:    emacspeak       GTalk: tv.raman.tv@gmail.com
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
Google: tv+raman 
IRC:    irc://irc.freenode.net/#emacs

To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
subject of "unsubscribe" or "help"

If you have questions about this archive or had problems using it, please send mail to:

priestdo@cs.vassar.edu No Soliciting!

Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998

Emacspeak Files | Emacspeak Blog