Re: [opensuse] graphical ocr for linux/opensuse - a report



Istvan Gabor skrev:
Hello:

This is only a report about an ocr (optical character recognition)
program for linux / opensuse. It has been a problem for me for a long
time to find a reliable, good working ocr program for linux that can
recognize Hungarian accented characters. Recently I found 'cuneiform'
ocr program, and a graphical frontend for it, called 'yagf'. These
two together work very well, and the usage is straightforward.
Cuneiform has several language modules, and reliably can recognize
"normal" and accented characters. It can use several image types
inlcuding jpg, tiff, png and bmp images. The recognition options are
easliy configurable in yagf. If html output is chosen it even can
make difference between smaller and larger fonts and can identify
section titles and bold face fonts. Of course it does not do it
without some errors, but it is acceptable. yagf can invoke xsane
directly and use the scanned image from it.

Both cuneiform and yagf are available in opensuse build service (obs)
repositories (cuneiform 0.9.0 and yagf 0.8.1).

There is or used to be a GPL-ed conversion app that can turn out .bmp files from .pdf documents.

Thought it might be worth mentioning, despite my not having a link handy.

BR,
Gudmund
--
This message and any replies to it is scanned by http://www.fra.se.
Please direct any complaints about this to them.
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx



Relevant Pages

  • Re: Obamas long form birth certificate a forgery?
    ... image of Obama's long form birth certificate are true (i.e. it is made ... in different layers. ... Alas, OCR technology still isn't perfect, and mistakes often occur ... automatically sharpens the recognized characters and apply a white ...
    (rec.heraldry)
  • Re: [PHP] Going from simple to super CAPTCHA
    ... icons are merged. ... CAPTCHA image, as any good Turing robot or OCR software could detect ... displayed with four slightly skewed characters on a random background ... OCR is extremely fast. ...
    (php.general)
  • Re: can i split a pdf file?
    ... but not for OCR postprocessing. ... scanning the characters is only one part. ... produce an improved version of the source PDF file. ...
    (freebsd-questions)
  • Re: Which OCR package for text scanning is the best?
    ... Different people define OCR accuracy in different ways. ... scanner working on something they found in a muddy puddle. ... The cutting is 257 words and has 1591 characters. ...
    (comp.periphs.scanners)