Re: Text in images to a text file
- From: Dances With Crows <danSPANceswitTRAPhcrows@xxxxxxxxx>
- Date: Thu, 29 Jun 2006 15:59:57 -0500
On 29 Jun 2006 13:21:31 -0700, bobblebob@xxxxxxxxx staggered into the
Black Sun and said:
Does anyone know of any software for Linux which converts text in
images to a text file?
What you're looking for is called "Optical Character Recognition" and is
usually abbreviated to "OCR". There are 2 Free packages called gocr and
ocrad, but they both suck compared to the commercial atuff available for
Windows. Seriously. Source image was a 300 DPI black-and-white, very
well scanned, no skew, professionally typeset scanned TIFF. ~2800 chars
on the page. Number of mis-recognized chars:
Old Typereader: 0
Old Omnipage: 2
gocr 0.39: >50
ocrad 0.10: >100
....and that's for text that's as good as it gets. If you have skew,
blotch, curl, weird fonts, or anything like that, performance goes down
*sharply*.
I have loads of paper documents that I intend to scan. The contents of
these documents need to eventually end up on a web site as text.
Even the best OCR engine available (Finereader? Latest Omnipage?) is
not perfect. If you require perfection, you'll need to proof every
single page by hand to catch the problems. If you can't proof
everything, you'll need to store the scanned images so people can still
read the text when your engine recognizes "Murgatroyd" as
"IVIurgatroycl". HTH anyway,
--
Matt G|There is no Darkness in Eternity/But only Light too dim for us to see
Brainbench MVP for Linux Admin / mail: TRAP + SPAN don't belong
http://www.brainbench.com / "He is a rhythmic movement of the
-----------------------------/ penguins, is Tux." --MegaHAL
.
- References:
- Text in images to a text file
- From: bobblebob
- Text in images to a text file
- Prev by Date: Re: Need suggestions on my problem
- Next by Date: Re: Where from Konqueror knows the country I'm in?
- Previous by thread: Re: Text in images to a text file
- Next by thread: Re: Text in images to a text file
- Index(es):
Relevant Pages
|