Red Hat Bugzilla – Bug 220979
Review Request: tesseract - Raw OCR Engine
Last modified: 2012-12-18 14:48:15 EST
Spec URL: http://mutebox.net/~karlik/tesseract.spec
SRPM URL: http://mutebox.net/~karlik/tesseract-1.02-1.src.rpm
A commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.
It is my first rpm and I am looking for sponsor. I want to add this rpm (in future I hope add more), because in extras is not any ocr and this program is on wishlist.
This has been discussed before IIRC. tesseract might give good results, but
it is hopeless to compile with 64bit. IMHO, this indicates very poor code
quality. I wouldn't want to volunteer to maintain something like that.
I fixed one rpmlint warning (mixed space and tabs).
new SRPM: http://mutebox.net/~karlik/tesseract-1.02-2.src.rpm
new SPECS has the same URL like old
I had to change URLs, here are correct:
Removing NEEDSPONSOR (bug 221349)
"Tesseract can also make use of the libtiff library. (www.libtiff.org)
Without libtiff, Tesseract can only read uncompressed and G3 compressed
You seem not to use libtiff, is this intentional?
No, it was not intentional.
- Update BRs
- Fix x86_64 compile
The patch x86_64-fix is not written by me. I found it in debian package.
Update to v1.03:
(In reply to comment #6)
> No, it was not intentional.
Eh, from your specfile 1.03-1:
You configure it not to use libtiff.
It was not intentional in 1.02 version package, but v1.03 does not work correct
with libtiff library
On cvs are fixed sources, so I have prepared patch bases on this. At now libtiff
again is on.
Please take a look at tis link:
They said that address of Tesseract OCR was changed to
* dist tag present
* valid Apache license
W: tesseract-devel no-documentation
* good md5 (c39bd7b465c37a3863140e88d51cd839)
* the newest version packaged
* package owns all directories well
* proper %clean section
* proper buildroot
* nothing wrong with %files
* no scriplets needed
* devel subpackage present and looks good
* no .la files
* no GUI
I wasn't able to test it in mock, but I hope it's working.
It does build fine in mock. The only rpmlint warning I saw from that besides
the one you noted is
W: tesseract mixed-use-of-spaces-and-tabs (spaces: line 16, tab: line 1)
which is not really an issue.
New Package CVS Request
Package Name: tesseract
Short Description: Raw OCR Engine
Built for f-7 and devel. Closed.
Karol, can you please provide tesseract for EPEL-6?
Maybe you won't maintain tesseract on EPEL-6, so I can do it. Just add my fasname (cassmodiah) in the request of the new branch.
It's currently a missing dep for tucan.
Package Change Request
Package Name: tesseract
New Branches: el6
Owners: karlik cassmodiah
Git done (by process-git-requests).
I have just today tested this Music OCR software named Audiveris (http://audiveris.kenai.com/). Clicking on the download button highlights you can download a RPM package as well as its sources.
One software this Music OCR Java written software requires is Tesseract to handle texts in images. I am currently running Fedora 17 i686. I could yum install tesseract from the Fedora repository. However when I query rpm for the actual package, it says tesseract-3.00-2.fc15.i686. So fc15 and not fc17. So far this Tesseract package shows it is missing an Audiveris required shared library named libtesseract.so.3 for its JNI.
My questions to you : what are the reasons behind this RPM package to not have been upgraded,
Second, it happens I worked onto Tesseract source codes and still own a libtesseract.so.3 in my /usr/local/lib directory.
My second question to you : what is the reason behind Fedora not offering a more up to date software version ?
My third question to you : what would need Red Hat for its RHEL/Centos6/Fedora distributions so that I can offer an up to date tesseract code provided tesseract successfully pass all my tests. My tests shall also include tests using Audiveris.
Please do strongly note I classified Tesseract as highly unstable the last time I worked onto it, the tesseract developers changing the code and committing their change without any regression test. So I stabilized my own tesseract version. Anyone in the tesseract project appeared totally dumb to any mail from outside. All this past work is duly documented at http://vouters.dyndns.org/tima/Linux-Java-Tesseract-ocr-Porting_Tesseract_from_android_Graphics_to_Sun_Graphics.html