Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 220979 - Review Request: tesseract - Raw OCR Engine
Review Request: tesseract - Raw OCR Engine
Product: Fedora
Classification: Fedora
Component: Package Review (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Michał Bentkowski
Fedora Package Reviews List
Depends On:
  Show dependency treegraph
Reported: 2006-12-29 17:01 EST by Karol Trzcionka
Modified: 2012-12-18 14:48 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-06-17 15:30:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
mr.ecik: fedora‑review+
tibbs: fedora‑cvs+

Attachments (Terms of Use)

  None (edit)
Description Karol Trzcionka 2006-12-29 17:01:39 EST
Spec URL: http://mutebox.net/~karlik/tesseract.spec
SRPM URL: http://mutebox.net/~karlik/tesseract-1.02-1.src.rpm
A commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.

It is my first rpm and I am looking for sponsor. I want to add this rpm (in future I hope add more), because in extras is not any ocr and this program is on wishlist.
Comment 1 Neal Becker 2006-12-29 17:27:21 EST
This has been discussed before IIRC.  tesseract might give good results, but 
it is hopeless to compile with 64bit.  IMHO, this indicates very poor code 
quality.  I wouldn't want to volunteer to maintain something like that.
Comment 2 Karol Trzcionka 2006-12-30 14:32:06 EST
I fixed one rpmlint warning (mixed space and tabs).
new SRPM: http://mutebox.net/~karlik/tesseract-1.02-2.src.rpm
new SPECS has the same URL like old
Comment 3 Karol Trzcionka 2006-12-31 12:52:43 EST
I had to change URLs, here are correct:
SPEC: http://karlik.nonlogic.org/tesseract/tesseract.spec
SRPM: http://karlik.nonlogic.org/tesseract/tesseract-1.02-2.src.rpm
Comment 4 Mamoru TASAKA 2007-01-06 08:01:53 EST
Removing NEEDSPONSOR (bug 221349)
Comment 5 Till Maas 2007-01-26 15:24:40 EST
from README:

"Tesseract can also make use of the libtiff library. (www.libtiff.org)
Without libtiff, Tesseract can only read uncompressed and G3 compressed
TIFF files."

You seem not to use libtiff, is this intentional?
Comment 6 Karol Trzcionka 2007-01-27 09:52:49 EST
No, it was not intentional.
New release:
- Update BRs
- Fix x86_64 compile

The patch x86_64-fix is not written by me. I found it in debian package.

New URLs:
Comment 8 Till Maas 2007-03-07 16:02:51 EST
(In reply to comment #6)
> No, it was not intentional.

Eh, from your specfile 1.03-1:
%configure --without-libtiff

You configure it not to use libtiff.
Comment 9 Karol Trzcionka 2007-03-20 16:58:32 EDT
It was not intentional in 1.02 version package, but v1.03 does not work correct
with libtiff library
Comment 10 Karol Trzcionka 2007-03-22 16:41:53 EDT
On cvs are fixed sources, so I have prepared patch bases on this. At now libtiff
again is on.
New URLs:
Comment 11 Peter Lemenkov 2007-04-19 08:38:51 EDT
Please take a look at tis link:


They said that address of Tesseract OCR was changed to


Comment 13 Michał Bentkowski 2007-06-09 14:54:55 EDT
 * dist tag present
 * valid Apache license
 * rpmlint:
W: tesseract-devel no-documentation
 * good md5 (c39bd7b465c37a3863140e88d51cd839)
 * the newest version packaged
 * package owns all directories well
 * proper %clean section
 * proper buildroot
 * nothing wrong with %files
 * no scriplets needed
 * devel subpackage present and looks good
 * no .la files
 * no GUI

I wasn't able to test it in mock, but I hope it's working.
Comment 14 Jason Tibbitts 2007-06-09 15:01:50 EDT
It does build fine in mock.  The only rpmlint warning I saw from that besides
the one you noted is
  W: tesseract mixed-use-of-spaces-and-tabs (spaces: line 16, tab: line 1)
which is not really an issue.
Comment 15 Karol Trzcionka 2007-06-10 08:33:15 EDT
New Package CVS Request
Package Name: tesseract
Short Description: Raw OCR Engine
Owners: karlikt@gmail.com
Branches: F-7
Comment 16 Kevin Fenzi 2007-06-10 23:33:43 EDT
cvs done. 
Comment 17 Karol Trzcionka 2007-06-17 15:30:07 EDT
Built for f-7 and devel. Closed.
Comment 18 Simon 2011-01-12 06:57:23 EST
Karol, can you please provide tesseract for EPEL-6?
Maybe you won't maintain tesseract on EPEL-6, so I can do it. Just add my fasname (cassmodiah) in the request of the new branch.
It's currently a missing dep for tucan.
Comment 19 Karol Trzcionka 2011-01-19 15:21:01 EST
Package Change Request
Package Name: tesseract
New Branches: el6
Owners: karlik cassmodiah
Comment 20 Jason Tibbitts 2011-01-19 16:56:30 EST
Git done (by process-git-requests).
Comment 21 Philippe Vouters 2012-12-18 14:48:15 EST
I have just today tested this Music OCR software named Audiveris (http://audiveris.kenai.com/). Clicking on the download button highlights you can download a RPM package as well as its sources.

One software this Music OCR Java written software requires is Tesseract to handle texts in images. I am currently running Fedora 17 i686. I could yum install tesseract from the Fedora repository. However when I query rpm for the actual package, it says tesseract-3.00-2.fc15.i686. So fc15 and not fc17. So far this Tesseract package shows it is missing an Audiveris required shared library named libtesseract.so.3 for its JNI.

My questions to you : what are the reasons behind this RPM package to not have been upgraded,

Second, it happens I worked onto Tesseract source codes and still own a libtesseract.so.3 in my /usr/local/lib directory.

My second question to you : what is the reason behind Fedora not offering a more up to date software version ?

My third question to you : what would need Red Hat for its RHEL/Centos6/Fedora distributions so that I can offer an up to date tesseract code provided tesseract successfully pass all my tests. My tests shall also include tests using Audiveris.

Please do strongly note I classified Tesseract as highly unstable the last time I worked onto it, the tesseract developers changing the code and committing their change without any regression test. So I stabilized my own tesseract version. Anyone in the tesseract project appeared totally dumb to any mail from outside. All this past work is duly documented at http://vouters.dyndns.org/tima/Linux-Java-Tesseract-ocr-Porting_Tesseract_from_android_Graphics_to_Sun_Graphics.html

Note You need to log in before you can comment on or make changes to this bug.