Bug 220979

Summary: Review Request: tesseract - Raw OCR Engine
Product: [Fedora] Fedora Reporter: Karol Trzcionka <karlikt>
Component: Package ReviewAssignee: Michał Bentkowski <mr.ecik>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Package Reviews List <fedora-package-review>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: andrewz, bnocera, cassmodiah, lemenkov, mgarski, opensource, Philippe.Vouters
Target Milestone: ---Flags: mr.ecik: fedora-review+
j: fedora-cvs+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-17 19:30:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Karol Trzcionka 2006-12-29 22:01:39 UTC
Spec URL: http://mutebox.net/~karlik/tesseract.spec
SRPM URL: http://mutebox.net/~karlik/tesseract-1.02-1.src.rpm
Description: 
A commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.

It is my first rpm and I am looking for sponsor. I want to add this rpm (in future I hope add more), because in extras is not any ocr and this program is on wishlist.

Comment 1 Neal Becker 2006-12-29 22:27:21 UTC
This has been discussed before IIRC.  tesseract might give good results, but 
it is hopeless to compile with 64bit.  IMHO, this indicates very poor code 
quality.  I wouldn't want to volunteer to maintain something like that.

Comment 2 Karol Trzcionka 2006-12-30 19:32:06 UTC
I fixed one rpmlint warning (mixed space and tabs).
new SRPM: http://mutebox.net/~karlik/tesseract-1.02-2.src.rpm
new SPECS has the same URL like old


Comment 3 Karol Trzcionka 2006-12-31 17:52:43 UTC
I had to change URLs, here are correct:
SPEC: http://karlik.nonlogic.org/tesseract/tesseract.spec
SRPM: http://karlik.nonlogic.org/tesseract/tesseract-1.02-2.src.rpm

Comment 4 Mamoru TASAKA 2007-01-06 13:01:53 UTC
Removing NEEDSPONSOR (bug 221349)

Comment 5 Till Maas 2007-01-26 20:24:40 UTC
from README:

"Tesseract can also make use of the libtiff library. (www.libtiff.org)
Without libtiff, Tesseract can only read uncompressed and G3 compressed
TIFF files."

You seem not to use libtiff, is this intentional?

Comment 6 Karol Trzcionka 2007-01-27 14:52:49 UTC
No, it was not intentional.
New release:
ChangeLog:
- Update BRs
- Fix x86_64 compile

The patch x86_64-fix is not written by me. I found it in debian package.

New URLs:
http://karlik.nonlogic.org/tesseract/tesseract.spec
http://karlik.nonlogic.org/tesseract/tesseract-1.02-3.src.rpm

Comment 8 Till Maas 2007-03-07 21:02:51 UTC
(In reply to comment #6)
> No, it was not intentional.

Eh, from your specfile 1.03-1:
%configure --without-libtiff

You configure it not to use libtiff.

Comment 9 Karol Trzcionka 2007-03-20 20:58:32 UTC
It was not intentional in 1.02 version package, but v1.03 does not work correct
with libtiff library

Comment 10 Karol Trzcionka 2007-03-22 20:41:53 UTC
On cvs are fixed sources, so I have prepared patch bases on this. At now libtiff
again is on.
New URLs:
http://karlik.nonlogic.org/tesseract/tesseract.spec
http://karlik.nonlogic.org/tesseract/tesseract-1.03-2.src.rpm

Comment 11 Peter Lemenkov 2007-04-19 12:38:51 UTC
Please take a look at tis link:

http://sourceforge.net/forum/forum.php?forum_id=672344

They said that address of Tesseract OCR was changed to

http://code.google.com/p/tesseract-ocr/



Comment 13 Michał Bentkowski 2007-06-09 18:54:55 UTC
REVIEW:
 * dist tag present
 * valid Apache license
 * rpmlint:
W: tesseract-devel no-documentation
 * good md5 (c39bd7b465c37a3863140e88d51cd839)
 * the newest version packaged
 * package owns all directories well
 * proper %clean section
 * proper buildroot
 * nothing wrong with %files
 * no scriplets needed
 * devel subpackage present and looks good
 * no .la files
 * no GUI

I wasn't able to test it in mock, but I hope it's working.
Approved.

Comment 14 Jason Tibbitts 2007-06-09 19:01:50 UTC
It does build fine in mock.  The only rpmlint warning I saw from that besides
the one you noted is
  W: tesseract mixed-use-of-spaces-and-tabs (spaces: line 16, tab: line 1)
which is not really an issue.


Comment 15 Karol Trzcionka 2007-06-10 12:33:15 UTC
New Package CVS Request
=======================
Package Name: tesseract
Short Description: Raw OCR Engine
Owners: karlikt
Branches: F-7
InitialCC: 

Comment 16 Kevin Fenzi 2007-06-11 03:33:43 UTC
cvs done. 

Comment 17 Karol Trzcionka 2007-06-17 19:30:07 UTC
Built for f-7 and devel. Closed.

Comment 18 Simon 2011-01-12 11:57:23 UTC
Karol, can you please provide tesseract for EPEL-6?
Maybe you won't maintain tesseract on EPEL-6, so I can do it. Just add my fasname (cassmodiah) in the request of the new branch.
It's currently a missing dep for tucan.

Comment 19 Karol Trzcionka 2011-01-19 20:21:01 UTC
Package Change Request
======================
Package Name: tesseract
New Branches: el6
Owners: karlik cassmodiah

Comment 20 Jason Tibbitts 2011-01-19 21:56:30 UTC
Git done (by process-git-requests).

Comment 21 Philippe Vouters 2012-12-18 19:48:15 UTC
I have just today tested this Music OCR software named Audiveris (http://audiveris.kenai.com/). Clicking on the download button highlights you can download a RPM package as well as its sources.

One software this Music OCR Java written software requires is Tesseract to handle texts in images. I am currently running Fedora 17 i686. I could yum install tesseract from the Fedora repository. However when I query rpm for the actual package, it says tesseract-3.00-2.fc15.i686. So fc15 and not fc17. So far this Tesseract package shows it is missing an Audiveris required shared library named libtesseract.so.3 for its JNI.

My questions to you : what are the reasons behind this RPM package to not have been upgraded,

Second, it happens I worked onto Tesseract source codes and still own a libtesseract.so.3 in my /usr/local/lib directory.

My second question to you : what is the reason behind Fedora not offering a more up to date software version ?

My third question to you : what would need Red Hat for its RHEL/Centos6/Fedora distributions so that I can offer an up to date tesseract code provided tesseract successfully pass all my tests. My tests shall also include tests using Audiveris.

Please do strongly note I classified Tesseract as highly unstable the last time I worked onto it, the tesseract developers changing the code and committing their change without any regression test. So I stabilized my own tesseract version. Anyone in the tesseract project appeared totally dumb to any mail from outside. All this past work is duly documented at http://vouters.dyndns.org/tima/Linux-Java-Tesseract-ocr-Porting_Tesseract_from_android_Graphics_to_Sun_Graphics.html