Spec URL: http://mutebox.net/~karlik/tesseract.spec SRPM URL: http://mutebox.net/~karlik/tesseract-1.02-1.src.rpm Description: A commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005. It is my first rpm and I am looking for sponsor. I want to add this rpm (in future I hope add more), because in extras is not any ocr and this program is on wishlist.
This has been discussed before IIRC. tesseract might give good results, but it is hopeless to compile with 64bit. IMHO, this indicates very poor code quality. I wouldn't want to volunteer to maintain something like that.
I fixed one rpmlint warning (mixed space and tabs). new SRPM: http://mutebox.net/~karlik/tesseract-1.02-2.src.rpm new SPECS has the same URL like old
I had to change URLs, here are correct: SPEC: http://karlik.nonlogic.org/tesseract/tesseract.spec SRPM: http://karlik.nonlogic.org/tesseract/tesseract-1.02-2.src.rpm
Removing NEEDSPONSOR (bug 221349)
from README: "Tesseract can also make use of the libtiff library. (www.libtiff.org) Without libtiff, Tesseract can only read uncompressed and G3 compressed TIFF files." You seem not to use libtiff, is this intentional?
No, it was not intentional. New release: ChangeLog: - Update BRs - Fix x86_64 compile The patch x86_64-fix is not written by me. I found it in debian package. New URLs: http://karlik.nonlogic.org/tesseract/tesseract.spec http://karlik.nonlogic.org/tesseract/tesseract-1.02-3.src.rpm
Update to v1.03: http://karlik.nonlogic.org/tesseract/tesseract.spec http://karlik.nonlogic.org/tesseract/tesseract-1.03-1.src.rpm
(In reply to comment #6) > No, it was not intentional. Eh, from your specfile 1.03-1: %configure --without-libtiff You configure it not to use libtiff.
It was not intentional in 1.02 version package, but v1.03 does not work correct with libtiff library
On cvs are fixed sources, so I have prepared patch bases on this. At now libtiff again is on. New URLs: http://karlik.nonlogic.org/tesseract/tesseract.spec http://karlik.nonlogic.org/tesseract/tesseract-1.03-2.src.rpm
Please take a look at tis link: http://sourceforge.net/forum/forum.php?forum_id=672344 They said that address of Tesseract OCR was changed to http://code.google.com/p/tesseract-ocr/
New URLs: http://karlik.nonlogic.org/tesseract/tesseract.spec http://karlik.nonlogic.org/tesseract/tesseract-1.04-1.src.rpm
REVIEW: * dist tag present * valid Apache license * rpmlint: W: tesseract-devel no-documentation * good md5 (c39bd7b465c37a3863140e88d51cd839) * the newest version packaged * package owns all directories well * proper %clean section * proper buildroot * nothing wrong with %files * no scriplets needed * devel subpackage present and looks good * no .la files * no GUI I wasn't able to test it in mock, but I hope it's working. Approved.
It does build fine in mock. The only rpmlint warning I saw from that besides the one you noted is W: tesseract mixed-use-of-spaces-and-tabs (spaces: line 16, tab: line 1) which is not really an issue.
New Package CVS Request ======================= Package Name: tesseract Short Description: Raw OCR Engine Owners: karlikt Branches: F-7 InitialCC:
cvs done.
Built for f-7 and devel. Closed.
Karol, can you please provide tesseract for EPEL-6? Maybe you won't maintain tesseract on EPEL-6, so I can do it. Just add my fasname (cassmodiah) in the request of the new branch. It's currently a missing dep for tucan.
Package Change Request ====================== Package Name: tesseract New Branches: el6 Owners: karlik cassmodiah
Git done (by process-git-requests).
I have just today tested this Music OCR software named Audiveris (http://audiveris.kenai.com/). Clicking on the download button highlights you can download a RPM package as well as its sources. One software this Music OCR Java written software requires is Tesseract to handle texts in images. I am currently running Fedora 17 i686. I could yum install tesseract from the Fedora repository. However when I query rpm for the actual package, it says tesseract-3.00-2.fc15.i686. So fc15 and not fc17. So far this Tesseract package shows it is missing an Audiveris required shared library named libtesseract.so.3 for its JNI. My questions to you : what are the reasons behind this RPM package to not have been upgraded, Second, it happens I worked onto Tesseract source codes and still own a libtesseract.so.3 in my /usr/local/lib directory. My second question to you : what is the reason behind Fedora not offering a more up to date software version ? My third question to you : what would need Red Hat for its RHEL/Centos6/Fedora distributions so that I can offer an up to date tesseract code provided tesseract successfully pass all my tests. My tests shall also include tests using Audiveris. Please do strongly note I classified Tesseract as highly unstable the last time I worked onto it, the tesseract developers changing the code and committing their change without any regression test. So I stabilized my own tesseract version. Anyone in the tesseract project appeared totally dumb to any mail from outside. All this past work is duly documented at http://vouters.dyndns.org/tima/Linux-Java-Tesseract-ocr-Porting_Tesseract_from_android_Graphics_to_Sun_Graphics.html