Bug 220979
Summary: | Review Request: tesseract - Raw OCR Engine | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Karol Trzcionka <karlikt> |
Component: | Package Review | Assignee: | Michał Bentkowski <mr.ecik> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Fedora Package Reviews List <fedora-package-review> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | andrewz, bnocera, cassmodiah, lemenkov, mgarski, opensource, Philippe.Vouters |
Target Milestone: | --- | Flags: | mr.ecik:
fedora-review+
j: fedora-cvs+ |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-06-17 19:30:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Karol Trzcionka
2006-12-29 22:01:39 UTC
This has been discussed before IIRC. tesseract might give good results, but it is hopeless to compile with 64bit. IMHO, this indicates very poor code quality. I wouldn't want to volunteer to maintain something like that. I fixed one rpmlint warning (mixed space and tabs). new SRPM: http://mutebox.net/~karlik/tesseract-1.02-2.src.rpm new SPECS has the same URL like old I had to change URLs, here are correct: SPEC: http://karlik.nonlogic.org/tesseract/tesseract.spec SRPM: http://karlik.nonlogic.org/tesseract/tesseract-1.02-2.src.rpm Removing NEEDSPONSOR (bug 221349) from README: "Tesseract can also make use of the libtiff library. (www.libtiff.org) Without libtiff, Tesseract can only read uncompressed and G3 compressed TIFF files." You seem not to use libtiff, is this intentional? No, it was not intentional. New release: ChangeLog: - Update BRs - Fix x86_64 compile The patch x86_64-fix is not written by me. I found it in debian package. New URLs: http://karlik.nonlogic.org/tesseract/tesseract.spec http://karlik.nonlogic.org/tesseract/tesseract-1.02-3.src.rpm Update to v1.03: http://karlik.nonlogic.org/tesseract/tesseract.spec http://karlik.nonlogic.org/tesseract/tesseract-1.03-1.src.rpm (In reply to comment #6) > No, it was not intentional. Eh, from your specfile 1.03-1: %configure --without-libtiff You configure it not to use libtiff. It was not intentional in 1.02 version package, but v1.03 does not work correct with libtiff library On cvs are fixed sources, so I have prepared patch bases on this. At now libtiff again is on. New URLs: http://karlik.nonlogic.org/tesseract/tesseract.spec http://karlik.nonlogic.org/tesseract/tesseract-1.03-2.src.rpm Please take a look at tis link: http://sourceforge.net/forum/forum.php?forum_id=672344 They said that address of Tesseract OCR was changed to http://code.google.com/p/tesseract-ocr/ New URLs: http://karlik.nonlogic.org/tesseract/tesseract.spec http://karlik.nonlogic.org/tesseract/tesseract-1.04-1.src.rpm REVIEW: * dist tag present * valid Apache license * rpmlint: W: tesseract-devel no-documentation * good md5 (c39bd7b465c37a3863140e88d51cd839) * the newest version packaged * package owns all directories well * proper %clean section * proper buildroot * nothing wrong with %files * no scriplets needed * devel subpackage present and looks good * no .la files * no GUI I wasn't able to test it in mock, but I hope it's working. Approved. It does build fine in mock. The only rpmlint warning I saw from that besides the one you noted is W: tesseract mixed-use-of-spaces-and-tabs (spaces: line 16, tab: line 1) which is not really an issue. New Package CVS Request ======================= Package Name: tesseract Short Description: Raw OCR Engine Owners: karlikt Branches: F-7 InitialCC: cvs done. Built for f-7 and devel. Closed. Karol, can you please provide tesseract for EPEL-6? Maybe you won't maintain tesseract on EPEL-6, so I can do it. Just add my fasname (cassmodiah) in the request of the new branch. It's currently a missing dep for tucan. Package Change Request ====================== Package Name: tesseract New Branches: el6 Owners: karlik cassmodiah Git done (by process-git-requests). I have just today tested this Music OCR software named Audiveris (http://audiveris.kenai.com/). Clicking on the download button highlights you can download a RPM package as well as its sources. One software this Music OCR Java written software requires is Tesseract to handle texts in images. I am currently running Fedora 17 i686. I could yum install tesseract from the Fedora repository. However when I query rpm for the actual package, it says tesseract-3.00-2.fc15.i686. So fc15 and not fc17. So far this Tesseract package shows it is missing an Audiveris required shared library named libtesseract.so.3 for its JNI. My questions to you : what are the reasons behind this RPM package to not have been upgraded, Second, it happens I worked onto Tesseract source codes and still own a libtesseract.so.3 in my /usr/local/lib directory. My second question to you : what is the reason behind Fedora not offering a more up to date software version ? My third question to you : what would need Red Hat for its RHEL/Centos6/Fedora distributions so that I can offer an up to date tesseract code provided tesseract successfully pass all my tests. My tests shall also include tests using Audiveris. Please do strongly note I classified Tesseract as highly unstable the last time I worked onto it, the tesseract developers changing the code and committing their change without any regression test. So I stabilized my own tesseract version. Anyone in the tesseract project appeared totally dumb to any mail from outside. All this past work is duly documented at http://vouters.dyndns.org/tima/Linux-Java-Tesseract-ocr-Porting_Tesseract_from_android_Graphics_to_Sun_Graphics.html |