Bug 1068910 - tesseract is missing OSD training data.
Summary: tesseract is missing OSD training data.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: tesseract
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Karol Trzcionka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-23 05:07 UTC by Elliott Sales de Andrade
Modified: 2014-05-08 10:09 UTC (History)
3 users (show)

Fixed In Version: tesseract-3.02.02-3.fc20
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-05-08 10:09:51 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Elliott Sales de Andrade 2014-02-23 05:07:52 UTC
Description of problem:
Running tesseract with "Orientation and script detection" enabled does not work, because a file is missing. It does not appear to be packaged in any of the tesseract* packages.

Version:
tesseract-3.02.02-2.fc20.x86_64

Results for "Orientation and script detection (OSD) only.":

$ tesseract <input> <output> -psm 0
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Error opening data file /usr/share/tesseract/tessdata/osd.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load
set_count == gridheight():Error:Assert failed:in file colfind.cpp, line 648
Segmentation fault

Results for "Automatic page segmentation with OSD.":

$ tesseract <input> <output> -psm 1
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Error opening data file /usr/share/tesseract/tessdata/osd.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load

Trying to install that file:

$ sudo yum install /usr/share/tesseract/tessdata/osd.traineddata
Loaded plugins: auto-update-debuginfo, langpacks, refresh-packagekit    
No package /usr/share/tesseract/tessdata/osd.traineddata available.
Error: Nothing to do

Comment 1 Fedora Update System 2014-03-28 10:47:46 UTC
tesseract-3.02.02-3.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/tesseract-3.02.02-3.fc20

Comment 2 Fedora Update System 2014-03-30 06:10:48 UTC
Package tesseract-3.02.02-3.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing tesseract-3.02.02-3.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-4504/tesseract-3.02.02-3.fc20
then log in and leave karma (feedback).

Comment 3 Richard Marko 2014-04-29 14:50:38 UTC
It still doesn't seem to contain requried file:

$ rpm -q tesseract
tesseract-3.02.02-3.fc20.x86_64
$ rpm -ql tesseract | grep osd | wc -l
0

Comment 4 Karol Trzcionka 2014-04-29 17:46:49 UTC
It is in subpackage -osd ;)
su -c 'yum install tesseract-osd'
Default description from Fedora Update System is misleading.

Comment 5 Fedora Update System 2014-05-08 10:09:51 UTC
tesseract-3.02.02-3.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.