Spec Name or Url: http://www.cora.nwra.com/~orion/fedora/gocr.spec SRPM Name or Url: http://www.cora.nwra.com/~orion/fedora/gocr-0.41-1.fc6.src.rpm Description: GOCR is an OCR (Optical Character Recognition) program, developed under the GNU Public License. It converts scanned images of text back to text files. Joerg Schulenburg started the program, and now leads a team of developers. GOCR can be used with different front-ends, which makes it very easy to port to different OSes and architectures. It can open many different image formats, and its quality have been improving in a daily basis.
In files, the de file could be marked as: %lang(de) %doc READMEde.txt There is a gtk frontend, maybe it could be shipped in a sub-package? There is a missing dependency on wish. I also think that maybe it could make sense to have gocr-tcl for gocr.tcl, because of that requires? There are many Requires missing. At least (in pnm.c), gzip, bzip2, transfig, netpbm-progs, libjpeg Maybe upstream could use convert...
(In reply to comment #1) > In files, the de file could be marked as: > %lang(de) %doc READMEde.txt Done. > There is a gtk frontend, maybe it could be shipped in a sub-package? Done. > There is a missing dependency on wish. I also think that maybe it > could make sense to have gocr-tcl for gocr.tcl, because of that > requires? Done. > There are many Requires missing. At least (in pnm.c), > gzip, bzip2, transfig, netpbm-progs, libjpeg > Maybe upstream could use convert... I'm wondering whether to make these hard Requires or not. Obviously you need some to get extra functionality, but only for the image types you need to process. Perhaps just a note in the description? The problem at the moment with using convert is that you would need a command like: convert <file> pnm:-, but the code expects to append the filename to the end of the command. The more I look at the code, the less I like it, but I suppose it's developing and may be useful. Just need the new spec file: http://www.cora.nwra.com/~orion/fedora/gocr.spec http://www.cora.nwra.com/~orion/fedora/gocr-0.41-2.fc6.src.rpm
(In reply to comment #2) > > There are many Requires missing. At least (in pnm.c), > > gzip, bzip2, transfig, netpbm-progs, libjpeg > > I'm wondering whether to make these hard Requires or not. Obviously you need > some to get extra functionality, but only for the image types you need to > process. Perhaps just a note in the description? It depends how it fails. But given what those deps are, except maybe for transfig, I can't see why they couldn't be hard requires. png, jpeg, gif and eps support seems to be a must to me. I tested a bit, but I get only segfaults on non pnm files (tried png and eps): $ gocr ex.pcx Special chars: àá__åæç À Å Æ ß &$Xgo ØØ44t>¢µ Special chars= àáâãäåæç À Å Æ ß &$XO_o øØ44 _>_µ Special chars : àáâăäåæç À Å _ _ G_#9o 0Ø44>>tµ $ convert ex.pcx ex.png $ gocr ex.png pngtopnm: warning - non-square pixels; to fix do a 'pamscale -yscale 4.28479' Erreur de segmentation $ gdb --args gocr ex.png .... (gdb) run Starting program: /usr/bin/gocr ex.png pngtopnm: warning - non-square pixels; to fix do a 'pamscale -yscale 4.28479' Program received signal SIGSEGV, Segmentation fault. 0x00aebf18 in pnm_readpaminit () from /usr/lib/libnetpbm.so.10 (gdb) bt #0 0x00aebf18 in pnm_readpaminit () from /usr/lib/libnetpbm.so.10 #1 0x080a0560 in readpgm (name=0xbfbc6a06 "ex.png", p=0xbfbbc2cc, vvv=0) at pnm.c:149 #2 0x080493b9 in main (argn=2, argv=0xbfbc5464) at gocr.c:272 #3 0x0082ce5c in __libc_start_main () from /lib/libc.so.6 #4 0x08048e01 in _start () With gziped or bzip2ed files, things are not better: $ gzip ex.pcx $ gocr ex.pcx.gz ERROR pcx.c L28: no ZSoft sign Another issue is that in gocr.tcl, the show button seems to invoke a program which isn't installed. There is an error with couldn't execute "xli": no such file or directory similarly with spell couldn't execute "tkispell": no such file or directory and with scan it starts xsane, so there is a missing dependency.
Added Requires for those things we ship. We don't ship xli or tkispell though. So, change to equivalent apps we do ship or forget about them? Worry about the segfault, or just report upstream?
Looking at tkispell on the web it doesn't seems to be maintained, and it is not obvious where upstream is. Looking at the gocr.tcl code, it looks like spellchecking involves putting a file named out01.txt in the current directory which is not cleaned up, and is the same file the output text is saved to in the default case... My opinion would be to disable this functionality. I did it simply by commenting out pack .abar.spell -side left I spotted another issue, the config file is found and written in the current directory, and not in $HOME! This is bad... Maybe we shouldn't ship gocr.tcl? It hasn't really be changed in 4 years. Testing a bit gtk-ocr, I found at least 2 bugs (a crash, and also at another point the files appeared but I couldn't convert them). It is saner with regard with the handling of config file, however the converted file is saved in a file with same name than input file with .txt appended without any possibility to override this, nor any explanation of where the converted file is saved to... The default image viewer here is display from ImageMagick. Looking at the cvs, it seems that it hasn't been changed in 6 years. My personal opinion is that those 2 frontends are too buggy and unmaintained to be shipped. Now regarding the segfault, I think it is problematic since it seems to me that support for widely used image formats (png, eps, jpeg) should be working in a shipped package. For devel it is not problematic, but for FC-6 and below I think this should be a must. Not supporting compressed images is not an issue in my opinion.
Is this here just for FuzzyOcr? If so, the developers recommend gocr-0.40 now according to: http://fuzzyocr.own-hero.net/wiki/Installation-3.x "preferably version 0.40 (some people reported bad recognition with 0.41)"
There is a new version available, maybe it fixes some of the issues?
Maybe it does, but relating to use with FuzzyOcr, I read this comment on the FuzzyOcr mailing list: (it is a private archive) http://lists.own-hero.net/mailman/private/devel-spam/2006-December/001091.html Someone wrote: "I can confirm that - on large images, scanning times can go through the roof (over 30 secs on a pic i had... gocr0.40 needed 1 sec, 0.41 8 secs and 0.42 35 secs) And I already found 3 of 10 images which crash gocr 0.42 with Error in ocr0.c L208: idx out of range" Granted, it's just one person's comment, but it seems gocr is heading in a different direction than what is good for scanning possible spam.
Good news, the FuzzyOcr developers are recommending gocr 0.43 now.
Just need the new spec file: http://www.cora.nwra.com/~orion/fedora/gocr.spec http://www.cora.nwra.com/~orion/fedora/gocr-0.43-1.fc6.src.rpm This disables the front-ends as they seem unmaintained. No segfaults. gzip/bzip2 only supported for: src/pnm.c: ".pnm.gz", "gzip -cd", /* compressed pnm-files, gzip package */ src/pnm.c: ".pbm.gz", "gzip -cd", src/pnm.c: ".pgm.gz", "gzip -cd", src/pnm.c: ".ppm.gz", "gzip -cd", But this is in the source and I'm not really interested in adding features.
* rpmlint says: E: gocr explicit-lib-dependency libjpeg I posted a comment above asking for that Requires, I guess there is an executable from libjpeg used for conversion. * follow guidelines X License is GPL, not included. You should ask upstream to include the license file, otherwise he may not be able to defend his license. Some file with an author and no license. This should be investigated and certainly corrected upstream. Except from otsu it is the upstream author, so ther shouldn't be much trouble. Maybe the upstream author thinks that no license means public domain, but it is not the case, he should either remove the author notice or explicitly license it in the public domain. otsu.c has no license but an author (in fact 2, as shown by looking at the comments). the following code was send by Ryan Dibble <dibbler> pnm.c has no license but an author /* (c) Joerg Schulenburg 2000-2006 pcx.c and tga.c have no license but an author // Joerg Schulenburg Mai99 // Joerg Schulenburg Mai99 * build and run fine * right Requires and BuildRequires. Maybe a comment explaining the need for the requires could be in order. * %files section right * sane provides * match upstream f989fe8e24f82d19c8ce55df15784e15 gocr-0.43.tar.gz The only remaining blocker is the license issue. A statement from upstream and a promise to fix things for the next release would be enough for me.
(In reply to comment #11) > X License is GPL, not included. You should ask upstream to include the > license file, otherwise he may not be able to defend his license. Ooops, sorry it is included. The only issue is with files with author and no license.
(In reply to comment #12) > Ooops, sorry it is included. The only issue is with files with author > and no license. 0.44 has been released with the licenses added. http://www.cora.nwra.com/~orion/fedora/gocr.spec http://www.cora.nwra.com/~orion/fedora/gocr-0.44-1.fc6.src.rpm
The way the license has been added to otsu.c is a bit dubious and looks like Ryan Dibble copyright was taken away since there is no evidence that he transfered it. Anyway this was the only blocker, so it is APPROVED.
Need initial import and FC-5/6 branches.
Set, but please use the template in the future as described on CVSAdminProcedure.
This was approved over two weeks ago and still is not imported. Is it a dead package?
Just got busy. Checked in and built.
If there is no CVS request then please do not change the fedora-cvs flag.