Created attachment 703876 [details] Nonblocking read of worker output Description of problem: 'createrepo -v' deadlocks on large repositories (fedora-release i a good candidate) due to fixed order of reading from workers stdout and stderr Version-Release number of selected component (if applicable): createrepo-0.9.9-11.fc17.noarch How reproducible: Steps to Reproduce: 1. createrepo -v on fedora-release repo Actual results: program never terminates Expected results: A working repo Additional info: Attached patch makes worker output nonblocking, adding reader threads is another possibility. See also: https://bugzilla.redhat.com/show_bug.cgi?id=734510 https://bugzilla.redhat.com/show_bug.cgi?id=856363
Created attachment 703907 [details] Threaded version of patch Since readline did not handle non-blocking reads gracefulle (truncated lines), I suggest this patch instead.
Created attachment 703910 [details] Threaded version of patch (without debug printout) Patch also contains a few sys.stdout.flush() calls to make feedback more continuous (no maing for pipes to fill up).
Hi, thanks for the patch, but I've already backported upstream commit 71cd4c45a84366ae66eb1b6e619f1b3987249c2d (select based) that fixes this issue.
Created attachment 703925 [details] Small exaple on how select and readline can still deadlock Ok, I'll wait for it to turn up in the fedora-17 updates repo then. Unfortunately the solution in the patch can still deadlock since readline will wait until there is a newline read from the pipe, this attachment shows how this can be forced to happen The stdout.flush() calls in my patch would probably make this a non-issu and makes the feedback more continuous as well.
> can still deadlock since readline will wait until there is a newline read from the pipe Yes.. but your producer prints 64k+ lines... This does not seem to be the case here: $ grep print worker.py print >> sys.stderr, "File not found: %s" % pkgpath print "reading %s" % (pkgfile) print >> sys.stderr, "Error: %s" % e Turning off buffering on stdout and stderr in worker.py should fix that for good, much easier and simpler than using threads, IMO.
Created attachment 704161 [details] Better example of why mixing select and readline is bad This small demo is (hopefully) a better demonstration of why mixing select and readline is not such a great idea (no deadlocks, but long delays and non-obvious order of messages).
> print>>f1,'Lock consumer on read 1', In worker.py, all output is newline-terminated. When stdout buffering is turned off (it isn't and that's a bug, I must admit), readline() can't block, right?
Well, consider the case when the process with the select loop gets swapped out for some time, then the incoming pipes will all be filled, but only the first line in each pipe will be printed, and then each time something is written to the pipe (i.e. its filedescriptor is returned by select), the next line will be printed, but the pipe will not be emptied. My feeling is that doing it right in the consumer is the way to go. Should I prepare a patch based on select/os.read, as per the example?
Created attachment 704222 [details] Select based low-level polling OK, here comes a select based patch with flush in the worker threads as well (easier to understand than non-blocking IMHO).
Yes, it's possible that pipe merges multiple lines, and readline() returns only the first one. But the remaining data are not lost, but buffered in the reader. Writer does not block and produces more data (or exits and closes the write end). Both conditions trigger select(), and since now readline() does not issue low-level read(), we loop until the buffer is empty again. I don't think the small latency justifies reimplementing readline().
OK, I would rather be safe than sorry, and reimplementing readline works better when the producer is buffered (i.e. not having 64 Kbytes pending in all input pipes [each write of 4096 bytes will result in one line read, quickly filling up the input pipe]). But the risk for deadlock is small (probably nonexistent) when producer is unbuffered/doing flush. I guess that my influence on this is NIL :-)
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.