Bug 916562 - 'createrepo -v' deadlocks on large repositories
Summary: 'createrepo -v' deadlocks on large repositories
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: createrepo
Version: 17
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-28 10:31 UTC by Anders Blomdell
Modified: 2013-08-01 18:47 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-08-01 18:47:18 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Nonblocking read of worker output (2.15 KB, patch)
2013-02-28 10:31 UTC, Anders Blomdell
no flags Details | Diff
Threaded version of patch (3.43 KB, patch)
2013-02-28 14:15 UTC, Anders Blomdell
no flags Details | Diff
Threaded version of patch (without debug printout) (3.37 KB, patch)
2013-02-28 14:58 UTC, Anders Blomdell
no flags Details | Diff
Small exaple on how select and readline can still deadlock (942 bytes, text/x-python)
2013-02-28 16:40 UTC, Anders Blomdell
no flags Details
Better example of why mixing select and readline is bad (2.53 KB, text/plain)
2013-03-01 13:50 UTC, Anders Blomdell
no flags Details
Select based low-level polling (3.36 KB, patch)
2013-03-01 15:25 UTC, Anders Blomdell
no flags Details | Diff

Description Anders Blomdell 2013-02-28 10:31:07 UTC
Created attachment 703876 [details]
Nonblocking read of worker output

Description of problem:

'createrepo -v' deadlocks on large repositories (fedora-release i a good candidate) due to fixed order of reading from workers stdout and stderr

Version-Release number of selected component (if applicable):

createrepo-0.9.9-11.fc17.noarch

How reproducible:


Steps to Reproduce:
1. createrepo -v on fedora-release repo
 
Actual results:

program never terminates 

Expected results:

A working repo

Additional info:

Attached patch makes worker output nonblocking, adding reader threads is another possibility.

See also:

https://bugzilla.redhat.com/show_bug.cgi?id=734510
https://bugzilla.redhat.com/show_bug.cgi?id=856363

Comment 1 Anders Blomdell 2013-02-28 14:15:05 UTC
Created attachment 703907 [details]
Threaded version of patch

Since readline did not handle non-blocking reads gracefulle (truncated lines), I suggest this patch instead.

Comment 2 Anders Blomdell 2013-02-28 14:58:27 UTC
Created attachment 703910 [details]
Threaded version of patch (without debug printout)

Patch also contains a few sys.stdout.flush() calls to make feedback more continuous (no maing for pipes to fill up).

Comment 3 Zdeněk Pavlas 2013-02-28 15:13:56 UTC
Hi, thanks for the patch, but I've already backported upstream commit
71cd4c45a84366ae66eb1b6e619f1b3987249c2d (select based) that fixes this issue.

Comment 4 Anders Blomdell 2013-02-28 16:40:14 UTC
Created attachment 703925 [details]
Small exaple on how select and readline can still deadlock

Ok, I'll wait for it to turn up in the fedora-17 updates repo then.

Unfortunately the solution in the patch can still deadlock since readline will wait until there is a newline read from the pipe, this attachment shows how this can be forced to happen

The stdout.flush() calls in my patch would probably make this a non-issu and makes the feedback more continuous as well.

Comment 5 Zdeněk Pavlas 2013-03-01 10:59:49 UTC
> can still deadlock since readline will wait until there is a newline read from the pipe

Yes.. but your producer prints 64k+ lines...  This does not seem to be the case here:

$ grep print worker.py
            print >> sys.stderr, "File not found: %s" % pkgpath
                print "reading %s" % (pkgfile)
            print >> sys.stderr, "Error: %s" % e

Turning off buffering on stdout and stderr in worker.py should fix that for good, much easier and simpler than using threads, IMO.

Comment 6 Anders Blomdell 2013-03-01 13:50:42 UTC
Created attachment 704161 [details]
Better example of why mixing select and readline is bad

This small demo is (hopefully) a better demonstration of why mixing select and
readline is not such a great idea (no deadlocks, but long delays and non-obvious order of messages).

Comment 7 Zdeněk Pavlas 2013-03-01 14:10:24 UTC
>        print>>f1,'Lock consumer on read 1',

In worker.py, all output is newline-terminated.  When stdout buffering is turned off (it isn't and that's a bug, I must admit), readline() can't block, right?

Comment 8 Anders Blomdell 2013-03-01 14:33:09 UTC
Well, consider the case when the process with the select loop gets swapped out for some time, then the incoming pipes will all be filled, but only the first line in each pipe will be printed, and then each time something is written to the pipe (i.e. its filedescriptor is returned by select), the next line will be printed, but the pipe will not be emptied.

My feeling is that doing it right in the consumer is the way to go.

Should I prepare a patch based on select/os.read, as per the example?

Comment 9 Anders Blomdell 2013-03-01 15:25:01 UTC
Created attachment 704222 [details]
Select based low-level polling

OK, here comes a select based patch with flush in the worker threads as well (easier to understand than non-blocking IMHO).

Comment 10 Zdeněk Pavlas 2013-03-04 15:04:11 UTC
Yes, it's possible that pipe merges multiple lines, and readline() returns only the first one.  But the remaining data are not lost, but buffered in the reader.  Writer does not block and produces more data (or exits and closes the write end).  Both conditions trigger select(), and since now readline() does not issue low-level read(), we loop until the buffer is empty again.  I don't think the small latency justifies reimplementing readline().

Comment 11 Anders Blomdell 2013-03-04 16:02:10 UTC
OK, I would rather be safe than sorry, and reimplementing readline works better when the producer is buffered (i.e. not having 64 Kbytes pending in all input pipes [each write of 4096 bytes will result in one line read, quickly filling up the input pipe]). 

But the risk for deadlock is small (probably nonexistent) when producer is unbuffered/doing flush. 

I guess that my influence on this is NIL :-)

Comment 12 Fedora End Of Life 2013-07-04 07:03:42 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2013-08-01 18:47:23 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.