Description of problem: Ever so often with scriptlets that return rather quickly (i.e. path of execution in script is small) rpm will deadlock waiting for the script to return. I have been seeing this with my rpm test hanrness for about a year now, but recently have reproduced this on RHEL 3. Version-Release number of selected component (if applicable): 4.2.3-10 How reproducible: Very Rare, but it is reproducible if you wait long enough (no pun intended) Steps to Reproduce: 1. Create a small two small spec files that contains %pre, %post, % preun and %postun scriptlets that create a semaphore file and then exit 0. The first should something like x-1-1 and the second x-1-2. 2. Bulid the packages from the spec file. 3. Now install the first, upgrade to the second, erase them and start all over in a loop: while : do rpm -Uvh x-1-1.noarch.rpm rpm -Uvh x-1-2.noarch.rpm rpm -e x done Actual results: Eventually, after perhaps thousands of passes it will lockup. When it does, strace will reveal it is haning on a futex, and gdb will show it is running pthread_cond_wait(). Expected results: That it would not hang ever. Additional info: I spent a lot of time trying to figure out what was going on (probably more than I should have). The first thing I found was that rpm was using pthread_cond_signal() in its signal handler and pthread_cond_wait() in rpmsqWaitUnregister() without proper locking of the signaler and the waiter (it was using the conditions mutex with the wait, but not with the signaler). I initially created a patch to fix this, only to find that the signal handler (and thus the caller of pthread_cond_signal()) where in the same thread of execution so my initial patch actually create a far easier to step on deadlock. Once I realized that pthread_cond_signal() and pthread_cond_wait() were in the same thread of execution, I arrived at what I think is a better solution. Ultimately, you can't use pthread_cond stuff in the same thread of execution (quick read of the man page will show this), so instead I thought through a way of using a single mutex to accomplish the synchronization that the pthread_cond's were used to provide. Its pretty simple: Before the fork crate and lock the rpmsq objects mutex. In the signal handler: unlock the mutex. In rpmsqWaitUnregister(): try to lock the mutex() since it starts off locked rpmsqWaitUnregister() will not be able to move forward (i.e. will block) until the signal handler unlocks the mutex. Also, I would understand if this was not fixed in RHEL 3, I merely logging this as a bug with a possible patch. What you guys do with it is up to you (I would do nothing until it has been tested out in Fedora land if I where you).
Created attachment 110381 [details] Patch to fix the problem in rpmsq.c
*** Bug 117620 has been marked as a duplicate of this bug. ***
Fixed (by applying patch) in rpm-4.4.3-0.20.
(In reply to comment #3) > Fixed (by applying patch) in rpm-4.4.3-0.20. RPM 4.4.3-0.20 doesn't seem to be out yet - is there a typo here? Also, will the patch be applied to RHEL3 U7? thanks!
rpm-4.4.5 was released several weeks ago. Ask RHEL support what is in U7.
There clearly was a bug here, which was fixed both upstream and in the 3 current RHEL releases, so the NOTABUG resolution is incorrect here. This bug # appears to cover the upstream patch, so should be resolved "UPSTREAM" seemingly? According to the SRPM, this is fixed in RHEL 3u8 (apparently as part of Bug 185322, which is not visible externally). According to the SRPM, this is fixed in RHEL 4u4 (apparently as part of Bug 185324, which is not visible externally). According to the SRPM, this is fixed in RHEL 5u0 (which only references this bug number as the place for the fix).