From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 (BDP) Description of problem: I was told by jbj to open a new bug for this issue becuase it was NOT the same as # 74726. Originally, I reported the same issue with the default version of rpm that ships with 8.0 Upgraded to version 4.1-9 test rpm packages (per jbj) but rpm is stil hanging. I managed to successfully remove six packages with rpm -e but then immediately tried to remove two more and it hung again If I just kill the proc with kill -9, rpm will not function. Once I remove the __db* files, rpm will function again. (strace follows) ... open("/var/lib/rpm/Packages", O_RDONLY|O_LARGEFILE) = 3 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 fstat64(3, {st_mode=S_IFREG|0644, st_size=10727424, ...}) = 0 brk(0x8260000) = 0x8260000 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 64000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 128000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 256000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 512000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [continues] -Pat Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. Just try removing packages using rpm -e. Seems to happen ever 4th or 5th package I try to remove but note that sometimes it happens sooner (after 1 or 2) sometimes later after 10 or 12. Additional info:
OK, you appear to have a new problem. However, I have the following questions first: 1) Does "immediately" mean simultaneously? 2) If you "kill -9" a running rpm, you *will* have to do "rm -f /var/lib/rpm/__db*" to fix. Are you terminating rpm through exceptional (e.g. kill -9) intervention frequently?
No. Immediately does not mean sumultaneously. I allowed the first command to completely finish removing the first six rpms. Then, after those six were successfully removed, I tried to remove the next two. I used something similar to the following # rpm -e pkg1 pkg2 pkg3 pkg4 pkg5 pkg6 [successful] # repm -e pkg1 pkg2 [hang - prompt does not return] Yes I am having to use kill -9. A lower priority kill does not work. After I kill the process with kill -9, I remove the __db* files and then rpm will function again until it hangs the next time. Then I go through the same process over again. This is happening quite often. I've been able to reproduce the issue on three completely separate installations of RH 8.0, all i386. -Pat
The aftermath of "kill -9" is less interesting (to me&rpm) than the initial hang, please adjust your comments accordingly. What packages were involved in the initial hang?
rpm -e ypbind ypserv nfs-utils fam portmap yp-tools [successful] rpm -e gdk-pixbuf-gnome gdk-pixbuf-devel [hang] I was trying to remove completed different packages from the other two systems so I don't think the issue is tied to any particular package being removed. -Pat
If you can reproduce, could you add -vv and append output here? Apologies for having you do the heavy lifting.
Created attachment 79711 [details] Results of rpm -e -vv
Please note that rpm hung during the above attached output - it did not finish. I used: # rpm -e -vv cups-libs qt samba-common samba-client unixODBC and just picked four packages at random to remove. -Pat
Sanity check: Did you "rm -f /var/lib/rpm__db*" before attempting the erase? I've tried several variants of erase from a chroot install, no hang yet, certainly not an easily reproduced hang. Caveat: my box is SMP, that may make a difference.
No I did not do a rm -f /var/lib/rpm__db* because the previous rpm -e was successful. There should not have been any __db* files there. If there _were_ any __db* files left behind from the previous rpm -e, they were not dealt with correctly when rpm completed successfully. Surely, you're not gonna tell me I have to manually check and remove these files after every successful use of rpm. Right? -Pat
The __db files are persistent in rpm-4.1, should always be present after creation, can be manually removed at any time that rpm is not active, cannot be removed by rpm bcause that opens up lock race windows. No, I'm not telling you that you have to remove those files after every successful execution of rpm. I'm asking whether you removed those files before attempting a reproducible test case. If not, I can't interpret the results of your test.
Like I said, I did not delete those files before I ran my test case because I had already deleted them 10 minutes before that when rpm hung up. rpm -e was working fine until I tried to run the test case you asked for. Thats why I asking - do you want me to manually remove those files after EVERY rpm -evv I do in order to reproduce the issue for you? -Pat
I need to know that there aren't stale locks from something else. The following sequence should isolate: 0) rm __db files 1) run "rpm -evv" that succeeds (__db files will exist after) 2) run "rpm -evv" that hangs, send me this log if different
Ok. I spent the better part of the afternoon trying to reproduce this per your instructions. I'm using RH 8.0 and rpm version 4.1-9. 1. rpm -e pkg1 pkg2 pkg3 ... pkg[n] [completes successfully] 2. rm -rf /var/lib/rpm/__db* [complets successfully] {repeat step 1 and 2) I continued this process until well... (laughing) the system doesn't have much left on it anymore. I doubt it will even reboot. However, rpm did NOT freeze or hang so I have no other rpm -evv report to attach. But I HAVE to delete those __db* files after every use of rpm otherwise it will hang. Looks like those lock files are causing it. -Pat
See also Bug 68056.
Well, I suggested that you remove the __db* files once, not each and every time. Apologies if that wasn't crystal clear. I've tried (and cannot) reproduce this bug in a chroot, so I'm going to close.
Could you please try with UP machine? If you cannot reproduce the bug with SMP, isn't it a logical step? What glibc are you using in your chroot environment? Besides, the attachment of preich is from the situation which you wanted: 0) rm __db files - he did that, as otherwise he could not rpm -e. 1) run "rpm -evv" that succeeds (__db files will exist after) - he did that 2) run "rpm -evv" that hangs, send me this log if different - he did that.
I can reproduce this 100% reliably on my freshly-kickstarted Dell 2650 PowerEdge. Steps to reproduce (the specific packages don't seem to matter) rm -fr /var/lib/rpm/__rpm* rpm -e somepackage rpm -qa|less rpm -e anotherpackage It begins select() cycling as described at the second rpm -e. I'm attaching -vv output from the two -e's I've used as a test case. Capturing -evv out of the -qa|less seems tricky though. Suggestions for that? Reopening this. I'm more than willing to help in debugging, 'cause this is driving me up a wall.
Err, oops, I guess I can't reopen it. Silly me. :)
Created attachment 82424 [details] rpm -evv that succeeds
Created attachment 82425 [details] rpm -evv that stalls
Thought I'd chime in again. This issue remains open for me because it's still happening. I've tested on three separate test systems all with the same rpm hang. Becuase a fix doesn't appear to be forthcoming yet, I'm abandoning my plan to migrate to 8.0. I'll keep monitoring for additional information and reports.
Just as an aside, I've also seen this with RedHat 8. On my home machine, my work machine, and one other machine at work (which is a perfect 3 for 3, as those are the only machines that I have seen with RedHat 8).
I cannot tell anything meaningful from "me too" reports. All I can get from the above is that there may be a different problem with -e than with -U (which I will try to reproduce). So I'm gonna close this bug. Fell free to reopen Yet More Reports, but *please* try to a) report the exact version of rpm you are using b) try to supply a reproducible test case
Huh? My case was both more than "me too" *and* was reproducible, at least on my end, *every* *single* *time*. The rpm version I'm using (and was using, for those rpm -evv reports that you asked for and I provided) is rpm-4.1-1.06, the stock version that ships with RedHat 8.0. What other information do you need?
I agree. Closing this bug is utterly ridiculous (except possibly for closing at as a dupe of the other RPM hang bugs). The exact steps to reproduce on EVERY system are not yet clear, but it is clear that this is affecting a LOT of people in a LOT of different environments, and that it has NOT been fixed. I think the bug should stay open until someone can show that it does NOT exist.