From Bugzilla Helper: User-Agent: Mozilla/5.0 Galeon/1.2.5 (X11; Linux i686; U;) Gecko/20020625 Description of problem: Using "rpm -e package" will randnomly hang. Using ctrl-c and ctrl-z don't have any effect. "ps ax" shows that the process is in the state of "S". kill "pid of rpm" doesn't work. kill -9 "pid of rpm" does work. Trying "rpm -e package" again after killing it hangs in the same way. Querying the rpm database via "rpm -qa" after the failed "rpm -e package" is killed hangs also. If the system is rebooted "rpm -e package" works. Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. rpm -e package Actual Results: Process hangs Expected Results: Process to finish and return to the command prompt Additional info: I have seen this at least 3-4 times over the last few days.
You can ctrl-c out of rpm commands executed after the first hung rpm process is killed.
I can reproduce this bug reliably. rpm starts spinning in an infinite select() loop with an empty file descriptor set.
Created attachment 63924 [details] rpm hanging in a select() loop.
Version of rpm, please: rpm -q rpm Try doing rm -rf /var/lib/rpm/__db*
rpm-4.1-0.34 Removing the __db files allows 'rpm -e' to finish.
That's the workaround until there's a useful pthread_mutexattr_setpshared() (ideal) or I get a chance to factor db open permissions onto a setgid helper.
FWIW, I'm seeing what appears to be the same bug sometimes with rpm -Uvh. There, it also loops over a null select, and killing it and removing the __db* files and trying again has so far worked around it. This is also with rpm-4.1-0.34
I think there's another bug somewhere that leaves __db* files behind, which trigger this bug on a subsequent rpm transaction, but I haven't been able to coax rpm into leaving the garbage files behind.
Hmm, if it's pre-existing __db* files causing the hangs, then they are being created during install; I've been seeing rpm -Uvh hangs on systems where the rpm -Uvh command is the first transaction I've run against the database after a clean install
This is a known problem. The current implementation is adequate, but not perfect. The __db files are used to share locks. A ^C will leave the file around, but the next execution as root, or next reboot, removes the __db files. Depending on the exact moment when ^C is hit, there may or may not be a lock held that another process may stumble upon. The problem that cannot be solved without a setgid helper is, if root does ^C, then non-root cannot remove the file, and can hang on dead locks. The setgid helper will be added, but not to rpm-4.1.
Ok, so when will rpm-4.2 be out?
Below is a script to put in /usr/local/bin after you put /usr/local/bin at the beginning of your path to workaround this bug. #!/bin/bash rm -f /var/lib/rpm/__db* >/dev/null 2>&1 rpm $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15 $16 $17 $18 $19 $20 $21 $22 $23 $24 $25
Hmmm, signals are now trapped when an rpmdb is opened, so the stale lock avoidance hack is not only not necessary anymore, but also can be harmful, as /var/lib/rpm/__db* can/will open lock races on the /var/lib/rpm/__db* files.
Fixed since rpm-4.1-0.59.
Still see this problem with rpm-4.1-0.66
Have had the problem with rpm-4.1-0.69
I need a reproducible case, not repeated confirmations, if you want a fix.
I saw this bug today with 4.1-0.81 while trying running 'rpm -e kernel-2.4.18-10.98'. Again, Ctrl-C didn't do anything, and attaching strace to the pid shows that it is stuck in a select loop. I don't think that there were existing /var/lib/rpm/__db.0* files, but I can't be sure. Leaving as NEEDINFO as I don't think this is enough info to find the bug.
I am going to close this since I can't seem to reproduce it on demand, I haven't seen it in a week, and I might have been confusing it with rpm taking excessively long because of high hard disk load after it was supposed to have been fixed. If I do see it again I will reopen this bug.
Still seeing this bug in latest rawhide (rpm-4.1-0.84). strace shows infinate loop on: select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) kill <pid> doesn't work, but kill -9 <pid> does. I was running rpm -Fvh *.rpm for about 50 rpms, and it hung after the first one (popt). After killing the process and removing /var/lib/rpm/__db.0*, running rpm -Fvh again works. I don't think there were any __db.0* files there prior to the first run, but I can't be sure.
You will *always* need to do rm -f /var/lib/rpm/__db* after "kill -9". I still don't see any reproducible problem here ...
OK, I found another piece of data: redhat-config-pacakges was running on the machine in question _while_ I was trying to do the upgrade. So, hazarding a guess, rpm 4.1 doesn't recognize librpm404's lock on the database. Once I cleared the __db.0* files then it worked because 4.0.4's locking is partially broken, as mentioned on rpm-list.
reopening due to midair collision
No, rpm-4.1 sets the lock to keep rpm-4.0.4 based apps happy. This is getting purty far off topic, so I'm gonna (again) close this bug. Feel free to reopen Yet Another Bug.
I am using rpm 4.1 from 8.0 and I experience the same bug. If I would never have to abort rpm when it's running, then I guess the response to this bug report would be acceptable. However, since there's a bug (that might be similar) in rpm that causes it to lock up in the first place, I find it strange that this bug is so easily glossed over, especially now with the new release of 8.0. This bug will likely affect a lot of users. I just made a fresh chroot environment, installed lots of base packages in, then installed a second set of base packages, and it locked up after completing e2fsprogs (there is up to now no apparent reason for what package triggers the problem, I've had it on kernel-source and glibc-common as well). It is currently locked up. Attaching to it with strace shows it's in pause, and a backtrace with gdb shows this: Attaching to program: /mnt/music/B/bitches/root/bin/rpm, process 730 0x08184847 in __libc_pause () (gdb) bt #0 0x08184847 in __libc_pause () #1 0x0814267f in pause () #2 0x0805fccc in psmWait () #3 0x08060236 in runScript () #4 0x08060838 in runInstScript () #5 0x08062a83 in rpmpsmStage () #6 0x0806240b in rpmpsmStage () #7 0x080628d5 in rpmpsmStage () #8 0x0807d085 in rpmtsRun () #9 0x0806dbf1 in rpmInstall () #10 0x08048e4d in main () #11 0x0815ad62 in __libc_start_main () It's been like that for five minutes, and I'm now going to go to the store to give it at least 15 more minutes, just to prove that it is really spinning idly in some sort of race condition, and it's not me being trigger-happy with the kill command. This bug has been encountered by a few users according to this bug report, and I fourth this. Just because a bug doesn't happen for you doesn't mean it's not valid, and I find it hard to believe that you're unable to experience it. I would suggest trying a few fresh installs in a chroot manually to see it for yourself. I can't imagine this not being a very critical bug for Red Hat.
I have the exact same strace & gdb results as thomas.ac.be. In my working with Red Hat 8.0 + RPM 4.1 today for the first time, approximately half of all 'rpm -e' and 'rpm -U' commands I've executed have hung (in all I've seen 30+ hangs), requiring a 'kill -9' of rpm each time, followed by the obligatory 'rm -f /var/lib/rpm/__db*; rpm --rebuilddb'. The hangs appear to always occur in between packages; I haven't seen any hangs occur in the middle of erasing/upgrading a package. Since I'm the second one to experience the 'rpm -e' hangs with RH 8.0, shouldn't this report be reopened?
I have expereinced a lockup with rpm -e with RedHat 8.0. I maybe completely off and misinterpeting what I saw, but it appeared that it might be related to having the application running while trying to remove the package. If I remember right I had some application running, say Mozilla, I tried rpm -e and it was in S state. I used kill -9 and then closed the application. Then I rpm -e again and it worked.
i am having teh same problem like other guys here in redhat 8.0, it just hangs and i have to delte those __* files in order to get rpm work without having to restart.
My first rpm command was rpm -e httpd and it didn't hang, but did tell me about dependencies. So I was going to --force a re-install of the rpm and it hung. I kill -9 this two more times before comming here and rm'd three __db.00* files. My rpm -ivh --force worked fine after this. So even an attempt with -e will leave the __db!
Red Hat 8.0, my RPM hung right after it installed a single RPM package with -ivh. I attached strace to the pid and it repeats the following message forever: select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
I've experienced the same or a similar problem several times after I've just upgraded from RH73 to RH80. [root@zeus /]# rpm -q rpm rpm-4.1-1.06 [root@zeus /]# ls /var/lib/rpm/ Basenames __db.003 Installtid Provideversion Sha1header Conflictname Dirnames Name Pubkeys Sigmd5 __db.001 Filemd5s Packages Requirename Triggername __db.002 Group Providename Requireversion [root@zeus /]# rm -f /var/lib/rpm/__db.00* [root@zeus /]# service lpd status lpd is stopped [root@zeus /]# rpm -e LPRng error: Failed dependencies: LPRng >= 3.7.4-9 is needed by (installed) redhat-config-printer-0.4.24-1 [root@zeus /]# rpm -e LPRng redhat-config-printer error: Failed dependencies: redhat-config-printer = 0.4.24-1 is needed by (installed) redhat-config-printer-gui-0.4.24-1 [root@zeus /]# rpm -e LPRng redhat-config-printer redhat-config-printer-gui warning: /etc/alchemist/namespace/printconf/local.adl saved as /etc/alchemist/namespace/printconf/local.adl.rpmsave [root@zeus /]# ls /var/lib/rpm/ Basenames __db.003 Installtid Provideversion Sha1header Conflictname Dirnames Name Pubkeys Sigmd5 __db.001 Filemd5s Packages Requirename Triggername __db.002 Group Providename Requireversion
Unfortunately, rpm can get its database so inconsistent during the hang, that "rm /var/lib/rpm/__db*" does not help. For example, I did "rpm -e at", it hang, with strace -p showing select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) Now, running "rpm -e at" produces: error reading information on service atd: No such file or directory error: %preun(at-3.1.8-31) scriptlet failed, exit status 1 If it really fails just because it races with a script subprocess, one is left to wonder why it was too hard to use fork, exec, and wait4. psmWait, indeed.
I'd like to suggest that everyone experiencing this bug keep their RPM database backed up. One of my "killall -9 rpm; rm -f /var/lib/rpm/__db*; rpm --rebuilddb" cycles left me with a database that has forgotten about 3/4 of the packages on my system.
Hi Jeff! If you use apt (http://psyche.freshrpms.net/rpm.html?id=243) to install an RPM the __db.00? files are left behind on every occasion. Perhaps looking at this can give you a clue.
I ran into a hang on a fresh system today running rpm-4.1-9 while removing the kernel-2.4.18-14 package. strace showed those timeouts forever. This system had run apt-get a few times. Are you saying that apt-get may be triggering this rpm bug? Maybe this is why Red Hat was unable to reproduce this after 4.1-9 was released.
I've seen this several times too, with -U, -F and -e, and I *think* it has only happened with packages that have some %pre or %post (or the corresponding -un) scripts, possibly if'ing on the $1 argument to determine whether an upgrade or erase is going on. $ rpm -q rpm rpm-4.1-1.06
ville.skytta: You need/want the missed SIGCHLD fix in rpm-4.1-9 at ftp:://people.redhat.com/jbj/test-4.1. THere are far too many different problem here for me to solve any problem efficiently, so I'm gonna close this bug. Feel free to open individual bugs and I'll try to get you sorted out.
could bug 77857 be related? The two systems i saw this bug on were both systems i had upgraded from 7.2/7.3. I wasn't aware i had to manually delete symlinks and reinstall rpm if i upgrade to RH8.0. It isn't mentioned in the release-notes either.
Symlinks are unlikely to be the problem, other than that rpm may not upgrade at all unless symlinks are removed.
If you are very rarely seeing lockup problems with rpm-4.1-9 and you use apt-get, please read Bug 77988 and help us figure out this problem.
WOW! This is just closed as WORKSFORME? Well, FWIW, this DOESN'T work for me. It seems that whenever (or nearly) I use the Package Manager, or up2date when su'd to root, rpm locks up on me. Removng the __db.? files from /var/lib/rpm clears it up. I've since only run the Package Manager and Up2Date while logged in as root and have managed to avoid the problem. Last time it happened, I recall a python process was still hanging around that *may* have actually been the problem. Since I need to run up2date soon again, I'll do it su'd to root and will hopefully be able to reproduce the problem. Is there another bug somewhere that I should be looking at (a bugzilla search didn't find anything, but I may not have provided the right search terms...)
I am seeing this issue as well, I do not use apt-get. I had this issue just today after removing old kernel packages one after another when I updated to 2.4.18-24. Usually only rebooting the system (for some reason) or sometimes even then I have to do a rpm --rebuilddb and then installs and removals will work again This is very frustrating.
I have been experiencing this issue on RedHat Linux 8.0 systems with rpm-4.1- 1.06. This occurs a significant amount of the time - like 30-50%! rpm sits in pause(2) waiting for signals, but never receives any, as reported above. Please advise if you'd like a fresh bug report opening. I think it will add to the confusion though, and could be considered a duplicate!
One of my RH9 servers seems to have gotten hosed, I don't know how to get it on its feet again. The symptom seem similar to this bug, but perhaps worse (I also can't run up2date). Redhat 9 % rpm -e at-3.1.8-33 error reading information on service atd: No such file or directory error: %preun(at-3.1.8-33) scriptlet failed, exit status 1 # rpm -q rpm rpm-4.2-0.69 # rm -rf /var/lib/rpm/__db* zsh: no matches found: /var/lib/rpm/__db* # up2date -l zsh: 20499 segmentation fault up2date -l
Ignore that last bit about update not working for me... I reinstalled Python and it's better now. But, I'm still getting that error from rpm -e .
Is there a new bug report for this? It still isn't fixed and I'd say enough people have commented to say its reproducible. I just had this problem again on Redhat 9 trying to remove (rpm -e) proftpd. I can't remove proftpd, and rpm always locks up. I tried the db thing, and after a few tries, it worked. This is a persistant problem that has existed in all distributions. I have apt-get along with rpm version 4.2