After a fresh install of Psyche, rpm hung during the installation of an RPM. # rpm -Uvh ~dave/rpmbuild/RPMS/qstat-2.5b-1.i386.rpm Preparing... ########################################### [100%] 1:qstat ########################################### [100%] <hung for hours> This is all I got from stracing it.. select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) ... Here's the backtrace from the hung rpm process... (gdb) bt #0 0x08196bbe in select () #1 0x082105f4 in _GLOBAL_OFFSET_TABLE_ () #2 0x080fc2a1 in __os_yield_rpmdb () #3 0x080c2f6f in __db_tas_mutex_lock_rpmdb () #4 0x080f3a87 in __lock_get_internal () #5 0x080f32bf in __lock_get_rpmdb () #6 0x080dd2a2 in __db_c_put_rpmdb () #7 0x0809285f in db3cput () #8 0x0808fadc in rpmdbAdd () #9 0x08062cba in rpmpsmStage () #10 0x080623c0 in rpmpsmStage () #11 0x080628d5 in rpmpsmStage () #12 0x0807d085 in rpmtsRun () #13 0x0806dbf1 in rpmInstall () #14 0x08048e4d in main () #15 0x0815ad62 in __libc_start_main () Had to SIGKILL and rm /var/lib/rpm/__* to use rpm again. How Reproducible: Don't know. This bug was quite common in null, but I don't have a reliable method of reproducing it. Additional info: rpm-4.1-1.06
Somebody pointed out bug: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=68056 As a reference but I don't see it added here. So I'll add it. I to had the problem but it's not readily repeatable. Which sucks for the developer/QA folk at RH to try to track-down. Cheers, -Ali
I've had this problem too, after a clean install of 8.0. I cannot install abiword's RPM, for instance. % rpm -Uvh foo.rpm Preparing... ########################################### [100%] just hangs -benjamin
I have the exact same problem on a vanilla install of Psyche. Now that everything is installed, I cannot upgrade or install new packages, as RPM hangs on a select() call that continuously times out ... After killing rpm (with -9), I have to remove /var/lib/rpm/__db* manually so that I can successfully query the database again. Rebuilding the database does not seem to solve the problem, either.
Me too! One note: while rpm hung as root, I could still "rpm -qa" as me.
same problem here. i have to delete the __* files to get it work.redhat 8.0
I think I've found a temporary workaround. I'm not sure how reliable it is, but I've been able to install five or six packages now without any problems. Instead of doing a "rpm -Uvh blah", I used a more verbose output "rpm -Uvvvh blah", and haven't had any problems. I'm guessing that it's a race condition, and by having rpm display a longer debugging trace, the race doesn't manifest itself. That said, I tried upgrading a package with just "rpm -U blah", and the crash occurred. After killing rpm and deleting the locks, I issued a "rpm -Uvvvh blah" and it installed without any troubles. HTH, -Kris
I've also had this bug. First I thought that it was an issue with my $HOME being on a NFS mounted filesystem (I typicly download the rpm's as me into my $HOME, su - and install) but I've verifyed that this has nothing to do with it. Also I noticed that rpm began working again after a reboot (I had to reboot for other reasons ;) reboot was not to try and fix rpm) Its also possible that this is not a rpm problem. I've seen evolution and WineX hang in the exact same way (didnt do any tracing at the time tho). WineX was so severly hung that not even kill -9 managed to kill it. It just sat there eating up my CPU.
Red Hat 8.0, my RPM hung right after it installed a single RPM package with -ivh. I attached strace to the pid and it repeats the following message forever: select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
Created attachment 78965 [details] strace of rpm -qa
I'm reporting the same issue with RH 8.0 and the default version of rpm that ships with it, version 4.1-1.06. I run rpm -e to remove all the unnecessary software on systems that will act as a server. After removal of a package or [2|3|4|...|n], rpm hangs. The strace eventually shows <snipped> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) looping over and over (rinse, lather repeat) As the others are reporting, if I remove the __db* files after killing the process, I can use rpm again. If I do not remove them, I have to restart the system. Taking it to single user and back again does not fix it. -Pat
*** Bug 75393 has been marked as a duplicate of this bug. ***
I've been getting the exact same strace output and the same problems. The hangs are quite, quite frequent for me, however, more or less hangs every second time. Any news on when/if this particular issue might be resolved, or if it's indeed being worked on? I do realize that it might not be as high priority since not all users are experiencing it, but it seems to be serious enough a problem worthy of more attention :)
Try rpm-4.1-9 packages from ftp://people.redhat.com/jbj/test-4.1 Please give me explicit WORKSFORME to expedite errata release.
rpm-4.1-9 WORKSFORME. I haven't encountered a single hang while upgrading hundreds of packages.
Updated to the rpm-4.1-9 test packages and I haven't had the problem since... although I've only updated a few packages since and I wouldn't call my experience a full test... but so far so good.
I also updated to version 4.1-9 test packages but I'm still seeing the same problem (see previous post above, RH 8.0). I managed to successfully remove six packages with rpm -e but then immediately tried to remove two more and it hung again If I just kill the proc with kill -9, rpm will not function. Once I remove the __db* files, rpm will function again. (strace follows) ... open("/var/lib/rpm/Packages", O_RDONLY|O_LARGEFILE) = 3 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 fstat64(3, {st_mode=S_IFREG|0644, st_size=10727424, ...}) = 0 brk(0x8260000) = 0x8260000 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 64000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 128000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 256000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 512000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [continues] I guess I'd have to say version 4.1-9 WORKSFORME_NOT_ -Pat
preich: you have a different problem, please open a different bug
I think I spoke too soon. :\ On a different system also running 4.1-9, this rpm command just hung: "rpm -ivh squid-2.4.STABLE7-4.i386.rpm". strace reports: select(0, NULL, NULL, NULL, {0, 20000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) ... I deleted the __* files before upgrading to 4.1-9, but not after. Maybe I should have?
Still not working for me, same problem as the last poster (strace and all). I did delete those particular files beforehand, however, since RPM hung while trying to install the new version of RPM, if that makes any sense ;>
Go read #73097, and figger out which type of hang you have. I'm trying to sort out the missed SIGCHLD at the moment, other "Me too" hang reports are not gonna help.
I have observed this lockup several times, on two separate Psyche installs (a desktop Athlon machine and a Dell laptop). I didn't know about deleting the __* files (how many users actually would?), and so have had to resort to restarting the machine each time. I have *not* killed RPM before this happened, which would have resulted in a stale lock. I don't think I have ever killed RPM before it has finished, for fear of something like this happening. Thus if there is a stale lock problem, RPM is leaving the stale locks itself. If stale locks are the problem, it would be nice if RPM at least (a) reported that it was waiting for locks to be freed, and/or (b) didn't sit in an un-killable state while waiting, and/or (c) checked to see if other instances of RPM were running, to see if locks were actually valid. (There are lots of potential problems with (c) though.) It may not be a problem with this at all; perhaps for example when RPM does an ldconfig, that is locking-up instead.
Kill -9 doesn't seem to want to do anything. cannot rebuild the database (hangs) deleting the _* files doesn't help ----------------- All of the above both before and after installing the above-mentioned RPM rpm-4.1-9 (which did NOT hang)
bkoz WORKSFORME thanks. After updating to all the new rpm's, I tried installing glibc-2.3.1-1 and everything worked. -benjamin
Just in case people think this problem is fixed, I ran into it several times on my laptop and a dual processor Pentium3. Fresh install + this test package. Does not work for me.
Bad news ,, after no problems for quite a while, I just got a hang with 4.1-9. I did a kill -9; rm /var/lib/rpm/__* and then reran the hung command which updates to small noarch rpm with no problems. Note that this was NOT on a slow processor -- dual 933MHz PIII with 1GB ram.
I have seen this behaviour before in other applications using BerkeleyDB. BerkeleyDB uses on disk memory regions for IPC, when lock state gets out of sync, and there is no _detectable_ deadlock, it is possible for new processes to deadlock on a single held lock of a program that is no longer running (i.e. crashed hard and did not clean up). It _should_ be safe to remove the __db.00? files as long as no other copies of RPM are running. This is my experience anyway.
I've just reproduced the bug with the test-4.1 RPMs. Same details as my initial report - stuck on select(), backtrace is the same.
I have seen this with the 4.1-9 test rpms and the newer rpm-4.2-0.5+glibc 2.3.1. The more I upgrade the rarer it seems to become. I currently have two boxes running RedHat 8.0 with rpm-4.2-0.5 and glibc 2.3.1. My personal box has had the combination for a week and hasn't seen a hang yet. The other, a server, I just installed the new packages yesterday and I have seen a hang today. Is there any hope in sight for this bug? I have started recommending people stay with RedHat 7.3 till this bug is fixed. I am also regretting upgrading myself, because of this bug and others problems I have had.
We are seeing exactly the same thing. We've not tried any of the work arounds, but will be. This bug is annoying enough to cause us to delay our 8.0 workstation rollout until it is resolved.
we are seeing exactly the same bug on nearly every 8.0 machine we have. our 8.0 deployment is halted until this bug gets resolved...
Again, lest it be lost in the noise: Try rpm-4.1-9 packages from ftp://people.redhat.com/jbj/test-4.1 Please give me explicit WORKSFORME to expedite errata release. There are far too many bugs (with different root causes) here to sort out. Feel free to reopen individual reports.
Er, there are several reports that the 4.1-9 packages do not, in fact, fix this bug.
Erm...where do you get the idea that we're all reporting different bugs here? I'm experiencing the problem *exactly* as djdave.au originally reported it (same strace, same backtrace). And it has already been confirmed by djdave.au and myself that rpm-4.1-9 does NOT fix it.
A few hours after I last commented the machine that hadn't experienced a hang in a week, did.
Again, There are far too many bugs (with different root causes) here to sort out. Feel free to reopen individual reports.
Again, what makes you think people are discussing different/multiple bugs here? I'm not, and scanning over the comments I don't see anyone else doing so either. (And even if someone did post a comment that wasn't related to the original problem, how can that possibly render the original report NOTABUG?) Also, what does "Feel free to reopen individual reports" mean? This (djdave.au's) report looks like an "individual report" to me, and it describes the problem I've been experiencing perfectly.
I think he means open another Bugzilla bug # specific to 4.1-9. I'll check if it has already been done, and if not I'm opening it myself. All of my systems are experiencing this same problem, although more rarely, with 4.1-9. This is the only thing preventing me from telling people "RH 8.0 is ready."
I've just opened a fresh bugreport for this - bug #77562.
Just an FYI, If anyone is using synaptic while trying to rpm via cli try stopping Synaptic/apt-get or red-carpet. I have tested this scenario and had several problems and in most cases te hang was due to multiple calls to the db lock files which as all of know does not like to play with other kids. Once I had stopped all apps trying to access the db removed all the _* files out of /var/lib/rpm, I was able to successfully remove/install whatever I wanted. Again this is one persons evaluation and may or may not shed any light on the issue at hand, but definately woth a try. Peace!
yes Rusty that is good advice. I turned off the red-carpet daemon, and the problem stopped. I think it is related to multiple clients querying the rpm database at the same time. thank you!
a reboot is a nice workaround, but if you remove the stale lock files /var/lib/rpm/__db* you will also get rpm to work again!