Red Hat Bugzilla – Bug 206275
rpmq running as root gets stuck
Last modified: 2007-11-30 17:11:43 EST
Description of problem:
Ran "rpm -q redhat-artwork" after updating today (for BZ), and it didn't come
back. Placing it into the background rpmq was running. From another
gnome-terminal as normal user it returned immediately. Killing off the rpmq
process (had to -KILL it, it would't respond otherwise) it now hangs:
[root@laptop13 ~]# ps -l -p 3712
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 3712 3094 0 75 0 - 3532 futex pts/0 00:00:00 rpmq
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Look for stale locks by running (as root)
Otherwise just do
rm -f /var/lib/rpm/__db*
Not sure if this is related, but a FC5 box I just noticed is running very slow
has the following line in top:
9639 root 25 0 11516 1128 1000 R 41.3 0.4 12667:18 rpmq
Ouch... must have been running for weeks. kill SIGINT won't kill it. -9 did.
I'm not sure if this has messed up the rpm db or not -- I will cross that bridge
when I come to it.
Segafualts and loss of data are likely due to removing an rpmdb environment
without correcting other problems in the rpmdb.
FYI: Most rpmdb "hangs" are now definitely fixed by purging stale read locks when opening
a database environment in rpm-4.4.8-0.4. There's more todo, but I'm quite sure that a
large class of problems with symptoms of "hang" are now corrected.
Detecting damaged by verifying when needed is well automated in rpm-4.4.8-0.4. Automatically
correcting all possible damage is going to take more work, but a large class of problems is likely
already fixed in rpm-4.4.8-0.8 as well.
rpmq from rpm-4.4.2-38.fc7 got stuck (shown as running, but no CPU usage IIRC)
when trying to run makewhatis (man-1.6e-1.fc7)recently (apropos(1) didn't know a
thing, so this might have happened a few times before), after rebooting and
successfully updating openmotif-->lestiff makewhatis went through.
I frequently see fc5 and fc6 machines with rpmq hung as described above, which
also locks any other rpm operations from happening (yum, rpm, etc) and I hear
from colleagues that it's common.
Is it possible to backport the stale read lock purge bugfix from 4.4.8 to 4.4.2?
The other day I had a yum update hang on a box that had 192MB of RAM and for
some reason had the swap space disabled (fstab labelling issue). I'm sure the
above problems are something else, as I've had rpm/yum hangs on boxes with 2GB
of RAM and 5GB swap. But if you have a crappy box, check if your swap is
enabled! And maybe the tools should nicely die with "out of mem" errors rather
The above info about stale locks was helpful. Running 'rpmdb_stat -CA' shows,
during the period when nothing rpm-related works:
Locks grouped by object:
Locker Mode Count Status ----------------- Object ---------------
36 READ 2 HELD 0x353b8 len: 20 data:
35 READ 1 HELD (64c11 304 bef22e2b 10f1 0) handle 0
Then when I kill the stuck processes and run db_recover in /var/lib/rpm,
'rpmdb_stat -CA' reports:
db_stat: DB_ENV->open: No such file or directory
and then rpm transactions succeed as one would expect.
Technically, if there was a running process holding a lock, then the lock was not stale.
Stale locks is the term for locks that are not held by current processes.
Jeff - you point out an error in my previous comment. Thanks.
The true order of operations above was: find the stuck process, kill (-9) it,
view the locks with rpmdb_stat (still there), run db_recover (no longer there),
run the next process.
Please excuse the brainfart, comment #7 as written was inaccurate and non-useful.
*** Bug 213892 has been marked as a duplicate of this bug. ***
Considering the timing of these hangs and crashes, most likely yet another
manifestation of the kernel mmap() bug - see
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=213963#c65 for details.
Feel free to reopen if this still happens with current kernels.