Bug 206275
Summary: | rpmq running as root gets stuck | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Horst H. von Brand <vonbrand> |
Component: | rpm | Assignee: | Panu Matilainen <pmatilai> |
Status: | CLOSED WORKSFORME | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | amk, bill-bugzilla.redhat.com, panagopoulosalexandrou, trevor |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-08-10 11:00:10 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Horst H. von Brand
2006-09-13 13:22:06 UTC
Look for stale locks by running (as root) cd /var/lib/rpm /usr/lib/rpm/rpmdb_stat -CA Otherwise just do rm -f /var/lib/rpm/__db* Not sure if this is related, but a FC5 box I just noticed is running very slow has the following line in top: 9639 root 25 0 11516 1128 1000 R 41.3 0.4 12667:18 rpmq Ouch... must have been running for weeks. kill SIGINT won't kill it. -9 did. I'm not sure if this has messed up the rpm db or not -- I will cross that bridge when I come to it. Segafualts and loss of data are likely due to removing an rpmdb environment without correcting other problems in the rpmdb. FYI: Most rpmdb "hangs" are now definitely fixed by purging stale read locks when opening a database environment in rpm-4.4.8-0.4. There's more todo, but I'm quite sure that a large class of problems with symptoms of "hang" are now corrected. Detecting damaged by verifying when needed is well automated in rpm-4.4.8-0.4. Automatically correcting all possible damage is going to take more work, but a large class of problems is likely already fixed in rpm-4.4.8-0.8 as well. UPSTREAM rpmq from rpm-4.4.2-38.fc7 got stuck (shown as running, but no CPU usage IIRC) when trying to run makewhatis (man-1.6e-1.fc7)recently (apropos(1) didn't know a thing, so this might have happened a few times before), after rebooting and successfully updating openmotif-->lestiff makewhatis went through. I frequently see fc5 and fc6 machines with rpmq hung as described above, which also locks any other rpm operations from happening (yum, rpm, etc) and I hear from colleagues that it's common. Is it possible to backport the stale read lock purge bugfix from 4.4.8 to 4.4.2? The other day I had a yum update hang on a box that had 192MB of RAM and for some reason had the swap space disabled (fstab labelling issue). I'm sure the above problems are something else, as I've had rpm/yum hangs on boxes with 2GB of RAM and 5GB swap. But if you have a crappy box, check if your swap is enabled! And maybe the tools should nicely die with "out of mem" errors rather than hanging? The above info about stale locks was helpful. Running 'rpmdb_stat -CA' shows, during the period when nothing rpm-related works: Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- 36 READ 2 HELD 0x353b8 len: 20 data: 0x11L0x06000x040x030000+.0xf20xbe0xf10x10000000000000 35 READ 1 HELD (64c11 304 bef22e2b 10f1 0) handle 0 Then when I kill the stuck processes and run db_recover in /var/lib/rpm, 'rpmdb_stat -CA' reports: db_stat: DB_ENV->open: No such file or directory and then rpm transactions succeed as one would expect. Technically, if there was a running process holding a lock, then the lock was not stale. Stale locks is the term for locks that are not held by current processes. Jeff - you point out an error in my previous comment. Thanks. The true order of operations above was: find the stuck process, kill (-9) it, view the locks with rpmdb_stat (still there), run db_recover (no longer there), run the next process. Please excuse the brainfart, comment #7 as written was inaccurate and non-useful. *** Bug 213892 has been marked as a duplicate of this bug. *** Considering the timing of these hangs and crashes, most likely yet another manifestation of the kernel mmap() bug - see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=213963#c65 for details. Feel free to reopen if this still happens with current kernels. |