Bug 206275 - rpmq running as root gets stuck
rpmq running as root gets stuck
Status: CLOSED WORKSFORME
Product: Fedora
Classification: Fedora
Component: rpm (Show other bugs)
rawhide
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Panu Matilainen
:
: 213892 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-13 09:22 EDT by Horst H. von Brand
Modified: 2007-11-30 17:11 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-08-10 07:00:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Horst H. von Brand 2006-09-13 09:22:06 EDT
Description of problem:
Ran "rpm -q redhat-artwork" after updating today (for BZ), and it didn't come
back. Placing it into the background rpmq was running. From another
gnome-terminal as normal user it returned immediately. Killing off the rpmq
process (had to -KILL it, it would't respond otherwise) it now hangs:

[root@laptop13 ~]# ps -l -p 3712
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
4 S     0  3712  3094  0  75   0 -  3532 futex  pts/0    00:00:00 rpmq

Version-Release number of selected component (if applicable):


How reproducible:
rpm-4.4.2-32

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Jeff Johnson 2006-09-13 14:05:03 EDT
Look for stale locks by running (as root)
    cd /var/lib/rpm
    /usr/lib/rpm/rpmdb_stat -CA

Otherwise just do
    rm -f /var/lib/rpm/__db*
Comment 2 Trevor Cordes 2006-11-21 11:26:18 EST
Not sure if this is related, but a FC5 box I just noticed is running very slow
has the following line in top:

 9639 root      25   0 11516 1128 1000 R 41.3  0.4  12667:18 rpmq

Ouch... must have been running for weeks.  kill SIGINT won't kill it.  -9 did.

I'm not sure if this has messed up the rpm db or not -- I will cross that bridge
when I come to it.
Comment 3 Jeff Johnson 2006-12-03 13:37:33 EST
Segafualts and loss of data are likely due to removing an rpmdb environment
without correcting other problems in the rpmdb.

FYI: Most rpmdb "hangs" are now definitely fixed by purging stale read locks when opening
a database environment in rpm-4.4.8-0.4. There's more todo, but I'm quite sure that a
large class of problems with symptoms of "hang" are now corrected.

Detecting damaged by verifying when needed is well automated in rpm-4.4.8-0.4. Automatically 
correcting all possible damage is going to take more work, but a large class of problems is likely
already fixed in rpm-4.4.8-0.8 as well.

UPSTREAM
Comment 4 Horst H. von Brand 2007-01-02 12:47:48 EST
rpmq from rpm-4.4.2-38.fc7 got stuck (shown as running, but no CPU usage IIRC)
when trying to run makewhatis (man-1.6e-1.fc7)recently (apropos(1) didn't know a
thing, so this might have happened a few times before), after rebooting and
successfully updating openmotif-->lestiff makewhatis went through.
Comment 5 Bill McGonigle 2007-02-16 14:50:50 EST
I frequently see fc5 and fc6 machines with rpmq hung as described above, which
also locks any other rpm operations from happening (yum, rpm, etc) and I hear
from colleagues that it's common. 

Is it possible to backport the stale read lock purge bugfix from 4.4.8 to 4.4.2?
Comment 6 Trevor Cordes 2007-02-16 14:58:26 EST
The other day I had a yum update hang on a box that had 192MB of RAM and for
some reason had the swap space disabled (fstab labelling issue).  I'm sure the
above problems are something else, as I've had rpm/yum hangs on boxes with 2GB
of RAM and 5GB swap.  But if you have a crappy box, check if your swap is
enabled!  And maybe the tools should nicely die with "out of mem" errors rather
than hanging?
Comment 7 Bill McGonigle 2007-02-16 15:15:15 EST
The above info about stale locks was helpful.  Running 'rpmdb_stat -CA' shows,
during the period when nothing rpm-related works:

Locks grouped by object:
Locker   Mode      Count Status  ----------------- Object ---------------
      36 READ          2 HELD    0x353b8 len:  20 data:
0x11L0x06000x040x030000+.0xf20xbe0xf10x10000000000000

      35 READ          1 HELD    (64c11 304 bef22e2b 10f1 0) handle        0

Then when I kill the stuck processes and run db_recover in /var/lib/rpm,
'rpmdb_stat -CA' reports:

  db_stat: DB_ENV->open: No such file or directory

and then rpm transactions succeed as one would expect.
Comment 8 Jeff Johnson 2007-02-16 15:57:07 EST
Technically, if there was a running process holding a lock, then the lock was not stale.

Stale locks is the term for locks that are not held by current processes.

Comment 9 Bill McGonigle 2007-02-16 18:01:25 EST
Jeff - you point out an error in my previous comment.  Thanks.

The true order of operations above was: find the stuck process, kill (-9) it,
view the locks with rpmdb_stat (still there), run db_recover (no longer there),
run the next process.  

Please excuse the brainfart, comment #7 as written was inaccurate and non-useful. 
Comment 10 Panu Matilainen 2007-07-17 15:57:39 EDT
*** Bug 213892 has been marked as a duplicate of this bug. ***
Comment 11 Panu Matilainen 2007-08-10 07:00:10 EDT
Considering the timing of these hangs and crashes, most likely yet another
manifestation of the kernel mmap() bug - see
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=213963#c65 for details.
Feel free to reopen if this still happens with current kernels.

Note You need to log in before you can comment on or make changes to this bug.