Red Hat Bugzilla – Bug 133464
rpm hangs on DB modifying ops if RPC errors are present (not using NFS for RPMs)
Last modified: 2007-11-30 17:10:49 EST
Description of problem:
Today, I experience what appeared to be RPM DB corruption (like we saw
a lot of during RHL9). After deleting locks files, rebuilding the DB
and rebooting, things were fine.
The second time it happened, I was performing some file operations via
NFS (unrelated to the rpm command I ran) in the background. Again, it
looked like EPM DB corruption. I noticed that my "background" NFS
operations had ceased and that I could not get them back, nor umount
the NFS mount. I also could not "cardctl eject" "ifdown eth1" or
"iwconfig eth1", each hung indefinately and required root to "kill"
them to exit.
At this point, I thought I was having a problem with my wireless NIC
on my notebook. Three times, I had to reboot to "fix" the issue.
Now, here at home on my Dual-Opteron, non-wireless workstation, I am
seeing the exact same thing.
None of the systems involved in the first round of incidents
throughout the dat and now this one here at home are the same (I do
not even have my notebook out or powered up, here). The ONLY things
that are common are:
1. The problem only manifests itself if I try to run an RPM command.
This happened at first at home this evening via up2date. Once, it
gave me the message: "warning: waiting for transaction lock"
immediately after the "Testing package set / solving RPM
inter-dependencies..." message, then hung there. I did not see any
cpu activity associated with this.
2. Both networks have RHEL3 ES servers NFS servers.
3. Not once, in all of these incidents, was I trying to access RPM
files via NFS. Some times I was doing some NFS at the time I ran the
RPM command. Other times, I was not.
Version-Release number of selected component (if applicable):
FC2 on x86 (notebook) up2date up through 2004-09-22 @ ~5pm MDT
FC2 on AMD64 (workstation) up2date up through 2004-09-20 @ ~3am MDT
RHEL3-ES at the office (for the notebook) auto up2date
RHEL3-ES at home (for the workstation) auto up2date
Not exactly sure, but here is what I saw.
Steps to Reproduce:
1. Have an NFS share mounted, in use or not (I was using autofs)
2. Trying to perform an rpm command that changes the database hangs (
never see the errors until after I try to run an "rpm" command)
3. Looking in /var/log/messages, I find this message (several times):
kernel: RPC: error 5 connecting to server xx.domain.dom
NFS is now broken, too. It is hard to umount (even forcibly) an NFS
mount, files are completely inaccessible.
Other networking is completely unaffected, though I could not "cardctl
eject" or "iwconfig eth1" my wireless NIC.
RPM hangs on anything that tries to alter the DB. -q, -V and -K work
None of that goo.
If you would like, I can give you the complete output of "rpm -qa" on
each of the four systems. If there are some debugging options or
versions of things you want me to run and see if I can capture some
stuff for you, let me know and it will be done.
rpm uses statvfs(2) to identify free blocks/inodes on each
statvfs stats each mount point, so "Stale BFS mounts" can/will
hang. This is no different than the behavior of /bin/df.
Add --ignoresize to avoid if necessary.
And this problem has nothing whatsoever to do with an rpmdb.