Bug 133464

Summary: rpm hangs on DB modifying ops if RPC errors are present (not using NFS for RPMs)
Product: [Fedora] Fedora Reporter: Lamont Peterson <peregrine>
Component: rpmAssignee: Jeff Johnson <jbj>
Status: CLOSED NOTABUG QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: nobody+pnasrat
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-24 12:44:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lamont Peterson 2004-09-24 07:57:03 UTC
Description of problem:
Today, I experience what appeared to be RPM DB corruption (like we saw
a lot of during RHL9).  After deleting locks files, rebuilding the DB
and rebooting, things were fine.

The second time it happened, I was performing some file operations via
NFS (unrelated to the rpm command I ran) in the background.  Again, it
looked like EPM DB corruption.  I noticed that my "background" NFS
operations had ceased and that I could not get them back, nor umount
the NFS mount.  I also could not "cardctl eject" "ifdown eth1" or
"iwconfig eth1", each hung indefinately and required root to "kill"
them to exit.

At this point, I thought I was having a problem with my wireless NIC
on my notebook.  Three times, I had to reboot to "fix" the issue.

Now, here at home on my Dual-Opteron, non-wireless workstation, I am
seeing the exact same thing.

None of the systems involved in the first round of incidents
throughout the dat and now this one here at home are the same (I do
not even have my notebook out or powered up, here).  The ONLY things
that are common are:

1.  The problem only manifests itself if I try to run an RPM command.
 This happened at first at home this evening via up2date.  Once, it
gave me the message: "warning: waiting for transaction lock"
immediately after the "Testing package set / solving RPM
inter-dependencies..." message, then hung there.  I did not see any
cpu activity associated with this.
2.  Both networks have RHEL3 ES servers NFS servers.
3.  Not once, in all of these incidents, was I trying to access RPM
files via NFS.  Some times I was doing some NFS at the time I ran the
RPM command.  Other times, I was not.

Version-Release number of selected component (if applicable):
Clients:
FC2 on x86 (notebook) up2date up through 2004-09-22 @ ~5pm MDT
FC2 on AMD64 (workstation) up2date up through 2004-09-20 @ ~3am MDT

Servers:
RHEL3-ES at the office (for the notebook) auto up2date
RHEL3-ES at home (for the workstation) auto up2date

How reproducible:
Not exactly sure, but here is what I saw.

Steps to Reproduce:
1.  Have an NFS share mounted, in use or not (I was using autofs)
2.  Trying to perform an rpm command that changes the database hangs (
never see the errors until after I try to run an "rpm" command)
3.  Looking in /var/log/messages, I find this message (several times):
kernel: RPC: error 5 connecting to server xx.domain.dom
  
Actual results:
NFS is now broken, too.  It is hard to umount (even forcibly) an NFS
mount, files are completely inaccessible.

Other networking is completely unaffected, though I could not "cardctl
eject" or "iwconfig eth1" my wireless NIC.

RPM hangs on anything that tries to alter the DB.  -q, -V and -K work
fine.

Expected results:
None of that goo.

Additional info:
If you would like, I can give you the complete output of "rpm -qa" on
each of the four systems.  If there are some debugging options or
versions of things you want me to run and see if I can capture some
stuff for you, let me know and it will be done.

Comment 1 Jeff Johnson 2004-09-24 12:44:08 UTC
rpm uses statvfs(2) to identify free blocks/inodes on each
mount point.

statvfs stats each mount point, so "Stale BFS mounts" can/will
hang. This is no different than the behavior of /bin/df.

Add --ignoresize to avoid if necessary.

And this problem has nothing whatsoever to do with an rpmdb.