Description of problem: Today, I experience what appeared to be RPM DB corruption (like we saw a lot of during RHL9). After deleting locks files, rebuilding the DB and rebooting, things were fine. The second time it happened, I was performing some file operations via NFS (unrelated to the rpm command I ran) in the background. Again, it looked like EPM DB corruption. I noticed that my "background" NFS operations had ceased and that I could not get them back, nor umount the NFS mount. I also could not "cardctl eject" "ifdown eth1" or "iwconfig eth1", each hung indefinately and required root to "kill" them to exit. At this point, I thought I was having a problem with my wireless NIC on my notebook. Three times, I had to reboot to "fix" the issue. Now, here at home on my Dual-Opteron, non-wireless workstation, I am seeing the exact same thing. None of the systems involved in the first round of incidents throughout the dat and now this one here at home are the same (I do not even have my notebook out or powered up, here). The ONLY things that are common are: 1. The problem only manifests itself if I try to run an RPM command. This happened at first at home this evening via up2date. Once, it gave me the message: "warning: waiting for transaction lock" immediately after the "Testing package set / solving RPM inter-dependencies..." message, then hung there. I did not see any cpu activity associated with this. 2. Both networks have RHEL3 ES servers NFS servers. 3. Not once, in all of these incidents, was I trying to access RPM files via NFS. Some times I was doing some NFS at the time I ran the RPM command. Other times, I was not. Version-Release number of selected component (if applicable): Clients: FC2 on x86 (notebook) up2date up through 2004-09-22 @ ~5pm MDT FC2 on AMD64 (workstation) up2date up through 2004-09-20 @ ~3am MDT Servers: RHEL3-ES at the office (for the notebook) auto up2date RHEL3-ES at home (for the workstation) auto up2date How reproducible: Not exactly sure, but here is what I saw. Steps to Reproduce: 1. Have an NFS share mounted, in use or not (I was using autofs) 2. Trying to perform an rpm command that changes the database hangs ( never see the errors until after I try to run an "rpm" command) 3. Looking in /var/log/messages, I find this message (several times): kernel: RPC: error 5 connecting to server xx.domain.dom Actual results: NFS is now broken, too. It is hard to umount (even forcibly) an NFS mount, files are completely inaccessible. Other networking is completely unaffected, though I could not "cardctl eject" or "iwconfig eth1" my wireless NIC. RPM hangs on anything that tries to alter the DB. -q, -V and -K work fine. Expected results: None of that goo. Additional info: If you would like, I can give you the complete output of "rpm -qa" on each of the four systems. If there are some debugging options or versions of things you want me to run and see if I can capture some stuff for you, let me know and it will be done.
rpm uses statvfs(2) to identify free blocks/inodes on each mount point. statvfs stats each mount point, so "Stale BFS mounts" can/will hang. This is no different than the behavior of /bin/df. Add --ignoresize to avoid if necessary. And this problem has nothing whatsoever to do with an rpmdb.