The kernel included in Wolverine will cause all nfs mounts to hang, in syslog is ... kernel: nfs: task xxxxx can't get a request slot. I'm using amd as well. The nfs servers are primarily solaris, being mounted as nfsv2/udp as specified in the automounter map.
Did you set up a firewall in the install?
This defect is considered MUST-FIX for Florence Release-Candidate #2
No firewall. It works fine for a while, then the kernel emits the error about no request slot, and it's all over. No NFS anymore, which generally makes the system unusable for me (automounted home directories, and X hangs on a blocking NFS operation at some point as well, so the console goes away.) I'm going to try to reproduce this on my home machine tonight -- much simpler setup, not using amd.
Another clue, possibly that this might be more related to amd... (background information about local env.) The automounter setup is a little bit funky, it's been around for many years here. Basically, all exported disks are automounted on /NFS like /NFS/zeus/d1 (host zeus, disk d1). Those filesystems are accessed as /nfs/zeus/d1, with a symbolic link from /nfs/zeus -> /NFS/zeus. So, my home directory might be /nfs/zeus/d1/home/buysse. There are also mappings for /home/username. In this case, that's mapped to the same amd mount from zeus, with a sublink of home/buysse. At this point, I can access my home directory as /nfs/zeus/d1/home/buysse, but not as /home/buysse. Attempting to access /home/buysse gives "bash: cd: /home/buysse: Input/Output error". The kernel is also generating an error: "nfs_stat_to_errno: bad nfs status return value: 116". Is this a amd error or kernel nfs code? I can't tell -- everything on this box is either local or automounted. I may have misfiled this -- should it be am-utils?
There may be more than one problem... Which ethernet card are you using?
Ethernet card is a eepro100, lspci output: 00:09.0 Class 0200: 8086:1229 (rev 08) 00:09.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corporation EtherExpress PRO/100+ Management Adapter It's really looking likely that am-utils is the culprit -- I can catch it "early" -- when only one or two filesystems are hung, kill amd, manually umount -f the automount points, and start amd with the -r flag (restart mounts), and things will work well again for a while. I've got a fat pipe -- I can upgrade to newer revs of anything easily on this box.
Your analysis looks right to me; am-utils looks more likely to be the home for this bug report, so I'm moving it there. It is still possible that it's a kernel bug or that it's a bad interaction between the kernel an am-utils.
Using current rawhide kernel (2.4.2-0.1.28), am-utils-6.0.4-7, and glibc-2.2.2- 6, this problem still exists. Input/Output error.
I'll keep updating this as I test: qa0322; problem still exists. I've backed glibc off to 2.2.2-7 due to bug 32749 (glibc-2.2.2-8 breaks am-utils completely). kernel-2.4.2-0.1.32, am- utils-6.0.4-7. Falling back to the version of am-utils shipped with RH7 corrects the problem.
And again... kernel-2.4.2-0.1.35, glibc-2.2.2-9. Still broken. Does anyone want a strace or ltrace of the failure?
Yes please. Also, to make sure that I understand the setup, do you have automount config files I could use to reproduce the problem?
I didn't find time to grab the traces from this package, but based on testing tonight, the problem is fixed in seawolf. 6.0.5-1 fixes the bug in am-utils.