Red Hat Bugzilla – Bug 29139
kernel hangs using nfs
Last modified: 2005-10-31 17:00:50 EST
The kernel included in Wolverine will cause all nfs mounts to hang, in
syslog is ... kernel: nfs: task xxxxx can't get a request slot. I'm using
amd as well. The nfs servers are primarily solaris, being mounted as
nfsv2/udp as specified in the automounter map.
Did you set up a firewall in the install?
This defect is considered MUST-FIX for Florence Release-Candidate #2
It works fine for a while, then the kernel emits the error about no request
slot, and it's all over. No NFS anymore, which generally makes the system
unusable for me (automounted home directories, and X hangs on a blocking NFS
operation at some point as well, so the console goes away.)
I'm going to try to reproduce this on my home machine tonight -- much simpler
setup, not using amd.
Another clue, possibly that this might be more related to amd...
(background information about local env.)
The automounter setup is a little bit funky, it's been around for many years
here. Basically, all exported disks are automounted on /NFS like /NFS/zeus/d1
(host zeus, disk d1). Those filesystems are accessed as /nfs/zeus/d1, with a
symbolic link from /nfs/zeus -> /NFS/zeus. So, my home directory might
be /nfs/zeus/d1/home/buysse. There are also mappings for /home/username. In
this case, that's mapped to the same amd mount from zeus, with a sublink of
At this point, I can access my home directory as /nfs/zeus/d1/home/buysse, but
not as /home/buysse. Attempting to access /home/buysse gives "bash:
cd: /home/buysse: Input/Output error". The kernel is also generating an
error: "nfs_stat_to_errno: bad nfs status return value: 116". Is this a amd
error or kernel nfs code? I can't tell -- everything on this box is either
local or automounted.
I may have misfiled this -- should it be am-utils?
There may be more than one problem... Which ethernet card are you
Ethernet card is a eepro100, lspci output:
00:09.0 Class 0200: 8086:1229 (rev 08)
00:09.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
Subsystem: Intel Corporation EtherExpress PRO/100+ Management Adapter
It's really looking likely that am-utils is the culprit -- I can catch
it "early" -- when only one or two filesystems are hung, kill amd, manually
umount -f the automount points, and start amd with the -r flag (restart
mounts), and things will work well again for a while.
I've got a fat pipe -- I can upgrade to newer revs of anything easily on this
Your analysis looks right to me; am-utils looks more likely to be
the home for this bug report, so I'm moving it there. It is still
possible that it's a kernel bug or that it's a bad interaction
between the kernel an am-utils.
Using current rawhide kernel (2.4.2-0.1.28), am-utils-6.0.4-7, and glibc-2.2.2-
6, this problem still exists. Input/Output error.
I'll keep updating this as I test:
qa0322; problem still exists. I've backed glibc off to 2.2.2-7 due to bug
32749 (glibc-2.2.2-8 breaks am-utils completely). kernel-2.4.2-0.1.32, am-
Falling back to the version of am-utils shipped with RH7 corrects the problem.
And again... kernel-2.4.2-0.1.35, glibc-2.2.2-9. Still broken. Does anyone
want a strace or ltrace of the failure?
Yes please. Also, to make sure that I understand the setup, do you have
automount config files I could use to reproduce the problem?
I didn't find time to grab the traces from this package, but based on testing
tonight, the problem is fixed in seawolf. 6.0.5-1 fixes the bug in am-utils.