Bug 29139 - kernel hangs using nfs
kernel hangs using nfs
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: am-utils (Show other bugs)
7.1
i386 Linux
high Severity high
: ---
: ---
Assigned To: Nalin Dahyabhai
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-02-23 16:09 EST by Joshua Buysse
Modified: 2005-10-31 17:00 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-03-28 04:59:22 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joshua Buysse 2001-02-23 16:09:34 EST
The kernel included in Wolverine will cause all nfs mounts to hang, in 
syslog is ... kernel: nfs: task xxxxx can't get a request slot.  I'm using 
amd as well.  The nfs servers are primarily solaris, being mounted as 
nfsv2/udp as specified in the automounter map.
Comment 1 Bill Nottingham 2001-02-23 16:42:15 EST
Did you set up a firewall in the install?
Comment 2 Glen Foster 2001-02-23 16:49:50 EST
This defect is considered MUST-FIX for Florence Release-Candidate #2
Comment 3 Joshua Buysse 2001-02-23 16:57:35 EST
No firewall.  

It works fine for a while, then the kernel emits the error about no request 
slot, and it's all over.  No NFS anymore, which generally makes the system 
unusable for me (automounted home directories, and X hangs on a blocking NFS 
operation at some point as well, so the console goes away.)

I'm going to try to reproduce this on my home machine tonight -- much simpler 
setup, not using amd.
Comment 4 Joshua Buysse 2001-02-23 17:54:59 EST
Another clue, possibly that this might be more related to amd...

(background information about local env.)
The automounter setup is a little bit funky, it's been around for many years 
here.  Basically, all exported disks are automounted on /NFS like /NFS/zeus/d1 
(host zeus, disk d1).  Those filesystems are accessed as /nfs/zeus/d1, with a 
symbolic link from /nfs/zeus -> /NFS/zeus.  So, my home directory might 
be /nfs/zeus/d1/home/buysse.  There are also mappings for /home/username.  In 
this case, that's mapped to the same amd mount from zeus, with a sublink of 
home/buysse.

At this point, I can access my home directory as /nfs/zeus/d1/home/buysse, but 
not as /home/buysse.  Attempting to access /home/buysse gives "bash: 
cd: /home/buysse: Input/Output error".  The kernel is also generating an 
error: "nfs_stat_to_errno: bad nfs status return value: 116".  Is this a amd 
error or kernel nfs code?  I can't tell -- everything on this box is either 
local or automounted.  

I may have misfiled this -- should it be am-utils?
Comment 5 Michael K. Johnson 2001-02-27 18:11:56 EST
There may be more than one problem...  Which ethernet card are you
using?
Comment 6 Joshua Buysse 2001-02-27 19:20:29 EST
Ethernet card is a eepro100, lspci output:

00:09.0 Class 0200: 8086:1229 (rev 08)
00:09.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
        Subsystem: Intel Corporation EtherExpress PRO/100+ Management Adapter

It's really looking likely that am-utils is the culprit -- I can catch 
it "early" -- when only one or two filesystems are hung, kill amd, manually 
umount -f the automount points, and start amd with the -r flag (restart 
mounts), and things will work well again for a while.

I've got a fat pipe -- I can upgrade to newer revs of anything easily on this 
box.
Comment 7 Michael K. Johnson 2001-03-01 00:16:50 EST
Your analysis looks right to me; am-utils looks more likely to be
the home for this bug report, so I'm moving it there.  It is still
possible that it's a kernel bug or that it's a bad interaction
between the kernel an am-utils.
Comment 8 Joshua Buysse 2001-03-16 14:45:15 EST
Using current rawhide kernel (2.4.2-0.1.28), am-utils-6.0.4-7, and glibc-2.2.2-
6, this problem still exists.  Input/Output error.
Comment 9 Joshua Buysse 2001-03-22 19:45:17 EST
I'll keep updating this as I test:

qa0322; problem still exists.  I've backed glibc off to 2.2.2-7 due to bug 
32749 (glibc-2.2.2-8 breaks am-utils completely).  kernel-2.4.2-0.1.32, am-
utils-6.0.4-7.

Falling back to the version of am-utils shipped with RH7 corrects the problem.
Comment 10 Joshua Buysse 2001-03-27 17:15:32 EST
And again... kernel-2.4.2-0.1.35, glibc-2.2.2-9.  Still broken.  Does anyone 
want a strace or ltrace of the failure?
Comment 11 Tim Waugh 2001-03-28 04:59:18 EST
Yes please.  Also, to make sure that I understand the setup, do you have 
automount config files I could use to reproduce the problem?
Comment 12 Joshua Buysse 2001-04-11 02:58:48 EDT
I didn't find time to grab the traces from this package, but based on testing 
tonight, the problem is fixed in seawolf.  6.0.5-1 fixes the bug in am-utils.

Note You need to log in before you can comment on or make changes to this bug.