Bug 29139

Summary: kernel hangs using nfs
Product: [Retired] Red Hat Linux Reporter: Joshua Buysse <buysse>
Component: am-utilsAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: high    
Version: 7.1CC: twaugh
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-03-28 09:59:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joshua Buysse 2001-02-23 21:09:34 UTC
The kernel included in Wolverine will cause all nfs mounts to hang, in 
syslog is ... kernel: nfs: task xxxxx can't get a request slot.  I'm using 
amd as well.  The nfs servers are primarily solaris, being mounted as 
nfsv2/udp as specified in the automounter map.

Comment 1 Bill Nottingham 2001-02-23 21:42:15 UTC
Did you set up a firewall in the install?

Comment 2 Glen Foster 2001-02-23 21:49:50 UTC
This defect is considered MUST-FIX for Florence Release-Candidate #2


Comment 3 Joshua Buysse 2001-02-23 21:57:35 UTC
No firewall.  

It works fine for a while, then the kernel emits the error about no request 
slot, and it's all over.  No NFS anymore, which generally makes the system 
unusable for me (automounted home directories, and X hangs on a blocking NFS 
operation at some point as well, so the console goes away.)

I'm going to try to reproduce this on my home machine tonight -- much simpler 
setup, not using amd.

Comment 4 Joshua Buysse 2001-02-23 22:54:59 UTC
Another clue, possibly that this might be more related to amd...

(background information about local env.)
The automounter setup is a little bit funky, it's been around for many years 
here.  Basically, all exported disks are automounted on /NFS like /NFS/zeus/d1 
(host zeus, disk d1).  Those filesystems are accessed as /nfs/zeus/d1, with a 
symbolic link from /nfs/zeus -> /NFS/zeus.  So, my home directory might 
be /nfs/zeus/d1/home/buysse.  There are also mappings for /home/username.  In 
this case, that's mapped to the same amd mount from zeus, with a sublink of 
home/buysse.

At this point, I can access my home directory as /nfs/zeus/d1/home/buysse, but 
not as /home/buysse.  Attempting to access /home/buysse gives "bash: 
cd: /home/buysse: Input/Output error".  The kernel is also generating an 
error: "nfs_stat_to_errno: bad nfs status return value: 116".  Is this a amd 
error or kernel nfs code?  I can't tell -- everything on this box is either 
local or automounted.  

I may have misfiled this -- should it be am-utils?


Comment 5 Michael K. Johnson 2001-02-27 23:11:56 UTC
There may be more than one problem...  Which ethernet card are you
using?

Comment 6 Joshua Buysse 2001-02-28 00:20:29 UTC
Ethernet card is a eepro100, lspci output:

00:09.0 Class 0200: 8086:1229 (rev 08)
00:09.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
        Subsystem: Intel Corporation EtherExpress PRO/100+ Management Adapter

It's really looking likely that am-utils is the culprit -- I can catch 
it "early" -- when only one or two filesystems are hung, kill amd, manually 
umount -f the automount points, and start amd with the -r flag (restart 
mounts), and things will work well again for a while.

I've got a fat pipe -- I can upgrade to newer revs of anything easily on this 
box.

Comment 7 Michael K. Johnson 2001-03-01 05:16:50 UTC
Your analysis looks right to me; am-utils looks more likely to be
the home for this bug report, so I'm moving it there.  It is still
possible that it's a kernel bug or that it's a bad interaction
between the kernel an am-utils.

Comment 8 Joshua Buysse 2001-03-16 19:45:15 UTC
Using current rawhide kernel (2.4.2-0.1.28), am-utils-6.0.4-7, and glibc-2.2.2-
6, this problem still exists.  Input/Output error.

Comment 9 Joshua Buysse 2001-03-23 00:45:17 UTC
I'll keep updating this as I test:

qa0322; problem still exists.  I've backed glibc off to 2.2.2-7 due to bug 
32749 (glibc-2.2.2-8 breaks am-utils completely).  kernel-2.4.2-0.1.32, am-
utils-6.0.4-7.

Falling back to the version of am-utils shipped with RH7 corrects the problem.

Comment 10 Joshua Buysse 2001-03-27 22:15:32 UTC
And again... kernel-2.4.2-0.1.35, glibc-2.2.2-9.  Still broken.  Does anyone 
want a strace or ltrace of the failure?

Comment 11 Tim Waugh 2001-03-28 09:59:18 UTC
Yes please.  Also, to make sure that I understand the setup, do you have 
automount config files I could use to reproduce the problem?


Comment 12 Joshua Buysse 2001-04-11 06:58:48 UTC
I didn't find time to grab the traces from this package, but based on testing 
tonight, the problem is fixed in seawolf.  6.0.5-1 fixes the bug in am-utils.