29139 – kernel hangs using nfs

Bug 29139 - kernel hangs using nfs

Summary: kernel hangs using nfs

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	am-utils
Sub Component:
Version:	7.1
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Nalin Dahyabhai
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-02-23 21:09 UTC by Joshua Buysse
Modified:	2005-10-31 22:00 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2001-03-28 09:59:22 UTC
Embargoed:

Attachments	(Terms of Use)

Description Joshua Buysse 2001-02-23 21:09:34 UTC

The kernel included in Wolverine will cause all nfs mounts to hang, in 
syslog is ... kernel: nfs: task xxxxx can't get a request slot.  I'm using 
amd as well.  The nfs servers are primarily solaris, being mounted as 
nfsv2/udp as specified in the automounter map.

Comment 1 Bill Nottingham 2001-02-23 21:42:15 UTC

Did you set up a firewall in the install?

Comment 2 Glen Foster 2001-02-23 21:49:50 UTC

This defect is considered MUST-FIX for Florence Release-Candidate #2

Comment 3 Joshua Buysse 2001-02-23 21:57:35 UTC

No firewall.  

It works fine for a while, then the kernel emits the error about no request 
slot, and it's all over.  No NFS anymore, which generally makes the system 
unusable for me (automounted home directories, and X hangs on a blocking NFS 
operation at some point as well, so the console goes away.)

I'm going to try to reproduce this on my home machine tonight -- much simpler 
setup, not using amd.

Comment 4 Joshua Buysse 2001-02-23 22:54:59 UTC

Another clue, possibly that this might be more related to amd...

(background information about local env.)
The automounter setup is a little bit funky, it's been around for many years 
here.  Basically, all exported disks are automounted on /NFS like /NFS/zeus/d1 
(host zeus, disk d1).  Those filesystems are accessed as /nfs/zeus/d1, with a 
symbolic link from /nfs/zeus -> /NFS/zeus.  So, my home directory might 
be /nfs/zeus/d1/home/buysse.  There are also mappings for /home/username.  In 
this case, that's mapped to the same amd mount from zeus, with a sublink of 
home/buysse.

At this point, I can access my home directory as /nfs/zeus/d1/home/buysse, but 
not as /home/buysse.  Attempting to access /home/buysse gives "bash: 
cd: /home/buysse: Input/Output error".  The kernel is also generating an 
error: "nfs_stat_to_errno: bad nfs status return value: 116".  Is this a amd 
error or kernel nfs code?  I can't tell -- everything on this box is either 
local or automounted.  

I may have misfiled this -- should it be am-utils?

Comment 5 Michael K. Johnson 2001-02-27 23:11:56 UTC

There may be more than one problem...  Which ethernet card are you
using?

Comment 6 Joshua Buysse 2001-02-28 00:20:29 UTC

Ethernet card is a eepro100, lspci output:

00:09.0 Class 0200: 8086:1229 (rev 08)
00:09.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
        Subsystem: Intel Corporation EtherExpress PRO/100+ Management Adapter

It's really looking likely that am-utils is the culprit -- I can catch 
it "early" -- when only one or two filesystems are hung, kill amd, manually 
umount -f the automount points, and start amd with the -r flag (restart 
mounts), and things will work well again for a while.

I've got a fat pipe -- I can upgrade to newer revs of anything easily on this 
box.

Comment 7 Michael K. Johnson 2001-03-01 05:16:50 UTC

Your analysis looks right to me; am-utils looks more likely to be
the home for this bug report, so I'm moving it there.  It is still
possible that it's a kernel bug or that it's a bad interaction
between the kernel an am-utils.

Comment 8 Joshua Buysse 2001-03-16 19:45:15 UTC

Using current rawhide kernel (2.4.2-0.1.28), am-utils-6.0.4-7, and glibc-2.2.2-
6, this problem still exists.  Input/Output error.

Comment 9 Joshua Buysse 2001-03-23 00:45:17 UTC

I'll keep updating this as I test:

qa0322; problem still exists.  I've backed glibc off to 2.2.2-7 due to bug 
32749 (glibc-2.2.2-8 breaks am-utils completely).  kernel-2.4.2-0.1.32, am-
utils-6.0.4-7.

Falling back to the version of am-utils shipped with RH7 corrects the problem.

Comment 10 Joshua Buysse 2001-03-27 22:15:32 UTC

And again... kernel-2.4.2-0.1.35, glibc-2.2.2-9.  Still broken.  Does anyone 
want a strace or ltrace of the failure?

Comment 11 Tim Waugh 2001-03-28 09:59:18 UTC

Yes please.  Also, to make sure that I understand the setup, do you have 
automount config files I could use to reproduce the problem?

Comment 12 Joshua Buysse 2001-04-11 06:58:48 UTC

I didn't find time to grab the traces from this package, but based on testing 
tonight, the problem is fixed in seawolf.  6.0.5-1 fixes the bug in am-utils.

Note You need to log in before you can comment on or make changes to this bug.