56665 – nfsd fails to server exports after a few minutes uptime in 2.4.9-12smp

Bug 56665 - nfsd fails to server exports after a few minutes uptime in 2.4.9-12smp

Summary: nfsd fails to server exports after a few minutes uptime in 2.4.9-12smp

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-11-23 18:17 UTC by Paul Raines
Modified:	2007-04-18 16:38 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-08-11 10:54:09 UTC
Embargoed:

Attachments	(Terms of Use)

Description Paul Raines 2001-11-23 18:17:21 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4)
Gecko/20011019 Netscape6/6.2

Description of problem:
I just installed RH7.1 on a machine, upgraded to the last errata 
versions, and installed the 2.4.9-12smp kernel.  The machine has an
internal SCSI disk and external IDE->SCSI Raid.  I converted all 
partitions except / to ext3 which are exported.  I start nfs and can
mount the exports just fine from a couple of clients.  I start a 
script that loops over many tens of clients to mount a export off
the server.  After about ten or so, mounts stop working and I get
I/O errors.  Unmounting from a previously successful client and 
retrying the mount all fails.

I can '/etc/init.d/nfs restart' and things start working again for
another few minutes.

I downgrade to the RH 2.4.7-2.9 kernel (which I patched for ext3) and
the problem goes away.
Mounts of the ext2 root volume also fail so I don't think it is an
ext3 problem.

I tried upgrading nfs-utils and mount to rawhide versions but problem
did not go away.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Upgrade to 2.4.9-12 kernel
2. Export several mount points
3. Mount exports from serveral clients till you get I/O error
   and server refuses to give more mounts
	

Actual Results:  Could no longer mount NFS exported volumes off server

Expected Results:  SHould have been able to mount exported NFS volumes

Additional info:

When the problem first starts, you will see messages like this in
syslog:

Nov 23 12:36:46 monte rpc.mountd: authenticated mount request from
132.183.203.39:1022 for /local_mount/homes/monte/1
(/local_mount/homes/monte/1) 
Nov 23 12:36:46 monte last message repeated 19 times

However, soon no messages appear from other attempted mounts so
rpc.mountd is probably completely locked up.

Here is the /etc/exports file:
===============
/local_mount/homes/monte/1 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/local_mount/space/monte/1 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/local_mount/space/monte/2 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/local_mount/space/monte/3 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/local_mount/space/monte/4 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/local_mount/space/monte/5 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/local_mount/space/monte/6 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/local_mount/space/monte/7 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/local_mount/space/monte/8 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/local_mount/space/monte/9 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

/export/redhat-7.1 \
  @all(rw) \
  192.168.100.0/255.255.255.0(rw,no_root_squash,insecure)

===========
The machine is a master of batch (beowulf) cluster so has
two network devices with batch nodes on 192.168.100.*

Comment 1 Paul Raines 2002-01-18 00:23:38 UTC

I discovered this problem was related to iptables. The machine serves as
a bridge between a private network and the main network and is setup to
masquerade using iptables.  Below is how it is configure.  As soon as
I turn off iptables, the NFS problem goes away.  So the IP filters looks
like it is breaking NFS somehow.

# /etc/init.d/iptables status
Table: nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
MASQUERADE  all  --  anywhere             anywhere           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
Table: filter
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Comment 2 Pete Zaitcev 2002-01-18 02:23:16 UTC

Make sure "-o <public_ethN>" is used in iptables.
Don't let it masquerade what goes inside, or else
the connection tracker chokes.

The output of "iptables -L -t nat" does not show
additional options such as -o.

Comment 3 Paul Raines 2002-01-18 13:47:24 UTC

I tried adding the "-o <public_ethN>"
 option and it still broke NFS.  It made no difference.  Specifically,
I added "-o 192.168.100.0/255.255.255.0"

Note You need to log in before you can comment on or make changes to this bug.