From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2 Description of problem: I just installed RH7.1 on a machine, upgraded to the last errata versions, and installed the 2.4.9-12smp kernel. The machine has an internal SCSI disk and external IDE->SCSI Raid. I converted all partitions except / to ext3 which are exported. I start nfs and can mount the exports just fine from a couple of clients. I start a script that loops over many tens of clients to mount a export off the server. After about ten or so, mounts stop working and I get I/O errors. Unmounting from a previously successful client and retrying the mount all fails. I can '/etc/init.d/nfs restart' and things start working again for another few minutes. I downgrade to the RH 2.4.7-2.9 kernel (which I patched for ext3) and the problem goes away. Mounts of the ext2 root volume also fail so I don't think it is an ext3 problem. I tried upgrading nfs-utils and mount to rawhide versions but problem did not go away. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Upgrade to 2.4.9-12 kernel 2. Export several mount points 3. Mount exports from serveral clients till you get I/O error and server refuses to give more mounts Actual Results: Could no longer mount NFS exported volumes off server Expected Results: SHould have been able to mount exported NFS volumes Additional info: When the problem first starts, you will see messages like this in syslog: Nov 23 12:36:46 monte rpc.mountd: authenticated mount request from 132.183.203.39:1022 for /local_mount/homes/monte/1 (/local_mount/homes/monte/1) Nov 23 12:36:46 monte last message repeated 19 times However, soon no messages appear from other attempted mounts so rpc.mountd is probably completely locked up. Here is the /etc/exports file: =============== /local_mount/homes/monte/1 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /local_mount/space/monte/1 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /local_mount/space/monte/2 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /local_mount/space/monte/3 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /local_mount/space/monte/4 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /local_mount/space/monte/5 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /local_mount/space/monte/6 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /local_mount/space/monte/7 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /local_mount/space/monte/8 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /local_mount/space/monte/9 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) /export/redhat-7.1 \ @all(rw) \ 192.168.100.0/255.255.255.0(rw,no_root_squash,insecure) =========== The machine is a master of batch (beowulf) cluster so has two network devices with batch nodes on 192.168.100.*
I discovered this problem was related to iptables. The machine serves as a bridge between a private network and the main network and is setup to masquerade using iptables. Below is how it is configure. As soon as I turn off iptables, the NFS problem goes away. So the IP filters looks like it is breaking NFS somehow. # /etc/init.d/iptables status Table: nat Chain PREROUTING (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Table: filter Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
Make sure "-o <public_ethN>" is used in iptables. Don't let it masquerade what goes inside, or else the connection tracker chokes. The output of "iptables -L -t nat" does not show additional options such as -o.
I tried adding the "-o <public_ethN>" option and it still broke NFS. It made no difference. Specifically, I added "-o 192.168.100.0/255.255.255.0"