Red Hat Bugzilla – Bug 102438
nfs hangs under higher load / traffic
Last modified: 2008-08-02 19:40:33 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.4) Gecko/20030703
Description of problem:
1. Description of the infrastructure
One Tiger 4 acts as a frontend for 12 Tiger 2 machines, which build a cluster.
The Tiger 4 has one 1Gb ethernet interface on board and an additional QLogic QLA
2340 FC-Adapter, which is connected to a EMC-Clariion 4500, 4 processors, 16 GB RAM.
The Tiger 2 have 2 processors, 4GB RAM and two onboard 1Gb ethernet interfaces,
only one is in use.
The machines are connected via a Nortel BayStack 380-T24 Switch.
Red Hat Linux Advanced Server release 2.1AS (Derry), Kernel 2.4.18-e.31smp on an
ia64 is running on the computers.
We configured two different IP-Addresses in the network interface of the Tiger
4, so that this computer act as a router to a network different from the Cluster
We put some other ethernet interfaces into the Tiger 4, but we didn't get them
configured, we had also no success to configure a second QLA 2340 for the
connection to the Clariion.
The Tiger 4 should act as a NFS Server. Exported filesystems resided on the
internal disk and on the Clariion.
2. NFS instability
NFS is not working properly on the Tiger 4. Regardless which options had been
set for mounting,
the NFS server break off it's job, if some load occured - currently 32 nfsd are
The STAR jobs reside on NFS filesystems, so if the NFS Server stop to work, then
the calcultion also stop. Sometimes it continued, sometimes not. This depends on
the mount options (soft or hard). At the moment NFS on the Tiger 4 could not be
used for STAR calculations!
Version-Release number of selected component (if applicable):
Steps to Reproduce:
The problem usually occurs, whren the application STAR starts to write to the
nfs share, but also occurs, if the following script is started on for e.g. 12
for i in 1 2 3 4 5 6 7 8 9 10
( time dd if=/dev/zero of=$FILE bs=8k count=12800 2>&1 ) 2>&1 | grep real
excerpt from /etc/fstab
n0:/home /home nfs
bg,soft,intr,retry=100,notcp,udp,rsize=8192,wsize=8192 0 0
n0:/work/data /work/data nfs
bg,hard,intr,retry=100,notcp,udp,rsize=8192,wsize=8192 0 0
excerpt from var/log/messages:
Jun 26 15:50:21 n1 kernel: nfs: server n0 not responding, still trying
Jun 26 15:50:22 n1 kernel: nfs: server n0 OK
Jun 26 16:06:19 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:06:50 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:07:57 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:10:24 n1 kernel: nfs: server n0 not responding, timed out
Works fine in SLES 8.0 ...
It's not possible add the customer, Mr. Tschuchranin on cc - I always receive an
error message. His eMail adress: Frank.Tschuchranin@toyota-f1.com
Unfortunately I reported the bug with my private account - can you pls. assign
this to my Red Hat account email@example.com - Thanks
It looks that I have the same problem with our RHEL3ES. The NFS
server is 2x Xeon 2.4G but the load goes sometimes over 8. Then some
of the automounter clients cannot mount NFS share with the same
errors. There was no error with RH8 before.
What kernel version are you using?
Is this still a problem with more recent RHEL3 or AS21 kernels?
This bug is filed against RHEL2.1, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.