From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.4) Gecko/20030703 Description of problem: 1. Description of the infrastructure One Tiger 4 acts as a frontend for 12 Tiger 2 machines, which build a cluster. The Tiger 4 has one 1Gb ethernet interface on board and an additional QLogic QLA 2340 FC-Adapter, which is connected to a EMC-Clariion 4500, 4 processors, 16 GB RAM. The Tiger 2 have 2 processors, 4GB RAM and two onboard 1Gb ethernet interfaces, only one is in use. The machines are connected via a Nortel BayStack 380-T24 Switch. Red Hat Linux Advanced Server release 2.1AS (Derry), Kernel 2.4.18-e.31smp on an ia64 is running on the computers. We configured two different IP-Addresses in the network interface of the Tiger 4, so that this computer act as a router to a network different from the Cluster network. We put some other ethernet interfaces into the Tiger 4, but we didn't get them configured, we had also no success to configure a second QLA 2340 for the connection to the Clariion. The Tiger 4 should act as a NFS Server. Exported filesystems resided on the internal disk and on the Clariion. 2. NFS instability NFS is not working properly on the Tiger 4. Regardless which options had been set for mounting, the NFS server break off it's job, if some load occured - currently 32 nfsd are running. The STAR jobs reside on NFS filesystems, so if the NFS Server stop to work, then the calcultion also stop. Sometimes it continued, sometimes not. This depends on the mount options (soft or hard). At the moment NFS on the Tiger 4 could not be used for STAR calculations! Version-Release number of selected component (if applicable): kernel- 2.4.18-e.31smp How reproducible: Sometimes Steps to Reproduce: The problem usually occurs, whren the application STAR starts to write to the nfs share, but also occurs, if the following script is started on for e.g. 12 clients. #!/bin/bash export LANG=C FILE=/work/data/fratsch/$HOSTNAME for i in 1 2 3 4 5 6 7 8 9 10 do ( time dd if=/dev/zero of=$FILE bs=8k count=12800 2>&1 ) 2>&1 | grep real rm $FILE done excerpt from /etc/fstab n0:/home /home nfs bg,soft,intr,retry=100,notcp,udp,rsize=8192,wsize=8192 0 0 n0:/work/data /work/data nfs bg,hard,intr,retry=100,notcp,udp,rsize=8192,wsize=8192 0 0 excerpt from var/log/messages: Jun 26 15:50:21 n1 kernel: nfs: server n0 not responding, still trying Jun 26 15:50:22 n1 kernel: nfs: server n0 OK Jun 26 16:06:19 n1 kernel: nfs: server n0 not responding, timed out Jun 26 16:06:50 n1 kernel: nfs: server n0 not responding, timed out Jun 26 16:07:57 n1 kernel: nfs: server n0 not responding, timed out Jun 26 16:10:24 n1 kernel: nfs: server n0 not responding, timed out Additional info: Works fine in SLES 8.0 ...
It's not possible add the customer, Mr. Tschuchranin on cc - I always receive an error message. His eMail adress: Frank.Tschuchranin Unfortunately I reported the bug with my private account - can you pls. assign this to my Red Hat account jkunze - Thanks
It looks that I have the same problem with our RHEL3ES. The NFS server is 2x Xeon 2.4G but the load goes sometimes over 8. Then some of the automounter clients cannot mount NFS share with the same errors. There was no error with RH8 before.
What kernel version are you using?
2.4.18-e.31smp
Is this still a problem with more recent RHEL3 or AS21 kernels?
This bug is filed against RHEL2.1, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.