Bug 102438

Summary: nfs hangs under higher load / traffic
Product: Red Hat Enterprise Linux 2.1 Reporter: Joachim Kunze <joachim>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: drajsajtl, harald, herrmann, jkunze, nhappel, raimondi, riel
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 19:24:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joachim Kunze 2003-08-15 03:09:45 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.4) Gecko/20030703

Description of problem:
1. Description of the infrastructure
One Tiger 4 acts as a frontend for 12 Tiger 2 machines, which build a cluster.
The Tiger 4 has one 1Gb ethernet interface on board and an additional QLogic QLA
2340 FC-Adapter, which is connected to a EMC-Clariion 4500, 4 processors, 16 GB RAM.
The Tiger 2 have 2 processors, 4GB RAM and two onboard 1Gb ethernet interfaces,
only one is in use.
The machines are connected via a Nortel BayStack 380-T24 Switch.

Red Hat Linux Advanced Server release 2.1AS (Derry), Kernel 2.4.18-e.31smp on an
ia64 is running on the computers.

We configured two different IP-Addresses in the network interface of the Tiger
4, so that this computer act as a router to a network different from the Cluster
network.
We put some other ethernet interfaces into the Tiger 4, but we didn't get them
configured, we had also no success to configure a second QLA 2340 for the
connection to the Clariion.
The Tiger 4 should act as a NFS Server. Exported filesystems resided on the
internal disk and on the Clariion. 

2. NFS instability
NFS is not working properly on the Tiger 4. Regardless which options had been
set for mounting,
the NFS server break off it's job, if some load occured - currently 32 nfsd are
running.
The STAR jobs reside on NFS filesystems, so if the NFS Server stop to work, then
the calcultion also stop. Sometimes it continued, sometimes not. This depends on
the mount options (soft or hard). At the moment NFS on the Tiger 4 could not be
used for STAR calculations!


Version-Release number of selected component (if applicable):
kernel- 2.4.18-e.31smp

How reproducible:
Sometimes

Steps to Reproduce:
The problem usually occurs, whren the application STAR starts to write to the
nfs share, but also occurs, if the following script is started on for e.g. 12
clients.

#!/bin/bash
export LANG=C

FILE=/work/data/fratsch/$HOSTNAME
for i in 1 2 3 4 5 6 7 8 9 10
do
  ( time dd if=/dev/zero of=$FILE bs=8k count=12800 2>&1 ) 2>&1 | grep real
  rm $FILE
done

excerpt from /etc/fstab
n0:/home                /home                   nfs    
bg,soft,intr,retry=100,notcp,udp,rsize=8192,wsize=8192  0 0
n0:/work/data           /work/data              nfs    
bg,hard,intr,retry=100,notcp,udp,rsize=8192,wsize=8192  0 0

excerpt from var/log/messages:
Jun 26 15:50:21 n1 kernel: nfs: server n0 not responding, still trying
Jun 26 15:50:22 n1 kernel: nfs: server n0 OK
Jun 26 16:06:19 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:06:50 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:07:57 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:10:24 n1 kernel: nfs: server n0 not responding, timed out


Additional info:

Works fine in SLES 8.0 ...

Comment 1 Joachim Kunze 2003-08-15 04:07:12 UTC
It's not possible add the customer, Mr. Tschuchranin on cc - I always receive an
error message. His eMail adress: Frank.Tschuchranin

Unfortunately I reported the bug with my private account - can you pls. assign
this to my Red Hat account jkunze - Thanks

Comment 2 Tomas Drajsajtl 2004-03-11 08:44:10 UTC
It looks that I have the same problem with our RHEL3ES. The NFS 
server is 2x Xeon 2.4G but the load goes sometimes over 8. Then some 
of the automounter clients cannot mount NFS share with the same 
errors. There was no error with RH8 before.

Comment 3 Steve Dickson 2004-07-30 17:46:12 UTC
What kernel version are you using?

Comment 4 Niels Happel 2004-07-31 01:13:36 UTC
2.4.18-e.31smp

Comment 5 Steve Dickson 2004-10-14 20:47:35 UTC
Is this still a problem with more recent RHEL3 or AS21 kernels?

Comment 6 RHEL Program Management 2007-10-19 19:24:17 UTC
This bug is filed against RHEL2.1, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products.  Since
this bug does not meet that criteria, it is now being closed.

For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/

If you feel this bug is indeed mission critical, please contact your
support representative.  You may be asked to provide detailed
information on how this bug is affecting you.