Bug 102438 - nfs hangs under higher load / traffic
nfs hangs under higher load / traffic
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
2.1
ia64 Linux
medium Severity high
: ---
: ---
Assigned To: Steve Dickson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-08-14 23:09 EDT by Joachim Kunze
Modified: 2008-08-02 19:40 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 15:24:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joachim Kunze 2003-08-14 23:09:45 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.4) Gecko/20030703

Description of problem:
1. Description of the infrastructure
One Tiger 4 acts as a frontend for 12 Tiger 2 machines, which build a cluster.
The Tiger 4 has one 1Gb ethernet interface on board and an additional QLogic QLA
2340 FC-Adapter, which is connected to a EMC-Clariion 4500, 4 processors, 16 GB RAM.
The Tiger 2 have 2 processors, 4GB RAM and two onboard 1Gb ethernet interfaces,
only one is in use.
The machines are connected via a Nortel BayStack 380-T24 Switch.

Red Hat Linux Advanced Server release 2.1AS (Derry), Kernel 2.4.18-e.31smp on an
ia64 is running on the computers.

We configured two different IP-Addresses in the network interface of the Tiger
4, so that this computer act as a router to a network different from the Cluster
network.
We put some other ethernet interfaces into the Tiger 4, but we didn't get them
configured, we had also no success to configure a second QLA 2340 for the
connection to the Clariion.
The Tiger 4 should act as a NFS Server. Exported filesystems resided on the
internal disk and on the Clariion. 

2. NFS instability
NFS is not working properly on the Tiger 4. Regardless which options had been
set for mounting,
the NFS server break off it's job, if some load occured - currently 32 nfsd are
running.
The STAR jobs reside on NFS filesystems, so if the NFS Server stop to work, then
the calcultion also stop. Sometimes it continued, sometimes not. This depends on
the mount options (soft or hard). At the moment NFS on the Tiger 4 could not be
used for STAR calculations!


Version-Release number of selected component (if applicable):
kernel- 2.4.18-e.31smp

How reproducible:
Sometimes

Steps to Reproduce:
The problem usually occurs, whren the application STAR starts to write to the
nfs share, but also occurs, if the following script is started on for e.g. 12
clients.

#!/bin/bash
export LANG=C

FILE=/work/data/fratsch/$HOSTNAME
for i in 1 2 3 4 5 6 7 8 9 10
do
  ( time dd if=/dev/zero of=$FILE bs=8k count=12800 2>&1 ) 2>&1 | grep real
  rm $FILE
done

excerpt from /etc/fstab
n0:/home                /home                   nfs    
bg,soft,intr,retry=100,notcp,udp,rsize=8192,wsize=8192  0 0
n0:/work/data           /work/data              nfs    
bg,hard,intr,retry=100,notcp,udp,rsize=8192,wsize=8192  0 0

excerpt from var/log/messages:
Jun 26 15:50:21 n1 kernel: nfs: server n0 not responding, still trying
Jun 26 15:50:22 n1 kernel: nfs: server n0 OK
Jun 26 16:06:19 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:06:50 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:07:57 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:10:24 n1 kernel: nfs: server n0 not responding, timed out


Additional info:

Works fine in SLES 8.0 ...
Comment 1 Joachim Kunze 2003-08-15 00:07:12 EDT
It's not possible add the customer, Mr. Tschuchranin on cc - I always receive an
error message. His eMail adress: Frank.Tschuchranin@toyota-f1.com

Unfortunately I reported the bug with my private account - can you pls. assign
this to my Red Hat account jkunze@redhat.com - Thanks
Comment 2 Tomas Drajsajtl 2004-03-11 03:44:10 EST
It looks that I have the same problem with our RHEL3ES. The NFS 
server is 2x Xeon 2.4G but the load goes sometimes over 8. Then some 
of the automounter clients cannot mount NFS share with the same 
errors. There was no error with RH8 before.
Comment 3 Steve Dickson 2004-07-30 13:46:12 EDT
What kernel version are you using?
Comment 4 Niels Happel 2004-07-30 21:13:36 EDT
2.4.18-e.31smp
Comment 5 Steve Dickson 2004-10-14 16:47:35 EDT
Is this still a problem with more recent RHEL3 or AS21 kernels?
Comment 6 RHEL Product and Program Management 2007-10-19 15:24:17 EDT
This bug is filed against RHEL2.1, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products.  Since
this bug does not meet that criteria, it is now being closed.

For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/

If you feel this bug is indeed mission critical, please contact your
support representative.  You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.