102438 – nfs hangs under higher load / traffic

Bug 102438 - nfs hangs under higher load / traffic

Summary: nfs hangs under higher load / traffic

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-08-15 03:09 UTC by Joachim Kunze
Modified:	2008-08-02 23:40 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 19:24:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Joachim Kunze 2003-08-15 03:09:45 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.4) Gecko/20030703

Description of problem:
1. Description of the infrastructure
One Tiger 4 acts as a frontend for 12 Tiger 2 machines, which build a cluster.
The Tiger 4 has one 1Gb ethernet interface on board and an additional QLogic QLA
2340 FC-Adapter, which is connected to a EMC-Clariion 4500, 4 processors, 16 GB RAM.
The Tiger 2 have 2 processors, 4GB RAM and two onboard 1Gb ethernet interfaces,
only one is in use.
The machines are connected via a Nortel BayStack 380-T24 Switch.

Red Hat Linux Advanced Server release 2.1AS (Derry), Kernel 2.4.18-e.31smp on an
ia64 is running on the computers.

We configured two different IP-Addresses in the network interface of the Tiger
4, so that this computer act as a router to a network different from the Cluster
network.
We put some other ethernet interfaces into the Tiger 4, but we didn't get them
configured, we had also no success to configure a second QLA 2340 for the
connection to the Clariion.
The Tiger 4 should act as a NFS Server. Exported filesystems resided on the
internal disk and on the Clariion. 

2. NFS instability
NFS is not working properly on the Tiger 4. Regardless which options had been
set for mounting,
the NFS server break off it's job, if some load occured - currently 32 nfsd are
running.
The STAR jobs reside on NFS filesystems, so if the NFS Server stop to work, then
the calcultion also stop. Sometimes it continued, sometimes not. This depends on
the mount options (soft or hard). At the moment NFS on the Tiger 4 could not be
used for STAR calculations!


Version-Release number of selected component (if applicable):
kernel- 2.4.18-e.31smp

How reproducible:
Sometimes

Steps to Reproduce:
The problem usually occurs, whren the application STAR starts to write to the
nfs share, but also occurs, if the following script is started on for e.g. 12
clients.

#!/bin/bash
export LANG=C

FILE=/work/data/fratsch/$HOSTNAME
for i in 1 2 3 4 5 6 7 8 9 10
do
  ( time dd if=/dev/zero of=$FILE bs=8k count=12800 2>&1 ) 2>&1 | grep real
  rm $FILE
done

excerpt from /etc/fstab
n0:/home                /home                   nfs    
bg,soft,intr,retry=100,notcp,udp,rsize=8192,wsize=8192  0 0
n0:/work/data           /work/data              nfs    
bg,hard,intr,retry=100,notcp,udp,rsize=8192,wsize=8192  0 0

excerpt from var/log/messages:
Jun 26 15:50:21 n1 kernel: nfs: server n0 not responding, still trying
Jun 26 15:50:22 n1 kernel: nfs: server n0 OK
Jun 26 16:06:19 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:06:50 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:07:57 n1 kernel: nfs: server n0 not responding, timed out
Jun 26 16:10:24 n1 kernel: nfs: server n0 not responding, timed out


Additional info:

Works fine in SLES 8.0 ...

Comment 1 Joachim Kunze 2003-08-15 04:07:12 UTC

It's not possible add the customer, Mr. Tschuchranin on cc - I always receive an
error message. His eMail adress: Frank.Tschuchranin

Unfortunately I reported the bug with my private account - can you pls. assign
this to my Red Hat account jkunze - Thanks

Comment 2 Tomas Drajsajtl 2004-03-11 08:44:10 UTC

It looks that I have the same problem with our RHEL3ES. The NFS 
server is 2x Xeon 2.4G but the load goes sometimes over 8. Then some 
of the automounter clients cannot mount NFS share with the same 
errors. There was no error with RH8 before.

Comment 3 Steve Dickson 2004-07-30 17:46:12 UTC

What kernel version are you using?

Comment 4 Niels Happel 2004-07-31 01:13:36 UTC

2.4.18-e.31smp

Comment 5 Steve Dickson 2004-10-14 20:47:35 UTC

Is this still a problem with more recent RHEL3 or AS21 kernels?

Comment 6 RHEL Program Management 2007-10-19 19:24:17 UTC

This bug is filed against RHEL2.1, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products.  Since
this bug does not meet that criteria, it is now being closed.

For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/

If you feel this bug is indeed mission critical, please contact your
support representative.  You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.