Bug 114021 - NFS read stalls in e.34 and e.35 kernels
NFS read stalls in e.34 and e.35 kernels
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
2.1
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-01-21 10:03 EST by Need Real Name
Modified: 2007-11-30 17:06 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-04-15 11:00:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2004-01-21 10:03:33 EST
Description of problem:

Copying a file from an NFS mount to local disk exhibits stalls as long
as 5 seconds:

[relevant output from strace -tt cp /path/to/nfs/file /tmp]
11:44:01.195024 read(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
11:44:06.789237 fstat64(4, {st_mode=S_IFREG|0644, st_size=1347584,
...}) = 0

These stalls ultimately cause the copy of a 1.44MB floppy disk image
to take about a minute and a half to complete.  The strace -r output
did not show any unusual timings for the system calls, but the -tt
wall-clock output did show these long delays between read() and the
subsequent fstat64()/_llseek()/fcntl64()/write() calls.

Version-Release number of selected component (if applicable):

kernel-2.4.9-e.34smp
kernel-2.4.9-e.35smp

How reproducible:

In our testing, this was 100% reproducable with both kernels, and with
the default udp mount [rw]size, 4096, 8192, and 32768 set.  This was
against a NetApp 960 running OnTap 6.4.2P6.  During these tests, the
filer was showing ~10% cpu utilization, and the interface we were
testing against was not heavily used.  We did not test tcp transport.

Steps to Reproduce:
1. mount filer:/volX/export /mnt/filer
2. time cp /mnt/filer/file /tmp
  
Actual results:

Observe the throughput being much lower than it should be, and the
above strace results.

Expected results:

The file should copy in a time that is reasonable given the networking
speeds of the two hosts.

Additional info:

Falling back to the e.30smp kernel allows us to get consistent timings
in this simple test.
Comment 1 Mike Cooling 2004-01-29 16:33:51 EST
I too am seeing attrocious performance at the e.35smp kernel. I too am
running a Netapp at release 6.4.2P6 (but it's an F740). Load average
reaches bursts of up to 140. and bad Apache response times. I am
reverting back to the e.30 kernel tomorrow.
Comment 2 Jason Baron 2004-04-15 11:00:58 EDT
This has long been fixed in the current erratum, e.38.
Comment 3 Mike Cooling 2004-04-15 12:08:48 EDT
I disagree. It is much better but performance is still much worse than
e.30, and I have already opened up support issue #312835 and returned
to e.30 once again.

Note You need to log in before you can comment on or make changes to this bug.