Bug 114021 - NFS read stalls in e.34 and e.35 kernels
Summary: NFS read stalls in e.34 and e.35 kernels
Alias: None
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
(Show other bugs)
Version: 2.1
Hardware: i686 Linux
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2004-01-21 15:03 UTC by Need Real Name
Modified: 2007-11-30 22:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-04-15 15:00:58 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Need Real Name 2004-01-21 15:03:33 UTC
Description of problem:

Copying a file from an NFS mount to local disk exhibits stalls as long
as 5 seconds:

[relevant output from strace -tt cp /path/to/nfs/file /tmp]
11:44:01.195024 read(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
11:44:06.789237 fstat64(4, {st_mode=S_IFREG|0644, st_size=1347584,
...}) = 0

These stalls ultimately cause the copy of a 1.44MB floppy disk image
to take about a minute and a half to complete.  The strace -r output
did not show any unusual timings for the system calls, but the -tt
wall-clock output did show these long delays between read() and the
subsequent fstat64()/_llseek()/fcntl64()/write() calls.

Version-Release number of selected component (if applicable):


How reproducible:

In our testing, this was 100% reproducable with both kernels, and with
the default udp mount [rw]size, 4096, 8192, and 32768 set.  This was
against a NetApp 960 running OnTap 6.4.2P6.  During these tests, the
filer was showing ~10% cpu utilization, and the interface we were
testing against was not heavily used.  We did not test tcp transport.

Steps to Reproduce:
1. mount filer:/volX/export /mnt/filer
2. time cp /mnt/filer/file /tmp
Actual results:

Observe the throughput being much lower than it should be, and the
above strace results.

Expected results:

The file should copy in a time that is reasonable given the networking
speeds of the two hosts.

Additional info:

Falling back to the e.30smp kernel allows us to get consistent timings
in this simple test.

Comment 1 Mike Cooling 2004-01-29 21:33:51 UTC
I too am seeing attrocious performance at the e.35smp kernel. I too am
running a Netapp at release 6.4.2P6 (but it's an F740). Load average
reaches bursts of up to 140. and bad Apache response times. I am
reverting back to the e.30 kernel tomorrow.

Comment 2 Jason Baron 2004-04-15 15:00:58 UTC
This has long been fixed in the current erratum, e.38.

Comment 3 Mike Cooling 2004-04-15 16:08:48 UTC
I disagree. It is much better but performance is still much worse than
e.30, and I have already opened up support issue #312835 and returned
to e.30 once again.

Note You need to log in before you can comment on or make changes to this bug.