Bug 785854

Summary: NFS is 40x slower when using 32K I/O size to copy files from a Solaris 10 NFS client to a RHEL 6.2 NFS server
Product: Red Hat Enterprise Linux 6 Reporter: roland.teague
Component: kernelAssignee: nfs-maint
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2CC: bengland, jlayton, perfbz, steved
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: NFS performance
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-16 11:06:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
badperf 32k I/O size tcpdump
none
goodperf default I/O size tcpdump none

Description roland.teague 2012-01-30 19:09:25 UTC
Description of problem:

NFS is 40x slower when using 32K I/O size to copy files from a Solaris 10 NFS client to a RHEL 6.2 NFS server. Performance issue is also seen with 8K and 16K I/O sizes. Performance is only acceptable with 4K I/O size.

Version-Release number of selected component (if applicable):

Have tested and reproduced with NFS ver 3 and 4 on the following RHEL releases.

RHEL 5.3
RHEL 5.5
RHEL 5.6
RHEL 6.2

Solaris 10 client (8/11 release)

bash-3.2# uname -a
SunOS ib50-110 5.10 Generic_147441-01 i86pc i386 i86pc
bash-3.2#


How reproducible:

Very reproducable.

Mount RHEL NFS server from Sol 10 client with 32K I/O size and cp/dd file to NFS mount point. Redo test with 4K NFS I/O size.

Steps to Reproduce:
1.
2.
3.
  
Actual results:

bash-3.2# mount -F nfs -o vers=4,rsize=32k,wsize=32k 10.10.138.134:/home /ibfs1
bash-3.2# time dd if=/dev/zero of=/ibfs1/dd_test_32k bs=1000000 count=1000
1000+0 records in
1000+0 records out

real    10m53.497s
user    0m0.005s
sys     0m1.330s
bash-3.2#


Expected results:

bash-3.2# mount -F nfs -o vers=4 10.10.138.134:/home /ibfs1
bash-3.2# time dd if=/dev/zero of=/ibfs1/dd_test_4k bs=1000000 count=1000
1000+0 records in
1000+0 records out

real    0m17.010s
user    0m0.003s
sys     0m1.093s
bash-3.2#


Additional info:

Bruce Fields has asked that a bug be filed. There is nothing notable in the messages file on client or server. Oracle has told the end customer that the issue in not with Solaris. I can provide a tcpdump if neccessary.

Re: Intermittent performance issues with Solaris 10 NFS V3 client to RHEL 5.5 NFS server
Monday, December 12, 2011 10:13 AM
From: 
"J. Bruce Fields" <bfields>
To: 
"John Simon" <tzzhc4>
Cc: 
linux-nfs.org

On Sun, Dec 11, 2011 at 12:00:45PM -0800, John Simon wrote:
> I recently attached a Solaris 10 8/07 client (6900 with ce gigbit
> interface) to our NFS server which runs RHEL 5.5 (kernel
> 2.6.18-194.el5).

Could you file a bug against Red Hat and/or Solaris?

> Performance typically is good running around
> 25-50MB/s but sometimes seemingly without reason the performance drops
> to abysmal levels and will stay like that until NFS is unmounted and
> remounted. I have tested this after hours when there is no load on
> either server, no traffic on the network and using a 1GB test file.
> Our other 300 Linux clients have no performance issues, I have ruled
> out network issues by isolating the server to a switch dedicated to it
> and an additional port on the NFS server and the tests I performed
> were with the file cache in memory.
> 
> $ time cp /var/tmp/1g.TEST.new /mnt/ real    25m1.456s user
> 0m0.276s sys     0m6.699s
> 
> After an unmount, wait 5 minutes and remount:
> 
> $ time cp /var/tmp/1g.TEST.new /mnt/ real    0m26.767s user
> 0m0.277s sys     0m6.589s
> 
> Mount options I am using on Solaris:
> 
>   Flags:
>   vers=3,proto=tcp,sec=none,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retrans=5,timeo=600
>   Attr cache:    acregmin=120,acregmax=120,acdirmin=120,acdirmax=120

Some ideas:

    - Anything interesting in the logs on client or server?
    - If you look at a small part of the network traffic in
      wireshark, in the bad case, is there any obvious problem?
      (Lots of retransmissions, errors returned from the server, ?)
    - Can you get any rpc statistics out of the client?  (Average
      time to respond to an rpc, mix of rpc's sent, etc.?)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to http://us.mc1814.mail.yahoo.com/mc/compose?to=majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comment 2 Jeff Layton 2012-02-05 21:14:25 UTC
I think the first place to start is with some network captures that show the traffic between client and server.

nfsstat info might also be interesting if there's a large difference in the
number and/or type of calls between the two test runs.

Comment 3 Ben England 2012-02-06 16:01:26 UTC
Hi Roland, we worked together about 5 years ago at IBRIX.

How do you know that you are using 4-KB I/O size (paragraph 1 in your original
problem report)?  It appears to me that in the "expected results" section with
your test run with good performance, you didn't specify rsize and wsize.  The
Linux NFS server in RHEL6.2 will negotiate up to 1 MB RPC size (default with
NFS V4 at least), If network round-trip time is high then larger RPC size
should help.  IN both expected and actual cases above, dd I/O size is 1000000
bytes.

I don't know Solaris well but is there a /proc/mounts file or equivalent on the
Solaris client and does the mountpoint appear in it?  This would tell us what
parameters were negotiated for the NFS mount in each case.

I don't see this kind of drop-off with Linux NFS client when I do the test
using a Linux client,  nor do I see any kind of evidence here of a regression
with NFS since you reproduced it on all those RHEV versions.  

Can you do this on RHEL server?

# tcpdump -w /tmp/a.tcpdump -s 1500 -c 100000 

before test starts, compress and post as an attachment?  Also, can you do this
on RHEL server

# while [ 1 ] ; do nfsstat -s ; sleep 5 ; done > nfsstat.log

before test starts and post that? This will tell us whether there is a stall or
whether this is steady state behavior.

does NFS V3 behave differently than NFS V4?

Is it possible that the network path might be the cause of this problem?  There
is a way to use netperf to simulate NFS behavior to some extent at the network
level, you might want to try this and see what different RPC sizes will do
(vary the -r parameter below to simulate reads vs writes, different RPC sizes,
run multiple netperf processes to simulate multiple threads).

[root@perf56 ~]# netperf -v 5 -l 5 -H perf36 -t TCP_RR -- -r 512,32768
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
perf36.lab.bos.redhat.com (10.16.41.6) port 0 AF_INET : spin interval : demo :
first burst 0
Alignment      Offset         RoundTrip  Trans    Throughput
Local  Remote  Local  Remote  Latency    Rate     10^6bits/s
Send   Recv    Send   Recv    usec/Tran  per sec  Outbound   Inbound
    8      0       0      0   611.323   1635.797 6.700     428.814

Comment 4 roland.teague 2012-02-06 19:31:14 UTC
Hi Ben, glad to hear your at Redhat. :-)

I assumed that the default rsize and wsize values were 4k. Perhaps I was mistaken because when I specify a rsize and wsize of 4k I get performance that is worse than the 32k block size. I'm not sure what I/O size is used on Solaris 10 when the rsize and wsize values are not specified. But I tried setting the rsize and wsize values to 1m and I still get horrible performance. I see the issue with both NFS version 3 and 4.

Here are the results when not specifying the rsize and wsize values.

bash-3.2# mount -F nfs -o vers=4 10.10.138.134:/home /ibfs1
bash-3.2# mount
/ibfs1 on 10.10.138.134:/home remote/read/write/setuid/devices/rstchown/vers=4/xattr/dev=4d40024 on Mon Feb  6 14:19:15 2012
bash-3.2# time dd if=/dev/zero of=/ibfs1/dd_test_4k bs=1000000 count=1000
1000+0 records in
1000+0 records out

real    0m16.791s
user    0m0.003s
sys     0m1.052s
bash-3.2#

I will work on getting the tcpdumps. nfsstat is showing a 50% split between
putfh and write calls on both the client and server for both 4K and 32K I/O
sizes.

Comment 5 Steve Dickson 2012-02-06 19:42:58 UTC
(In reply to comment #4)
> I will work on getting the tcpdumps. nfsstat is showing a 50% split between
> putfh and write calls on both the client and server for both 4K and 32K I/O
> sizes.
Please use tshark got capture the traces since tcpdump does not have
v4 support. Something similar to:
    tshark -w /tmp/data.pcap <server>
    bzip2 /tmp/data.pcap.

tia,

Comment 6 roland.teague 2012-02-06 21:26:48 UTC
Created attachment 559758 [details]
badperf 32k I/O size tcpdump

The end customer is using NFS version 3 so I have included a tcpdump using NFS version 3. These are the results when I use a rsize/wsize of 32k.

Comment 7 roland.teague 2012-02-06 21:28:58 UTC
Created attachment 559759 [details]
goodperf default I/O size tcpdump

This is the good performing tcpdump of NFS version 3 using the default rsize/wsize.

Comment 8 Steve Dickson 2012-02-07 15:35:30 UTC
(In reply to comment #6)
> Created attachment 559758 [details]
> badperf 32k I/O size tcpdump
> 
> The end customer is using NFS version 3 so I have included a tcpdump using NFS
> version 3. These are the results when I use a rsize/wsize of 32k.

With this trace you are not getting 32k writes. You are getting 32 byte writes...
How are you setting the rsize/wsize sizes?

Comment 9 Jeff Layton 2012-02-07 16:23:22 UTC
Also, the "goodperf" capture shows that solaris is defaulting to a 32k wsize, not a 4k one. I have to wonder if Solaris understands the 'k' that you're using. I suggest specifying rsize/wsize in bytes and redoing your test.

For instance: wsize=32768

Comment 10 roland.teague 2012-02-07 16:30:05 UTC
I also noticed the same write sizes in the tcpdumps which would explain the performance difference. I suspect that "Solaris" does not understand the "k" so I am retesing. I'm also confirming with the end customer if they were indeed using 32k for the rsize/wsize mount options and not 32768.

Comment 12 Jeff Layton 2012-02-16 11:06:16 UTC
Ok, given that we think we understand the problem, I'm going to go ahead and
close this as NOTABUG. Roland, please feel free to reopen the bug if our
analysis turns out to be incorrect or if you want to discuss it further.

Comment 13 roland.teague 2012-02-16 14:56:51 UTC
So it turns out that I could not reproduce the performance issue as easily as I thought I could due to the syntax issues on the Solaris side. The customer still is seeing the performance issue with any block size greater than 4096. We haven't been able to reproduce it and the customer cannot reproduce at will. I have asked for them to capture tcpdumps when they reproduce the issue again. They are running RHEL 5.5.