Bug 502884

Summary:

NFSv4 Issue/slowdown when testing against the NetApp server

Product:

Red Hat Enterprise Linux 4

Reporter:

Vivek Goyal <vgoyal>

Component:

kernel

Assignee:

Jeff Layton <jlayton>

Status:

CLOSED ERRATA

QA Contact:

yanfu,wang <yanwang>

Severity:

medium

Docs Contact:

Priority:

high

Version:

4.9

CC:

bikash.choudhury, bikash, dhoward, jburke, lwang, nhorman, pbunyan, ricardo.labiaga, rwheeler, sager, staubach, steved, yanwang

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-02-16 15:51:29 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

589293

Attachments:

Description	Flags
Network capture	none
test patch -- backport of all 3 patches for 441884	none
capture of rhel4 creating file	none
capture of rhel5 creating file	none
test patch 2 -- backport 3 patches for 441884 and 2 patches to fix setattr handling	none

Description Vivek Goyal 2009-05-27 15:51:22 UTC

Description of problem:

RHEL4 rhts connectathon tests have suddenly started throwing out warnings against NetApp server that tests took too long to finish. Previously we were not seeing these issues and have started seeing these now. Looks like upgrade of netapp nfs server has led to issues.

Looks like, The NFSv4 test in RHEL4 against the NetApp are running 3X slower then
before.

Version-Release number of selected component (if applicable):

RHEL4 U9 (89.1.EL)

How reproducible:
Saw it on all the machines across architecture. So it is easily reproducible.

Steps to Reproduce:
1. Run connectathon tests.
2.
3.
  
Actual results:

Slow finish
Expected results:

Should get back to faster finish as with older versions of server.

Additional info:

Couple of links to rhts warnings.

http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/kernel/filesystems/nfs/connectathon&result=Warn&rwhiteboard=kernel%202.6.9-89.1.EL%20smp&arch=i386&jobids=59909
http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/kernel/filesystems/nfs/connectathon&result=Warn&rwhiteboard=kernel%202.6.9-89.1.EL&arch=x86_64&jobids=59909

Comment 1 Peter Staubach 2009-05-27 16:08:00 UTC

Created attachment 345636 [details]
Network capture

Attaching a network capture of the traffic while the slowness is occuring.

It may or may not be useful, but did look different than the network
traffic during the faster sections of the test run.

Comment 2 Neil Horman 2009-05-27 18:32:34 UTC

Peter, as per your email I took a look at the tcpdump, and as you noted there is a connection which is constantly getting reset (well, closed with a FIN from the server).  I'm not sure if it the source of your problems, but it certainly looks suspicious.  I compared a good NFS v4 call (frame 22) with a data block from a connection that the server closes (frame 20).

If you look at them the first difference is the destination port on the server they are sending to, the good connection uses the nfs program port (733) while the connection that is perpetually getting closed is going to port 59216 on the server.  Theres not enough of the trace to tell, and its possible that a program registered with the portmapper there, but its been my experience that nfs ports tend to be fairly regular and well known.  Thats not to say a different service isn't registered there, although I couldn't say what it is.

Continuing, if you look at the data block in frame 20 and compare it to the good frame, we can see that frame 20 holds what looks like an rpc message (the rpc header is almost identical to that of 22 with obvious differences for xid, etc).  Wireshark seems unable to decode frame 20 propoerly, but a hadn decode shows that it looks like a 148 byte rpc message contained in 1 fragment with an xid of 0x3d46a3af containing a call command in rpc v2 format.  AFter that the wheels come off the wagon, though.  The RPC program number is 0x40000000, which I can not find defined anywhere in any well known rpc program list.  Of course that makes the rest of the frame meaningless, since we don't know what the rest of it should decode as (the program version and call number seem equally garbage-like).  The netapp responds to this in frame 23 with a response containing the correct/same xid as in frame 20, but it appears to be some sort of rpc error (I need to read up on the rpc frame format for a full decode)

So in summary:
1) The RHEL4 client is attempting to connect to an odd port that doesn't appear to belong to the NFS suite

2) The Netapp Server is accepting the connection

3) The RHEL4 client is pusing what appears to be valid rpc data for an unknown RPC program number/version/call

4) The Netapp server responds with what appears to be a valid rpc response containing (I think) some error data

5) The netapp closes the connection

Given that all these closures are occuring on the same port, I think a next logical step would be to run an rpcinfo on the netapp, and figure out what program 0x40000000 and what program is listening on port 59216.  That'll probably give us a better idea of what exactly this exchange is about, and likely give us more information about the change in behavior.

Comment 3 Jeff Layton 2009-05-28 13:53:59 UTC

That unknown RPC call is the server calling back the client to recall a delegation. If you look you can see the client almost always sends an immediate DELEGRETURN call to the server after that exachange completes.

Unfortunately wireshark is really bad at decoding these callbacks. They're pretty simple however and I don't think it's uncommon for the server to close the socket after doing the call.

It looks like the delegation callbacks are pretty universally being done just after a SETATTR call. The server responds with NFS4ERR_DELAY and then recalls the delegation.

The problem here though is that RHEL4 has this:

    #define NFS4_POLL_RETRY_MIN     (1*HZ)

...so it waits a minimum of 1s before retrying the SETATTR call. If we're getting a lot of these sorts of calls then that may explain the problem. In RHEL5 we added a patch a while back that brought down this value to .1s, which is a little more reasonable. Perhaps we should experiment with that for RHEL4 too?

Comment 4 Jeff Layton 2009-05-28 14:04:33 UTC

The RHEL5 case for that problem was bug 441884.

Comment 5 Jeff Layton 2009-05-28 17:56:15 UTC

Created attachment 345808 [details]
test patch -- backport of all 3 patches for 441884

This patch is a rollup of all 3 patches from 441884. I've lightly tested it and it seems to basically work. I don't have a good way to reproduce this here however, so it might be good to have someone who has been able to reproduce it test this patch.

Comment 8 Jeff Layton 2009-06-01 13:09:14 UTC

From my own testing against filerb (note that my clients are much farther away on the network than Peter's):

Unpatched -89.EL:
created 155 files 62 directories 5 levels deep in 183.14 seconds

Patched -89.EL with 3 patch set from above:
created 155 files 62 directories 5 levels deep in 42.6 seconds

RHEL-5 2.6.18-150.el5debug:
created 155 files 62 directories 5 levels deep in 15.61 seconds

...so I'm also seeing a significant difference between RHEL4 and 5 even with the 3 patch set so far. I'll get some captures and see if I can determine the cause.

Comment 9 Jeff Layton 2009-06-01 13:40:45 UTC

Nothing really stands out at me in the post-patch RHEL4 capture. RHEL5 is a bit different, the server never seems to issue a NFS4ERR_DELAY and delegation recall after the setattr from a RHEL5 client, even though the clients are setting the same attribute (FATTR4_TIME_MODIFY_SET to SET_TO_SERVER_TIME4).

We may need some help from netapp to explain why this is.

Comment 10 Jeff Layton 2009-06-01 13:46:12 UTC

cc'ing Bikash and Ricardo...

Can either of you shed any light on the difference I see in comment #9? I can provide captures if they'll help...

Comment 11 Jeff Layton 2009-06-01 19:20:05 UTC

Created attachment 346127 [details]
capture of rhel4 creating file

Comment 12 Jeff Layton 2009-06-01 19:20:34 UTC

Created attachment 346128 [details]
capture of rhel5 creating file

Comment 13 Jeff Layton 2009-06-01 19:25:05 UTC

Ok, here are 2 captures of files being created...

The first thing I notice is that RHEL4 involves a lot more calls. That in and of itself probably accounts for some of the performance differential.

I still don't see exactly why the filer decides to recall the delegation from the RHEL4 client, but not the RHEL5 one on the setattr. The open and setattr calls pretty much look identical. Even though we have a patch to retry in .1s after getting NFS4ERR_DELAY, it would still probably be a big win if we never got such a delay at all.

Comment 14 Ricardo Labiaga 2009-06-12 06:50:45 UTC

Can you run the following commands on the filer to determine OS version and options?

version
options nfs

Comment 15 Jeff Layton 2009-06-12 10:13:03 UTC

Thanks Ricardo, here's the info:

na3070b> version
NetApp Release 7.3.1: Thu Jan  8 01:17:51 PST 2009
na3070b> options nfs
nfs.acache.persistence.enabled on         (value might be overwritten in takeover)
nfs.assist.queue.limit       40         (value might be overwritten in takeover)
nfs.export.allow_provisional_access on         (value might be overwritten in takeover)
nfs.export.auto-update       on         (value might be overwritten in takeover)
nfs.export.exportfs_comment_on_delete on         (value might be overwritten in takeover)
nfs.export.harvest.timeout   1800       (value might be overwritten in takeover)
nfs.export.neg.timeout       3600       (value might be overwritten in takeover)
nfs.export.pos.timeout       36000      (value might be overwritten in takeover)
nfs.export.resolve.timeout   6          (value might be overwritten in takeover)
nfs.hide_snapshot            off        
nfs.ifc.rcv.high             66340      
nfs.ifc.rcv.low              33170      
nfs.ifc.xmt.high             16         
nfs.ifc.xmt.low              8          
nfs.ipv6.enable              off        
nfs.kerberos.enable          off        
nfs.locking.check_domain     on         (value might be overwritten in takeover)
nfs.max_num_aux_groups       32         
nfs.mount_rootonly           on         
nfs.mountd.trace             off        
nfs.netgroup.strict          off        
nfs.notify.carryover         on         
nfs.ntacl_display_permissive_perms off        (value might be overwritten in takeover)
nfs.per_client_stats.enable  off        
nfs.require_valid_mapped_uid off        
nfs.response.trace           off        (value might be overwritten in takeover)
nfs.response.trigger         60         (value might be overwritten in takeover)
nfs.rpcsec.ctx.high          0          
nfs.rpcsec.ctx.idle          360        
nfs.tcp.enable               on         
nfs.thin_prov.ejuke          off        (value might be overwritten in takeover)
nfs.udp.enable               on         
nfs.udp.xfersize             32768      (value might be overwritten in takeover)
nfs.v2.df_2gb_lim            off        (value might be overwritten in takeover)
nfs.v3.enable                on         (value might be overwritten in takeover)
nfs.v4.acl.enable            on         (value might be overwritten in takeover)
nfs.v4.enable                on         (value might be overwritten in takeover)
nfs.v4.id.domain             redhat.com 
nfs.v4.read_delegation       on         (value might be overwritten in takeover)
nfs.v4.setattr_acl_preserve  off        (value might be overwritten in takeover)
nfs.v4.write_delegation      on         (value might be overwritten in takeover)
nfs.webnfs.enable            off        
nfs.webnfs.rootdir           XXX        
nfs.webnfs.rootdir.set       off

Comment 16 Ricardo Labiaga 2009-06-12 20:30:29 UTC

Jeff,

Is RHEL4 using the delegation stateid in the setattr call?  ONTAP 7.3.1 will
actually check for this.

Comment 17 Jeff Layton 2009-06-13 01:17:28 UTC

Good catch, Ricardo. RHEL4 is sending a zeroed out stateid. I'll poke at the code and see if I can figure out why. Thanks!

Comment 18 Jeff Layton 2009-06-18 02:04:56 UTC

Created attachment 348370 [details]
test patch 2 -- backport 3 patches for 441884 and 2 patches to fix setattr handling

Here's an updated test patch.

This adds 2 patches backported from upstream to make setattrs use the delegation stateid when there is one. My results with this are much better -- ~23s to create the files for cthon basic test 1 and it looks like it eliminates the frequent delegation recalls.

Still not quite as fast as RHEL5, but definitely getting into the same ballpark (only about an 8s delta now in my testing). I think the main difference now is that RHEL5 seems to do fewer calls overall on a create. That probably accounts for the difference here.

Peter, when you get time can you test this patch too and let me know if it looks better on your end?

Comment 19 Peter Staubach 2009-06-18 12:44:23 UTC

I will include it into the kernel that I run.

Even at ~23 seconds, this is still a far cry from the NFSv3 times of .3 seconds.
I wonder if this is required or whether there is still something amiss.

Comment 20 Jeff Layton 2009-06-18 12:53:44 UTC

My client is quite a bit farther from the server. NFSv4 + RHEL5 takes ~15s to do the same test.

From a look at traces, RHEL4 definitely makes a lot more calls on the wire for each create. I'm pretty sure some of those could be optimized away, but given that we discourage using NFSv4 on RHEL4, I'm not sure it's really worthwhile.

I've also included these patches in my RHEL4 test kernels:

http://people.redhat.com/jlayton

Comment 21 Jeff Burke 2009-08-24 14:44:04 UTC

After reading Jeff's Comment #20 and thinking about the amount of time this is wasting. On every kernel, every cthon test we have to manually review the WARN. Just to make sure it is this issue we are aware of.

If we are not going to fix this. I would like to disable RHEL4 NFS tests on the NetApp servers.

Comment 22 Bikash 2009-08-24 16:32:34 UTC

I am advocating RHEL5.x to all our field personnel and customers who are attempting to use NFSv4 in production.

Comment 26 RHEL Program Management 2009-08-24 17:41:41 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 27 Vivek Goyal 2010-09-17 17:50:55 UTC

Committed in 89.36.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 30 yanfu,wang 2011-01-07 05:57:33 UTC

I run connectathon testsuite on 2.6.9-89.1.EL which is the initial problem report and run on the new kernel 2.6.9-94.EL again in beaker, the NFSv4 test in RHEL4 against the NetApp nfs server all pass now, pls check the below links:
run against old kernel 2.6.9-89.1.EL:
https://beaker.engineering.redhat.com/jobs/43731
https://beaker.engineering.redhat.com/jobs/43732
https://beaker.engineering.redhat.com/jobs/43734
https://beaker.engineering.redhat.com/jobs/43735
https://beaker.engineering.redhat.com/jobs/43736
https://beaker.engineering.redhat.com/jobs/43742

against new kernel version 2.6.9-94.EL:
i386: https://beaker.engineering.redhat.com/jobs/43750
x86_64: https://beaker.engineering.redhat.com/jobs/43751, https://beaker.engineering.redhat.com/jobs/44029
ppc64: https://beaker.engineering.redhat.com/jobs/43753
s390x： https://beaker.engineering.redhat.com/jobs/43755
s390: https://beaker.engineering.redhat.com/jobs/44028
ia64: https://beaker.engineering.redhat.com/jobs/43759

Comment 31 errata-xmlrpc 2011-02-16 15:51:29 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html