Description of problem: Current versions of Linux have an issue when you use file locking: they can end up using a lot of stateids if you have one or several applications does a lot of lock/unlock cycles but where the file stateid never gets CLOSEd (which can happen if at least one application has the file open at any point in time). That is a bug that the above patch set aims to fix by adding client side support for the RELEASE_LOCKOWNER operation. Some versions of Linux also had a problem with delegation management: instead of returning delegations for files that had seen no recent activity, they would continue to hoard them until either the server recalled them, or the entire inode dropped out of cache, or the application performed some action that required the delegation return. This has been fixed since Linux 2.6.29: we now return delegations that have seen no activity in the last lease period. Can we back port the fix to 2.6.18 kernel? Version-Release number of selected component (if applicable): RHEL5.3 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
It appears that linux clients constant request new owner ID when new locks are requested. There appears to be an issue in linux 2.6.18.x in releasing NFSv4 LOCK. I setup a comparising test between solaris and linux. Data is in the attachement. solaris_nfsv4_lock_ulock_packet.txt No. Time delta Source Destination Protocol Info 144 2.863063 0.000063 10.61.75.210 10.61.69.240 NFS V4 COMPOUND Call (Reply In 145) lock PUTFH;LOCK Opcode: LOCK (12) locktype: WRITEW_LT (4) reclaim?: No offset: 0 length: 18446744073709551615 new lock owner?: No <-- NOT requesting new lock owner reusing sane owner ID stateid seqid: 0x00000039 Data: A298574C0400000000000000 lock_seqid: 0x0000003a REQUESTING ANOTHER LOCK No. Time delta Source Destination Protocol Info 148 2.863563 0.000063 10.61.75.210 10.61.69.240 NFS V4 COMPOUND Call (Reply In 149) lock PUTFH;LOCK Opcode: LOCK (12) locktype: WRITEW_LT (4) reclaim?: No offset: 0 length: 18446744073709551615 new lock owner?: No <---- stateid seqid: 0x0000003b Data: A298574C0400000000000000 lock_seqid: 0x0000003c Where as linux rhel 5u5 Packet trace ------------- No. Time delta Source Destination Protocol Info 1261 4.828593 0.000055 10.61.73.40 10.61.69.240 NFS V4 COMPOUND Call (Reply In 1262) <EMPTY> PUTFH;LOCK Opcode: LOCK (12) locktype: WRITEW_LT (4) reclaim?: No offset: 0 length: 18446744073709551615 new lock owner?: Yes <--- requesting new owners seqid: 0x00000134 stateid seqid: 0x00000000 Data: 4C0B574C0800000000000000 lock_seqid: 0x00000000 Owner clientid: 0x030001004c0b574c Data: <DATA> length: 4 contents: <DATA> this stress lock test program ultimately causes the netapp filer to run out of resources and send back NFSv4ERR_RESOURCE (10018)
Created attachment 436190 [details] test data for both solarus and rhel5u5
The original description of this bug is from one of Trond's emails (http://article.gmane.org/gmane.linux.nfs/33685) The discussion thread(s): * remote I/O error nfs4err_resource (http://thread.gmane.org/gmane.linux.nfsv4/11583/) * Re: remote I/O error nfs4err_resource (http://thread.gmane.org/gmane.linux.nfsv4/11630/) There seems to be two sets of issues here when referring to RHEL 5, one involving locking and the other delegation. Locking ------- * kernel.org Bug 14501 - NFSv4 mount: Open,fork,Lock,Read: Read denied because it uses the OPEN state instead of the LOCK state (https://bugzilla.kernel.org/show_bug.cgi?id=14501) * Trond's patches to fix kernel.org Bug 14501: [PATCH 0/5] NFSv4 Lock owner fixes (http://thread.gmane.org/gmane.linux.nfs/33353) Delegation ---------- * Presumably the relevant patches were a part of these: [PATCH 00/69] NFS client patches queued for the 2.6.29 merge window (http://thread.gmane.org/gmane.linux.nfs/23822/) Regards, Harshula
Adding some context to this bug report. This bugzilla was a result of a customer reported problem. A large number of RHEL 5.3 NFS clients mounting a NFSv4 share from a Netapp server suddenly started show the following log messages repeatedly and could no longer access the share from Netapp. NFS: v4 server returned a bad sequence-id error! We have the following information posted by Netapp engineer --------<snip>--------- OnTAP has a fixed size table of stateids: that is a known limitation, but it is also a deliberate choice in order to ensure that the server resources can be managed in a predictable manner. OnTAP also has a bug which can affect you when you turn on delegations: if the server runs out of free stateid table slots, then the natural thing would be for it to start asking clients to return delegations. This is not yet implemented in OnTAP and so Burt 388695 is tracking this issue. Current versions of Linux have an issue when you use file locking: they can end up using a lot of stateids if you have one or several applications does a lot of lock/unlock cycles but where the file stateid never gets CLOSEd (which can happen if at least one application has the file open at any point in time). That is a bug that the above patch set aims to fix by adding client side support for the RELEASE_LOCKOWNER operation. Older versions of Linux also had a problem with delegation management: instead of returning delegations for files that had seen no recent activity, they would continue to hoard them until either the server recalled them, or the entire inode dropped out of cache, or the application performed some action that required the delegation return. This has been fixed since Linux 2.6.29: we now return delegations that have seen no activity in the last lease period. --------<snip>---------
A correction to the note in c#6. The clients were all RHEL 5.4.
Netapp does have a workround implemented by BURT 354058 http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=354058 This changes the reply from netapp instead of ERR_RESOURCE we now send ERR_JUKEBOX. However the customer wants to understand the root cause why we have so many owner id associated to their production Netapp filer. At one time during their production load environment they had 70+ linux 5.4 using NFSv4.
> instead of returning delegations for files that had seen no recent > activity, they would continue to hoard them until either the server > recalled them, or the entire inode dropped out of cache, ... If the locks and delegations are returned when the inodes are freed from the inode slab cache, perhaps a "sysctk vm.dropcaches=2" or a "mount -o remount" of the NFSv4 volumes would help return them earlier and serve as a workaround to clear up resources at the filer?
NetApp provided additional information: * The customer is currently using ONTAP 7.3.2P6D1 without BURT 354058 (http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=354058). * The customer's NetAPP filers have nfs.v4.read_delegation, nfs.v4.write_delegation turned off. So we need to focus on Trond's locking patches (http://thread.gmane.org/gmane.linux.nfs/33353). * Samuel Li will add to the BZ a 64 bit build of the reproducers.
Created attachment 436465 [details] reproducer (includes 32 bit & 64 bit binaries) provided by Samuel Li
A question regarding the error returned. I may be wrong here so would like some clarification. The error NFS: v4 server returned a bad sequence-id error! is returned in static struct nfs4_state *nfs4_do_open(struct inode *dir, struct path *path, int flags, struct iattr *sattr, struct rpc_cred *cred) { .. do { status = _nfs4_do_open(dir, path, flags, sattr, cred, &res); if (status == 0) break; /* NOTE: BAD_SEQID means the server and client disagree about the * book-keeping w.r.t. state-changing operations * (OPEN/CLOSE/LOCK/LOCKU...) * It is actually a sign of a bug on the client or on the server. * * If we receive a BAD_SEQID error in the particular case of * doing an OPEN, we assume that nfs_increment_open_seqid() will * have unhashed the old state_owner for us, and that we can * therefore safely retry using a new one. We should still warn * the user though... */ if (status == -NFS4ERR_BAD_SEQID) { printk(KERN_WARNING "NFS: v4 server returned a bad sequence-id error!\n"); exception.retry = 1; continue; } .. } At the point when the OPEN call is made, the stateid is not yet set. In this case, the NFS4ERR_BAD_SEQID returned means that the client set seqid used to maintain proper ordering of the OPEN request used a wrong sequence number for this lockowner. This should not be related to the sequence_id in the stateid at all. Also, from the mail threads referenced in c#5, on exhaustion of the stateid, a NFS3ERR_RESOURCE is returned. Are we sure we are seeing the same problem here?
I guess the RESOURCE errors on the Netapp server could also lead to the Server returning the BAD_SEQID error. Do we also see the NFS: v4 server returned a bad sequence-id error! errors on the test systems when the reproducer is run to generate the problem with RESOURCE errors?
From 2045648_NFSv4_request_new_owner/rhel5u5_64_nfsv4_lock_unlock.trc : No. Time Source Destination Protocol Info 131330 2010-08-03 13:48:46.284749 10.61.69.240 10.61.73.40 NFS V4 COMP Reply (Call In 131327) <EMPTY> PUTFH;CLOSE CLOSE(10026) ... Network File System, Ops(2): PUTFH(0) CLOSE(10026) [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4ERR_BAD_SEQID (10026) Tag: <EMPTY> length: 0 contents: <EMPTY> Operations (count: 2) Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: CLOSE (4) Status: NFS4ERR_BAD_SEQID (10026)
As Hash explained, we have a tcpdump within the attachment "test data for both solarus and rhel5u5" which confirms that we do indeed see this error returned by the server. rhel5u5_64_nfsv4_lock_unlock.trc 65596 COMP Call (Reply In 65597) <EMPTY> PUTFH;LOCK LOCK - seqid: 0x00004000 65597 COMP Reply (Call In 65596) <EMPTY> PUTFH;LOCK LOCK(10018) NFS4ERR_RESOURCE 65598 COMP Call (Reply In 65601) <EMPTY> PUTFH;CLOSE CLOSE;GETATTR GETATTR - seqid: 0x00004001 65601 COMP Reply (Call In 65598) <EMPTY> PUTFH;CLOSE CLOSE(10026) NFS4ERR_BAD_SEQID This could be an unrelated bug. From RFC 3530 -- The client MUST monotonically increment the sequence number for the CLOSE, LOCK, LOCKU, OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE operations. This is true even in the event that the previous operation that used the sequence number received an error. The only exception to this rule is if the previous operation received one of the following errors: NFS4ERR_STALE_CLIENTID, NFS4ERR_STALE_STATEID, NFS4ERR_BAD_STATEID, NFS4ERR_BAD_SEQID, NFS4ERR_BADXDR, NFS4ERR_RESOURCE, NFS4ERR_NOFILEHANDLE. -- The sequence id should not be incremented when NFS4ERR_RESOURCE is returned.
Created attachment 436599 [details] Test data for rhel6beta2
I just uploaded another test I just performed on Rhel 6 Beta 2 There I don't get the NFS4ERR_BAD_SEQID Does it matter? From the filer side in rhel5u5 when we reached a NFS4ERR_RESOURCE it appears the owner id allocated was newer cleaned correctly. linux_nfsv4_lock_c_being <-- this displayes netapp filer lock data Unconfirmed client count 0 Clients with valid callbacks 2 Free client count 8189 Free owners 8190; In-Use Owners 2 <---- When we hit ERR_RESOURCE linux_nfsv4_lock_c_err_resource Clients with valid callbacks 3 Free client count 8188 Free owners 510; In-Use Owners 16386 <-- netapp still honors the onwers ID In RHEL6Beta 2 lock_c_rhel6b2_begin Clients with valid callbacks 2 Free client count 8188 Free owners 16885; In-Use Owners 11 <-- we begin lock_c_rhel6b2_after Clients with valid callbacks 2 Free client count 8188 Free owners 16884; In-Use Owners 12 <--- after ERR_RESOUCE clean up happend snippet of the trace added to the attachment linux_rhel6b2.nfsv4_locl_unlock_end.trc 37 COMP call LOCK 39 COMP reply LOCK NFS4ERR_RESOURCE 10018 40 COMP call CLOSE 42 COMP reply CLOSE NFS4_OK
This looks like the issue that was fixed by commit 52567b03ca38b6e556ced450d64dba8d66e23b0e (NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE).
Besides Trond's patch for NFS4ERR_BAD_SEQID when ERR_RESOURCE has been returned. Are there plans for RELEASE_LOCKOWNER being implemented in rhel5 codeline? Operation 39: RELEASE_LOCKOWNER - Release Lockowner State DESCRIPTION This operation is used to notify the server that the lock_owner is no longer in use by the client. This allows the server to release cached state related to the specified lock_owner. If file locks, associated with the lock_owner, are held at the server, the error NFS4ERR_LOCKS_HELD will be returned and no further action will be taken. IMPLEMENTATION The client may choose to use this operation to ease the amount of server state that is held. Depending on behavior of applications at the client, it may be important for the client to use this operation since the server has certain obligations with respect to holding a reference to a lock_owner as long as the associated file is open. Therefore, if the client knows for certain that the lock_owner will no longer be used under the context of the associated open_owner4, it should use RELEASE_LOCKOWNER.
Created attachment 436817 [details] Backport of Trond's NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE
A backport of RELEASE_LOCKOWNER has been build for rhel5.x by Steve and it working. I been testing the new kernel on 6 x rhel5.x boxes and netapp cluster filer Test performed * lock tests ~approx 1000 lock per client( 6 x clients ) * additional load added * Netapp CFO every 5 min on both filer heads * fibre loop saturation * disk issue (2 reconstruct double degraded and zeroing drives) * NFS traffic using sio transferring 100MiB/sec (2 x client) * internal netapp read/write test After 12 hours all programs are still active and running smooth. There are some longer latencies around 40 sec. This is expected due the netapp cluster failover/giveback procedure. [root@rhel5u5-64rtp nfsv4]# rsh 10.61.69.240 "priv set diag; lock status -c " Warning: These diagnostic commands are for use by NetApp personnel only. Free StateID count 32743 New StateID count 0 Revoked StateID count 0 Closed StateID count 0 Confirm-Wait StateID count 0 Normal StateID count 24 <----- Expired StateID count 0 Dlg_Revoked StateID count 0 Idle client count 1 Leasing client count 6 Expired client count 0 Unconfirmed client count 0 Clients with valid callbacks 6 Free client count 8185 Free owners 8178; In-Use Owners 14 <--- 6 x rhel clients systat from netapp filer CPU NFS CIFS HTTP Total Net kB/s Disk kB/s in out read write 59% 14561 0 0 14561 26457 38088 92492 47328 89% 12120 0 0 12120 24793 121865 95588 38480 <---121MiB/sec net out 45% 12983 0 0 12983 6786 121720 86184 57344 45% 13126 0 0 13126 4683 122097 92188 53256 39% 18857 0 0 18857 5593 70040 92320 38928 25% 19540 0 0 19540 3204 1524 89644 54016 24% 19241 0 0 19241 3155 1501 89279 48863 23% 18188 0 0 18188 2987 1424 94438 36186 76% 15281 0 0 15281 102219 3361 92084 53599 23% 18247 0 0 18247 2992 1424 86176 52872 23% 18760 0 0 18760 3076 1463 89424 38672
*** Bug 537151 has been marked as a duplicate of this bug. ***
in kernel-2.6.18-222.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
*** Bug 510980 has been marked as a duplicate of this bug. ***
Do not have hardware to test. And according to https://bugzilla.redhat.com/show_bug.cgi?id=620502#c11, we can relay on the positive result provided by Netapp engineer at https://bugzilla.redhat.com/show_bug.cgi?id=620502#c34. do code review and confirm patch apply is sane. fsstress job links: https://beaker.engineering.redhat.com/jobs/26703 http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=177847 http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=177848 http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=177850
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, a large number of Red Hat Enterprise Linux NFS clients mounting a NFSv4 share from a server would show the following log messages repeatedly and could no longer access the share from the server: NFS: v4 server returned a bad sequence-id error! With this update, the error is no longer returned.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,5 +1,3 @@ -Previously, a large number of Red Hat Enterprise Linux NFS clients mounting a NFSv4 share from a server would show the following log messages repeatedly and could no longer access the share from the server: +The RELEASE_LOCKOWNER operation has been implemented for NFSv4 in order to avoid an exhaustion of NFS server state IDs, which could result in an NFS4ERR_RESOURCE error. Furthermore, if the NFS4ERR_RESOURCE error is returned by the NFS server, it is now handled correctly, thus preventing a possible reason for the following error: -NFS: v4 server returned a bad sequence-id error! +NFS: v4 server returned a bad sequence-id error!- -With this update, the error is no longer returned.