620502 – [NetApp 5.6 bug] RHEL NFS clients disconnected from NetApp NFSv4 shares with: v4 server returned a bad sequence-id error!

Bug 620502 - [NetApp 5.6 bug] RHEL NFS clients disconnected from NetApp NFSv4 shares with: v4 server returned a bad sequence-id error!

Summary: [NetApp 5.6 bug] RHEL NFS clients disconnected from NetApp NFSv4 shares with:...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	5.6
Assignee:	Steve Dickson
QA Contact:	yanfu,wang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	537151 (view as bug list)
Depends On:
Blocks:	557597 621304 628889
TreeView+	depends on / blocked

Reported:	2010-08-02 17:37 UTC by Bikash
Modified:	2018-12-01 18:27 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	The RELEASE_LOCKOWNER operation has been implemented for NFSv4 in order to avoid an exhaustion of NFS server state IDs, which could result in an NFS4ERR_RESOURCE error. Furthermore, if the NFS4ERR_RESOURCE error is returned by the NFS server, it is now handled correctly, thus preventing a possible reason for the following error: NFS: v4 server returned a bad sequence-id error!
Clone Of:
Clones:	621304 (view as bug list)
Environment:
Last Closed:	2011-01-13 21:47:24 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
test data for both solarus and rhel5u5 (13.30 MB, application/x-compressed) 2010-08-03 05:56 UTC, Samuel Li	no flags	Details
reproducer (includes 32 bit & 64 bit binaries) provided by Samuel Li (8.66 KB, application/x-compressed-tar) 2010-08-04 08:20 UTC, Harshula Jayasuriya	no flags	Details
Test data for rhel6beta2 (9.42 KB, application/x-compressed) 2010-08-04 16:32 UTC, Samuel Li	no flags	Details
Backport of Trond's NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE (1.80 KB, patch) 2010-08-05 10:51 UTC, Harshula Jayasuriya	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0017	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update	2011-01-13 10:37:42 UTC

Description Bikash 2010-08-02 17:37:43 UTC

Description of problem:
Current versions of Linux have an issue when you use file locking: they
can end up using a lot of stateids if you have one or several
applications does a lot of lock/unlock cycles but where the file stateid
never gets CLOSEd (which can happen if at least one application has the
file open at any point in time). That is a bug that the above patch set
aims to fix by adding client side support for the RELEASE_LOCKOWNER
operation.
Some versions of Linux also had a problem with delegation management:
instead of returning delegations for files that had seen no recent
activity, they would continue to hoard them until either the server
recalled them, or the entire inode dropped out of cache, or the
application performed some action that required the delegation return.
This has been fixed since Linux 2.6.29: we now return delegations that
have seen no activity in the last lease period.

Can we back port the fix to 2.6.18 kernel?

Version-Release number of selected component (if applicable):
RHEL5.3

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Samuel Li 2010-08-03 05:55:14 UTC

    It appears that linux clients constant request new owner ID when new locks are
    requested. There appears to be an issue in linux 2.6.18.x in releasing NFSv4
    LOCK. I setup a comparising test between solaris and linux. Data is in the
    attachement.

    solaris_nfsv4_lock_ulock_packet.txt

    No.     Time        delta       Source                Destination          
    Protocol Info
        144 2.863063    0.000063    10.61.75.210          10.61.69.240          NFS
         V4 COMPOUND Call (Reply In 145) lock         PUTFH;LOCK
            Opcode: LOCK (12)
                locktype: WRITEW_LT (4)
                reclaim?: No
                offset: 0
                length: 18446744073709551615
                new lock owner?: No   <-- NOT requesting new lock owner reusing
    sane owner ID
                stateid
                    seqid: 0x00000039
                    Data: A298574C0400000000000000
                lock_seqid: 0x0000003a


    REQUESTING ANOTHER LOCK
    No.     Time        delta       Source                Destination          
    Protocol Info
        148 2.863563    0.000063    10.61.75.210          10.61.69.240          NFS
         V4 COMPOUND Call (Reply In 149) lock         PUTFH;LOCK
            Opcode: LOCK (12)
                locktype: WRITEW_LT (4)
                reclaim?: No
                offset: 0
                length: 18446744073709551615
                new lock owner?: No   <----
                stateid
                    seqid: 0x0000003b
                    Data: A298574C0400000000000000
                lock_seqid: 0x0000003c

    Where as linux rhel 5u5

    Packet trace
    -------------
    No.     Time        delta       Source                Destination          
    Protocol Info
       1261 4.828593    0.000055    10.61.73.40           10.61.69.240          NFS
         V4 COMPOUND Call (Reply In 1262) <EMPTY> PUTFH;LOCK

            Opcode: LOCK (12)
                locktype: WRITEW_LT (4)
                reclaim?: No
                offset: 0
                length: 18446744073709551615
                new lock owner?: Yes    <--- requesting new owners
                seqid: 0x00000134
                stateid
                    seqid: 0x00000000
                    Data: 4C0B574C0800000000000000
                lock_seqid: 0x00000000
                Owner
                    clientid: 0x030001004c0b574c
                    Data: <DATA>
                        length: 4
                        contents: <DATA>

    this stress lock test program ultimately causes the netapp filer to run out of
    resources and send back NFSv4ERR_RESOURCE (10018)

Comment 3 Samuel Li 2010-08-03 05:56:28 UTC

Created attachment 436190 [details]
test data for both solarus and rhel5u5

Comment 5 Harshula Jayasuriya 2010-08-03 07:57:33 UTC

The original description of this bug is from one of Trond's emails (http://article.gmane.org/gmane.linux.nfs/33685)

The discussion thread(s):
* remote I/O error nfs4err_resource (http://thread.gmane.org/gmane.linux.nfsv4/11583/)
* Re: remote I/O error nfs4err_resource (http://thread.gmane.org/gmane.linux.nfsv4/11630/)

There seems to be two sets of issues here when referring to RHEL 5, one involving locking and the other delegation.

Locking
-------
* kernel.org Bug 14501 - NFSv4 mount: Open,fork,Lock,Read: Read denied because it uses the OPEN state instead of the LOCK state (https://bugzilla.kernel.org/show_bug.cgi?id=14501)

* Trond's patches to fix kernel.org Bug 14501: [PATCH 0/5] NFSv4 Lock owner fixes (http://thread.gmane.org/gmane.linux.nfs/33353)

Delegation
----------
* Presumably the relevant patches were a part of these: [PATCH 00/69] NFS client patches queued for the 2.6.29 merge window (http://thread.gmane.org/gmane.linux.nfs/23822/)

Regards,
Harshula

Comment 6 Sachin Prabhu 2010-08-03 09:57:47 UTC

Adding some context to this bug report.

This bugzilla was a result of a customer reported problem. 

A large number of RHEL 5.3 NFS clients mounting a NFSv4 share from a Netapp server suddenly started show the following log messages repeatedly and could no longer access the share from Netapp.

NFS: v4 server returned a bad sequence-id error!


We have the following information posted by Netapp engineer

--------<snip>---------

OnTAP has a fixed size table of stateids: that is a known limitation,
but it is also a deliberate choice in order to ensure that the server
resources can be managed in a predictable manner.
OnTAP also has a bug which can affect you when you turn on delegations:
if the server runs out of free stateid table slots, then the natural
thing would be for it to start asking clients to return delegations.
This is not yet implemented in OnTAP and so Burt 388695 is tracking this
issue.

Current versions of Linux have an issue when you use file locking: they
can end up using a lot of stateids if you have one or several
applications does a lot of lock/unlock cycles but where the file stateid
never gets CLOSEd (which can happen if at least one application has the
file open at any point in time). That is a bug that the above patch set
aims to fix by adding client side support for the RELEASE_LOCKOWNER
operation.

Older versions of Linux also had a problem with delegation management:
instead of returning delegations for files that had seen no recent
activity, they would continue to hoard them until either the server
recalled them, or the entire inode dropped out of cache, or the
application performed some action that required the delegation return.
This has been fixed since Linux 2.6.29: we now return delegations that
have seen no activity in the last lease period.

--------<snip>---------

Comment 8 Sachin Prabhu 2010-08-03 10:14:37 UTC

A correction to the note in c#6. The clients were all RHEL 5.4.

Comment 10 Samuel Li 2010-08-03 16:51:43 UTC

Netapp does have a workround implemented by BURT 354058
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=354058
This changes the reply from netapp instead of ERR_RESOURCE we now send ERR_JUKEBOX.

However the customer wants to understand the root cause why we have so many owner id associated to their production Netapp filer. At one time during their production load environment they had 70+ linux 5.4 using NFSv4.

Comment 12 Fabio Olive Leite 2010-08-04 00:52:16 UTC

> instead of returning delegations for files that had seen no recent
> activity, they would continue to hoard them until either the server
> recalled them, or the entire inode dropped out of cache, ...

If the locks and delegations are returned when the inodes are freed from the inode slab cache, perhaps a "sysctk vm.dropcaches=2" or a "mount -o remount" of the NFSv4 volumes would help return them earlier and serve as a workaround to clear up resources at the filer?

Comment 13 Harshula Jayasuriya 2010-08-04 03:22:46 UTC

NetApp provided additional information:

* The customer is currently using ONTAP 7.3.2P6D1 without BURT 354058 (http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=354058).

* The customer's NetAPP filers have nfs.v4.read_delegation, nfs.v4.write_delegation turned off. So we need to focus on Trond's locking patches (http://thread.gmane.org/gmane.linux.nfs/33353).

* Samuel Li will add to the BZ a 64 bit build of the reproducers.

Comment 14 Harshula Jayasuriya 2010-08-04 08:20:32 UTC

Created attachment 436465 [details]
reproducer (includes 32 bit & 64 bit binaries) provided by Samuel Li

Comment 15 Sachin Prabhu 2010-08-04 14:08:22 UTC

A question regarding the error returned. I may be wrong here so would like some
clarification.

The error
NFS: v4 server returned a bad sequence-id error!
is returned in

static struct nfs4_state *nfs4_do_open(struct inode *dir, struct path *path,
int flags, struct iattr *sattr, struct rpc_cred *cred)
{
..

        do {
                status = _nfs4_do_open(dir, path, flags, sattr, cred, &res);
                if (status == 0)
                        break;
                /* NOTE: BAD_SEQID means the server and client disagree about
the
                 * book-keeping w.r.t. state-changing operations
                 * (OPEN/CLOSE/LOCK/LOCKU...)
                 * It is actually a sign of a bug on the client or on the
server.
                 *
                 * If we receive a BAD_SEQID error in the particular case of
                 * doing an OPEN, we assume that nfs_increment_open_seqid()
will
                 * have unhashed the old state_owner for us, and that we can
                 * therefore safely retry using a new one. We should still warn
                 * the user though...
                 */
                if (status == -NFS4ERR_BAD_SEQID) {
                        printk(KERN_WARNING "NFS: v4 server returned a bad
sequence-id error!\n");
                        exception.retry = 1;
                        continue;
                }
..
}

At the point when the OPEN call is made, the stateid is not yet set. In this
case, the NFS4ERR_BAD_SEQID returned means that the client set seqid used to
maintain proper ordering of the OPEN request used a wrong sequence number for
this lockowner. This should not be related to the sequence_id in the stateid at
all.

Also, from the mail threads referenced in c#5, on exhaustion of the stateid, a
NFS3ERR_RESOURCE is returned.

Are we sure we are seeing the same problem here?

Comment 16 Sachin Prabhu 2010-08-04 15:42:38 UTC

I guess the RESOURCE errors on the Netapp server could also lead to the Server returning the BAD_SEQID error. 

Do we also see the 
NFS: v4 server returned a bad sequence-id error!
errors on the test systems when the reproducer is run to generate the problem with RESOURCE errors?

Comment 18 Harshula Jayasuriya 2010-08-04 15:57:32 UTC

From 2045648_NFSv4_request_new_owner/rhel5u5_64_nfsv4_lock_unlock.trc :

No.     Time                       Source                Destination           Protocol Info
 131330 2010-08-03 13:48:46.284749 10.61.69.240          10.61.73.40           NFS      V4 COMP Reply (Call In 131327) <EMPTY> PUTFH;CLOSE CLOSE(10026)
...
Network File System, Ops(2): PUTFH(0) CLOSE(10026)
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4ERR_BAD_SEQID (10026)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 2)
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: CLOSE (4)
            Status: NFS4ERR_BAD_SEQID (10026)

Comment 19 Sachin Prabhu 2010-08-04 16:15:53 UTC

As Hash explained, we have a tcpdump within the attachment
"test data for both solarus and rhel5u5"
which confirms that we do indeed see this error returned by the server.

rhel5u5_64_nfsv4_lock_unlock.trc

65596 COMP Call (Reply In 65597) <EMPTY> PUTFH;LOCK LOCK - seqid: 0x00004000
65597 COMP Reply (Call In 65596) <EMPTY> PUTFH;LOCK LOCK(10018) NFS4ERR_RESOURCE
65598 COMP Call (Reply In 65601) <EMPTY> PUTFH;CLOSE CLOSE;GETATTR GETATTR - seqid: 0x00004001
65601 COMP Reply (Call In 65598) <EMPTY> PUTFH;CLOSE CLOSE(10026) NFS4ERR_BAD_SEQID

This could be an unrelated bug. 
From RFC 3530
--
   The client MUST monotonically increment the sequence number for the
   CLOSE, LOCK, LOCKU, OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE
   operations.  This is true even in the event that the previous
   operation that used the sequence number received an error.  The only
   exception to this rule is if the previous operation received one of
   the following errors: NFS4ERR_STALE_CLIENTID, NFS4ERR_STALE_STATEID,
   NFS4ERR_BAD_STATEID, NFS4ERR_BAD_SEQID, NFS4ERR_BADXDR,
   NFS4ERR_RESOURCE, NFS4ERR_NOFILEHANDLE.
--
The sequence id should not be incremented when NFS4ERR_RESOURCE is returned.

Comment 20 Samuel Li 2010-08-04 16:32:36 UTC

Created attachment 436599 [details]
Test data for rhel6beta2

Comment 21 Samuel Li 2010-08-04 16:58:43 UTC

I just uploaded another test I just performed on Rhel 6 Beta 2

There I don't get the  NFS4ERR_BAD_SEQID

Does it matter?

From the filer side in rhel5u5 when we reached a NFS4ERR_RESOURCE it appears the owner id allocated was newer cleaned correctly.

linux_nfsv4_lock_c_being   <-- this displayes netapp filer lock data
Unconfirmed client count 0
Clients with valid callbacks 2
Free client count 8189
Free owners 8190; In-Use Owners 2    <----

When we hit ERR_RESOURCE
linux_nfsv4_lock_c_err_resource 
Clients with valid callbacks 3
Free client count 8188 
Free owners 510; In-Use Owners 16386  <-- netapp still honors the onwers ID

In RHEL6Beta 2
lock_c_rhel6b2_begin
Clients with valid callbacks 2
Free client count 8188
Free owners 16885; In-Use Owners 11  <-- we begin

lock_c_rhel6b2_after
Clients with valid callbacks 2
Free client count 8188
Free owners 16884; In-Use Owners 12  <--- after ERR_RESOUCE clean up happend

snippet of the trace added to the attachment

linux_rhel6b2.nfsv4_locl_unlock_end.trc

37 COMP call LOCK
39 COMP reply LOCK NFS4ERR_RESOURCE 10018
40 COMP call CLOSE
42 COMP reply CLOSE NFS4_OK

Comment 22 Trond Myklebust 2010-08-04 17:07:04 UTC

This looks like the issue that was fixed by commit 52567b03ca38b6e556ced450d64dba8d66e23b0e (NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE).

Comment 24 Samuel Li 2010-08-05 02:54:16 UTC

Besides Trond's patch for NFS4ERR_BAD_SEQID when ERR_RESOURCE has been returned. 

Are there plans for RELEASE_LOCKOWNER being implemented in rhel5 codeline?

Operation 39: RELEASE_LOCKOWNER - Release Lockowner State

DESCRIPTION

   This operation is used to notify the server that the lock_owner is no
   longer in use by the client.  This allows the server to release
   cached state related to the specified lock_owner.  If file locks,
   associated with the lock_owner, are held at the server, the error
   NFS4ERR_LOCKS_HELD will be returned and no further action will be
   taken.

IMPLEMENTATION

   The client may choose to use this operation to ease the amount of
   server state that is held.  Depending on behavior of applications at
   the client, it may be important for the client to use this operation
   since the server has certain obligations with respect to holding a
   reference to a lock_owner as long as the associated file is open.
   Therefore, if the client knows for certain that the lock_owner will
   no longer be used under the context of the associated open_owner4, it
   should use RELEASE_LOCKOWNER.

Comment 29 Harshula Jayasuriya 2010-08-05 10:51:41 UTC

Created attachment 436817 [details]
Backport of Trond's NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE

Comment 34 Samuel Li 2010-08-06 18:14:09 UTC

A backport of RELEASE_LOCKOWNER has been build for rhel5.x by Steve and it working.

I been testing the new kernel on 6 x rhel5.x boxes and netapp cluster filer

Test performed

    * lock tests  ~approx 1000 lock per client( 6 x clients )
    * additional load added
         * Netapp CFO every 5 min on both filer heads
         * fibre loop saturation
                    * disk issue  (2 reconstruct double degraded and zeroing  drives)
    * NFS traffic using sio transferring 100MiB/sec (2 x client)
    * internal netapp read/write test 

After 12 hours

all programs are still active and running smooth. There are some longer latencies around 40 sec. This is expected due the netapp cluster failover/giveback procedure.


[root@rhel5u5-64rtp nfsv4]# rsh 10.61.69.240 "priv set diag; lock status -c "
Warning: These diagnostic commands are for use by NetApp
         personnel only.
Free StateID count 32743
New StateID count 0
Revoked StateID count 0
Closed StateID count 0
Confirm-Wait StateID count 0
Normal StateID count 24     <-----
Expired StateID count 0
Dlg_Revoked StateID count 0
Idle client count 1
Leasing client count 6
Expired client count 0
Unconfirmed client count 0
Clients with valid callbacks 6
Free client count 8185
Free owners 8178; In-Use Owners 14   <--- 6 x rhel clients

systat from netapp filer
CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     
                                  in   out   read  write 
 59% 14561     0     0   14561 26457 38088  92492  47328 
 89% 12120     0     0   12120 24793 121865  95588  38480   <---121MiB/sec net out
 45% 12983     0     0   12983  6786 121720  86184  57344   
 45% 13126     0     0   13126  4683 122097  92188  53256
 39% 18857     0     0   18857  5593 70040  92320  38928
 25% 19540     0     0   19540  3204  1524  89644  54016
 24% 19241     0     0   19241  3155  1501  89279  48863 
 23% 18188     0     0   18188  2987  1424  94438  36186 
 76% 15281     0     0   15281 102219  3361  92084  53599
 23% 18247     0     0   18247  2992  1424  86176  52872  
 23% 18760     0     0   18760  3076  1463  89424  38672

Comment 35 Jeff Layton 2010-08-30 18:45:57 UTC

*** Bug 537151 has been marked as a duplicate of this bug. ***

Comment 42 Jarod Wilson 2010-09-17 14:02:39 UTC

in kernel-2.6.18-222.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 44 Steve Dickson 2010-10-21 19:06:20 UTC

*** Bug 510980 has been marked as a duplicate of this bug. ***

Comment 45 yanfu,wang 2010-10-28 09:04:40 UTC

Do not have hardware to test. And according to https://bugzilla.redhat.com/show_bug.cgi?id=620502#c11, we can relay on the positive result provided by Netapp engineer at https://bugzilla.redhat.com/show_bug.cgi?id=620502#c34.

do code review and confirm patch apply is sane.
fsstress job links:
https://beaker.engineering.redhat.com/jobs/26703
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=177847
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=177848
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=177850

Comment 46 Martin Prpič 2010-11-11 14:00:50 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, a large number of Red Hat Enterprise Linux NFS clients mounting a NFSv4 share from a server would show the following log messages repeatedly and could no longer access the share from the server:

NFS: v4 server returned a bad sequence-id error!

With this update, the error is no longer returned.

Comment 48 errata-xmlrpc 2011-01-13 21:47:24 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Comment 49 Martin Prpič 2011-05-09 11:39:16 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,5 +1,3 @@
-Previously, a large number of Red Hat Enterprise Linux NFS clients mounting a NFSv4 share from a server would show the following log messages repeatedly and could no longer access the share from the server:
+The RELEASE_LOCKOWNER operation has been implemented for NFSv4 in order to avoid an exhaustion of NFS server state IDs, which could result in an NFS4ERR_RESOURCE error. Furthermore, if the NFS4ERR_RESOURCE error is returned by the NFS server, it is now handled correctly, thus preventing a possible reason for the following error:
 
-NFS: v4 server returned a bad sequence-id error!
+NFS: v4 server returned a bad sequence-id error!-
-With this update, the error is no longer returned.

Note You need to log in before you can comment on or make changes to this bug.