Bug 1552203 - RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations
Summary: RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUE...
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Dave Wysochanski
QA Contact: Zhi Li
Depends On:
Blocks: 1477664 1577173 1594286 1687360 1689811
TreeView+ depends on / blocked
Reported: 2018-03-06 16:46 UTC by Dave Wysochanski
Modified: 2022-03-13 14:45 UTC (History)
25 users (show)

Fixed In Version: kernel-3.10.0-1025.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1687360 1689811 (view as bug list)
Last Closed: 2019-08-06 12:11:02 UTC
Target Upstream Version:

Attachments (Terms of Use)
nfs4 client script to detect loop condition and issue a message to stop a tcpdump (1.95 KB, application/octet-stream)
2019-03-08 11:01 UTC, Dave Wysochanski
no flags Details
nfs4 server script to inject the SEQ4_STATUS_RECALLABLE_STATE_REVOKED bit on SEQUENCE reply (902 bytes, text/plain)
2019-03-08 11:02 UTC, Dave Wysochanski
no flags Details
nfs4 client script to detect loop condition and issue a message to stop a tcpdump (1.96 KB, text/plain)
2019-03-08 17:18 UTC, Dave Wysochanski
no flags Details
tcpdump script to be used with nfs4_detect_protocol_loop_sequence_flags.stp (3.48 KB, text/plain)
2019-03-08 17:18 UTC, Dave Wysochanski
no flags Details

System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3915571 0 None None None 2019-02-16 17:45:35 UTC
Red Hat Product Errata RHSA-2019:2029 0 None None None 2019-08-06 12:12:11 UTC

Description Dave Wysochanski 2018-03-06 16:46:51 UTC
Description of problem:
The RHEL7.4 NFS4.1 client can get into a state where it is streaming TEST_STATEIDs for the same stateid over and over and receiving NFS4_OK back from a Linux NFS server, but with the SEQUENCE reply containing SEQ4_STATUS_RECALLABLE_STATE_REVOKED bit set.

Version-Release number of selected component (if applicable):
Seen on 3.10.0-693.2.1.el7 (client and server)

How reproducible:
TBD but customer can reproduce fairly regularly

Steps to Reproduce:

Actual results:
NFS client sends the same TEST_STATEID over and over

Expected results:
NFS client should only send TEST_STATEID one time and then handle the reply or error.

Additional info:
This customer is an openshift customer so they are having problems with openshift as a result.
We have tcpdumps showing this but not sure if we have what leads up to this.  Also customer has been complaining about MySQL being unable to start due to unable to lock files but we're not sure if this is a separate problem or related to this bug.

Comment 2 Dave Wysochanski 2018-03-06 16:51:57 UTC
This may be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1459733 due to the kernel version but it looks different enough for now I filed a separate bug.

Comment 3 Dave Wysochanski 2018-08-17 18:25:30 UTC
It appears likely this commit yet to be backported to RHEL7 may explain this issue:

commit 95da1b3a5aded124dd1bda1e3cdb876184813140
Author: Andrew Elble <aweits@rit.edu>
Date:   Fri Nov 3 14:06:31 2017 -0400

    nfsd: deal with revoked delegations appropriately
    If a delegation has been revoked by the server, operations using that
    delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1
    case, and NFS4ERR_BAD_STATEID otherwise.
    The server needs NFSv4.1 clients to explicitly free revoked delegations.
    If the server returns NFS4ERR_DELEG_REVOKED, the client will do that;
    otherwise it may just forget about the delegation and be unable to
    recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a
    SEQUENCE reply.  That can cause the Linux 4.1 client to loop in its
    stage manager.
    Signed-off-by: Andrew Elble <aweits@rit.edu>
    Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ecbc7b0..b8281776 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4016,7 +4016,8 @@ static struct nfs4_delegation *find_deleg_stateid(struct nfs4_client *cl, statei
        struct nfs4_stid *ret;
-       ret = find_stateid_by_type(cl, s, NFS4_DELEG_STID);
+       ret = find_stateid_by_type(cl, s,
+                               NFS4_DELEG_STID|NFS4_REVOKED_DELEG_STID);
        if (!ret)
                return NULL;
        return delegstateid(ret);
@@ -4039,6 +4040,12 @@ static bool nfsd4_is_deleg_cur(struct nfsd4_open *open)
        deleg = find_deleg_stateid(cl, &open->op_delegate_stateid);
        if (deleg == NULL)
                goto out;
+       if (deleg->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID) {
+               nfs4_put_stid(&deleg->dl_stid);
+               if (cl->cl_minorversion)
+                       status = nfserr_deleg_revoked;
+               goto out;
+       }
        flags = share_access_to_flags(open->op_share_access);
        status = nfs4_check_delegmode(deleg, flags);
        if (status) {
@@ -4908,6 +4915,16 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
                     struct nfs4_stid **s, struct nfsd_net *nn)
        __be32 status;
+       bool return_revoked = false;
+       /*
+        *  only return revoked delegations if explicitly asked.
+        *  otherwise we report revoked or bad_stateid status.
+        */
+       if (typemask & NFS4_REVOKED_DELEG_STID)
+               return_revoked = true;
+       else if (typemask & NFS4_DELEG_STID)
+               typemask |= NFS4_REVOKED_DELEG_STID;
        if (ZERO_STATEID(stateid) || ONE_STATEID(stateid))
                return nfserr_bad_stateid;
@@ -4922,6 +4939,12 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
        *s = find_stateid_by_type(cstate->clp, stateid, typemask);
        if (!*s)
                return nfserr_bad_stateid;
+       if (((*s)->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) {
+               nfs4_put_stid(*s);
+               if (cstate->minorversion)
+                       return nfserr_deleg_revoked;
+               return nfserr_bad_stateid;
+       }
        return nfs_ok;

Comment 4 Dave Wysochanski 2018-08-17 18:32:24 UTC
I am marking for RHEL7.7 for now and we can do backports to z-streams as needed or even reconsider for RHEL7.6 if an urgent need is demonstrated and engineering agrees.  Probably too late for RHEL7.6 but possibly not out of the question due to stable CC.

Comment 24 Dave Wysochanski 2019-03-07 23:01:01 UTC
Note: The upstream thread refers to the following commit fixing a related NFS client bug, and was added in RHEL7.4

commit 26d36301bd653df6481fd38f3e1435a1f15e56d1
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Thu Sep 22 13:39:05 2016 -0400

    NFSv4.1: Ensure we call FREE_STATEID if needed on close/delegreturn/locku
    or NFS4ERR_EXPIRED on a call to close, open_downgrade, delegreturn, or
    locku, we should call FREE_STATEID before attempting to recover.
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
    Tested-by: Oleg Drokin <green@linuxhacker.ru>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Comment 25 Dave Wysochanski 2019-03-07 23:05:10 UTC
TODO: Check what happens during a CB_RECALL error - does the NFS client just drop the delegation?  Could we have missed another location in the NFS client that needs to call nfs4_free_revoked_stateid()?

Comment 26 Dave Wysochanski 2019-03-08 11:01:18 UTC
Created attachment 1542088 [details]
nfs4 client script to detect loop condition and issue a message to stop a tcpdump

Comment 27 Dave Wysochanski 2019-03-08 11:02:04 UTC
Created attachment 1542089 [details]
nfs4 server script to inject the SEQ4_STATUS_RECALLABLE_STATE_REVOKED bit on SEQUENCE reply

Comment 28 Dave Wysochanski 2019-03-08 17:18:04 UTC
Created attachment 1542182 [details]
nfs4 client script to detect loop condition and issue a message to stop a tcpdump

Comment 29 Dave Wysochanski 2019-03-08 17:18:48 UTC
Created attachment 1542183 [details]
tcpdump script to be used with nfs4_detect_protocol_loop_sequence_flags.stp

Comment 31 Dave Wysochanski 2019-03-14 21:12:07 UTC
I have been attempting to reproduce this with various methods but so far no luck.

Comment 32 Bruno Meneguele 2019-03-15 19:18:49 UTC
Patch(es) committed on kernel-3.10.0-1025.el7

Comment 36 Dave Wysochanski 2019-04-04 20:17:40 UTC
FWIW, I have reproduced this issue with the following error injection to cause igrab to fail inside nfs_delegation_find_inode_server

When igrab fails in the NFS client while handling a recall the client will not send a DELEGRETURN.
As a result of not sending a DELEGRETURN, the NFS server's cl_revoked list will remain non-empty until the nfs4_client is destroyed on the server (the client unmounts the share)

     71 __be32 nfs4_callback_recall(struct cb_recallargs *args, void *dummy,
     72                             struct cb_process_state *cps)
     73 {
     74         struct inode *inode;
     75         __be32 res;
     77         res = htonl(NFS4ERR_OP_NOT_IN_SESSION);
     78         if (!cps->clp) /* Always set for v4.0. Set in cb_sequence for v4.1 */
     79                 goto out;
     81         dprintk_rcu("NFS: RECALL callback request from %s\n",
     82                 rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR));
     84         res = htonl(NFS4ERR_BADHANDLE);
     85         inode = nfs_delegation_find_inode(cps->clp, &args->fh);
     86         if (inode == NULL) {  <--------------------------------------- inode == NULL when igrab() fails
     87                 trace_nfs4_cb_recall(cps->clp, &args->fh, NULL,
     88                                 &args->stateid, -ntohl(res));
     89                 goto out;
     90         }
     91         /* Set up a helper thread to actually return the delegation */
     92         switch (nfs_async_inode_return_delegation(inode, &args->stateid)) {  <-------- skips this which sends DELEGRETURN
     93         case 0:
     94                 res = 0;
     95                 break;
     96         case -ENOENT:
     97                 res = htonl(NFS4ERR_BAD_STATEID);
     98                 break;
     99         default:
    100                 res = htonl(NFS4ERR_RESOURCE);
    101         }
    102         trace_nfs4_cb_recall(cps->clp, &args->fh, inode,
    103                         &args->stateid, -ntohl(res));
    104         iput(inode);
    105 out:
    106         dprintk("%s: exit with status = %d\n", __func__, ntohl(res));
    107         return res;
    108 }

This may be more what happened in the real world to trigger this issue.  What I see on my NFS client after I run that stap is every 2/3*nfs4leasetime when the SEQUENCE is sent, the reply comes with SEQ4_STATUS_RECALLABLE_STATE_REVOKED and then a series of OPEN / TEST_STATEIDs again.  This goes on forever until the NFS client unmounts the share.

Comment 39 Zhi Li 2019-07-09 06:35:59 UTC
 Didn't find any new nfs regression issues, so set sanityonly+verify for this bug now.

Comment 41 errata-xmlrpc 2019-08-06 12:11:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.