Hide Forgot
Description of problem: The RHEL7.4 NFS4.1 client can get into a state where it is streaming TEST_STATEIDs for the same stateid over and over and receiving NFS4_OK back from a Linux NFS server, but with the SEQUENCE reply containing SEQ4_STATUS_RECALLABLE_STATE_REVOKED bit set. Version-Release number of selected component (if applicable): Seen on 3.10.0-693.2.1.el7 (client and server) NFSv4.1 How reproducible: TBD but customer can reproduce fairly regularly Steps to Reproduce: TBD Actual results: NFS client sends the same TEST_STATEID over and over Expected results: NFS client should only send TEST_STATEID one time and then handle the reply or error. Additional info: This customer is an openshift customer so they are having problems with openshift as a result. We have tcpdumps showing this but not sure if we have what leads up to this. Also customer has been complaining about MySQL being unable to start due to unable to lock files but we're not sure if this is a separate problem or related to this bug.
This may be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1459733 due to the kernel version but it looks different enough for now I filed a separate bug.
It appears likely this commit yet to be backported to RHEL7 may explain this issue: commit 95da1b3a5aded124dd1bda1e3cdb876184813140 Author: Andrew Elble <aweits@rit.edu> Date: Fri Nov 3 14:06:31 2017 -0400 nfsd: deal with revoked delegations appropriately If a delegation has been revoked by the server, operations using that delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1 case, and NFS4ERR_BAD_STATEID otherwise. The server needs NFSv4.1 clients to explicitly free revoked delegations. If the server returns NFS4ERR_DELEG_REVOKED, the client will do that; otherwise it may just forget about the delegation and be unable to recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a SEQUENCE reply. That can cause the Linux 4.1 client to loop in its stage manager. Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index ecbc7b0..b8281776 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4016,7 +4016,8 @@ static struct nfs4_delegation *find_deleg_stateid(struct nfs4_client *cl, statei { struct nfs4_stid *ret; - ret = find_stateid_by_type(cl, s, NFS4_DELEG_STID); + ret = find_stateid_by_type(cl, s, + NFS4_DELEG_STID|NFS4_REVOKED_DELEG_STID); if (!ret) return NULL; return delegstateid(ret); @@ -4039,6 +4040,12 @@ static bool nfsd4_is_deleg_cur(struct nfsd4_open *open) deleg = find_deleg_stateid(cl, &open->op_delegate_stateid); if (deleg == NULL) goto out; + if (deleg->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID) { + nfs4_put_stid(&deleg->dl_stid); + if (cl->cl_minorversion) + status = nfserr_deleg_revoked; + goto out; + } flags = share_access_to_flags(open->op_share_access); status = nfs4_check_delegmode(deleg, flags); if (status) { @@ -4908,6 +4915,16 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid) struct nfs4_stid **s, struct nfsd_net *nn) { __be32 status; + bool return_revoked = false; + + /* + * only return revoked delegations if explicitly asked. + * otherwise we report revoked or bad_stateid status. + */ + if (typemask & NFS4_REVOKED_DELEG_STID) + return_revoked = true; + else if (typemask & NFS4_DELEG_STID) + typemask |= NFS4_REVOKED_DELEG_STID; if (ZERO_STATEID(stateid) || ONE_STATEID(stateid)) return nfserr_bad_stateid; @@ -4922,6 +4939,12 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid) *s = find_stateid_by_type(cstate->clp, stateid, typemask); if (!*s) return nfserr_bad_stateid; + if (((*s)->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) { + nfs4_put_stid(*s); + if (cstate->minorversion) + return nfserr_deleg_revoked; + return nfserr_bad_stateid; + } return nfs_ok; }
I am marking for RHEL7.7 for now and we can do backports to z-streams as needed or even reconsider for RHEL7.6 if an urgent need is demonstrated and engineering agrees. Probably too late for RHEL7.6 but possibly not out of the question due to stable CC.
Note: The upstream thread refers to the following commit fixing a related NFS client bug, and was added in RHEL7.4 commit 26d36301bd653df6481fd38f3e1435a1f15e56d1 Author: Trond Myklebust <trond.myklebust@primarydata.com> Date: Thu Sep 22 13:39:05 2016 -0400 NFSv4.1: Ensure we call FREE_STATEID if needed on close/delegreturn/locku If a server returns NFS4ERR_ADMIN_REVOKED, NFS4ERR_DELEG_REVOKED or NFS4ERR_EXPIRED on a call to close, open_downgrade, delegreturn, or locku, we should call FREE_STATEID before attempting to recover. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
TODO: Check what happens during a CB_RECALL error - does the NFS client just drop the delegation? Could we have missed another location in the NFS client that needs to call nfs4_free_revoked_stateid()?
Created attachment 1542088 [details] nfs4 client script to detect loop condition and issue a message to stop a tcpdump
Created attachment 1542089 [details] nfs4 server script to inject the SEQ4_STATUS_RECALLABLE_STATE_REVOKED bit on SEQUENCE reply
Created attachment 1542182 [details] nfs4 client script to detect loop condition and issue a message to stop a tcpdump
Created attachment 1542183 [details] tcpdump script to be used with nfs4_detect_protocol_loop_sequence_flags.stp
I have been attempting to reproduce this with various methods but so far no luck.
Patch(es) committed on kernel-3.10.0-1025.el7
FWIW, I have reproduced this issue with the following error injection to cause igrab to fail inside nfs_delegation_find_inode_server http://git.engineering.redhat.com/git/users/dwysocha/archived-cases.git/plain/bugzilla/1687360/nfs_delegation_find_inode_server_fail_igrab.stp When igrab fails in the NFS client while handling a recall the client will not send a DELEGRETURN. As a result of not sending a DELEGRETURN, the NFS server's cl_revoked list will remain non-empty until the nfs4_client is destroyed on the server (the client unmounts the share) 71 __be32 nfs4_callback_recall(struct cb_recallargs *args, void *dummy, 72 struct cb_process_state *cps) 73 { 74 struct inode *inode; 75 __be32 res; 76 77 res = htonl(NFS4ERR_OP_NOT_IN_SESSION); 78 if (!cps->clp) /* Always set for v4.0. Set in cb_sequence for v4.1 */ 79 goto out; 80 81 dprintk_rcu("NFS: RECALL callback request from %s\n", 82 rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR)); 83 84 res = htonl(NFS4ERR_BADHANDLE); 85 inode = nfs_delegation_find_inode(cps->clp, &args->fh); 86 if (inode == NULL) { <--------------------------------------- inode == NULL when igrab() fails 87 trace_nfs4_cb_recall(cps->clp, &args->fh, NULL, 88 &args->stateid, -ntohl(res)); 89 goto out; 90 } 91 /* Set up a helper thread to actually return the delegation */ 92 switch (nfs_async_inode_return_delegation(inode, &args->stateid)) { <-------- skips this which sends DELEGRETURN 93 case 0: 94 res = 0; 95 break; 96 case -ENOENT: 97 res = htonl(NFS4ERR_BAD_STATEID); 98 break; 99 default: 100 res = htonl(NFS4ERR_RESOURCE); 101 } 102 trace_nfs4_cb_recall(cps->clp, &args->fh, inode, 103 &args->stateid, -ntohl(res)); 104 iput(inode); 105 out: 106 dprintk("%s: exit with status = %d\n", __func__, ntohl(res)); 107 return res; 108 } This may be more what happened in the real world to trigger this issue. What I see on my NFS client after I run that stap is every 2/3*nfs4leasetime when the SEQUENCE is sent, the reply comes with SEQ4_STATUS_RECALLABLE_STATE_REVOKED and then a series of OPEN / TEST_STATEIDs again. This goes on forever until the NFS client unmounts the share.
Didn't find any new nfs regression issues, so set sanityonly+verify for this bug now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2029