Bug 638269
Summary: | NFS4 clients cannot reclaim locks after server reboot | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Sachin Prabhu <sprabhu> | ||||||
Component: | kernel | Assignee: | Sachin Prabhu <sprabhu> | ||||||
Status: | CLOSED ERRATA | QA Contact: | yanfu,wang <yanwang> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.0 | CC: | bfields, dhoward, jiali, jlayton, rwheeler, steved, tscofield, yanwang | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-2.6.32-84.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: |
The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-05-23 20:53:49 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 661730 | ||||||||
Attachments: |
|
Description
Sachin Prabhu
2010-09-28 16:02:31 UTC
Created attachment 450232 [details]
Reproducer
Easy reproducer:
Usage:
1) Mount a NFS4 share on a RHEL 6 client.
2) ./fl_test <filename>
The reproduce opens a file, obtains a lock and starts writing to the file every 1 second.
3) Reboot the server.
Once the server is back up, the client attempts to reclaim locks but is never successful.
The tcpdumps shows that the the client makes a WRITE request and a RENEW request.
It receives a STALE_CLIENTID for the RENEW call and a STALE_STATEID for the WRITE call.
The client at this point re-establishes the client id using setclient id.
However it never sends a request to OPEN the file. It instead proceeds to reclaim the locks using the stale stateid.
There seems to be 2 issues here. Issue 1: The problem appears to happen because of the STALE_STATEID received at the time the STALE_CLIENTID is received. When a STALE_STATEID is received, the flag NFS_STATE_RECLAIM_REBOOT is set for the open state which is now stale. for the STALE_CLIENTID, the following code path is called nfs4_state_manager() -> nfs4_check_lease -> ops->renew_lease(clp, cred); returns an -NFS4ERR_STALE_CLIENTID. This is handled by the error handler in nfs4_state_manager() -> nfs4_check_lease -> nfs4_recovery_handle_error(clp, status); static int nfs4_recovery_handle_error(struct nfs_client *clp, int error) { .. case -NFS4ERR_STALE_CLIENTID: case -NFS4ERR_LEASE_MOVED: set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state); nfs4_state_end_reclaim_reboot(clp); nfs4_state_start_reclaim_reboot(clp); .. } nfs4_recovery_handle_error in turn calls nfs4_state_end_reclaim_reboot(clp). nfs4_state_end_reclaim_reboot() goes through the list of stateowners and then through all the states owned by that state owner. static void nfs4_state_end_reclaim_reboot(struct nfs_client *clp) { .. for (pos = rb_first(&clp->cl_state_owners); pos != NULL; pos = rb_next(pos)) { sp = rb_entry(pos, struct nfs4_state_owner, so_client_node); spin_lock(&sp->so_lock); list_for_each_entry(state, &sp->so_states, open_states) { if (!test_and_clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags)) continue; nfs4_state_mark_reclaim_nograce(clp, state); } spin_unlock(&sp->so_lock); } .. } It encounters the state with NFS_STATE_RECLAIM_REBOOT set from the falied WRITE command. The nfs4_state_mark_reclaim_nograce() ends up switching the flag NFS_STATE_RECLAIM_REBOOT with the flag NFS_STATE_RECLAIM_NOGRACE. This is wrong since we are still in the grace period and that particular flag is set when we could not reclaim during the grace period. We end up at another part of the nfs4_state_manager() where we fail because of issue 2. By commenting out the call to nfs4_state_end_reclaim_reboot(clp); in nfs4_recovery_handle_error(), I was able to fix the lock reclaim behaviour. --- nfs4state.c.orig 2010-09-21 12:26:53.890007302 +0100 +++ nfs4state.c 2010-09-28 12:31:41.000000000 +0100 @@ -1161,7 +1161,7 @@ static int nfs4_recovery_handle_error(st case -NFS4ERR_STALE_CLIENTID: case -NFS4ERR_LEASE_MOVED: set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state); - nfs4_state_end_reclaim_reboot(clp); + //nfs4_state_end_reclaim_reboot(clp); nfs4_state_start_reclaim_reboot(clp); break; case -NFS4ERR_EXPIRED: Issue 2: After issue 1, the flag NFS_STATE_RECLAIM_NOGRACE is set for the affected state. To reclaim state, the following code path is called. nfs4_state_manager() -> nfs4_do_reclaim() -> nfs4_reclaim_open_state -> ops->recover_open() This calls nfs4_open_expired(). These attempt to reopen the file. However the following code prevents the new OPEN call in nfs4_open_prepare() from actually sending out a new open request. static void nfs4_open_prepare(struct rpc_task *task, void *calldata) { .. if (can_open_cached(data->state, data->o_arg.fmode, data->o_arg.open_flags)) goto out_no_action; <-- HERE .. } The nfs4_reclaim_locks() is subsequently called in nfs4_reclaim_open_state(). This ends up using the old stateid for which the server returns STALE_STATEID. The client ends up looking within nfs4_reclaim_open_state(). Created attachment 452098 [details] Test patch This is a combination of 2 separate bug hit when attempting to reclaim locks --- http://article.gmane.org/gmane.linux.nfs/35846 NFSv4: Fix open recovery From: Trond Myklebust <Trond.Myklebust> NFSv4 open recovery is currently broken: since we do not clear the state->flags states before attempting recovery, we end up with the 'can_open_cached()' function triggering. This again leads to no OPEN call being put on the wire. Reported-by: Sachin Prabhu <sprabhu> Signed-off-by: Trond Myklebust <Trond.Myklebust> --- --- http://article.gmane.org/gmane.linux.nfs/35847 NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers From: Trond Myklebust <Trond.Myklebust> In the case of a server reboot, the state recovery thread starts by calling nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when the server reboots while the client is in the middle of recovery. However, if the client has already marked the nfs4_state as requiring reboot recovery, then the above behaviour will cause the recovery thread to treat the open as if it was part of such an edge condition: the open will be recovered as if it was part of a lease expiration (and all the locks will be lost). Fix is to remove the call to nfs4_state_mark_reclaim_reboot from nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it to the recovery thread to do this for us. Signed-off-by: Trond Myklebust <Trond.Myklebust> --- Signed-off-by: Sachin Prabhu <sprabhu> Test packages based on the patch above successfully fixes the problem for me. Both patches have been committed upstream as -- b0ed9dbc24f1fd912b2dd08b995153cafc1d5b1c NFSv4: Fix open recovery NFSv4 open recovery is currently broken: since we do not clear the state->flags states before attempting recovery, we end up with the 'can_open_cached()' function triggering. This again leads to no OPEN call being put on the wire. Reported-by: Sachin Prabhu <sprabhu> Signed-off-by: Trond Myklebust <Trond.Myklebust> -- -- ae1007d37e00144b72906a4bdc47d517ae91bcc1 NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers In the case of a server reboot, the state recovery thread starts by calling nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when the server reboots while the client is in the middle of recovery. However, if the client has already marked the nfs4_state as requiring reboot recovery, then the above behaviour will cause the recovery thread to treat the open as if it was part of such an edge condition: the open will be recovered as if it was part of a lease expiration (and all the locks will be lost). Fix is to remove the call to nfs4_state_mark_reclaim_reboot from nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it to the recovery thread to do this for us. Signed-off-by: Trond Myklebust <Trond.Myklebust> -- This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. We have another case reported. In this case, 1) The client holds the lock. 2) The server reboots, 3) The client attempts to take the lock again while holding the old lock. The lock returns an EIO. This patch also seems to fix that particular situation. Patch(es) available on kernel-2.6.32-89.el6 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected. hi Sachin, I verified on the RHEL6.1 Snapshot4 kernel and got the below results, I also want to let you confirm if it's the expected result, thanks in advance. ... 123.678798 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 3069) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR 124.679037 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 124.896339 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 125.332313 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 126.204308 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 127.948326 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 131.436334 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 138.412330 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 152.364326 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 180.268331 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 184.678385 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> RENEW RENEW 610.678682 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 610.678709 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> RENEW RENEW 610.680200 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11719) <EMPTY> RENEW RENEW(10022) 610.680341 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> RENEW RENEW 610.680549 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11724) <EMPTY> RENEW RENEW(10022) 610.680606 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> SETCLIENTID SETCLIENTID 610.680798 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11726) <EMPTY> SETCLIENTID SETCLIENTID 610.680833 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR 610.682097 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11718) <EMPTY> PUTFH;WRITE WRITE(10023) 610.686359 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11728) <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR 610.686412 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR 610.686525 10.16.42.210 -> 10.16.64.158 NFS V1 CB_NULL Call 610.686580 10.16.64.158 -> 10.16.42.210 NFS V1 CB_NULL Reply (Call In 11736) 610.686641 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11734) <EMPTY> PUTFH;OPEN OPEN(10033) 610.686760 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> RENEW RENEW 610.686939 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11741) <EMPTY> RENEW RENEW 610.686988 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;SAVEFH SAVEFH;OPEN OPEN;GETFH GETFH;GETATTR GETATTR;RESTOREFH RESTOREFH;GETATTR GETATTR 610.689236 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11743) <EMPTY> PUTFH;SAVEFH SAVEFH;OPEN OPEN;GETFH GETFH;GETATTR GETATTR;RESTOREFH RESTOREFH;GETATTR GETATTR 610.689328 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;OPEN_CONFIRM OPEN_CONFIRM 610.730572 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11745) <EMPTY> PUTFH;OPEN_CONFIRM OPEN_CONFIRM 610.730648 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;LOCK LOCK 610.730871 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11748) <EMPTY> PUTFH;LOCK LOCK 610.730944 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 610.744400 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11751) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR 610.744513 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR 610.780891 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11753) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR 611.781120 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR ... against above comment, the OPEN call which resets the stateid followed by successful WRITE calls with the newer kernel. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |