Bug 638269

Summary:

NFS4 clients cannot reclaim locks after server reboot

Product:

Red Hat Enterprise Linux 6

Reporter:

Sachin Prabhu <sprabhu>

Component:

kernel

Assignee:

Sachin Prabhu <sprabhu>

Status:

CLOSED ERRATA

QA Contact:

yanfu,wang <yanwang>

Severity:

high

Docs Contact:

Priority:

high

Version:

6.0

CC:

bfields, dhoward, jiali, jlayton, rwheeler, steved, tscofield, yanwang

Target Milestone:

Keywords:

ZStream

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-2.6.32-84.el6

Doc Type:

Bug Fix

Doc Text:

The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-05-23 20:53:49 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

661730

Attachments:

Description	Flags
Reproducer	none
Test patch	none

Description Sachin Prabhu 2010-09-28 16:02:31 UTC

In case the NFS server reboots when performing IO operations, NFSv4 clients fail to re-establish the session and attempt to reclaim locks using the older stateid.

After a NFS server reboot, the client on receiving a Stale clientid request is expected to 
1) Use SETCLIENTID command to re-establish a client-id with the server
2) Use OPEN calls to re-establish the open sessions for those files.

This doesn't appear to be happening with the RHEL 6 kernel 2.6.32-71.el6.x86_64. I can also reproduce this issue on the F13 kernel 2.6.34.7-56.fc13.x86_64.


The following is the tshark output of what happens after a server reboot

The client was rebooted.

 84  60.300148 192.168.122.32 -> 192.168.122.31 NFS [RPC retransmission of #67][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 91  63.301650 192.168.122.32 -> 192.168.122.31 NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 92  63.301661 192.168.122.32 -> 192.168.122.31 NFS V4 COMP Call <EMPTY> RENEW RENEW
 95  63.303859 192.168.122.31 -> 192.168.122.32 NFS V4 COMP Reply (Call In 92) <EMPTY> RENEW RENEW(10022)
 97  63.306543 192.168.122.31 -> 192.168.122.32 NFS V4 COMP Reply (Call In 91) <EMPTY> PUTFH;WRITE WRITE(10023)
 99  63.307191 192.168.122.32 -> 192.168.122.31 NFS V4 COMP Call <EMPTY> RENEW RENEW
100  63.307319 192.168.122.31 -> 192.168.122.32 NFS V4 COMP Reply (Call In 99) <EMPTY> RENEW RENEW(10022)
101  63.307449 192.168.122.32 -> 192.168.122.31 NFS V4 COMP Call <EMPTY> SETCLIENTID SETCLIENTID
102  63.307719 192.168.122.31 -> 192.168.122.32 NFS V4 COMP Reply (Call In 101) <EMPTY> SETCLIENTID SETCLIENTID
103  63.307836 192.168.122.32 -> 192.168.122.31 NFS V4 COMP Call <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
107  63.309521 192.168.122.31 -> 192.168.122.32 NFS V4 COMP Reply (Call In 103) <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
108  63.309611 192.168.122.31 -> 192.168.122.32 NFS V1 CB_NULL Call
109  63.309751 192.168.122.32 -> 192.168.122.31 NFS V4 COMP Call <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR
111  63.309916 192.168.122.32 -> 192.168.122.31 NFS V1 CB_NULL Reply (Call In 108)
113  63.311317 192.168.122.31 -> 192.168.122.32 NFS V4 COMP Reply (Call In 109) <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR
114  63.311577 192.168.122.32 -> 192.168.122.31 NFS V4 COMP Call <EMPTY> PUTFH;LOCK LOCK
115  63.311774 192.168.122.31 -> 192.168.122.32 NFS V4 COMP Reply (Call In 114) <EMPTY> PUTFH;LOCK LOCK(10023)
116  63.311942 192.168.122.32 -> 192.168.122.31 NFS V4 COMP Call <EMPTY> PUTFH;LOCK LOCK
117  63.312137 192.168.122.31 -> 192.168.122.32 NFS V4 COMP Reply (Call In 116) <EMPTY> PUTFH;LOCK LOCK(10023)
118  63.312361 192.168.122.32 -> 192.168.122.31 NFS V4 COMP Call <EMPTY> PUTFH;LOCK LOCK
119  63.312552 192.168.122.31 -> 192.168.122.32 NFS V4 COMP Reply (Call In 118) <EMPTY> PUTFH;LOCK LOCK(10023)

This continues until the client is halted.

Comment 1 Sachin Prabhu 2010-09-28 16:08:38 UTC

Created attachment 450232 [details]
Reproducer

Easy reproducer:

Usage: 
1) Mount a NFS4 share on a RHEL 6 client.
2) ./fl_test <filename>

The reproduce opens a file, obtains a lock and starts writing to the file every 1 second.

3) Reboot the server.

Once the server is back up, the client attempts to reclaim locks but is never successful.

The tcpdumps shows that the the client makes a WRITE request and a RENEW request. 
It receives a STALE_CLIENTID for the RENEW call and a STALE_STATEID for the WRITE call. 
The client at this point re-establishes the client id using setclient id. 
However it never sends a request to OPEN the file. It instead proceeds to reclaim the locks using the stale stateid.

Comment 2 Sachin Prabhu 2010-09-28 16:26:29 UTC

There seems to be 2 issues here.

Issue 1:

The problem appears to happen because of the STALE_STATEID received at the time the STALE_CLIENTID is received.

When a STALE_STATEID is received, the flag NFS_STATE_RECLAIM_REBOOT is set for the open state which is now stale.

for the STALE_CLIENTID, the following code path is called

nfs4_state_manager() -> nfs4_check_lease -> ops->renew_lease(clp, cred); returns an -NFS4ERR_STALE_CLIENTID. 

This is handled by the error handler in 
nfs4_state_manager() -> nfs4_check_lease -> nfs4_recovery_handle_error(clp, status);

static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
{
..
                case -NFS4ERR_STALE_CLIENTID:
                case -NFS4ERR_LEASE_MOVED:
                        set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
                        nfs4_state_end_reclaim_reboot(clp);
                        nfs4_state_start_reclaim_reboot(clp);
..
}

nfs4_recovery_handle_error in turn calls nfs4_state_end_reclaim_reboot(clp).

nfs4_state_end_reclaim_reboot() goes through the list of stateowners and then through all the states owned by that state owner.

static void nfs4_state_end_reclaim_reboot(struct nfs_client *clp)
{
..
        for (pos = rb_first(&clp->cl_state_owners); pos != NULL; pos = rb_next(pos)) {
                sp = rb_entry(pos, struct nfs4_state_owner, so_client_node);
                spin_lock(&sp->so_lock);
                list_for_each_entry(state, &sp->so_states, open_states) {
                        if (!test_and_clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags))
                                continue;
                        nfs4_state_mark_reclaim_nograce(clp, state);
                }
                spin_unlock(&sp->so_lock);
        }
..
}

It encounters the state with NFS_STATE_RECLAIM_REBOOT set from the falied WRITE command. The nfs4_state_mark_reclaim_nograce() ends up switching the flag NFS_STATE_RECLAIM_REBOOT with the flag NFS_STATE_RECLAIM_NOGRACE.  
This is wrong since we are still in the grace period and that particular flag is set when we could not reclaim during the grace period. We end up at another part of the nfs4_state_manager() where we fail because of issue 2.

By commenting out the call to nfs4_state_end_reclaim_reboot(clp); in nfs4_recovery_handle_error(), I was able to fix the lock reclaim behaviour.

--- nfs4state.c.orig	2010-09-21 12:26:53.890007302 +0100
+++ nfs4state.c	2010-09-28 12:31:41.000000000 +0100
@@ -1161,7 +1161,7 @@ static int nfs4_recovery_handle_error(st
 		case -NFS4ERR_STALE_CLIENTID:
 		case -NFS4ERR_LEASE_MOVED:
 			set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
-			nfs4_state_end_reclaim_reboot(clp);
+			//nfs4_state_end_reclaim_reboot(clp);
 			nfs4_state_start_reclaim_reboot(clp);
 			break;
 		case -NFS4ERR_EXPIRED:

Comment 3 Sachin Prabhu 2010-09-28 16:55:56 UTC

Issue 2:

After issue 1, the flag NFS_STATE_RECLAIM_NOGRACE is set for the affected state. To reclaim state, the following code path is called.

nfs4_state_manager() -> nfs4_do_reclaim() -> nfs4_reclaim_open_state -> ops->recover_open()
This calls nfs4_open_expired().


These attempt to reopen the file. However the following code prevents the new OPEN call in nfs4_open_prepare() from actually sending out a new open request.

static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
{
..
               if (can_open_cached(data->state, data->o_arg.fmode, data->o_arg.open_flags))
                        goto out_no_action;   <-- HERE
..
}

The nfs4_reclaim_locks() is subsequently called in nfs4_reclaim_open_state(). This ends up using the old stateid for which the server returns STALE_STATEID. The client ends up looking within nfs4_reclaim_open_state().

Comment 4 Sachin Prabhu 2010-10-01 11:37:31 UTC

http://thread.gmane.org/gmane.linux.nfs/35767

Comment 5 Sachin Prabhu 2010-10-07 12:37:33 UTC

Created attachment 452098 [details]
Test patch

This is a combination of 2 separate bug hit when attempting to reclaim locks

---
http://article.gmane.org/gmane.linux.nfs/35846
NFSv4: Fix open recovery

From: Trond Myklebust <Trond.Myklebust>

NFSv4 open recovery is currently broken: since we do not clear the
state->flags states before attempting recovery, we end up with the
'can_open_cached()' function triggering. This again leads to no OPEN call
being put on the wire.

Reported-by: Sachin Prabhu <sprabhu>
Signed-off-by: Trond Myklebust <Trond.Myklebust>
---

---
http://article.gmane.org/gmane.linux.nfs/35847
NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers

From: Trond Myklebust <Trond.Myklebust>

In the case of a server reboot, the state recovery thread starts by calling
nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
the server reboots while the client is in the middle of recovery.

However, if the client has already marked the nfs4_state as requiring
reboot recovery, then the above behaviour will cause the recovery thread to
treat the open as if it was part of such an edge condition: the open will
be recovered as if it was part of a lease expiration (and all the locks
will be lost).
Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
to the recovery thread to do this for us.

Signed-off-by: Trond Myklebust <Trond.Myklebust>
---


Signed-off-by: Sachin Prabhu <sprabhu>

Comment 6 Sachin Prabhu 2010-10-08 10:20:58 UTC

Test packages based on the patch above successfully fixes the problem for me.

Comment 10 Sachin Prabhu 2010-11-02 09:57:10 UTC

Both patches have been committed upstream as

--
b0ed9dbc24f1fd912b2dd08b995153cafc1d5b1c
NFSv4: Fix open recovery

NFSv4 open recovery is currently broken: since we do not clear the
state->flags states before attempting recovery, we end up with the
'can_open_cached()' function triggering. This again leads to no OPEN call
being put on the wire.

Reported-by: Sachin Prabhu <sprabhu>
Signed-off-by: Trond Myklebust <Trond.Myklebust>
--



--
ae1007d37e00144b72906a4bdc47d517ae91bcc1
NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers

In the case of a server reboot, the state recovery thread starts by calling
nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
the server reboots while the client is in the middle of recovery.

However, if the client has already marked the nfs4_state as requiring
reboot recovery, then the above behaviour will cause the recovery thread to
treat the open as if it was part of such an edge condition: the open will
be recovered as if it was part of a lease expiration (and all the locks
will be lost).
Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
to the recovery thread to do this for us.

Signed-off-by: Trond Myklebust <Trond.Myklebust>
--

Comment 12 RHEL Program Management 2010-11-03 11:30:05 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 13 Sachin Prabhu 2010-11-08 11:04:24 UTC

We have another case reported. 

In this case,
1) The client holds the lock.
2) The server reboots,
3) The client attempts to take the lock again while holding the old lock. The lock returns an EIO.

This patch also seems to fix that particular situation.

Comment 18 Aristeu Rozanski 2010-12-13 15:09:12 UTC

Patch(es) available on kernel-2.6.32-89.el6

Comment 21 Martin Prpič 2011-02-23 15:06:17 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected.

Comment 22 yanfu,wang 2011-04-19 05:13:47 UTC

hi Sachin,
I verified on the RHEL6.1 Snapshot4 kernel and got the below results, I also want to let you confirm if it's the expected result, thanks in advance.
...
123.678798 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 3069) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
124.679037 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
124.896339 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
125.332313 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
126.204308 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
127.948326 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
131.436334 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
138.412330 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
152.364326 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
180.268331 10.16.64.158 -> 10.16.42.210 NFS [RPC retransmission of #3086][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
184.678385 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> RENEW RENEW
610.678682 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
610.678709 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> RENEW RENEW
610.680200 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11719) <EMPTY> RENEW RENEW(10022)
610.680341 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> RENEW RENEW
610.680549 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11724) <EMPTY> RENEW RENEW(10022)
610.680606 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> SETCLIENTID SETCLIENTID
610.680798 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11726) <EMPTY> SETCLIENTID SETCLIENTID
610.680833 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
610.682097 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11718) <EMPTY> PUTFH;WRITE WRITE(10023)
610.686359 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11728) <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
610.686412 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR
610.686525 10.16.42.210 -> 10.16.64.158 NFS V1 CB_NULL Call
610.686580 10.16.64.158 -> 10.16.42.210 NFS V1 CB_NULL Reply (Call In 11736)
610.686641 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11734) <EMPTY> PUTFH;OPEN OPEN(10033)
610.686760 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> RENEW RENEW
610.686939 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11741) <EMPTY> RENEW RENEW
610.686988 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;SAVEFH SAVEFH;OPEN OPEN;GETFH GETFH;GETATTR GETATTR;RESTOREFH RESTOREFH;GETATTR GETATTR
610.689236 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11743) <EMPTY> PUTFH;SAVEFH SAVEFH;OPEN OPEN;GETFH GETFH;GETATTR GETATTR;RESTOREFH RESTOREFH;GETATTR GETATTR
610.689328 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;OPEN_CONFIRM OPEN_CONFIRM
610.730572 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11745) <EMPTY> PUTFH;OPEN_CONFIRM OPEN_CONFIRM
610.730648 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;LOCK LOCK
610.730871 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11748) <EMPTY> PUTFH;LOCK LOCK
610.730944 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
610.744400 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11751) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
610.744513 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
610.780891 10.16.42.210 -> 10.16.64.158 NFS V4 COMP Reply (Call In 11753) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
611.781120 10.16.64.158 -> 10.16.42.210 NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
...

Comment 23 yanfu,wang 2011-04-22 07:06:26 UTC

against above comment, the OPEN call which resets the stateid followed by
successful WRITE calls with the newer kernel.

Comment 24 errata-xmlrpc 2011-05-23 20:53:49 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html