1267722 – If the nfs server is restarted while in process of reclaiming locks from a previous restart, some locks are never reclaimed

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1267722 - If the nfs server is restarted while in process of reclaiming locks from a previous restart, some locks are never reclaimed

Summary: If the nfs server is restarted while in process of reclaiming locks from a pr...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	nfs-maint
QA Contact:	JianHong Yin
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1282998 (view as bug list)
Depends On:
Blocks:	1201346 1268411
TreeView+	depends on / blocked

Reported:	2015-09-30 19:14 UTC by Frank Sorenson
Modified:	2019-11-14 07:00 UTC (History)
CC List:	7 users (show)
Fixed In Version:	kernel-2.6.32-586.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-01 15:16:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
program used with reproducer (1.42 KB, text/x-csrc) 2015-09-30 19:14 UTC, Frank Sorenson	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1315555	1	None	None	None	2021-01-20 06:05:38 UTC

Internal Links: 1315555

Description Frank Sorenson 2015-09-30 19:14:00 UTC

Created attachment 1078779 [details]
program used with reproducer

Description of problem:

If the nfs v4 server is restarted while reclaiming locks from a previous restart (and before completing the reclaims), some are lost and don't get reclaimed


Version-Release number of selected component (if applicable):

RHEL 6.6 - kernel 2.6.32-504.3.3.el6.x86_64 (both nfs client and nfs server)
nfs v4

How reproducible:

see below


Steps to Reproduce:

test program attached

on nfs client:
	# gcc -Wall test_reclaim.c -o test_reclaim
	# mount -o vers=4 server:/exports/locktest /mnt/locktest

	# test_reclaim /mnt/locktest 5000
	    (test program will open and lock files, then wait for ENTER as signal to clean up--let it sit until testing is complete)

on nfs server terminal 1:
	monitor nfs lock count:
		# while true; do echo $(date;grep -cf <(pgrep nfsd|sed 's@.*@ & @') /proc/locks); sleep 1; done

on nfs server terminal 2:
	restart nfs server
		# /etc/init.d/nfs restart

(monitor 1st terminal for when locks drops to zero, then begins climbing back to 5000, but before it reaches 5000)

	restart nfs server
		# /etc/init.d/nfs restart

Actual results:

nfs-related locks will be at 5000, then drop to 0 after restart.  They will then begin climbing back toward 5000.  The second restart will drop the locks back to 0 until it begins reclaiming again.  If the second restart occurs before all 5000 have been reclaimed, some will be missed, and it will never reach 5000

Wed Sep 30 11:27:18 CDT 2015 5000   <<<< nfs client has locks on 5000 files
Wed Sep 30 11:27:19 CDT 2015 5000
Wed Sep 30 11:27:20 CDT 2015 5000
Wed Sep 30 11:27:21 CDT 2015 0      <<<< nfs server restarted
Wed Sep 30 11:27:22 CDT 2015 0
Wed Sep 30 11:27:23 CDT 2015 0
...
Wed Sep 30 11:28:02 CDT 2015 0
Wed Sep 30 11:28:03 CDT 2015 1848   <<<< reclaim begins
Wed Sep 30 11:28:04 CDT 2015 4169
Wed Sep 30 11:28:05 CDT 2015 5000   <<<< locks reclaimed
Wed Sep 30 11:28:06 CDT 2015 0      <<<< nfs server restarted
...
Wed Sep 30 11:29:02 CDT 2015 0
Wed Sep 30 11:29:03 CDT 2015 1556   <<<< reclaim begins
Wed Sep 30 11:29:04 CDT 2015 0      <<<< nfs server restarted (interrupts reclaim)
Wed Sep 30 11:29:05 CDT 2015 0
Wed Sep 30 11:29:06 CDT 2015 0
Wed Sep 30 11:29:07 CDT 2015 942    <<<< reclaim begins
Wed Sep 30 11:29:08 CDT 2015 2777
Wed Sep 30 11:29:09 CDT 2015 2781
Wed Sep 30 11:29:10 CDT 2015 2781
Wed Sep 30 11:29:11 CDT 2015 2781   <<<< never increases

(the nfs client will still show locks on all 5000 files)

Expected results:

locks are reclaimed


Additional info:

this can also be replicated with nfs v3 by restarting 'nfslock' repeatedly, in a fast loop


test_reclaim program does the following:
	fd[0] = open(/mnt/locktest/test_file_1, RW) ; fcntl(fd[0], F_SETLKW, F_WRLCK, offset 0, length 1)
	fd[1] = open(/mnt/locktest/test_file_2, RW) : fcntl(fd[1], F_SETLKW, F_WRLCK, offset 0, length 1)
	...
	open(/mnt/locktest/test_file_4999, RW) ; fcntl(fd[4999], F_SETLKW, F_WRLCK, offset 0, length 1)
	 (wait for input)
	close() all fds
	unlink() all files

Comment 2 Frank Sorenson 2015-09-30 19:27:53 UTC

These two upstream commits appear to address this issue (see also http://thread.gmane.org/gmane.linux.nfs/66787/focus=66788)

commit df817ba35736db2d62b07de6f050a4db53492ad8
Author: Trond Myklebust <trond.myklebust>
Date:   2014-09-27 17:41:51 -0400

    NFSv4: fix open/lock state recovery error handling
    
    The current open/lock state recovery unfortunately does not handle errors
    such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
    just proceeds as if the state manager is finished recovering.
    This patch ensures that we loop back, handle higher priority errors
    and complete the open/lock state recovery.
    
    Cc: stable.org
    Signed-off-by: Trond Myklebust <trond.myklebust>

commit a4339b7b686b4acc8b6de2b07d7bacbe3ae44b83
Author: Trond Myklebust <trond.myklebust>
Date:   2014-09-27 17:02:26 -0400

    NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
    
    If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
    CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
    a second time, then the client will currently take this to mean that it must
    declare all locks to be stale, and hence ineligible for reboot recovery.
    
    RFC3530 and RFC5661 both suggest that the client should instead rely on the
    server to respond to inelegible open share, lock and delegation reclaim
    requests with NFS4ERR_NO_GRACE in this situation.
    
    Cc: stable.org
    Signed-off-by: Trond Myklebust <trond.myklebust>


The customer has confirmed that df817ba35736db2d62b07de6f050a4db53492ad8 resolves the issue (but has not tested the other)

Comment 6 J. Bruce Fields 2015-10-21 00:31:51 UTC

(In reply to Frank Sorenson from comment #2)
> These two upstream commits appear to address this issue (see also
> http://thread.gmane.org/gmane.linux.nfs/66787/focus=66788)

Makes sense to me.  Will you post these, or would you like me to?

Comment 8 RHEL Program Management 2015-11-02 00:13:06 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 9 Aristeu Rozanski 2015-11-13 14:33:52 UTC

Patch(es) available on kernel-2.6.32-586.el6

Comment 15 JianHong Yin 2016-01-14 14:32:28 UTC

Test fail at latest kernel-2.6.32-601.el6.x86_64
  https://beaker.engineering.redhat.com/jobs/1191294

If the second restart occurs before all 5000 have been reclaimed, some will be missed, and it will never reach 5000

Comment 16 Dave Wysochanski 2016-01-29 22:14:23 UTC

*** Bug 1282998 has been marked as a duplicate of this bug. ***

Comment 17 JianHong Yin 2016-02-24 04:11:21 UTC

(In reply to Yin.JianHong from comment #15)
> Test fail at latest kernel-2.6.32-601.el6.x86_64
>   https://beaker.engineering.redhat.com/jobs/1191294
> 
> If the second restart occurs before all 5000 have been reclaimed, some will
> be missed, and it will never reach 5000

Test still failed in latest kernel kernel-2.6.32-616.el6:

[10:58:46 root@ ~~]# cat term1.log
2016-02-24_10:51:50 0
2016-02-24_10:51:51 0
2016-02-24_10:51:52 0
2016-02-24_10:51:53 0
2016-02-24_10:51:54 0
2016-02-24_10:51:55 0
..snip..
2016-02-24_10:53:05 0
2016-02-24_10:53:06 0
2016-02-24_10:53:07 0
2016-02-24_10:53:08 0
2016-02-24_10:53:09 1784
2016-02-24_10:53:10 4051
2016-02-24_10:53:11 5000  <<<< nfs client has locks on 5000 files
2016-02-24_10:53:12 5000
2016-02-24_10:53:13 5000
2016-02-24_10:53:14 0     <<<< nfs server restarted
2016-02-24_10:53:15 0
2016-02-24_10:53:16 0
2016-02-24_10:53:17 0
..snip..
2016-02-24_10:53:38 0
2016-02-24_10:53:39 0
2016-02-24_10:53:40 0
2016-02-24_10:53:41 0
2016-02-24_10:53:42 0
2016-02-24_10:53:43 1387  <<<< reclaim begins
2016-02-24_10:53:44 3989
2016-02-24_10:53:45 5000  <<<< locks reclaimed
2016-02-24_10:53:46 5000
2016-02-24_10:53:47 5000
2016-02-24_10:53:48 0     <<<< nfs server restarted
2016-02-24_10:53:49 0
..snip..
2016-02-24_10:54:41 0
2016-02-24_10:54:42 0
2016-02-24_10:54:43 0
2016-02-24_10:54:44 128   <<<< reclaim begins
2016-02-24_10:54:45 0     <<<< nfs server restarted (interrupts reclaim)
2016-02-24_10:54:46 0
2016-02-24_10:54:48 0
2016-02-24_10:54:49 0
..snip..
2016-02-24_10:55:42 0
2016-02-24_10:55:43 0
2016-02-24_10:55:44 0
2016-02-24_10:55:45 0
2016-02-24_10:55:46 877    <<<< reclaim begins
2016-02-24_10:55:47 2680
2016-02-24_10:55:48 2680
2016-02-24_10:55:49 2680
2016-02-24_10:55:50 2680
2016-02-24_10:55:51 2680
2016-02-24_10:55:52 2680
2016-02-24_10:55:53 2680
2016-02-24_10:55:54 2680
2016-02-24_10:55:55 2680
2016-02-24_10:55:56 2680
2016-02-24_10:55:57 2680
2016-02-24_10:55:58 2680
2016-02-24_10:55:59 2680
2016-02-24_10:56:01 2680
2016-02-24_10:56:02 2680  <<<< never increases
2016-02-24_10:56:03 2680
2016-02-24_10:56:04 2680
2016-02-24_10:56:05 2680
2016-02-24_10:56:06 2680
2016-02-24_10:56:07 2680
2016-02-24_10:56:08 2680
2016-02-24_10:56:09 2680
2016-02-24_10:56:10 2680
2016-02-24_10:56:11 2680
2016-02-24_10:56:12 2680
2016-02-24_10:56:13 2680
2016-02-24_10:56:14 2680
2016-02-24_10:56:15 2680
2016-02-24_10:56:16 2680
2016-02-24_10:56:18 2680
2016-02-24_10:56:19 2680
2016-02-24_10:56:20 2680
2016-02-24_10:56:21 2680
2016-02-24_10:56:22 2680
2016-02-24_10:56:23 2680
2016-02-24_10:56:24 2680
2016-02-24_10:56:25 2680
2016-02-24_10:56:26 2680
2016-02-24_10:56:27 2680
2016-02-24_10:56:28 2680
2016-02-24_10:56:29 2680
2016-02-24_10:56:30 2680
2016-02-24_10:56:31 2680
2016-02-24_10:56:32 2680
2016-02-24_10:56:33 2680
2016-02-24_10:56:34 2680
2016-02-24_10:56:36 2680
2016-02-24_10:56:37 2680
2016-02-24_10:56:38 2680
2016-02-24_10:56:39 2680
2016-02-24_10:56:40 2680
2016-02-24_10:56:41 2680
2016-02-24_10:56:42 2680
2016-02-24_10:56:43 2680
2016-02-24_10:56:44 2680
2016-02-24_10:56:45 2680
2016-02-24_10:56:46 2680
2016-02-24_10:56:47 2680
2016-02-24_10:56:48 2680
2016-02-24_10:56:49 2680
2016-02-24_10:56:50 2680
2016-02-24_10:56:51 2680
2016-02-24_10:56:53 2680
2016-02-24_10:56:54 2680
2016-02-24_10:56:55 2680
2016-02-24_10:56:56 2680
2016-02-24_10:56:57 2680
2016-02-24_10:56:58 2680
2016-02-24_10:56:59 2680
2016-02-24_10:57:00 2680
2016-02-24_10:57:01 2680
2016-02-24_10:57:02 2680
2016-02-24_10:57:03 2680
2016-02-24_10:57:04 2680
2016-02-24_10:57:05 2680
2016-02-24_10:57:06 2680
2016-02-24_10:57:07 2680
2016-02-24_10:57:08 2680
2016-02-24_10:57:09 2680
2016-02-24_10:57:11 2680
2016-02-24_10:57:12 2680
2016-02-24_10:57:13 2680
2016-02-24_10:57:14 2680
2016-02-24_10:57:15 2680
2016-02-24_10:57:16 2680
2016-02-24_10:57:17 2680
2016-02-24_10:57:18 2680
2016-02-24_10:57:19 2680
2016-02-24_10:57:20 2680
2016-02-24_10:57:21 2680
2016-02-24_10:57:22 2680
2016-02-24_10:57:23 2680
2016-02-24_10:57:24 2680
2016-02-24_10:57:25 2680
2016-02-24_10:57:26 2680
2016-02-24_10:57:27 2680
2016-02-24_10:57:29 2680
2016-02-24_10:57:30 2680
2016-02-24_10:57:31 2680
2016-02-24_10:57:32 2680
2016-02-24_10:57:33 2680
2016-02-24_10:57:34 2680
2016-02-24_10:57:35 2680
2016-02-24_10:57:36 2680
2016-02-24_10:57:37 2680
2016-02-24_10:57:38 2680
2016-02-24_10:57:39 2680
2016-02-24_10:57:40 2680
2016-02-24_10:57:41 2680
2016-02-24_10:57:42 2680
2016-02-24_10:57:43 2680
2016-02-24_10:57:44 2680
2016-02-24_10:57:45 2680
2016-02-24_10:57:47 2680
2016-02-24_10:57:48 2680
2016-02-24_10:57:49 2680
2016-02-24_10:57:50 2680
2016-02-24_10:57:51 2680
2016-02-24_10:57:52 2680
2016-02-24_10:57:53 2680
2016-02-24_10:57:54 2680
2016-02-24_10:57:55 2680
2016-02-24_10:57:56 2680
2016-02-24_10:57:57 2680
2016-02-24_10:57:58 2680
2016-02-24_10:57:59 2680
2016-02-24_10:58:00 2680
2016-02-24_10:58:01 2680
2016-02-24_10:58:02 2680
2016-02-24_10:58:03 2680
2016-02-24_10:58:05 2680
2016-02-24_10:58:06 2680
2016-02-24_10:58:07 2680
2016-02-24_10:58:08 2680
2016-02-24_10:58:09 2680
2016-02-24_10:58:10 2680
2016-02-24_10:58:11 2680
2016-02-24_10:58:12 2680
2016-02-24_10:58:13 2680
2016-02-24_10:58:14 2680
2016-02-24_10:58:15 2680
2016-02-24_10:58:16 2680
2016-02-24_10:58:17 2680
2016-02-24_10:58:18 2680
2016-02-24_10:58:19 2680
2016-02-24_10:58:20 2680
2016-02-24_10:58:21 2680
2016-02-24_10:58:23 2680
2016-02-24_10:58:24 2680
2016-02-24_10:58:25 2680
2016-02-24_10:58:26 2680
2016-02-24_10:58:27 2680
2016-02-24_10:58:28 2680
2016-02-24_10:58:29 2680
2016-02-24_10:58:30 2680
2016-02-24_10:58:31 2680
2016-02-24_10:58:32 2680
2016-02-24_10:58:33 2680
2016-02-24_10:58:34 2680
2016-02-24_10:58:35 2680
2016-02-24_10:58:36 2680
2016-02-24_10:58:37 2680
2016-02-24_10:58:38 2680
2016-02-24_10:58:39 2680
2016-02-24_10:58:41 2680
2016-02-24_10:58:42 2680
2016-02-24_10:58:43 2680
2016-02-24_10:58:44 2680
2016-02-24_10:58:45 2680
2016-02-24_10:58:46 2680

Comment 18 Frank Sorenson 2016-03-04 22:54:37 UTC

Yes, I see the same result:

Fri Mar 4 16:08:57 CST 2016 5000
Fri Mar 4 16:08:58 CST 2016 5000
Fri Mar 4 16:08:59 CST 2016 5000
Fri Mar 4 16:09:00 CST 2016 0
...
Fri Mar 4 16:09:15 CST 2016 0
Fri Mar 4 16:09:16 CST 2016 1711
Fri Mar 4 16:09:18 CST 2016 3894
Fri Mar 4 16:09:19 CST 2016 5000
Fri Mar 4 16:09:20 CST 2016 0
...
Fri Mar 4 16:10:17 CST 2016 0
Fri Mar 4 16:10:18 CST 2016 990
Fri Mar 4 16:10:20 CST 2016 3277
Fri Mar 4 16:10:21 CST 2016 0
...
Fri Mar 4 16:11:19 CST 2016 0
Fri Mar 4 16:11:20 CST 2016 526
Fri Mar 4 16:11:21 CST 2016 2797
Fri Mar 4 16:11:22 CST 2016 4576
Fri Mar 4 16:11:23 CST 2016 4576
Fri Mar 4 16:11:25 CST 2016 4576
Fri Mar 4 16:11:26 CST 2016 4576
Fri Mar 4 16:11:27 CST 2016 4576
Fri Mar 4 16:11:28 CST 2016 4576
Fri Mar 4 16:11:29 CST 2016 4576

Looking at the other commit mentioned, the description appears relevant, but the nfs4_handle_reclaim_lease_error() function is not in the RHEL 6 kernel base.  The patch does not apply as-is, and I am not familiar enough with the code to try to find the right way to fix it myself.  Here's the other patch that looks relevant:

commit a4339b7b686b4acc8b6de2b07d7bacbe3ae44b83
Author: Trond Myklebust <trond.myklebust>
Date:   2014-09-27 17:02:26 -0400

    NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
    
    If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
    CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
    a second time, then the client will currently take this to mean that it must
    declare all locks to be stale, and hence ineligible for reboot recovery.
    
    RFC3530 and RFC5661 both suggest that the client should instead rely on the
    server to respond to inelegible open share, lock and delegation reclaim
    requests with NFS4ERR_NO_GRACE in this situation.
    
    Cc: stable.org
    Signed-off-by: Trond Myklebust <trond.myklebust>

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 22fe351..26d510d 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1761,7 +1761,6 @@ static int nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status)
                break;
        case -NFS4ERR_STALE_CLIENTID:
                clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
-               nfs4_state_clear_reclaim_reboot(clp);
                nfs4_state_start_reclaim_reboot(clp);
                break;
        case -NFS4ERR_CLID_INUSE:

Comment 19 JianHong Yin 2016-03-09 08:44:06 UTC

Hi J. Bruce
  Could you help to look at Comment 18? and find the right way to fix this issue ..

thanks!

Comment 20 J. Bruce Fields 2016-03-10 18:45:06 UTC

Now that I look at this again:

> nfs-related locks will be at 5000, then drop to 0 after restart.  They will
> then begin climbing back toward 5000.  The second restart will drop the locks
> back to 0 until it begins reclaiming again.  If the second restart occurs
> before all 5000 have been reclaimed, some will be missed, and it will never
> reach 5000

Apologies for not noticing this at the start, but: I'm pretty certain this behavior is correct in the NFSv4.0 case.  See http://tools.ietf.org/html/rfc7530#section-9.6.3.4.4 for a detailed discussion.  In particular, "a client MUST reclaim only those locks that it successfully acquired from the previous server instance, omitting any that it failed to reclaim before a new reboot."

Comment 21 J. Bruce Fields 2016-03-10 18:54:14 UTC

Also, I wonder about this:

> (the nfs client will still show locks on all 5000 files)

What happens if you try to do IO on a file descriptor that was used for one of the unreclaimed locks?  After the fix for bug 963785 I think that such IO should fail with EIO.

It's possible those locks still show up in the client's /proc/locks until unlock or close, though, I'm not sure.

Comment 22 Fujitsu kernel engineers 2016-03-11 02:41:20 UTC

Dear, Bruce Fields-san.

Thank you for the reply.

I would like to confirm whether my understanding regarding comment#20 is right.
I have described two patterns as below.

  (1) : After getting all locks, the second NFS server's reboot occurs.
  (2) : While trying to get the locks, the second NFS server's reboot occurs.

I think that RFC7530(9.6.3.4.4) is referring to (2).
That is, while trying to get the locks, if the second NFS server's reboot occurs, 
it is that only the previously held lock is gotten.
This problem(Bug 1267722) I had issued is expected behavior in accordance with (2).
Is my above understanding right?


 (1) After getting all locks, the second NFS server's reboot occurs.

       NFS client      NFS server
         |                |
   lock1 |                | The lock1,lock2 and lock3 are gotten.
   lock2 |                |
   lock3 |                |
         |                + reboot(1st)
         |                |
         |                |
   lock1 |--- reclaim --->| The lock1,lock2 and lock3 are gotten within grace period.
   lock2 |--- reclaim --->|
   lock3 |--- reclaim --->|
         |                |
         |                + reboot(2nd)
         |                |
   lock1 |--- reclaim --->| The lock1,lock2 and lock3 are gotten within grace period.
   lock2 |--- reclaim --->|
   lock3 |--- reclaim --->|



 (2) While trying to get the locks, the second NFS server's reboot occurs.

       NFS client      NFS server
         |                |
   lock1 |                | The lock1,lock2 and lock3 are gotten.
   lock2 |                |
   lock3 |                |
         |                + reboot(1st)
         |                |
         |                |
   lock1 |--- reclaim --->| Only the lock1 and the lock2 are gotten.
   lock2 |--- reclaim --->|
         |                |
         |                + reboot(2nd)
         |                |
   lock1 |--- reclaim --->| Only the lock1 and the lock2 are gotten.
   lock2 |--- reclaim --->| But, the lock3 isn't successfully gotten.
   lock3 |--- reclaim --->|
         |                |

Best Regards,
K.Kakiuchi

Comment 23 J. Bruce Fields 2016-03-11 17:18:46 UTC

(In reply to Fujitsu kernel engineers from comment #22)
> I would like to confirm whether my understanding regarding comment#20 is
> right.
> I have described two patterns as below.
> 
>   (1) : After getting all locks, the second NFS server's reboot occurs.
>   (2) : While trying to get the locks, the second NFS server's reboot occurs.
> 
> I think that RFC7530(9.6.3.4.4) is referring to (2).
> That is, while trying to get the locks, if the second NFS server's reboot
> occurs, 
> it is that only the previously held lock is gotten.
> This problem(Bug 1267722) I had issued is expected behavior in accordance
> with (2).
> Is my above understanding right?

Thanks for the careful writeup!  I believe that's correct, with one minor clarification:

>          |                + reboot(2nd)
>          |                |
>    lock1 |--- reclaim --->| Only the lock1 and the lock2 are gotten.
>    lock2 |--- reclaim --->| But, the lock3 isn't successfully gotten.
>    lock3 |--- reclaim --->|

I agree that lock3 isn't successfully acquired in this case. But, let's be careful: note that the failure is because the client doesn't send the reclaim request for lock3, not because the server would return an error if it did.  (The server might well allow the reclaim.  In fact, it might allow the reclaim even in some situations where that would be incorrect--the NFSv4.0 protocol is insufficient to allow the server to tell the difference in every case.  Thus the RFC requires the client to be careful and not send the lock3 reclaim request.  The addition of RECLAIM_COMPLETE to the NFSv4.1 protocol changes this; see RFC 5661.)

Note You need to log in before you can comment on or make changes to this bug.