Bug 1506382

Summary: deadlock in nfs v4 client init
Product: Red Hat Enterprise Linux 7 Reporter: Frank Sorenson <fsorenso>
Component: kernelAssignee: Scott Mayhew <smayhew>
kernel sub component: NFS QA Contact: Zhi Li <yieli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: agaikwad, alexander.hass, bcodding, chorn, devin.bougie, dhoward, dwysocha, jbyrd, jenander, jiyin, kim-thomas.rehmann, knakayam, linux, malcolm.thoreau.frazier, mkardeh, mlinden, muneda.takahiro, nfs-maint, nmurray, rkieley, rrueck, smayhew, toracat, tumeya, xzhou, yieli, yoyang
Version: 7.4Keywords: Reproducer, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-825.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1530134 1530135 (view as bug list) Environment:
Last Closed: 2018-04-10 22:41:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1522983, 1530134, 1530135    

Description Frank Sorenson 2017-10-25 20:29:28 UTC
Description of problem:

A race condition between processes attempting to mount can lead to a deadlock between tasks in the nfs client init code


Version-Release number of selected component (if applicable):

kernel 3.10.0-693.10.1.el7.x86_64


How reproducible:

reproducer below


Steps to Reproduce:

1 nfs client, 2 nfs servers

on nfs servers:
    # for i in $(seq 1 25) ; do mkdir /exports/dir$i ; done
    # echo "/exports *(rw,no_root_squash)" >> /etc/exports
    # exportfs -ar

on nfs client:
    # for i in $(seq 1 25) ; do mkdir -p /mnt/test$i ; done

    # hosts=(server1 server2)
    # for i in $(seq 1 25) ; do
        hostnum=$(($i % 2))
        mount.nfs ${hosts[$hostnum]}:/exports/dir$i /mnt/test$i -o vers=4,minorversion=1 &
      done


Actual results:

hang with a process waiting on nfs_client_active_wq while holding nfs_clid_init_mutex and process trying to take nfs_clid_init_mutex


Expected results:

no hang


Additional info:

vmcore details to follow

Comment 3 Frank Sorenson 2017-10-25 21:15:39 UTC
crash> ps -S
  RU: 6
  IN: 156
  UN: 25

the mount.nfs processes are all blocked:

crash> ps -m mount.nfs | grep -c UN
25

crash> ps -m mount.nfs | tail
[0 00:00:49.473] [UN]  PID: 12061  TASK: ffff88005c43af70  CPU: 3   COMMAND: "mount.nfs"
[0 00:00:49.514] [UN]  PID: 12066  TASK: ffff88001e148000  CPU: 0   COMMAND: "mount.nfs"
[0 00:00:49.542] [UN]  PID: 12063  TASK: ffff88001e14bf40  CPU: 2   COMMAND: "mount.nfs"
[0 00:00:49.478] [UN]  PID: 12059  TASK: ffff88005c438fd0  CPU: 1   COMMAND: "mount.nfs"
[0 00:00:49.478] [UN]  PID: 12053  TASK: ffff880036439fa0  CPU: 1   COMMAND: "mount.nfs"
[0 00:00:49.477] [UN]  PID: 12054  TASK: ffff88003643dee0  CPU: 3   COMMAND: "mount.nfs"
[0 00:00:49.543] [UN]  PID: 12060  TASK: ffff88005c43dee0  CPU: 2   COMMAND: "mount.nfs"
[0 00:00:49.543] [UN]  PID: 12056  TASK: ffff88005c43eeb0  CPU: 2   COMMAND: "mount.nfs"
[0 00:00:49.517] [UN]  PID: 12065  TASK: ffff88001e14af70  CPU: 0   COMMAND: "mount.nfs"
[0 00:00:49.482] [UN]  PID: 12055  TASK: ffff88005c43cf10  CPU: 1   COMMAND: "mount.nfs"



PID: 12055  TASK: ffff88005c43cf10  CPU: 1   COMMAND: "mount.nfs"
 #0 [ffff880036453a60] __schedule at ffffffff816a9015
 #1 [ffff880036453ac8] schedule at ffffffff816a9599
 #2 [ffff880036453ad8] nfs_wait_client_init_complete at ffffffffc059b333 [nfs]
 #3 [ffff880036453b28] nfs_get_client at ffffffffc059c5af [nfs]
 #4 [ffff880036453b80] nfs4_set_client at ffffffffc0661de8 [nfsv4]
 #5 [ffff880036453c00] nfs4_create_server at ffffffffc06635de [nfsv4]
 #6 [ffff880036453c88] nfs4_remote_mount at ffffffffc065a7ce [nfsv4]
 #7 [ffff880036453cb0] mount_fs at ffffffff81205389
 #8 [ffff880036453cf8] vfs_kern_mount at ffffffff81221e57
 #9 [ffff880036453d30] nfs_do_root_mount at ffffffffc065a6f6 [nfsv4]
#10 [ffff880036453d70] nfs4_try_mount at ffffffffc065aaf4 [nfsv4]
#11 [ffff880036453da0] nfs_fs_mount at ffffffffc05a9a95 [nfs]
#12 [ffff880036453e28] mount_fs at ffffffff81205389
#13 [ffff880036453e70] vfs_kern_mount at ffffffff81221e57
#14 [ffff880036453ea8] do_mount at ffffffff81224363
#15 [ffff880036453f28] sys_mount at ffffffff81224fa6
#16 [ffff880036453f80] system_call_fastpath at ffffffff816b5089

this task is in fs/nfs/client.c:
int nfs_wait_client_init_complete(const struct nfs_client *clp) {                       
        return wait_event_killable(nfs_client_active_wq,
                        nfs_client_init_is_complete(clp));
}               


the most recently-blocked process:

PID: 12051  TASK: ffff880036438fd0  CPU: 0   COMMAND: "mount.nfs"
 #0 [ffff88005d9c78a0] __schedule at ffffffff816a9015
 #1 [ffff88005d9c7908] schedule at ffffffff816a9599
 #2 [ffff88005d9c7918] nfs_wait_client_init_complete at ffffffffc059b333 [nfs]
 #3 [ffff88005d9c7968] nfs41_walk_client_list at ffffffffc066310d [nfsv4]
 #4 [ffff88005d9c79b0] nfs41_discover_server_trunking at ffffffffc06597a6 [nfsv4]
 #5 [ffff88005d9c79e0] nfs4_discover_server_trunking at ffffffffc0659d8d [nfsv4]
 #6 [ffff88005d9c7a30] nfs4_init_client at ffffffffc0662906 [nfsv4]
 #7 [ffff88005d9c7b28] nfs_get_client at ffffffffc059c6d6 [nfs]
 #8 [ffff88005d9c7b80] nfs4_set_client at ffffffffc0661de8 [nfsv4]
 #9 [ffff88005d9c7c00] nfs4_create_server at ffffffffc06635de [nfsv4]
#10 [ffff88005d9c7c88] nfs4_remote_mount at ffffffffc065a7ce [nfsv4]
#11 [ffff88005d9c7cb0] mount_fs at ffffffff81205389
#12 [ffff88005d9c7cf8] vfs_kern_mount at ffffffff81221e57
#13 [ffff88005d9c7d30] nfs_do_root_mount at ffffffffc065a6f6 [nfsv4]
#14 [ffff88005d9c7d70] nfs4_try_mount at ffffffffc065aaf4 [nfsv4]
#15 [ffff88005d9c7da0] nfs_fs_mount at ffffffffc05a9a95 [nfs]
#16 [ffff88005d9c7e28] mount_fs at ffffffff81205389
#17 [ffff88005d9c7e70] vfs_kern_mount at ffffffff81221e57
#18 [ffff88005d9c7ea8] do_mount at ffffffff81224363
#19 [ffff88005d9c7f28] sys_mount at ffffffff81224fa6
#20 [ffff88005d9c7f80] system_call_fastpath at ffffffff816b5089

this is waiting for the nfs_client 'pos' to complete init in fs/nfs/nfs4client.c
int nfs41_walk_client_list(struct nfs_client *new,
                           struct nfs_client **result,
                           struct rpc_cred *cred) {
...
                        status = nfs_wait_client_init_complete(pos);
where:
bool nfs_client_init_is_complete(const struct nfs_client *clp)
{
        return clp->cl_cons_state <= NFS_CS_READY;
}

 #2 [ffff88005d9c7918] nfs_wait_client_init_complete at ffffffffc059b333 [nfs]
...
    ffff88005d9c7950: ffff88001dc62400 ffff88001d8a7400  << %rbx and %r12
    ffff88005d9c7960: ffff88005d9c79a8 ffffffffc066310d  << %rbp and return address
 #3 [ffff88005d9c7968] nfs41_walk_client_list at ffffffffc066310d [nfsv4]

nfs_client is ffff88001dc62400

crash> struct nfs_client.cl_cons_state ffff88001dc62400
  cl_cons_state = 0x2

include/linux/nfs_fs_sb.h:#define NFS_CS_READY		0		/* ready to be used */


PID: 12052  TASK: ffff88003643cf10  CPU: 2   COMMAND: "mount.nfs"
 #0 [ffff88005ceeb8f0] __schedule at ffffffff816a9015
 #1 [ffff88005ceeb958] schedule_preempt_disabled at ffffffff816aa4b9
 #2 [ffff88005ceeb968] __mutex_lock_slowpath at ffffffff816a83e7
 #3 [ffff88005ceeb9c8] mutex_lock at ffffffff816a77ff
 #4 [ffff88005ceeb9e0] nfs4_discover_server_trunking at ffffffffc0659d58 [nfsv4]
 #5 [ffff88005ceeba30] nfs4_init_client at ffffffffc0662906 [nfsv4]
 #6 [ffff88005ceebb28] nfs_get_client at ffffffffc059c6d6 [nfs]
 #7 [ffff88005ceebb80] nfs4_set_client at ffffffffc0661de8 [nfsv4]
 #8 [ffff88005ceebc00] nfs4_create_server at ffffffffc06635de [nfsv4]
 #9 [ffff88005ceebc88] nfs4_remote_mount at ffffffffc065a7ce [nfsv4]
#10 [ffff88005ceebcb0] mount_fs at ffffffff81205389
#11 [ffff88005ceebcf8] vfs_kern_mount at ffffffff81221e57
#12 [ffff88005ceebd30] nfs_do_root_mount at ffffffffc065a6f6 [nfsv4]
#13 [ffff88005ceebd70] nfs4_try_mount at ffffffffc065aaf4 [nfsv4]
#14 [ffff88005ceebda0] nfs_fs_mount at ffffffffc05a9a95 [nfs]
#15 [ffff88005ceebe28] mount_fs at ffffffff81205389
#16 [ffff88005ceebe70] vfs_kern_mount at ffffffff81221e57
#17 [ffff88005ceebea8] do_mount at ffffffff81224363
#18 [ffff88005ceebf28] sys_mount at ffffffff81224fa6
#19 [ffff88005ceebf80] system_call_fastpath at ffffffff816b5089

crash> p &nfs_client_active_wq
$4 = (wait_queue_head_t *) 0xffffffffc05c6070 <nfs_client_active_wq>

crash> wait_queue_head_t 0xffffffffc05c6070 -o
typedef struct __wait_queue_head {
  [ffffffffc05c6070] spinlock_t lock;
  [ffffffffc05c6078] struct list_head task_list;

crash> list -H ffffffffc05c6078 | wc -l
24



this task is in fs/nfs/nfs4state.c:
int nfs4_discover_server_trunking(struct nfs_client *clp,
                                  struct nfs_client **result) {       
        const struct nfs4_state_recovery_ops *ops =
                                clp->cl_mvops->reboot_recovery_ops;
        struct rpc_clnt *clnt;
        struct rpc_cred *cred;
        int i, status;  
                        
        dprintk("NFS: %s: testing '%s'\n", __func__, clp->cl_hostname);
                        
        clnt = clp->cl_rpcclient;
        i = 0;

        mutex_lock(&nfs_clid_init_mutex);  <<<<

crash> p &nfs_clid_init_mutex
$2 = (struct mutex *) 0xffffffffc06a7e00 <nfs_clid_init_mutex>

crash> crashinfo --mutex=0xffffffffc06a7e00
 ---<struct mutex 0xffffffffc06a7e00>----
    Waiters on this mutex:
           12052  mount.nfs
    Owner of this mutex: pid=12051 cmd=mount.nfs

Comment 7 Benjamin Coddington 2017-10-27 14:28:27 UTC
So we hold the nfs_clid_init_mutex while walking the client list, and possibly sleeping waiting for a matching client to initialize, but that matching client is waiting for the nfs_clid_mutex in order to complete initialization.

I think we'll find the problem upstream too, unfortunately.  We're not that far from upstream right now, and the changes we do have are mostly cosmetic.  I'm marking this one as needing work and attention.

The problem seems to be that for v4.1, we set NFS_CS_SESSION_INITING on a client, but then we're blocked from setting up the session by the mutex.  The process holding the mutex then sleeps indefinitely waiting for that client to be set up.

One fix might be to move the setting of NFS_CS_SESSION_INITING into the mutex protected region.  Here's a completely untested stab at that.

From e795e5f7523e08b17ef5ced761312c557149a0dc Mon Sep 17 00:00:00 2001
Message-Id: <e795e5f7523e08b17ef5ced761312c557149a0dc.1509114333.git.bcodding@redhat.com>
From: Benjamin Coddington <bcodding@redhat.com>
Date: Fri, 27 Oct 2017 10:13:21 -0400
Subject: [PATCH 1/2] Move client ready set into the v4.0 path

---
 fs/nfs/nfs4client.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index f08b7f95dabd..ec0f4eca2477 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -293,6 +293,7 @@ int nfs40_init_client(struct nfs_client *clp)
        }
 
        clp->cl_slot_tbl = tbl;
+       nfs_mark_client_ready(clp, NFS_CS_READY);
        return 0;
 }
 
@@ -409,9 +410,6 @@ struct nfs_client *nfs4_init_client(struct nfs_client *clp,
        if (error < 0)
                goto error;
 
-       if (!nfs4_has_session(clp))
-               nfs_mark_client_ready(clp, NFS_CS_READY);
-
        error = nfs4_discover_server_trunking(clp, &old);
        if (error < 0)
                goto error;
-- 
2.9.3


From c609e77410f3b7e155553de4cbce0c5258a46493 Mon Sep 17 00:00:00 2001
Message-Id: <c609e77410f3b7e155553de4cbce0c5258a46493.1509114333.git.bcodding@redhat.com>
In-Reply-To: <e795e5f7523e08b17ef5ced761312c557149a0dc.1509114333.git.bcodding@redhat.com>
References: <e795e5f7523e08b17ef5ced761312c557149a0dc.1509114333.git.bcodding@redhat.com>
From: Benjamin Coddington <bcodding@redhat.com>
Date: Fri, 27 Oct 2017 10:20:23 -0400
Subject: [PATCH 2/2] nfs: Fix a deadlock where racing mounts wait and hold
 nfs_clid_init_mutex

We must set NFS_CS_SESSION_INITING within nfs_clid_init_mutex region, or
racing mounts can wait for a client to be ready while holding
nfs_clid_init_mutex, but this mutex blocks the client from setting up the
session.
---
 fs/nfs/nfs4client.c | 12 ++++++++++--
 fs/nfs/nfs4state.c  |  6 ++----
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index ec0f4eca2477..7a9d3df6bea3 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -20,6 +20,8 @@
 
 #define NFSDBG_FACILITY                NFSDBG_CLIENT
 
+static DEFINE_MUTEX(nfs_clid_init_mutex);
+
 /*
  * Get a unique NFSv4.0 callback identifier which will be used
  * by the V4.0 callback service to lookup the nfs_client struct
@@ -406,19 +408,25 @@ struct nfs_client *nfs4_init_client(struct nfs_client *clp,
        }
        __set_bit(NFS_CS_IDMAP, &clp->cl_res_state);
 
+       mutex_lock(&nfs_clid_init_mutex);
+
        error = nfs4_init_client_minor_version(clp);
        if (error < 0)
-               goto error;
+               goto error_unlock;
 
        error = nfs4_discover_server_trunking(clp, &old);
        if (error < 0)
-               goto error;
+               goto error_unlock;
+
+       mutex_unlock(&nfs_clid_init_mutex);
 
        if (clp != old)
                clp->cl_preserve_clid = true;
        nfs_put_client(clp);
        return old;
 
+error_unlock:
+       mutex_unlock(&nfs_clid_init_mutex);
 error:
        nfs_mark_client_ready(clp, error);
        nfs_put_client(clp);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index e98c688d07ca..715b0d627d01 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2102,12 +2102,11 @@ int nfs4_discover_server_trunking(struct nfs_client *clp,
        clnt = clp->cl_rpcclient;
        i = 0;
 
-       mutex_lock(&nfs_clid_init_mutex);
 again:
        status  = -ENOENT;
        cred = nfs4_get_clid_cred(clp);
        if (cred == NULL)
-               goto out_unlock;
+               goto out;
 
        status = ops->detect_trunking(clp, result, cred);
        put_rpccred(cred);


I'll brew this up and let's test it.

Comment 26 Frank Sorenson 2017-11-30 22:17:32 UTC
Please be aware that this bugzilla is now public, due to expanding interest.

Comment 28 Scott Mayhew 2017-12-01 13:03:42 UTC
The current status is that I'm still working on a fix that will be accepted upstream.
https://www.spinics.net/lists/linux-nfs/msg66321.html

Comment 43 Rafael Aquini 2017-12-27 04:13:35 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 46 Rafael Aquini 2017-12-27 12:35:37 UTC
Patch(es) available on kernel-3.10.0-825.el7

Comment 50 Zhi Li 2018-01-08 02:39:40 UTC
Moving to VERIFIED according to the Comment #45.

Comment 51 Dave Wysochanski 2018-01-16 14:51:34 UTC
*** Bug 1428939 has been marked as a duplicate of this bug. ***

Comment 52 errata-xmlrpc 2018-04-10 22:41:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1062