Bug 1506382 - deadlock in nfs v4 client init
deadlock in nfs v4 client init
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.4
Unspecified Unspecified
urgent Severity high
: rc
: ---
Assigned To: Scott Mayhew
Zhi Li
: Reproducer, ZStream
: 1428939 (view as bug list)
Depends On:
Blocks: 1522983 1530134 1530135
  Show dependency treegraph
 
Reported: 2017-10-25 16:29 EDT by Frank Sorenson
Modified: 2018-05-23 04:13 EDT (History)
26 users (show)

See Also:
Fixed In Version: kernel-3.10.0-825.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1530134 1530135 (view as bug list)
Environment:
Last Closed: 2018-04-10 18:41:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3262061 None None None 2018-01-16 09:51 EST
Red Hat Product Errata RHSA-2018:1062 None None None 2018-04-10 18:43 EDT

  None (edit)
Description Frank Sorenson 2017-10-25 16:29:28 EDT
Description of problem:

A race condition between processes attempting to mount can lead to a deadlock between tasks in the nfs client init code


Version-Release number of selected component (if applicable):

kernel 3.10.0-693.10.1.el7.x86_64


How reproducible:

reproducer below


Steps to Reproduce:

1 nfs client, 2 nfs servers

on nfs servers:
    # for i in $(seq 1 25) ; do mkdir /exports/dir$i ; done
    # echo "/exports *(rw,no_root_squash)" >> /etc/exports
    # exportfs -ar

on nfs client:
    # for i in $(seq 1 25) ; do mkdir -p /mnt/test$i ; done

    # hosts=(server1 server2)
    # for i in $(seq 1 25) ; do
        hostnum=$(($i % 2))
        mount.nfs ${hosts[$hostnum]}:/exports/dir$i /mnt/test$i -o vers=4,minorversion=1 &
      done


Actual results:

hang with a process waiting on nfs_client_active_wq while holding nfs_clid_init_mutex and process trying to take nfs_clid_init_mutex


Expected results:

no hang


Additional info:

vmcore details to follow
Comment 3 Frank Sorenson 2017-10-25 17:15:39 EDT
crash> ps -S
  RU: 6
  IN: 156
  UN: 25

the mount.nfs processes are all blocked:

crash> ps -m mount.nfs | grep -c UN
25

crash> ps -m mount.nfs | tail
[0 00:00:49.473] [UN]  PID: 12061  TASK: ffff88005c43af70  CPU: 3   COMMAND: "mount.nfs"
[0 00:00:49.514] [UN]  PID: 12066  TASK: ffff88001e148000  CPU: 0   COMMAND: "mount.nfs"
[0 00:00:49.542] [UN]  PID: 12063  TASK: ffff88001e14bf40  CPU: 2   COMMAND: "mount.nfs"
[0 00:00:49.478] [UN]  PID: 12059  TASK: ffff88005c438fd0  CPU: 1   COMMAND: "mount.nfs"
[0 00:00:49.478] [UN]  PID: 12053  TASK: ffff880036439fa0  CPU: 1   COMMAND: "mount.nfs"
[0 00:00:49.477] [UN]  PID: 12054  TASK: ffff88003643dee0  CPU: 3   COMMAND: "mount.nfs"
[0 00:00:49.543] [UN]  PID: 12060  TASK: ffff88005c43dee0  CPU: 2   COMMAND: "mount.nfs"
[0 00:00:49.543] [UN]  PID: 12056  TASK: ffff88005c43eeb0  CPU: 2   COMMAND: "mount.nfs"
[0 00:00:49.517] [UN]  PID: 12065  TASK: ffff88001e14af70  CPU: 0   COMMAND: "mount.nfs"
[0 00:00:49.482] [UN]  PID: 12055  TASK: ffff88005c43cf10  CPU: 1   COMMAND: "mount.nfs"



PID: 12055  TASK: ffff88005c43cf10  CPU: 1   COMMAND: "mount.nfs"
 #0 [ffff880036453a60] __schedule at ffffffff816a9015
 #1 [ffff880036453ac8] schedule at ffffffff816a9599
 #2 [ffff880036453ad8] nfs_wait_client_init_complete at ffffffffc059b333 [nfs]
 #3 [ffff880036453b28] nfs_get_client at ffffffffc059c5af [nfs]
 #4 [ffff880036453b80] nfs4_set_client at ffffffffc0661de8 [nfsv4]
 #5 [ffff880036453c00] nfs4_create_server at ffffffffc06635de [nfsv4]
 #6 [ffff880036453c88] nfs4_remote_mount at ffffffffc065a7ce [nfsv4]
 #7 [ffff880036453cb0] mount_fs at ffffffff81205389
 #8 [ffff880036453cf8] vfs_kern_mount at ffffffff81221e57
 #9 [ffff880036453d30] nfs_do_root_mount at ffffffffc065a6f6 [nfsv4]
#10 [ffff880036453d70] nfs4_try_mount at ffffffffc065aaf4 [nfsv4]
#11 [ffff880036453da0] nfs_fs_mount at ffffffffc05a9a95 [nfs]
#12 [ffff880036453e28] mount_fs at ffffffff81205389
#13 [ffff880036453e70] vfs_kern_mount at ffffffff81221e57
#14 [ffff880036453ea8] do_mount at ffffffff81224363
#15 [ffff880036453f28] sys_mount at ffffffff81224fa6
#16 [ffff880036453f80] system_call_fastpath at ffffffff816b5089

this task is in fs/nfs/client.c:
int nfs_wait_client_init_complete(const struct nfs_client *clp) {                       
        return wait_event_killable(nfs_client_active_wq,
                        nfs_client_init_is_complete(clp));
}               


the most recently-blocked process:

PID: 12051  TASK: ffff880036438fd0  CPU: 0   COMMAND: "mount.nfs"
 #0 [ffff88005d9c78a0] __schedule at ffffffff816a9015
 #1 [ffff88005d9c7908] schedule at ffffffff816a9599
 #2 [ffff88005d9c7918] nfs_wait_client_init_complete at ffffffffc059b333 [nfs]
 #3 [ffff88005d9c7968] nfs41_walk_client_list at ffffffffc066310d [nfsv4]
 #4 [ffff88005d9c79b0] nfs41_discover_server_trunking at ffffffffc06597a6 [nfsv4]
 #5 [ffff88005d9c79e0] nfs4_discover_server_trunking at ffffffffc0659d8d [nfsv4]
 #6 [ffff88005d9c7a30] nfs4_init_client at ffffffffc0662906 [nfsv4]
 #7 [ffff88005d9c7b28] nfs_get_client at ffffffffc059c6d6 [nfs]
 #8 [ffff88005d9c7b80] nfs4_set_client at ffffffffc0661de8 [nfsv4]
 #9 [ffff88005d9c7c00] nfs4_create_server at ffffffffc06635de [nfsv4]
#10 [ffff88005d9c7c88] nfs4_remote_mount at ffffffffc065a7ce [nfsv4]
#11 [ffff88005d9c7cb0] mount_fs at ffffffff81205389
#12 [ffff88005d9c7cf8] vfs_kern_mount at ffffffff81221e57
#13 [ffff88005d9c7d30] nfs_do_root_mount at ffffffffc065a6f6 [nfsv4]
#14 [ffff88005d9c7d70] nfs4_try_mount at ffffffffc065aaf4 [nfsv4]
#15 [ffff88005d9c7da0] nfs_fs_mount at ffffffffc05a9a95 [nfs]
#16 [ffff88005d9c7e28] mount_fs at ffffffff81205389
#17 [ffff88005d9c7e70] vfs_kern_mount at ffffffff81221e57
#18 [ffff88005d9c7ea8] do_mount at ffffffff81224363
#19 [ffff88005d9c7f28] sys_mount at ffffffff81224fa6
#20 [ffff88005d9c7f80] system_call_fastpath at ffffffff816b5089

this is waiting for the nfs_client 'pos' to complete init in fs/nfs/nfs4client.c
int nfs41_walk_client_list(struct nfs_client *new,
                           struct nfs_client **result,
                           struct rpc_cred *cred) {
...
                        status = nfs_wait_client_init_complete(pos);
where:
bool nfs_client_init_is_complete(const struct nfs_client *clp)
{
        return clp->cl_cons_state <= NFS_CS_READY;
}

 #2 [ffff88005d9c7918] nfs_wait_client_init_complete at ffffffffc059b333 [nfs]
...
    ffff88005d9c7950: ffff88001dc62400 ffff88001d8a7400  << %rbx and %r12
    ffff88005d9c7960: ffff88005d9c79a8 ffffffffc066310d  << %rbp and return address
 #3 [ffff88005d9c7968] nfs41_walk_client_list at ffffffffc066310d [nfsv4]

nfs_client is ffff88001dc62400

crash> struct nfs_client.cl_cons_state ffff88001dc62400
  cl_cons_state = 0x2

include/linux/nfs_fs_sb.h:#define NFS_CS_READY		0		/* ready to be used */


PID: 12052  TASK: ffff88003643cf10  CPU: 2   COMMAND: "mount.nfs"
 #0 [ffff88005ceeb8f0] __schedule at ffffffff816a9015
 #1 [ffff88005ceeb958] schedule_preempt_disabled at ffffffff816aa4b9
 #2 [ffff88005ceeb968] __mutex_lock_slowpath at ffffffff816a83e7
 #3 [ffff88005ceeb9c8] mutex_lock at ffffffff816a77ff
 #4 [ffff88005ceeb9e0] nfs4_discover_server_trunking at ffffffffc0659d58 [nfsv4]
 #5 [ffff88005ceeba30] nfs4_init_client at ffffffffc0662906 [nfsv4]
 #6 [ffff88005ceebb28] nfs_get_client at ffffffffc059c6d6 [nfs]
 #7 [ffff88005ceebb80] nfs4_set_client at ffffffffc0661de8 [nfsv4]
 #8 [ffff88005ceebc00] nfs4_create_server at ffffffffc06635de [nfsv4]
 #9 [ffff88005ceebc88] nfs4_remote_mount at ffffffffc065a7ce [nfsv4]
#10 [ffff88005ceebcb0] mount_fs at ffffffff81205389
#11 [ffff88005ceebcf8] vfs_kern_mount at ffffffff81221e57
#12 [ffff88005ceebd30] nfs_do_root_mount at ffffffffc065a6f6 [nfsv4]
#13 [ffff88005ceebd70] nfs4_try_mount at ffffffffc065aaf4 [nfsv4]
#14 [ffff88005ceebda0] nfs_fs_mount at ffffffffc05a9a95 [nfs]
#15 [ffff88005ceebe28] mount_fs at ffffffff81205389
#16 [ffff88005ceebe70] vfs_kern_mount at ffffffff81221e57
#17 [ffff88005ceebea8] do_mount at ffffffff81224363
#18 [ffff88005ceebf28] sys_mount at ffffffff81224fa6
#19 [ffff88005ceebf80] system_call_fastpath at ffffffff816b5089

crash> p &nfs_client_active_wq
$4 = (wait_queue_head_t *) 0xffffffffc05c6070 <nfs_client_active_wq>

crash> wait_queue_head_t 0xffffffffc05c6070 -o
typedef struct __wait_queue_head {
  [ffffffffc05c6070] spinlock_t lock;
  [ffffffffc05c6078] struct list_head task_list;

crash> list -H ffffffffc05c6078 | wc -l
24



this task is in fs/nfs/nfs4state.c:
int nfs4_discover_server_trunking(struct nfs_client *clp,
                                  struct nfs_client **result) {       
        const struct nfs4_state_recovery_ops *ops =
                                clp->cl_mvops->reboot_recovery_ops;
        struct rpc_clnt *clnt;
        struct rpc_cred *cred;
        int i, status;  
                        
        dprintk("NFS: %s: testing '%s'\n", __func__, clp->cl_hostname);
                        
        clnt = clp->cl_rpcclient;
        i = 0;

        mutex_lock(&nfs_clid_init_mutex);  <<<<

crash> p &nfs_clid_init_mutex
$2 = (struct mutex *) 0xffffffffc06a7e00 <nfs_clid_init_mutex>

crash> crashinfo --mutex=0xffffffffc06a7e00
 ---<struct mutex 0xffffffffc06a7e00>----
    Waiters on this mutex:
           12052  mount.nfs
    Owner of this mutex: pid=12051 cmd=mount.nfs
Comment 7 Benjamin Coddington 2017-10-27 10:28:27 EDT
So we hold the nfs_clid_init_mutex while walking the client list, and possibly sleeping waiting for a matching client to initialize, but that matching client is waiting for the nfs_clid_mutex in order to complete initialization.

I think we'll find the problem upstream too, unfortunately.  We're not that far from upstream right now, and the changes we do have are mostly cosmetic.  I'm marking this one as needing work and attention.

The problem seems to be that for v4.1, we set NFS_CS_SESSION_INITING on a client, but then we're blocked from setting up the session by the mutex.  The process holding the mutex then sleeps indefinitely waiting for that client to be set up.

One fix might be to move the setting of NFS_CS_SESSION_INITING into the mutex protected region.  Here's a completely untested stab at that.

From e795e5f7523e08b17ef5ced761312c557149a0dc Mon Sep 17 00:00:00 2001
Message-Id: <e795e5f7523e08b17ef5ced761312c557149a0dc.1509114333.git.bcodding@redhat.com>
From: Benjamin Coddington <bcodding@redhat.com>
Date: Fri, 27 Oct 2017 10:13:21 -0400
Subject: [PATCH 1/2] Move client ready set into the v4.0 path

---
 fs/nfs/nfs4client.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index f08b7f95dabd..ec0f4eca2477 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -293,6 +293,7 @@ int nfs40_init_client(struct nfs_client *clp)
        }
 
        clp->cl_slot_tbl = tbl;
+       nfs_mark_client_ready(clp, NFS_CS_READY);
        return 0;
 }
 
@@ -409,9 +410,6 @@ struct nfs_client *nfs4_init_client(struct nfs_client *clp,
        if (error < 0)
                goto error;
 
-       if (!nfs4_has_session(clp))
-               nfs_mark_client_ready(clp, NFS_CS_READY);
-
        error = nfs4_discover_server_trunking(clp, &old);
        if (error < 0)
                goto error;
-- 
2.9.3


From c609e77410f3b7e155553de4cbce0c5258a46493 Mon Sep 17 00:00:00 2001
Message-Id: <c609e77410f3b7e155553de4cbce0c5258a46493.1509114333.git.bcodding@redhat.com>
In-Reply-To: <e795e5f7523e08b17ef5ced761312c557149a0dc.1509114333.git.bcodding@redhat.com>
References: <e795e5f7523e08b17ef5ced761312c557149a0dc.1509114333.git.bcodding@redhat.com>
From: Benjamin Coddington <bcodding@redhat.com>
Date: Fri, 27 Oct 2017 10:20:23 -0400
Subject: [PATCH 2/2] nfs: Fix a deadlock where racing mounts wait and hold
 nfs_clid_init_mutex

We must set NFS_CS_SESSION_INITING within nfs_clid_init_mutex region, or
racing mounts can wait for a client to be ready while holding
nfs_clid_init_mutex, but this mutex blocks the client from setting up the
session.
---
 fs/nfs/nfs4client.c | 12 ++++++++++--
 fs/nfs/nfs4state.c  |  6 ++----
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index ec0f4eca2477..7a9d3df6bea3 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -20,6 +20,8 @@
 
 #define NFSDBG_FACILITY                NFSDBG_CLIENT
 
+static DEFINE_MUTEX(nfs_clid_init_mutex);
+
 /*
  * Get a unique NFSv4.0 callback identifier which will be used
  * by the V4.0 callback service to lookup the nfs_client struct
@@ -406,19 +408,25 @@ struct nfs_client *nfs4_init_client(struct nfs_client *clp,
        }
        __set_bit(NFS_CS_IDMAP, &clp->cl_res_state);
 
+       mutex_lock(&nfs_clid_init_mutex);
+
        error = nfs4_init_client_minor_version(clp);
        if (error < 0)
-               goto error;
+               goto error_unlock;
 
        error = nfs4_discover_server_trunking(clp, &old);
        if (error < 0)
-               goto error;
+               goto error_unlock;
+
+       mutex_unlock(&nfs_clid_init_mutex);
 
        if (clp != old)
                clp->cl_preserve_clid = true;
        nfs_put_client(clp);
        return old;
 
+error_unlock:
+       mutex_unlock(&nfs_clid_init_mutex);
 error:
        nfs_mark_client_ready(clp, error);
        nfs_put_client(clp);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index e98c688d07ca..715b0d627d01 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2102,12 +2102,11 @@ int nfs4_discover_server_trunking(struct nfs_client *clp,
        clnt = clp->cl_rpcclient;
        i = 0;
 
-       mutex_lock(&nfs_clid_init_mutex);
 again:
        status  = -ENOENT;
        cred = nfs4_get_clid_cred(clp);
        if (cred == NULL)
-               goto out_unlock;
+               goto out;
 
        status = ops->detect_trunking(clp, result, cred);
        put_rpccred(cred);


I'll brew this up and let's test it.
Comment 26 Frank Sorenson 2017-11-30 17:17:32 EST
Please be aware that this bugzilla is now public, due to expanding interest.
Comment 28 Scott Mayhew 2017-12-01 08:03:42 EST
The current status is that I'm still working on a fix that will be accepted upstream.
https://www.spinics.net/lists/linux-nfs/msg66321.html
Comment 43 Rafael Aquini 2017-12-26 23:13:35 EST
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing
Comment 46 Rafael Aquini 2017-12-27 07:35:37 EST
Patch(es) available on kernel-3.10.0-825.el7
Comment 50 Zhi Li 2018-01-07 21:39:40 EST
Moving to VERIFIED according to the Comment #45.
Comment 51 Dave Wysochanski 2018-01-16 09:51:34 EST
*** Bug 1428939 has been marked as a duplicate of this bug. ***
Comment 52 errata-xmlrpc 2018-04-10 18:41:36 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1062

Note You need to log in before you can comment on or make changes to this bug.