Here are the logs which confirmed the race: Connection starts here: ============================================================================================================================================================== [2016-10-17 07:07:05.164177] I [MSGID: 114020] [client.c:2356:notify] 4-mdcache-client-1: parent translators are ready, attempting connect on transport [2016-10-17 07:07:05.170943] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 4-mdcache-client-1: changing port to 49152 (from 0) [2016-10-17 07:07:05.176348] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 4-mdcache-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-10-17 07:07:05.177828] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 4-mdcache-client-1: Connected to mdcache-client-1, attached to re mote volume '/bricks/brick0/mdcache'. [2016-10-17 07:07:05.177849] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 4-mdcache-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-10-17 07:07:05.178187] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 4-mdcache-client-1: Server lk version = 1 [2016-10-17 07:39:41.137951] W [rpc-clnt.c:717:rpc_clnt_handle_cbk] 4-mdcache-client-1: RPC call decoding failed Seems like connection went down here: ============================================================================================================================================================== [2016-10-17 07:39:41.138350] E [MSGID: 114031] [client-rpc-fops.c:1654:client3_3_entrylk_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint i s not connected] [2016-10-17 07:39:41.138564] W [MSGID: 114031] [client-rpc-fops.c:1102:client3_3_getxattr_cbk] 4-mdcache-client-1: remote operation failed. Path: <gfid:0000000 0-0000-0000-0000-000000000001> (00000000-0000-0000-0000-000000000001). Key: trusted.glusterfs.pathinfo [Transport endpoint is not connected] [2016-10-17 07:39:41.138720] W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected] [2016-10-17 07:39:41.142014] I [socket.c:3391:socket_submit_request] 4-mdcache-client-1: not connected (priv->connected = 0) ..... We see logs of new connection here: ============================================================================================================================================================== [2016-10-17 07:39:41.145226] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 4-mdcache-client-1: changing port to 49152 (from 0) [2016-10-17 07:39:41.152533] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 4-mdcache-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-10-17 07:39:41.152761] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 4-mdcache-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] [2016-10-17 07:39:41.154742] W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected] [2016-10-17 07:39:41.155498] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 4-mdcache-client-1: Connected to mdcache-client-1, attached to remote volume '/bricks/brick0/mdcache'. [2016-10-17 07:39:41.155531] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 4-mdcache-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-10-17 07:39:41.155546] I [MSGID: 114042] [client-handshake.c:1053:client_post_handshake] 4-mdcache-client-1: 22 fds open - Delaying child_up until they are re-opened [2016-10-17 07:39:41.156306] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 4-mdcache-client-1: reopendir on <gfid:00000000-0000-0000-0000-000000000001> succeeded (fd = 0) [2016-10-17 07:39:41.156467] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 4-mdcache-client-1: reopendir on <gfid:00000000-0000-0000-0000-000000000001> succeeded (fd = 1) .... [2016-10-17 07:39:41.161125] I [MSGID: 114041] [client-handshake.c:675:client_child_up_reopen_done] 4-mdcache-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2016-10-17 07:39:41.161445] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 4-mdcache-client-1: Server lk version = 1 We see the disconnect notification for the first connection here: ============================================================================================================================================================== [2016-10-17 07:39:41.167534] W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected] The message "W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected]" repeated 20 times between [2016-10-17 07:39:41.167534] and [2016-10-17 07:39:41.345856] [2016-10-17 07:39:41.352243] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 4-mdcache-client-1: disconnected from mdcache-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2016-10-17 07:39:41.354562] W [MSGID: 114031] [client-rpc-fops.c:630:client3_3_unlink_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected] [2016-10-17 07:39:41.355605] E [MSGID: 114031] [client-rpc-fops.c:1766:client3_3_xattrop_cbk] 4-mdcache-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) The message "E [MSGID: 114031] [client-rpc-fops.c:1654:client3_3_entrylk_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected]" repeated 2 times between [2016-10-17 07:39:41.138350] and [2016-10-17 07:39:41.356994] So what ends up happening is even when there is a good connection, client xlator thinks it is disconnected, because disconnect message came after connect of the newer one. Pranith.
REVIEW: http://review.gluster.org/15717 (rpc: Fix the race between notification and reconnection) posted (#1) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/15717 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) ------ commit e6c38ae1d3f3c53f8739ab2db7c4ecfdbc58fc44 Author: Pranith Kumar K <pkarampu> Date: Wed Oct 19 15:50:50 2016 +0530 rpc: Fix the race between notification and reconnection Problem: There was a hang because unlock on an entry failed with ENOTCONN. Client thinks the connection is down where as server thinks the connection is up. This is the race we are seeing: 1) Connection from client to the brick disconnects. 2) Saved frames unwind is called which unwinds all frames that were wound before disconnect. 3) connection from client to the brick happens and setvolume. 4) Disconnect notification for the connection in 1) comes now and calls client_rpc_notify() which marks the connection to be offline even when the connection is up. This is happening because I/O can retrigger connection before disconnect notification is sent to the higher layers in rpc. Fix: Notify the higher layers that a disconnect happened and then go ahead with reconnect logic. For the logs which point to the information above check: https://bugzilla.redhat.com/show_bug.cgi?id=1386626#c1 Thanks to Raghavendra G for suggesting the correct fix. >BUG: 1386626 >Change-Id: I3c84ba1f17010bd69049fa88ec5f0ae431f8cda9 >Signed-off-by: Pranith Kumar K <pkarampu> >Reviewed-on: http://review.gluster.org/15681 >NetBSD-regression: NetBSD Build System <jenkins.org> >Reviewed-by: Niels de Vos <ndevos> >CentOS-regression: Gluster Build System <jenkins.org> >Smoke: Gluster Build System <jenkins.org> >Reviewed-by: Raghavendra G <rgowdapp> >(cherry picked from commit a6b63e11b7758cf1bfcb67985e25ec02845f0995) Change-Id: Ifa721193c26b70e26b47b7698c077da0ad5f2e1d BUG: 1388323 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15717 NetBSD-regression: NetBSD Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
REVIEW: http://review.gluster.org/15825 (Revert "rpc: Fix the race between notification and reconnection") posted (#1) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/15825 (Revert "rpc: Fix the race between notification and reconnection") posted (#2) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/15825 (Revert "rpc: Fix the race between notification and reconnection") posted (#3) for review on release-3.9 by Raghavendra Talur (rtalur)
COMMIT: http://review.gluster.org/15825 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) ------ commit 72ce732fa5c9979ea6352ea98451471b721410c1 Author: Pranith Kumar K <pkarampu> Date: Fri Nov 11 11:48:51 2016 +0530 Revert "rpc: Fix the race between notification and reconnection" This reverts commit e6c38ae1d3f3c53f8739ab2db7c4ecfdbc58fc44. Mount is intermittently failing with this patch, which means this patch introduced another race. So until we fix that reverting this patch and making the release. BUG: 1388323 Change-Id: I6e110caf38fcc6a309b2abdc864bc4fbdb3a7588 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15825 Reviewed-by: Raghavendra Talur <rtalur> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
REVIEW: http://review.gluster.org/16026 (rpc: fix for race between rpc and protocol/client) posted (#1) for review on release-3.9 by Rajesh Joseph (rjoseph)
COMMIT: http://review.gluster.org/16026 committed in release-3.9 by Raghavendra Talur (rtalur) ------ commit 63152c4037218ca9b89282547d6fe38399fb111a Author: Rajesh Joseph <rjoseph> Date: Sat Dec 3 01:10:51 2016 +0530 rpc: fix for race between rpc and protocol/client It is possible that the notification thread which notifies protocol/client layer about the disconnection is put to sleep and meanwhile, a fuse thread or a timer thread initiates and completes reconnection to the brick. The notification thread is then woken up and protocol/client layer updates its flags to indicate that network is disconnected. No reconnection is initiated because reconnection is rpc-lib layer's responsibility and its flags indicate that connection is connected. Fix: Serialize connect and disconnect notify > Credit: Raghavendra Talur <rtalur> > Reviewed-on: http://review.gluster.org/15916 > Reviewed-by: Raghavendra G <rgowdapp> > Smoke: Gluster Build System <jenkins.org> > Reviewed-by: Raghavendra Talur <rtalur> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> (cherry picked from commit aa22f24f5db7659387704998ae01520708869873) Change-Id: I8ff5d1a3283b47f5c26848a42016a40bc34ffc1d BUG: 1388323 Signed-off-by: Rajesh Joseph <rjoseph> Reviewed-on: http://review.gluster.org/16026 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Raghavendra Talur <rtalur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report. glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html [2] https://www.gluster.org/pipermail/gluster-users/