Here are the logs which confirmed the race: Connection starts here: ============================================================================================================================================================== [2016-10-17 07:07:05.164177] I [MSGID: 114020] [client.c:2356:notify] 4-mdcache-client-1: parent translators are ready, attempting connect on transport [2016-10-17 07:07:05.170943] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 4-mdcache-client-1: changing port to 49152 (from 0) [2016-10-17 07:07:05.176348] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 4-mdcache-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-10-17 07:07:05.177828] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 4-mdcache-client-1: Connected to mdcache-client-1, attached to re mote volume '/bricks/brick0/mdcache'. [2016-10-17 07:07:05.177849] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 4-mdcache-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-10-17 07:07:05.178187] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 4-mdcache-client-1: Server lk version = 1 [2016-10-17 07:39:41.137951] W [rpc-clnt.c:717:rpc_clnt_handle_cbk] 4-mdcache-client-1: RPC call decoding failed Seems like connection went down here: ============================================================================================================================================================== [2016-10-17 07:39:41.138350] E [MSGID: 114031] [client-rpc-fops.c:1654:client3_3_entrylk_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint i s not connected] [2016-10-17 07:39:41.138564] W [MSGID: 114031] [client-rpc-fops.c:1102:client3_3_getxattr_cbk] 4-mdcache-client-1: remote operation failed. Path: <gfid:0000000 0-0000-0000-0000-000000000001> (00000000-0000-0000-0000-000000000001). Key: trusted.glusterfs.pathinfo [Transport endpoint is not connected] [2016-10-17 07:39:41.138720] W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected] [2016-10-17 07:39:41.142014] I [socket.c:3391:socket_submit_request] 4-mdcache-client-1: not connected (priv->connected = 0) ..... We see logs of new connection here: ============================================================================================================================================================== [2016-10-17 07:39:41.145226] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 4-mdcache-client-1: changing port to 49152 (from 0) [2016-10-17 07:39:41.152533] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 4-mdcache-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-10-17 07:39:41.152761] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 4-mdcache-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] [2016-10-17 07:39:41.154742] W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected] [2016-10-17 07:39:41.155498] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 4-mdcache-client-1: Connected to mdcache-client-1, attached to remote volume '/bricks/brick0/mdcache'. [2016-10-17 07:39:41.155531] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 4-mdcache-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-10-17 07:39:41.155546] I [MSGID: 114042] [client-handshake.c:1053:client_post_handshake] 4-mdcache-client-1: 22 fds open - Delaying child_up until they are re-opened [2016-10-17 07:39:41.156306] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 4-mdcache-client-1: reopendir on <gfid:00000000-0000-0000-0000-000000000001> succeeded (fd = 0) [2016-10-17 07:39:41.156467] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 4-mdcache-client-1: reopendir on <gfid:00000000-0000-0000-0000-000000000001> succeeded (fd = 1) .... [2016-10-17 07:39:41.161125] I [MSGID: 114041] [client-handshake.c:675:client_child_up_reopen_done] 4-mdcache-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2016-10-17 07:39:41.161445] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 4-mdcache-client-1: Server lk version = 1 We see the disconnect notification for the first connection here: ============================================================================================================================================================== [2016-10-17 07:39:41.167534] W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected] The message "W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected]" repeated 20 times between [2016-10-17 07:39:41.167534] and [2016-10-17 07:39:41.345856] [2016-10-17 07:39:41.352243] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 4-mdcache-client-1: disconnected from mdcache-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2016-10-17 07:39:41.354562] W [MSGID: 114031] [client-rpc-fops.c:630:client3_3_unlink_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected] [2016-10-17 07:39:41.355605] E [MSGID: 114031] [client-rpc-fops.c:1766:client3_3_xattrop_cbk] 4-mdcache-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) The message "E [MSGID: 114031] [client-rpc-fops.c:1654:client3_3_entrylk_cbk] 4-mdcache-client-1: remote operation failed [Transport endpoint is not connected]" repeated 2 times between [2016-10-17 07:39:41.138350] and [2016-10-17 07:39:41.356994] So what ends up happening is even when there is a good connection, client xlator thinks it is disconnected, because disconnect message came after connect of the newer one. Pranith.
REVIEW: http://review.gluster.org/15681 (rpc: Fix the race between notification and reconnection) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/15681 (rpc: Fix the race between notification and reconnection) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/15681 committed in master by Raghavendra G (rgowdapp) ------ commit a6b63e11b7758cf1bfcb67985e25ec02845f0995 Author: Pranith Kumar K <pkarampu> Date: Wed Oct 19 15:50:50 2016 +0530 rpc: Fix the race between notification and reconnection Problem: There was a hang because unlock on an entry failed with ENOTCONN. Client thinks the connection is down where as server thinks the connection is up. This is the race we are seeing: 1) Connection from client to the brick disconnects. 2) Saved frames unwind is called which unwinds all frames that were wound before disconnect. 3) connection from client to the brick happens and setvolume. 4) Disconnect notification for the connection in 1) comes now and calls client_rpc_notify() which marks the connection to be offline even when the connection is up. This is happening because I/O can retrigger connection before disconnect notification is sent to the higher layers in rpc. Fix: Notify the higher layers that a disconnect happened and then go ahead with reconnect logic. For the logs which point to the information above check: https://bugzilla.redhat.com/show_bug.cgi?id=1386626#c1 Thanks to Raghavendra G for suggesting the correct fix. BUG: 1386626 Change-Id: I3c84ba1f17010bd69049fa88ec5f0ae431f8cda9 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15681 NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Niels de Vos <ndevos> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp>
REVIEW: http://review.gluster.org/15838 (Revert "rpc: Fix the race between notification and reconnection") posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/15838 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 922af37e33acb59c6f8beca40a9314593b484d1a Author: Pranith Kumar Karampuri <pkarampu> Date: Sun Nov 13 06:53:15 2016 -0700 Revert "rpc: Fix the race between notification and reconnection" This reverts commit a6b63e11b7758cf1bfcb67985e25ec02845f0995. Nithya and Rajesh found that the mount fails sometimes after this patch was merged so reverting it. BUG: 1386626 Change-Id: I959a5b6c7da61368cf4c67c98193c6e8fdd1755d Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15838 Reviewed-by: N Balachandran <nbalacha> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org>
REVIEW: http://review.gluster.org/15911 (rpc: test for race between rpc and p/c) posted (#1) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15911 (rpc: test for race between rpc and p/c) posted (#2) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15911 (rpc: test for race between rpc and p/c) posted (#3) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15915 (rpc: test and fix for race between rpc and p/c) posted (#1) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15911 (rpc: test for race between rpc and p/c) posted (#4) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#1) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#2) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#3) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#4) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#5) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#6) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#7) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#8) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#9) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#10) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#11) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#12) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#13) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#14) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#15) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#16) for review on master by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/16017 (rpc: Fix race between disconnect and connect notify) posted (#1) for review on master by Rajesh Joseph (rjoseph)
REVIEW: http://review.gluster.org/16017 (rpc: Fix race between disconnect and connect notify) posted (#2) for review on master by Rajesh Joseph (rjoseph)
REVIEW: http://review.gluster.org/15916 (rpc: fix for race between rpc and protocol/client) posted (#17) for review on master by Rajesh Joseph (rjoseph)
COMMIT: http://review.gluster.org/15916 committed in master by Atin Mukherjee (amukherj) ------ commit aa22f24f5db7659387704998ae01520708869873 Author: Rajesh Joseph <rjoseph> Date: Sat Dec 3 01:10:51 2016 +0530 rpc: fix for race between rpc and protocol/client It is possible that the notification thread which notifies protocol/client layer about the disconnection is put to sleep and meanwhile, a fuse thread or a timer thread initiates and completes reconnection to the brick. The notification thread is then woken up and protocol/client layer updates its flags to indicate that network is disconnected. No reconnection is initiated because reconnection is rpc-lib layer's responsibility and its flags indicate that connection is connected. Fix: Serialize connect and disconnect notify Credit: Raghavendra Talur <rtalur> Change-Id: I8ff5d1a3283b47f5c26848a42016a40bc34ffc1d BUG: 1386626 Signed-off-by: Rajesh Joseph <rjoseph> Reviewed-on: http://review.gluster.org/15916 Reviewed-by: Raghavendra G <rgowdapp> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra Talur <rtalur> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/