Description of problem: ====================== I am seeing regular RPC errors in client logs as below (grep'ed for 10 line before this error) cat mnt-sysvol.log|grep -10 "rpc-clnt.c:200:call_bail" [2017-01-02 03:00:28.886513] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 03:00:28.889351] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 03:00:32.887765] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 03:00:32.890911] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 03:00:32.892896] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 03:00:32.895635] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 03:00:36.894053] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 03:00:36.897258] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 03:00:36.899105] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 03:00:36.901886] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 03:00:37.896300] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x6707913 sent = 2017-01-02 02:30:33.219921. timeout = 1800 for 10.70.37.154:49155 [2017-01-02 03:00:37.896322] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected] [2017-01-02 03:00:37.896338] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event [2017-01-02 03:00:40.900401] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 03:00:40.903765] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 03:00:40.905593] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 03:00:40.908382] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 03:00:44.906826] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 03:00:44.910119] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 03:00:44.911881] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 03:00:44.914766] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) -- [2017-01-02 04:19:46.360402] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 04:19:46.363308] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 04:19:50.361718] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 04:19:50.364437] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 04:19:50.366821] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 04:19:50.369633] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 04:19:54.368068] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 04:19:54.370783] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 04:19:54.373184] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 04:19:54.375991] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 04:19:57.370500] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x670ec09 sent = 2017-01-02 03:49:54.458390. timeout = 1800 for 10.70.37.154:49155 The message "I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up" repeated 4 times between [2017-01-02 04:19:17.893158] and [2017-01-02 04:19:18.004731] [2017-01-02 04:19:57.370523] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected] [2017-01-02 04:19:57.370548] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event [2017-01-02 04:19:58.374478] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 04:19:58.377145] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 04:19:58.379687] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 04:19:58.382411] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 04:20:02.380655] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 04:20:02.383576] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 04:20:02.385874] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) -- [2017-01-02 06:45:17.109852] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 06:45:17.113002] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 06:45:19.113075] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 06:45:19.116245] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 06:45:21.116340] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 06:45:21.119498] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 06:45:23.119372] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 06:45:23.122532] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 06:45:25.122665] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 06:45:25.125873] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 06:45:26.121781] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x6723362 sent = 2017-01-02 06:15:24.006274. timeout = 1800 for 10.70.37.154:49155 [2017-01-02 06:45:26.121804] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected] [2017-01-02 06:45:26.121819] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event [2017-01-02 06:45:27.125656] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 06:45:27.128996] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 06:45:29.129153] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 06:45:29.132390] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 06:45:31.132021] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 06:45:31.135186] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 06:45:33.134970] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 06:45:33.138151] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) -- [2017-01-02 09:08:16.759057] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 09:08:16.761948] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 09:08:20.760621] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 09:08:20.764021] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 09:08:20.765694] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 09:08:20.768422] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 09:08:24.767113] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 09:08:24.769769] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 09:08:24.772095] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 09:08:24.774729] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 09:08:25.769182] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x672b56f sent = 2017-01-02 08:38:23.790728. timeout = 1800 for 10.70.37.154:49155 [2017-01-02 09:08:25.769219] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected] [2017-01-02 09:08:25.769235] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event [2017-01-02 09:08:28.773454] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 09:08:28.776500] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 09:08:28.778978] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 09:08:28.781829] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 09:08:32.780050] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 09:08:32.783239] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 09:08:32.785385] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 09:08:32.788273] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) -- [2017-01-02 10:11:51.781997] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 10:11:51.784901] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 10:11:55.783229] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 10:11:55.786308] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 10:11:55.788335] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 10:11:55.791174] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 10:11:59.789794] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 10:11:59.792624] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 10:11:59.794900] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 10:11:59.797707] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 10:12:01.791985] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x672f4d9 sent = 2017-01-02 09:41:54.111627. timeout = 1800 for 10.70.37.154:49155 [2017-01-02 10:12:01.792009] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected] [2017-01-02 10:12:01.792026] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event [2017-01-02 10:12:03.795819] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 10:12:03.799015] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 10:12:03.801042] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 10:12:03.803973] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 10:12:07.802268] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 10:12:07.805305] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 10:12:07.807326] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 10:12:07.810240] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) -- [2017-01-02 11:42:31.120356] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569089: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error) [2017-01-02 11:42:31.175620] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569098: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error) [2017-01-02 11:42:31.495298] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 11:42:31.498369] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 11:42:31.628664] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569218: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error) [2017-01-02 11:42:31.786351] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569245: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error) [2017-01-02 11:42:31.826784] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569257: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error) [2017-01-02 11:42:32.074433] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569370: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error) [2017-01-02 11:42:32.134013] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569399: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error) [2017-01-02 11:42:32.179185] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569407: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error) [2017-01-02 11:42:32.494446] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x6736e00 sent = 2017-01-02 11:12:23.467531. timeout = 1800 for 10.70.37.154:49155 The message "I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up" repeated 1362 times between [2017-01-02 11:41:28.520117] and [2017-01-02 11:42:32.179608] [2017-01-02 11:42:32.494470] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected] [2017-01-02 11:42:32.494496] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 4265027758" repeated 144 times between [2017-01-02 11:42:24.980818] and [2017-01-02 11:42:32.495501] [2017-01-02 11:42:32.495583] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up [2017-01-02 11:42:32.500994] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 4265027758 [2017-01-02 11:42:32.501046] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569525: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error) [2017-01-02 11:42:32.501302] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 4265027758 [2017-01-02 11:42:32.501380] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up [2017-01-02 11:42:32.593385] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 4265027758 -- [2017-01-02 12:35:19.053243] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up [2017-01-02 12:35:19.092150] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255 [2017-01-02 12:35:19.092280] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up [2017-01-02 12:35:19.105660] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255 [2017-01-02 12:35:19.105699] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97141943: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Input/output error) [2017-01-02 12:35:19.105986] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255 [2017-01-02 12:35:19.106079] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up [2017-01-02 12:35:19.107014] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97141944: LOOKUP() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Stale file handle) [2017-01-02 12:35:19.182980] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255 [2017-01-02 12:35:19.183100] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up [2017-01-02 12:35:19.575401] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x6739ef3 sent = 2017-01-02 12:05:15.424404. timeout = 1800 for 10.70.37.154:49155 [2017-01-02 12:35:19.575425] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected] [2017-01-02 12:35:19.575441] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event [2017-01-02 12:35:19.607235] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97142053: LOOKUP() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Stale file handle) [2017-01-02 12:35:19.607529] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255 [2017-01-02 12:35:19.607574] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97142055: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Input/output error) [2017-01-02 12:35:19.607921] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255 [2017-01-02 12:35:19.608029] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up [2017-01-02 12:35:19.609989] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97142056: LOOKUP() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Stale file handle) [2017-01-02 12:35:19.629936] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255 [2017-01-02 12:35:19.630049] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up -- [2017-01-02 13:26:52.480512] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 13:26:52.483345] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 13:26:56.481517] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 13:26:56.484584] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 13:26:56.486870] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 13:26:56.489610] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 13:27:00.487878] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 13:27:00.491545] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 13:27:00.493363] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 13:27:00.496340] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 13:27:02.490664] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x673c554 sent = 2017-01-02 12:56:56.336913. timeout = 1800 for 10.70.37.154:49155 [2017-01-02 13:27:02.490687] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected] [2017-01-02 13:27:02.490704] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event [2017-01-02 13:27:04.494457] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 13:27:04.497549] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 13:27:04.499842] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 13:27:04.502622] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) [2017-01-02 13:27:08.501615] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0) [2017-01-02 13:27:08.503727] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 13:27:08.506014] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) [2017-01-02 13:27:08.508811] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused) On my systemic setup, I am doing same path directory creation simultaneously from 3 different clients. Each client used different server IP to mount the volume using fuse protocol Also, Each client were dumping sosreports every 5min into the volume mount in a screen session, along with top output being appended to a file every minute The dir-creations were happening from different users Eg: client1(el 7.2) was running the dir-creation using pavan@rhs-client23 client2(el 6.7) as root@rhs-client24 client3(el 7.3) as cli21@rhs-client21 Note: these logs are wrt client1 ie rhs-client23 Also, however note that I am able to access the mount sosreports available at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23.lab.eng.blr.redhat.com/ test execution details available at https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=632186609 Version-Release number of selected component (if applicable): ============ 3.8.4-10 other BZs for reference(raised wrt issues on same setup) 1409472 - brick crashed on systemic setup 1397907 - seeing frequent kernel hangs when doing operations both on fuse client and gluster nodes on replica volumes (edit) [NEEDINFO] 1409568 - seeing socket disconnects and transport endpoint not connected frequently on systemic setup 1409572 - In fuse mount logs:seeing input/output error with split-brain observed logs and failing GETXATTR and STAT 1409580 - seeing stale file handle errors in fuse mount logs in systemic testing
client sosreports are available at scp -r /var/tmp/$HOSTNAME qe@rhsqe-repo:/var/www/html/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/
[2017-01-02 13:27:04.497549] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0) [2017-01-02 13:27:04.499842] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused) msgs like these indicate that brick is not running on the port given by portmapper. Either, * Brick has crashed * Portmapper has given a wrong entry.
(In reply to Raghavendra G from comment #5) > [2017-01-02 13:27:04.497549] I [rpc-clnt.c:1965:rpc_clnt_reconfig] > 0-sysvol-client-2: changing port to 49154 (from 0) > [2017-01-02 13:27:04.499842] E [socket.c:2309:socket_connect_finish] > 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection > refused) > > msgs like these indicate that brick is not running on the port given by > portmapper. Either, > > * Brick has crashed > * Portmapper has given a wrong entry. Had discussed an issue where portmapper gave stale ports with Atin earlier. @Atin/@samikshan, Can this be a stale port issue? regards, Raghavendra
(In reply to Raghavendra G from comment #6) > (In reply to Raghavendra G from comment #5) > > [2017-01-02 13:27:04.497549] I [rpc-clnt.c:1965:rpc_clnt_reconfig] > > 0-sysvol-client-2: changing port to 49154 (from 0) > > [2017-01-02 13:27:04.499842] E [socket.c:2309:socket_connect_finish] > > 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection > > refused) > > > > msgs like these indicate that brick is not running on the port given by > > portmapper. Either, > > > > * Brick has crashed > > * Portmapper has given a wrong entry. > > Had discussed an issue where portmapper gave stale ports with Atin earlier. > > @Atin/@samikshan, > > Can this be a stale port issue? > > regards, > Raghavendra We have a case where if a brick process is killed through SIGKILL and then brought back where the port gets assigned to is lesser than the old port then in portmap traversal glusterd will pick up the old port. To confirm this case, we need to have gluster volume status output. Do we have that data in place?
Milind, Can you check sosreports and get the volume status? I tried extracting sos-reports. But for some reason couldn't find the volume-status.
looked at sos report at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23.lab.eng.blr.redhat.com/ but didn't find output of gluster volume status. sos_logs/sos.log reports that it can't find the gluster cli to run the gluster volume status command
(In reply to Milind Changire from comment #9) > looked at sos report at: > > http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/ > systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23. > lab.eng.blr.redhat.com/ > > but didn't find output of gluster volume status. > > sos_logs/sos.log reports that it can't find the gluster cli to run the > gluster volume status command Please check the corresponding brick log file and see what port was passed to it at the last init
(In reply to Milind Changire from comment #9) > looked at sos report at: > > http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/ > systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23. > lab.eng.blr.redhat.com/ > > but didn't find output of gluster volume status. > > sos_logs/sos.log reports that it can't find the gluster cli to run the > gluster volume status command My mistake; these are client side sos reports.
I see no updates on this bug for more than a year. Did we have any conclusion on this issue? Was this seen in latest releases? What'd it take to close this bug?
requesting re-validaiton of BZ to Nag
see comment #14
Nag, feel free to open it if seen. Considering the issue is not seen from last 18+ months, would like to close for now.