Bug 1409583 - seeing RPC status error messages and timeouts due to RPC (rpc-clnt.c:200:call_bail)
Summary: seeing RPC status error messages and timeouts due to RPC (rpc-clnt.c:200:call...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rpc
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Raghavendra G
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-02 14:28 UTC by Nag Pavan Chilakam
Modified: 2019-07-24 11:17 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-19 07:04:10 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2017-01-02 14:28:25 UTC
Description of problem:
======================
I am seeing regular RPC errors in client logs as below
(grep'ed for 10 line before this error)
cat mnt-sysvol.log|grep -10 "rpc-clnt.c:200:call_bail"


[2017-01-02 03:00:28.886513] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 03:00:28.889351] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 03:00:32.887765] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 03:00:32.890911] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 03:00:32.892896] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 03:00:32.895635] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 03:00:36.894053] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 03:00:36.897258] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 03:00:36.899105] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 03:00:36.901886] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 03:00:37.896300] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x6707913 sent = 2017-01-02 02:30:33.219921. timeout = 1800 for 10.70.37.154:49155
[2017-01-02 03:00:37.896322] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected]
[2017-01-02 03:00:37.896338] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event
[2017-01-02 03:00:40.900401] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 03:00:40.903765] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 03:00:40.905593] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 03:00:40.908382] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 03:00:44.906826] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 03:00:44.910119] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 03:00:44.911881] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 03:00:44.914766] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
--
[2017-01-02 04:19:46.360402] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 04:19:46.363308] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 04:19:50.361718] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 04:19:50.364437] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 04:19:50.366821] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 04:19:50.369633] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 04:19:54.368068] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 04:19:54.370783] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 04:19:54.373184] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 04:19:54.375991] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 04:19:57.370500] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x670ec09 sent = 2017-01-02 03:49:54.458390. timeout = 1800 for 10.70.37.154:49155
The message "I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up" repeated 4 times between [2017-01-02 04:19:17.893158] and [2017-01-02 04:19:18.004731]
[2017-01-02 04:19:57.370523] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected]
[2017-01-02 04:19:57.370548] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event
[2017-01-02 04:19:58.374478] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 04:19:58.377145] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 04:19:58.379687] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 04:19:58.382411] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 04:20:02.380655] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 04:20:02.383576] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 04:20:02.385874] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
--
[2017-01-02 06:45:17.109852] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 06:45:17.113002] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 06:45:19.113075] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 06:45:19.116245] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 06:45:21.116340] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 06:45:21.119498] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 06:45:23.119372] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 06:45:23.122532] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 06:45:25.122665] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 06:45:25.125873] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 06:45:26.121781] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x6723362 sent = 2017-01-02 06:15:24.006274. timeout = 1800 for 10.70.37.154:49155
[2017-01-02 06:45:26.121804] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected]
[2017-01-02 06:45:26.121819] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event
[2017-01-02 06:45:27.125656] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 06:45:27.128996] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 06:45:29.129153] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 06:45:29.132390] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 06:45:31.132021] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 06:45:31.135186] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 06:45:33.134970] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 06:45:33.138151] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
--
[2017-01-02 09:08:16.759057] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 09:08:16.761948] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 09:08:20.760621] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 09:08:20.764021] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 09:08:20.765694] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 09:08:20.768422] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 09:08:24.767113] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 09:08:24.769769] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 09:08:24.772095] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 09:08:24.774729] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 09:08:25.769182] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x672b56f sent = 2017-01-02 08:38:23.790728. timeout = 1800 for 10.70.37.154:49155
[2017-01-02 09:08:25.769219] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected]
[2017-01-02 09:08:25.769235] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event
[2017-01-02 09:08:28.773454] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 09:08:28.776500] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 09:08:28.778978] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 09:08:28.781829] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 09:08:32.780050] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 09:08:32.783239] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 09:08:32.785385] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 09:08:32.788273] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
--
[2017-01-02 10:11:51.781997] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 10:11:51.784901] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 10:11:55.783229] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 10:11:55.786308] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 10:11:55.788335] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 10:11:55.791174] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 10:11:59.789794] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 10:11:59.792624] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 10:11:59.794900] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 10:11:59.797707] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 10:12:01.791985] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x672f4d9 sent = 2017-01-02 09:41:54.111627. timeout = 1800 for 10.70.37.154:49155
[2017-01-02 10:12:01.792009] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected]
[2017-01-02 10:12:01.792026] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event
[2017-01-02 10:12:03.795819] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 10:12:03.799015] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 10:12:03.801042] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 10:12:03.803973] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 10:12:07.802268] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 10:12:07.805305] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 10:12:07.807326] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 10:12:07.810240] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
--
[2017-01-02 11:42:31.120356] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569089: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error)
[2017-01-02 11:42:31.175620] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569098: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error)
[2017-01-02 11:42:31.495298] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 11:42:31.498369] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 11:42:31.628664] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569218: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error)
[2017-01-02 11:42:31.786351] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569245: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error)
[2017-01-02 11:42:31.826784] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569257: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error)
[2017-01-02 11:42:32.074433] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569370: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error)
[2017-01-02 11:42:32.134013] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569399: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error)
[2017-01-02 11:42:32.179185] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569407: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error)
[2017-01-02 11:42:32.494446] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x6736e00 sent = 2017-01-02 11:12:23.467531. timeout = 1800 for 10.70.37.154:49155
The message "I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up" repeated 1362 times between [2017-01-02 11:41:28.520117] and [2017-01-02 11:42:32.179608]
[2017-01-02 11:42:32.494470] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected]
[2017-01-02 11:42:32.494496] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event
The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 4265027758" repeated 144 times between [2017-01-02 11:42:24.980818] and [2017-01-02 11:42:32.495501]
[2017-01-02 11:42:32.495583] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up
[2017-01-02 11:42:32.500994] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 4265027758
[2017-01-02 11:42:32.501046] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 96569525: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.73/level4.71 => -1 (Input/output error)
[2017-01-02 11:42:32.501302] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 4265027758
[2017-01-02 11:42:32.501380] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up
[2017-01-02 11:42:32.593385] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 4265027758
--
[2017-01-02 12:35:19.053243] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up
[2017-01-02 12:35:19.092150] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255
[2017-01-02 12:35:19.092280] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up
[2017-01-02 12:35:19.105660] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255
[2017-01-02 12:35:19.105699] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97141943: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Input/output error)
[2017-01-02 12:35:19.105986] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255
[2017-01-02 12:35:19.106079] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up
[2017-01-02 12:35:19.107014] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97141944: LOOKUP() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Stale file handle)
[2017-01-02 12:35:19.182980] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255
[2017-01-02 12:35:19.183100] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up
[2017-01-02 12:35:19.575401] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x6739ef3 sent = 2017-01-02 12:05:15.424404. timeout = 1800 for 10.70.37.154:49155
[2017-01-02 12:35:19.575425] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected]
[2017-01-02 12:35:19.575441] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event
[2017-01-02 12:35:19.607235] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97142053: LOOKUP() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Stale file handle)
[2017-01-02 12:35:19.607529] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255
[2017-01-02 12:35:19.607574] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97142055: MKDIR() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Input/output error)
[2017-01-02 12:35:19.607921] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255
[2017-01-02 12:35:19.608029] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up
[2017-01-02 12:35:19.609989] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 97142056: LOOKUP() /data/same-dir-creation/level1.1/level2.1/level3.74/level4.9 => -1 (Stale file handle)
[2017-01-02 12:35:19.629936] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-sysvol-dht: no subvolume for hash (value) = 795904255
[2017-01-02 12:35:19.630049] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 0-sysvol-replicate-1: no subvolumes up
--
[2017-01-02 13:26:52.480512] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 13:26:52.483345] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 13:26:56.481517] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 13:26:56.484584] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 13:26:56.486870] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 13:26:56.489610] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 13:27:00.487878] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 13:27:00.491545] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 13:27:00.493363] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 13:27:00.496340] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 13:27:02.490664] E [rpc-clnt.c:200:call_bail] 0-sysvol-client-7: bailing out frame type(GlusterFS Handshake) op(SETVOLUME(1)) xid = 0x673c554 sent = 2017-01-02 12:56:56.336913. timeout = 1800 for 10.70.37.154:49155
[2017-01-02 13:27:02.490687] W [MSGID: 114032] [client-handshake.c:1095:client_setvolume_cbk] 0-sysvol-client-7: received RPC status error [Transport endpoint is not connected]
[2017-01-02 13:27:02.490704] I [MSGID: 114051] [client-handshake.c:1266:client_setvolume_cbk] 0-sysvol-client-7: sending CHILD_CONNECTING event
[2017-01-02 13:27:04.494457] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 13:27:04.497549] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 13:27:04.499842] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 13:27:04.502622] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)
[2017-01-02 13:27:08.501615] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-5: changing port to 49155 (from 0)
[2017-01-02 13:27:08.503727] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 13:27:08.506014] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)
[2017-01-02 13:27:08.508811] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-2: connection to 10.70.35.156:49154 failed (Connection refused)





On my systemic setup, I am doing same path directory creation simultaneously from 3 different clients.
Each client used different server IP to mount the volume using fuse protocol
Also, Each client were dumping sosreports every 5min into the volume mount in a screen session, along with top output being appended to a file every minute
The dir-creations were happening from different users
Eg:
client1(el 7.2) was running the dir-creation using pavan@rhs-client23
client2(el 6.7) as root@rhs-client24
client3(el 7.3) as cli21@rhs-client21


Note: these logs are wrt client1 ie rhs-client23
Also, however note that I am able to access the mount




sosreports available at 
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23.lab.eng.blr.redhat.com/

test execution details available at https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=632186609

Version-Release number of selected component (if applicable):
============
3.8.4-10






other BZs for reference(raised wrt issues on same setup)
1409472 - brick crashed on systemic setup
1397907 - seeing frequent kernel hangs when doing operations both on fuse client and gluster nodes on replica volumes (edit) [NEEDINFO]
1409568 - seeing socket disconnects and transport endpoint not connected frequently on systemic setup 
1409572 - In fuse mount logs:seeing input/output error with split-brain observed logs and failing GETXATTR and STAT

1409580 - seeing stale file handle errors in fuse mount logs in systemic testing

Comment 2 Nag Pavan Chilakam 2017-01-04 07:12:45 UTC
client sosreports are available at scp -r /var/tmp/$HOSTNAME qe@rhsqe-repo:/var/www/html/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/

Comment 5 Raghavendra G 2017-09-01 09:18:14 UTC
[2017-01-02 13:27:04.497549] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-sysvol-client-2: changing port to 49154 (from 0)
[2017-01-02 13:27:04.499842] E [socket.c:2309:socket_connect_finish] 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection refused)

msgs like these indicate that brick is not running on the port given by portmapper. Either,

* Brick has crashed
* Portmapper has given a wrong entry.

Comment 6 Raghavendra G 2017-09-05 05:07:39 UTC
(In reply to Raghavendra G from comment #5)
> [2017-01-02 13:27:04.497549] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
> 0-sysvol-client-2: changing port to 49154 (from 0)
> [2017-01-02 13:27:04.499842] E [socket.c:2309:socket_connect_finish]
> 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection
> refused)
> 
> msgs like these indicate that brick is not running on the port given by
> portmapper. Either,
> 
> * Brick has crashed
> * Portmapper has given a wrong entry.

Had discussed an issue where portmapper gave stale ports with Atin earlier.

@Atin/@samikshan,

Can this be a stale port issue?

regards,
Raghavendra

Comment 7 Atin Mukherjee 2017-09-05 05:19:22 UTC
(In reply to Raghavendra G from comment #6)
> (In reply to Raghavendra G from comment #5)
> > [2017-01-02 13:27:04.497549] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
> > 0-sysvol-client-2: changing port to 49154 (from 0)
> > [2017-01-02 13:27:04.499842] E [socket.c:2309:socket_connect_finish]
> > 0-sysvol-client-5: connection to 10.70.37.86:49155 failed (Connection
> > refused)
> > 
> > msgs like these indicate that brick is not running on the port given by
> > portmapper. Either,
> > 
> > * Brick has crashed
> > * Portmapper has given a wrong entry.
> 
> Had discussed an issue where portmapper gave stale ports with Atin earlier.
> 
> @Atin/@samikshan,
> 
> Can this be a stale port issue?
> 
> regards,
> Raghavendra

We have a case where if a brick process is killed through SIGKILL and then brought back where the port gets assigned to is lesser than the old port then in portmap traversal glusterd will pick up the old port. To confirm this case, we need to have gluster volume status output. Do we have that data in place?

Comment 8 Raghavendra G 2017-09-05 05:30:51 UTC
Milind,

Can you check sosreports and get the volume status? I tried extracting sos-reports. But for some reason couldn't find the volume-status.

Comment 9 Milind Changire 2017-09-05 07:45:02 UTC
looked at sos report at:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23.lab.eng.blr.redhat.com/

but didn't find output of gluster volume status.

sos_logs/sos.log reports that it can't find the gluster cli to run the gluster volume status command

Comment 10 Atin Mukherjee 2017-09-05 08:40:40 UTC
(In reply to Milind Changire from comment #9)
> looked at sos report at:
> 
> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/
> systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23.
> lab.eng.blr.redhat.com/
> 
> but didn't find output of gluster volume status.
> 
> sos_logs/sos.log reports that it can't find the gluster cli to run the
> gluster volume status command

Please check the corresponding brick log file and see what port was passed to it at the last init

Comment 11 Milind Changire 2017-09-05 09:02:22 UTC
(In reply to Milind Changire from comment #9)
> looked at sos report at:
> 
> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/
> systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23.
> lab.eng.blr.redhat.com/
> 
> but didn't find output of gluster volume status.
> 
> sos_logs/sos.log reports that it can't find the gluster cli to run the
> gluster volume status command

My mistake; these are client side sos reports.

Comment 13 Atin Mukherjee 2018-11-10 07:03:21 UTC
I see no updates on this bug for more than a year. Did we have any conclusion on this issue? Was this seen in latest releases? What'd it take to close this bug?

Comment 14 Milind Changire 2018-11-19 11:00:16 UTC
requesting re-validaiton of BZ to Nag

Comment 15 Milind Changire 2018-11-19 11:01:12 UTC
see comment #14

Comment 18 Amar Tumballi 2019-07-19 07:04:10 UTC
Nag, feel free to open it if seen. Considering the issue is not seen from last 18+ months, would like to close for now.


Note You need to log in before you can comment on or make changes to this bug.