Hide Forgot
Description of problem: ====================== In an idle steady-state cluster, with eventing enabled and webhook configured as a listener, multiple CLIENT_CONNECT and CLIENT_DISCONNECT events are seen. And they repeat every minute (when timed). I do not have any bricks down, nor am I stopping/starting any service of any volume. An untouched cluster is generating CLIENT_CONNECT/DISCONNECT events at every heartbeat-of-a-minute. There must be some service that is getting restarted internally, but I have been unable to figure out more related to that service, as the pid that is shown in the event when checked on the corresponding node results in no match. After enabling debug level logs, we do see 'EOF on socket' in the brick logs. This behaviour looks to be on similar lines of BZ 1397314 and BZ 1401162, as per Rafi. Presently, I am seeing these messages for distribute-replicate and arbiter volumes, but not on disperse. Further testing is required to confidently claim the same. Version-Release number of selected component (if applicable): =========================================================== 3.8.4-7 How reproducible: ================ I have 3 setups, and all of them are showing this behaviour Additional info: ================ [root@dhcp47-60 ~]# rpm -qa | grep gluster glusterfs-server-3.8.4-7.el7rhgs.x86_64 glusterfs-3.8.4-7.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch vdsm-gluster-4.17.33-1.el7rhgs.noarch glusterfs-libs-3.8.4-7.el7rhgs.x86_64 glusterfs-fuse-3.8.4-7.el7rhgs.x86_64 python-gluster-3.8.4-7.el7rhgs.noarch glusterfs-events-3.8.4-7.el7rhgs.x86_64 glusterfs-api-3.8.4-7.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-7.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-7.el7rhgs.x86_64 gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 glusterfs-cli-3.8.4-7.el7rhgs.x86_64 [root@dhcp47-60 ~]# [root@dhcp47-60 ~]# gluster peer status Number of Peers: 3 Hostname: 10.70.47.61 Uuid: f4b259db-7add-4d01-bb5e-3c7f9c077bb4 State: Peer in Cluster (Connected) Hostname: 10.70.47.26 Uuid: 95c24075-02aa-49c1-a1e4-c7e0775e7128 State: Peer in Cluster (Connected) Hostname: 10.70.47.27 Uuid: 8d1aaf3a-059e-41c2-871b-6c7f5c0dd90b State: Peer in Cluster (Connected) [root@dhcp47-60 ~]# [root@dhcp47-60 ~]# Brick logs: ---------- [2016-12-12 09:51:29.896808] D [socket.c:559:__socket_rwv] 0-tcp.testvol-server: EOF on socket [2016-12-12 09:51:29.896859] D [socket.c:2403:socket_event_handler] 0-transport: disconnecting now [2016-12-12 09:51:29.896898] D [logging.c:1953:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk [2016-12-12 09:51:29.829919] D [MSGID: 0] [io-threads.c:353:iot_schedule] 0-testvol-io-threads: LOOKUP scheduled as fast fop [2016-12-12 09:51:29.896896] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-testvol-server: disconnecting connection from dhcp47-27.lab.eng.blr.redhat.com-23712-2016/12/12-09:51:29:685953-testvol-client-0-0-0 [2016-12-12 09:51:29.896987] D [client_t.c:328:gf_client_ref] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_disconnect+0x10f) [0x7f1643c1c4cf] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x7bce) [0x7f162f11dbce] -->/lib64/libglusterfs.so.0(gf_client_ref+0x6b) [0x7f1643eb124b] ) 0-client_t: dhcp47-27.lab.eng.blr.redhat.com-23712-2016/12/12-09:51:29:685953-testvol-client-0-0-0: ref-count 2 [2016-12-12 09:51:29.897151] D [client_t.c:305:gf_client_put] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_disconnect+0x10f) [0x7f1643c1c4cf] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x7bdb) [0x7f162f11dbdb] -->/lib64/libglusterfs.so.0(gf_client_put+0x93) [0x7f1643eb17d3] ) 0-client_t: dhcp47-27.lab.eng.blr.redhat.com-23712-2016/12/12-09:51:29:685953-testvol-client-0-0-0: bind_ref: 0, ref: 2, unref: 1 [2016-12-12 09:51:29.897270] D [client_t.c:412:gf_client_unref] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_disconnect+0x10f) [0x7f1643c1c4cf] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x7bdb) [0x7f162f11dbdb] -->/lib64/libglusterfs.so.0(gf_client_unref+0x7c) [0x7f1643eb13cc] ) 0-client_t: dhcp47-27.lab.eng.blr.redhat.com-23712-2016/12/12-09:51:29:685953-testvol-client-0-0-0: ref-count 1 [2016-12-12 09:51:29.897314] D [MSGID: 0] [server-helpers.c:367:server_connection_cleanup] 0-testvol-server: Performing cleanup on 128 fdentries [2016-12-12 09:51:29.897628] D [client_t.c:412:gf_client_unref] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_disconnect+0x10f) [0x7f1643c1c4cf] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x7bef) [0x7f162f11dbef] -->/lib64/libglusterfs.so.0(gf_client_unref+0x7c) [0x7f1643eb13cc] ) 0-client_t: dhcp47-27.lab.eng.blr.redhat.com-23712-2016/12/12-09:51:29:685953-testvol-client-0-0-0: ref-count 0 [2016-12-12 09:51:29.897664] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-testvol-server: Shutting down connection dhcp47-27.lab.eng.blr.redhat.com-23712-2016/12/12-09:51:29:685953-testvol-client-0-0-0 [2016-12-12 09:51:29.897750] D [MSGID: 0] [event-epoll.c:587:event_dispatch_epoll_handler] 0-epoll: generation bumped on idx=14 from gen=649 to slot->gen=650, fd=23, slot->fd=23 [2016-12-12 09:51:59.676123] D [logging.c:1829:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present [2016-12-12 09:51:59.676272] D [logging.c:1791:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
This was not seen on the earlier builds, < 3.8.4-5 .
As requested (to confirm that this was not seen before), I downgraded my cluster setup to a build <3.8.4-5 (3.8.4-3 to be precise). Enabled eventing on all the nodes, and performed some random user driven operations on the volume. Events are seen as expected. And when the cluster is left idle, no CLIENT_CONNECT/DISCONNECT events are seen at all. This confirms that the issue raised in this BZ seems to have gotten in, in one of the interim builds. Further testing would be required to ascertain the exact build.