Bug 1377386

Summary:	glusterd experiencing repeated connect/disconnect messages when shd is down
Product:	[Community] GlusterFS	Reporter:	Atin Mukherjee <amukherj>
Component:	protocol	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.9	CC:	amukherj, bugs, jdarcy, kaushal, ravishankar, rgowdapp
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.9.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1372356
Clones:	1378130 (view as bug list)		Environment:
Last Closed:	2016-12-06 05:59:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1372356, 1377387
Bug Blocks:	1373723, 1378130

Description Atin Mukherjee 2016-09-19 14:31:59 UTC

+++ This bug was initially created as a clone of Bug #1372356 +++

Description of problem:
Found this while testing the afr eventing framework patch:

On a 1x2 replicate volume, single node setup, when I kill glustershd, I see glusterd sending events toggling repeatedly between connected and disconnected.

2016-09-01 16:35:20  SVC_DISCONNECTED e3876d55-89ab-406b-9a49-e32899f70e5d svc_name=glustershd
2016-09-01 16:35:21  SVC_CONNECTED e3876d55-89ab-406b-9a49-e32899f70e5d svc_name=glustershd
2016-09-01 16:35:21  SVC_DISCONNECTED e3876d55-89ab-406b-9a49-e32899f70e5d svc_name=glustershd
2016-09-01 16:35:24  SVC_CONNECTED e3876d55-89ab-406b-9a49-e32899f70e5d svc_name=glustershd
2016-09-01 16:35:24  SVC_DISCONNECTED e3876d55-89ab-406b-9a49-e32899f70e5d svc_name=glustershd
2016-09-01 16:35:27  SVC_CONNECTED e3876d55-89ab-406b-9a49-e32899f70e5d svc_name=glustershd
2016-09-01 16:35:27  SVC_DISCONNECTED e3876d55-89ab-406b-9a49-e32899f70e5d svc_name=glustershd


How reproducible:
Always.

Steps to Reproduce:
1. Start glusterd in debug mode
2. Create a 1x2 replica volume and start it.
3. Kill the self-heal daemon
4. From the glusterd logs:


2016-09-01 13:37:59.915866] D [socket.c:3058:socket_connect] 0-glustershd: connection attempt on /var/run/gluster/628f74c872e682fe7c5003e222cab86a.socket failed, (Connection refused)
[2016-09-01 13:37:59.916053] D [MSGID: 0] [glusterd-svc-mgmt.c:318:glusterd_svc_common_rpc_notify] 0-management: glustershd has connected with glusterd.
[2016-09-01 13:37:59.916268] D [socket.c:2384:socket_event_handler] 0-transport: disconnecting now
[2016-09-01 13:37:59.917535] D [rpc-clnt-ping.c:93:rpc_clnt_remove_ping_timer_locked] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1e9)[0x7f9e23ab4191] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0xc5)[0x7f9e2387fe02] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xcb)[0x7f9e23879ba9] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x1aa)[0x7f9e2387a689] (--> /usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x10f)[0x7f9e23876b70] ))))) 0-: /var/run/gluster/628f74c872e682fe7c5003e222cab86a.socket: ping timer event already removed
[2016-09-01 13:37:59.917649] I [MSGID: 106006] [glusterd-svc-mgmt.c:327:glusterd_svc_common_rpc_notify] 0-management: glustershd has disconnected from glusterd.
[2016-09-01 13:37:59.917802] D [MSGID: 0] [event-epoll.c:587:event_dispatch_epoll_handler] 0-epoll: generation bumped on idx=2 from gen=13 to slot->gen=14, fd=6, slot->fd=6








[2016-09-01 13:38:02.916508] D [socket.c:3058:socket_connect] 0-glustershd: connection attempt on /var/run/gluster/628f74c872e682fe7c5003e222cab86a.socket failed, (Connection refused)
[2016-09-01 13:38:02.916567] D [MSGID: 0] [glusterd-svc-mgmt.c:318:glusterd_svc_common_rpc_notify] 0-management: glustershd has connected with glusterd.
[2016-09-01 13:38:02.916667] D [socket.c:2384:socket_event_handler] 0-transport: disconnecting now
[2016-09-01 13:38:02.916811] D [rpc-clnt-ping.c:93:rpc_clnt_remove_ping_timer_locked] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1e9)[0x7f9e23ab4191] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0xc5)[0x7f9e2387fe02] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xcb)[0x7f9e23879ba9] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x1aa)[0x7f9e2387a689] (--> /usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x10f)[0x7f9e23876b70] ))))) 0-: /var/run/gluster/628f74c872e682fe7c5003e222cab86a.socket: ping timer event already removed
[2016-09-01 13:38:02.916830] I [MSGID: 106006] [glusterd-svc-mgmt.c:327:glusterd_svc_common_rpc_notify] 0-management: glustershd has disconnected from glusterd.
[2016-09-01 13:38:02.916888] D [MSGID: 0] [event-epoll.c:587:event_dispatch_epoll_handler] 0-epoll: generation bumped on idx=2 from gen=16 to slot->gen=17, fd=6, slot->fd=6













[2016-09-01 13:38:05.917660] D [socket.c:3058:socket_connect] 0-glustershd: connection attempt on /var/run/gluster/628f74c872e682fe7c5003e222cab86a.socket failed, (Connection refused)
[2016-09-01 13:38:05.917720] D [MSGID: 0] [glusterd-svc-mgmt.c:318:glusterd_svc_common_rpc_notify] 0-management: glustershd has connected with glusterd.
[2016-09-01 13:38:05.917940] D [socket.c:2384:socket_event_handler] 0-transport: disconnecting now
[2016-09-01 13:38:05.918274] D [rpc-clnt-ping.c:93:rpc_clnt_remove_ping_timer_locked] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1e9)[0x7f9e23ab4191] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0xc5)[0x7f9e2387fe02] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xcb)[0x7f9e23879ba9] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x1aa)[0x7f9e2387a689] (--> /usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x10f)[0x7f9e23876b70] ))))) 0-: /var/run/gluster/628f74c872e682fe7c5003e222cab86a.socket: ping timer event already removed
[2016-09-01 13:38:05.918299] I [MSGID: 106006] [glusterd-svc-mgmt.c:327:glusterd_svc_common_rpc_notify] 0-management: glustershd has disconnected from glusterd.
[2016-09-01 13:38:05.918467] D [MSGID: 0] [event-epoll.c:587:event_dispatch_epoll_handler] 0-epoll: generation bumped on idx=2 from gen=19 to slot->gen=20, fd=6, slot->fd=6

--- Additional comment from Raghavendra G on 2016-09-06 01:35:17 EDT ---

Analysis from Atin:

<analysis>

I am seeing glusterd is receiving RPC_CLNT_CONNECT in glusterd_svc_common_rpc_notify () even though shd service is not up (note - this is applicable for all daemon services including brick process). In further debugging what I found is socket_connect_finish () calls __socket_connect_finish () which returns 0 which means its able to connect to the socket and hence RPC_CLNT_CONNECT event is sent to the transport. Is this right? My understanding is __socket_connect_finish () shouldn't be returning 0 here.

</analysis>

--- Additional comment from Raghavendra G on 2016-09-06 01:40:15 EDT ---

(In reply to Raghavendra G from comment #1)
> Analysis from Atin:
> 
> <analysis>
> 
> I am seeing glusterd is receiving RPC_CLNT_CONNECT in
> glusterd_svc_common_rpc_notify () even though shd service is not up (note -
> this is applicable for all daemon services including brick process). In
> further debugging what I found is socket_connect_finish () calls
> __socket_connect_finish () which returns 0 which means its able to connect
> to the socket and hence RPC_CLNT_CONNECT event is sent to the transport. Is
> this right? My understanding is __socket_connect_finish () shouldn't be
> returning 0 here.

Yes. __socket_connect_finish (or getsockopt (socket, SOL_SOCKET, SO_ERROR)  specifically) should've returned ECONNREFUSED as shd is down. I am not sure why this is not the case. Wondering whether there is an issue in TCP/IP (socket) layer or the way we are using getsockopt.

> 
> </analysis>

--- Additional comment from Raghavendra G on 2016-09-06 01:42:47 EDT ---

> 
> Yes. __socket_connect_finish (or getsockopt (socket, SOL_SOCKET, SO_ERROR) 
> specifically) should've returned ECONNREFUSED as shd is down. I am not sure
> why this is not the case. Wondering whether there is an issue in TCP/IP
> (socket) layer or the way we are using getsockopt.

Is it possible to confirm that no other process is listening on same address (IP:PORT) as shd?

--- Additional comment from Ravishankar N on 2016-09-06 04:29:09 EDT ---

Looked into the issue with Du, saw that the problem is in gluster rpc code (no other processes listening on same ip/port).

--- Additional comment from Raghavendra G on 2016-09-06 05:45:55 EDT ---

Following is the control flow we observed:

* connect(2) returned -1 with ECONNREFUSED.
* A shutdown is called on socket, with the intention of processing the failure in poller thread (why error has to be processed in poller thread when connect has failed with an error other than EAGAIN/EBUSY/EINPROGRESS?).
* socket is added for polling.
* socket_event_handle is called with all three - pollin, pollout and pollerr - events set.
* As this is the pollin event received when priv->connected is 0, socket_connect_finish is called, which in turn doesn't find any error set on socket. getsockopt not returning any error might be valid as connect had _failed_ with errno other than EINPROGRESS. connection is deemed successful and higher layers are notified with RPC_CONNECT event.
* On further execution of socket_event_handle, pollerr is processed and a RPC_DISCONNECT event is sent to higher layer.

Some things to ponder about:
* why did we receive a pollin/pollout event on socket when connect has failed clearly?
* why error has to be processed in poller thread when connect has failed with an error other than EAGAIN/EBUSY/EINPROGRESS?

--- Additional comment from Kaushal on 2016-09-07 03:37:53 EDT ---

(In reply to Raghavendra G from comment #5)
> Following is the control flow we observed:
> 
> * connect(2) returned -1 with ECONNREFUSED.
> * A shutdown is called on socket, with the intention of processing the
> failure in poller thread (why error has to be processed in poller thread
> when connect has failed with an error other than EAGAIN/EBUSY/EINPROGRESS?).
> * socket is added for polling.
> * socket_event_handle is called with all three - pollin, pollout and pollerr
> - events set.
> * As this is the pollin event received when priv->connected is 0,
> socket_connect_finish is called, which in turn doesn't find any error set on
> socket. getsockopt not returning any error might be valid as connect had
> _failed_ with errno other than EINPROGRESS. connection is deemed successful
> and higher layers are notified with RPC_CONNECT event.
> * On further execution of socket_event_handle, pollerr is processed and a
> RPC_DISCONNECT event is sent to higher layer.
> 
> Some things to ponder about:
> * why did we receive a pollin/pollout event on socket when connect has
> failed clearly?

This is what is surprising to me as well. Is it possible that we got these events because someone is using this socket already?

> * why error has to be processed in poller thread when connect has failed
> with an error other than EAGAIN/EBUSY/EINPROGRESS?

--- Additional comment from Raghavendra G on 2016-09-07 08:23:22 EDT ---

May be reading through http://www.faqs.org/faqs/unix-faq/socket/ might help us answer some of the questions raised above.

--- Additional comment from Worker Ant on 2016-09-09 03:49:10 EDT ---

REVIEW: http://review.gluster.org/15440 (socket: pollerr event shouldn't trigger socket_connnect_finish) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2016-09-09 03:51:58 EDT ---

REVIEW: http://review.gluster.org/15440 (socket: pollerr event shouldn't trigger socket_connnect_finish) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2016-09-11 12:09:53 EDT ---

REVIEW: http://review.gluster.org/15440 (socket: pollerr event shouldn't trigger socket_connnect_finish) posted (#3) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Jeff Darcy on 2016-09-13 21:02:25 EDT ---

How about if we pretend that SO_ERROR doesn't exist?  (As a general rule, something that can't be trusted might as well not exist.)  There must be some other way we can tell whether the socket is actually connected, and generate the correct events or state changes.  Does POLLIN|POLLERR occur in any other cases?  Would an immediate non-blocking read give us ENOTCONN vs. EWOULDBLOCK to distinguish these cases?

--- Additional comment from Worker Ant on 2016-09-15 14:26:29 EDT ---

REVIEW: http://review.gluster.org/15440 (socket: pollerr event shouldn't trigger socket_connnect_finish) posted (#4) for review on master by Shyamsundar Ranganathan (srangana)

--- Additional comment from Worker Ant on 2016-09-16 00:19:48 EDT ---

REVIEW: http://review.gluster.org/15440 (socket: pollerr event shouldn't trigger socket_connnect_finish) posted (#5) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2016-09-19 09:51:14 EDT ---

COMMIT: http://review.gluster.org/15440 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit 12ae3c448c8435decdff27643d5785698ac19dff
Author: Atin Mukherjee <amukherj>
Date:   Thu Sep 8 11:33:59 2016 +0530

    socket: pollerr event shouldn't trigger socket_connnect_finish
    
    If connect fails with any other error than EINPROGRESS we cannot get
    the error status using getsockopt (... SO_ERROR ... ). Hence we need
    to remember the state of connect and take appropriate action in the
    event_handler for the same.
    
    As an added note, a event can come where poll_err is HUP and we have
    poll_in as well (i.e some status was written to the socket), so for
    such cases we need to finish the connect, process the data and then
    the poll_err as is the case in the current code.
    
    Special thanks to Kaushal M & Raghavendra G for figuring out the issue.
    
    Change-Id: Ic45ad59ff8ab1d0a9d2cab2c924ad940b9d38528
    BUG: 1372356
    Signed-off-by: Atin Mukherjee <amukherj>
    Signed-off-by: Shyam <srangana>
    Reviewed-on: http://review.gluster.org/15440
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 1 Worker Ant 2016-09-19 14:32:19 UTC

REVIEW: http://review.gluster.org/15533 (socket: pollerr event shouldn't trigger socket_connnect_finish) posted (#1) for review on release-3.9 by Atin Mukherjee (amukherj)

Comment 2 Worker Ant 2016-09-21 15:21:21 UTC

COMMIT: http://review.gluster.org/15533 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) 
------
commit b4ed6127675eeed74264a151132bc1bce8dd921c
Author: Atin Mukherjee <amukherj>
Date:   Thu Sep 8 11:33:59 2016 +0530

    socket: pollerr event shouldn't trigger socket_connnect_finish
    
    If connect fails with any other error than EINPROGRESS we cannot get
    the error status using getsockopt (... SO_ERROR ... ). Hence we need
    to remember the state of connect and take appropriate action in the
    event_handler for the same.
    
    As an added note, a event can come where poll_err is HUP and we have
    poll_in as well (i.e some status was written to the socket), so for
    such cases we need to finish the connect, process the data and then
    the poll_err as is the case in the current code.
    
    Special thanks to Kaushal M & Raghavendra G for figuring out the issue.
    
    >Signed-off-by: Shyam <srangana>
    >Reviewed-on: http://review.gluster.org/15440
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Raghavendra G <rgowdapp>
    
    Change-Id: Ic45ad59ff8ab1d0a9d2cab2c924ad940b9d38528
    BUG: 1377386
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/15533
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 3 Aravinda VK 2016-10-27 05:22:59 UTC

glusterfs-3.9.0rc2 is released[1] and packages are available for different distributions[2] to test.

[1] http://www.gluster.org/pipermail/maintainers/2016-October/001601.html
[2] http://www.gluster.org/pipermail/maintainers/2016-October/001605.html and http://www.gluster.org/pipermail/maintainers/2016-October/001606.html

Comment 4 Aravinda VK 2016-12-06 05:59:41 UTC

Gluster 3.9 GA is released http://blog.gluster.org/2016/11/announcing-gluster-3-9/