Bug 1537362

Summary:	glustershd/glusterd is not using right port when connecting to glusterfsd process
Product:	[Community] GlusterFS	Reporter:	Atin Mukherjee <amukherj>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, zz.sh.cynthia
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-v4.1.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1537346
Clones:	1543711 (view as bug list)		Environment:
Last Closed:	2018-06-20 17:57:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1537346, 1543711

Description Atin Mukherjee 2018-01-23 02:43:59 UTC

+++ This bug was initially created as a clone of Bug #1537346 +++

Description of problem:
sometimes after reboot one sn nodes 
The output of command “gluster v heal mstate info” shows
 [root@testsn-1:/var/log/glusterfs/bricks]
# gluster v heal mstate info
Brick testsn-0.local:/mnt/bricks/mstate/brick
/testas-0/var/lib/ntp/drift 
/testas-2/var/lib/ntp/drift 
/.install-done 
/testas-0/var/lib/ntp 
/testmn-1/var/lib/ntp 
/testas-2/var/lib/ntp 
/testmn-0/var/lib/ntp 
/testmn-1/var/lib/ntp/drift 
/testas-1/var/lib/ntp 
/testas-1/var/lib/ntp/drift 
/testmn-0/var/lib/ntp/drift 
Status: Connected
Number of entries: 11

Brick testsn-1.local:/mnt/bricks/mstate/brick
Status: Transport endpoint is not connected
Number of entries: -

glustershd can not connect to local brick process! when i check the glustershd process i  find it always fail when trying to connect to glusterfsd process with port 49155.
[2018-01-18 10:42:29.891811] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 0-mstate-client-1: changing port to 49155 (from 0)
[2018-01-18 10:42:29.892120] E [socket.c:2369:socket_connect_finish] 0-mstate-client-1: connection to 192.168.1.3:49155 failed (Connection refused); disconnecting 

however, from local mstate glusterfsd process, it is listenning on port 49153!

Version-Release number of selected component (if applicable):
glusterfs3.12.3

How reproducible:

reboot sn node
Steps to Reproduce:
1.reboot sn node
2.
3.

Actual results:
glustershd can not connected to one local glusterfsd brick process
this can be seen from the following netstat command output;
[root@testsn-1:/var/log/glusterfs/bricks]
# ps -ef | grep glustershd
root      1295     1  0 Jan18 ?        00:00:18 /usr/sbin/glusterfs -s testsn-1.local --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/178dba826edae38df4ba67f25beeb1e6.socket --xlator-option *replicate*.node-uuid=9ccea6b1-4d81-4020-a4ba-ee6821268ba8
root     19900 27911  0 04:10 pts/1    00:00:00 grep glustershd
[root@testsn-1:/var/log/glusterfs/bricks]
# netstat -p | grep 1295
tcp        0      0 testsn-1.local:49098    testsn-0.local:49154    ESTABLISHED 1295/glusterfs
tcp        0      0 testsn-1.local:49099    testsn-0.local:49152    ESTABLISHED 1295/glusterfs
tcp        0      0 testsn-1.local:49140    testsn-1.local:24007    ESTABLISHED 1295/glusterfs
tcp        0      0 testsn-1.local:49097    testsn-0.local:49153    ESTABLISHED 1295/glusterfs
tcp        0      0 testsn-1.local:49096    testsn-0.local:49155    ESTABLISHED 1295/glusterfs
tcp        0      0 testsn-1.local:49120    testsn-1.local:49156    ESTABLISHED 1295/glusterfs
tcp        0      0 testsn-1.local:49121    testsn-1.local:49152    ESTABLISHED 1295/glusterfs
tcp        0      0 testsn-1.local:49126    testsn-1.local:49154    ESTABLISHED 1295/glusterfs
unix  3      [ ]         STREAM     CONNECTED      36264 1295/glusterfs      /var/run/gluster/178dba826edae38df4ba67f25beeb1e6.socket
unix  2      [ ]         DGRAM                     36258 1295/glusterfs      

Expected results:
glustershd should be able to connected to local brick process

Additional info:
# gluster v status mstate
Status of volume: mstate
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick testsn-0.local:/mnt/bricks/mstate/bri
ck                                          49154     0          Y       1113 
Brick testsn-1.local:/mnt/bricks/mstate/bri
ck                                          49155     0          Y       1117 
Self-heal Daemon on localhost               N/A       N/A        Y       1295 
Self-heal Daemon on testsn-2.local          N/A       N/A        Y       1813 
Self-heal Daemon on testsn-0.local          N/A       N/A        Y       1135 	
 
Task Status of Volume mstate
------------------------------------------------------------------------------
There are no active volume tasks
 
It is quite strange that the mstate brick process listen port is showed as 49155 in “gluster v heal status mstate” but showed 49153 in ps command!
[root@testsn-1:/var/log/glusterfs/bricks]
# ps -ef | grep -i glusterfsd | grep mstate
root      1117     1  0 Jan18 ?        00:00:05 /usr/sbin/glusterfsd -s testsn-1.local --volfile-id mstate.testsn-1.local.mnt-bricks-mstate-brick -p /var/run/gluster/vols/mstate/testsn-1.local-mnt-bricks-mstate-brick.pid -S /var/run/gluster/b520b934b415e6a68776cc4852901a77.socket --brick-name /mnt/bricks/mstate/brick -l /var/log/glusterfs/bricks/mnt-bricks-mstate-brick.log --xlator-option *-posix.glusterd-uuid=9ccea6b1-4d81-4020-a4ba-ee6821268ba8 --brick-port 49153 --xlator-option mstate-server.listen-port=49153 --xlator-option transport.socket.bind-address=testsn-1.local

--- Additional comment from Worker Ant on 2018-01-22 21:29:06 EST ---

REVIEW: https://review.gluster.org/19263 (glusterd: process pmap sign in only when port is marked as free) posted (#3) for review on master by Atin Mukherjee

Comment 1 Worker Ant 2018-01-23 02:45:23 UTC

REVIEW: https://review.gluster.org/19263 (glusterd: process pmap sign in only when port is marked as free) posted (#4) for review on master by Atin Mukherjee

Comment 2 Worker Ant 2018-01-25 08:01:51 UTC

COMMIT: https://review.gluster.org/19263 committed in master by \"Atin Mukherjee\" <amukherj> with a commit message- glusterd: process pmap sign in only when port is marked as free

Because of some crazy race in volume start code path because of friend
handshaking with volumes with quorum enabled we might end up into a situation
where glusterd would start a brick and get a disconnect and then immediately try
to start the same brick instance based on another friend update request. And
then if for the very first brick even if the process doesn't come up at the end
sign in event gets sent and we end up having two duplicate portmap entries for
the same brick. Since in brick start we mark the previous port as free, its
better to consider a sign in request as no op if the corresponding port type is
marked as free.

Change-Id: I995c348c7b6988956d24b06bf3f09ab64280fc32
BUG: 1537362
Signed-off-by: Atin Mukherjee <amukherj>

Comment 3 zhou lin 2018-04-16 04:47:20 UTC

(In reply to Worker Ant from comment #2)
> COMMIT: https://review.gluster.org/19263 committed in master by \"Atin
> Mukherjee\" <amukherj> with a commit message- glusterd: process
> pmap sign in only when port is marked as free
> 
> Because of some crazy race in volume start code path because of friend
> handshaking with volumes with quorum enabled we might end up into a situation
> where glusterd would start a brick and get a disconnect and then immediately
> try
> to start the same brick instance based on another friend update request. And
> then if for the very first brick even if the process doesn't come up at the
> end
> sign in event gets sent and we end up having two duplicate portmap entries
> for
> the same brick. Since in brick start we mark the previous port as free, its
> better to consider a sign in request as no op if the corresponding port type
> is
> marked as free.
> 
> Change-Id: I995c348c7b6988956d24b06bf3f09ab64280fc32
> BUG: 1537362
> Signed-off-by: Atin Mukherjee <amukherj>

this issue appeared again, in my local test env

gluster volume heal log info
Brick sn-0.local:/mnt/bricks/log/brick
Status: Transport endpoint is not connected
Number of entries: -

Brick sn-1.local:/mnt/bricks/log/brick
/master/fsaudit/internal-ssh.log 
/ 
/master/fsaudit/auth.log 
/blackboxes 
/master/fsaudit/alarms 
/rcploglib/sn-0/DHAAgent.level 
/rcploglib/sn-0/DVMAgent.level 
/master/syslog 
Status: Connected
Number of entries: 8

[root@sn-0:/root]
#

gluster volume heal mstate info
Brick sn-0.local:/mnt/bricks/mstate/brick
Status: Transport endpoint is not connected
Number of entries: -

Brick sn-1.local:/mnt/bricks/mstate/brick
/ 
Status: Connected
Number of entries: 1

[root@sn-0:/root]
#ps aux | grep glustershd | grep -v grep | wc -l     glustershd process exist.
1
[root@sn-0:/root]
#

_gluster volume status log
Status of volume: log
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick sn-0.local:/mnt/bricks/log/brick      49157     0          Y       1402 
Brick sn-1.local:/mnt/bricks/log/brick      49155     0          Y       3817 
Self-heal Daemon on localhost               N/A       N/A        Y       1428 
Self-heal Daemon on sn-1.local              N/A       N/A        Y       20316
Self-heal Daemon on sn-2.local              N/A       N/A        Y       15565

Task Status of Volume log
------------------------------------------------------------------------------
There are no active volume tasks

[root@sn-0:/root]
#_gluster volume status mstate
Status of volume: mstate
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick sn-0.local:/mnt/bricks/mstate/brick   49158     0          Y       1403 
Brick sn-1.local:/mnt/bricks/mstate/brick   49153     0          Y       2967 
Self-heal Daemon on localhost               N/A       N/A        Y       1428 
Self-heal Daemon on sn-1.local              N/A       N/A        Y       20316
Self-heal Daemon on sn-2.local              N/A       N/A        Y       15565

Task Status of Volume mstate
------------------------------------------------------------------------------
There are no active volume tasks

[root@sn-0:/root]

Comment 4 zhou lin 2018-04-16 04:48:53 UTC

On sn-0 node, mstate and log glusterfsd process started twice the port in use is the first one, but glustershd keeps connecting using the latter one:
/////////////////log volume///////////////////////////////
[2018-04-15 18:06:00.280807] I [MSGID: 100030] [glusterfsd.c:2521:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.12.3 (args: /usr/sbin/glusterfsd -s sn-0.local --volfile-id log.sn-0.local.mnt-bricks-log-brick -p /var/run/gluster/vols/log/sn-0.local-mnt-bricks-log-brick.pid -S /var/run/gluster/0af76d9541921cccad34e5fd3041a44d.socket --brick-name /mnt/bricks/log/brick -l /var/log/glusterfs/bricks/mnt-bricks-log-brick.log --xlator-option *-posix.glusterd-uuid=ac000adb-afb9-4826-8dba-9e4a3b641068 --brick-port 49154 --xlator-option log-server.listen-port=49154 --xlator-option transport.socket.bind-address=sn-0.local)
[2018-04-15 18:06:00.767856] I [MSGID: 100030] [glusterfsd.c:2521:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.12.3 (args: /usr/sbin/glusterfsd -s sn-0.local --volfile-id log.sn-0.local.mnt-bricks-log-brick -p /var/run/gluster/vols/log/sn-0.local-mnt-bricks-log-brick.pid -S /var/run/gluster/0af76d9541921cccad34e5fd3041a44d.socket --brick-name /mnt/bricks/log/brick -l /var/log/glusterfs/bricks/mnt-bricks-log-brick.log --xlator-option *-posix.glusterd-uuid=ac000adb-afb9-4826-8dba-9e4a3b641068 --brick-port 49157 --xlator-option log-server.listen-port=49157 --xlator-option transport.socket.bind-address=sn-0.local)

The real working port is 49154 but glustershd keeps trying to connect to 49157 so it does not succeed
[2018-04-15 18:21:25.854249] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 0-log-client-0: changing port to 49157 (from 0)
[2018-04-15 18:21:25.854597] E [socket.c:2369:socket_connect_finish] 0-log-client-0: connection to 192.168.1.12:49157 failed (Connection refused); disconnecting socket

//////////////////////////// msate volume////////////////////////////
[2018-04-15 18:06:00.383103] I [MSGID: 100030] [glusterfsd.c:2521:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.12.3 (args: /usr/sbin/glusterfsd -s sn-0.local --volfile-id mstate.sn-0.local.mnt-bricks-mstate-brick -p /var/run/gluster/vols/mstate/sn-0.local-mnt-bricks-mstate-brick.pid -S /var/run/gluster/79d00394b1085adcf7dc1fd4c57f71bc.socket --brick-name /mnt/bricks/mstate/brick -l /var/log/glusterfs/bricks/mnt-bricks-mstate-brick.log --xlator-option *-posix.glusterd-uuid=ac000adb-afb9-4826-8dba-9e4a3b641068 --brick-port 49155 --xlator-option mstate-server.listen-port=49155 --xlator-option transport.socket.bind-address=sn-0.local)
[2018-04-15 18:06:00.869951] I [MSGID: 100030] [glusterfsd.c:2521:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.12.3 (args: /usr/sbin/glusterfsd -s sn-0.local --volfile-id mstate.sn-0.local.mnt-bricks-mstate-brick -p /var/run/gluster/vols/mstate/sn-0.local-mnt-bricks-mstate-brick.pid -S /var/run/gluster/79d00394b1085adcf7dc1fd4c57f71bc.socket --brick-name /mnt/bricks/mstate/brick -l /var/log/glusterfs/bricks/mnt-bricks-mstate-brick.log --xlator-option *-posix.glusterd-uuid=ac000adb-afb9-4826-8dba-9e4a3b641068 --brick-port 49158 --xlator-option mstate-server.listen-port=49158 --xlator-option transport.socket.bind-address=sn-0.local)

The real listening port for glusterfsd is 49155 but glustershd keeps trying to connect through 49158, so it always fail

[2018-04-15 18:21:33.855829] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 0-mstate-client-0: changing port to 49158 (from 0)
[2018-04-15 18:21:33.856229] E [socket.c:2369:socket_connect_finish] 0-mstate-client-0: connection to 192.168.1.12:49158 failed (Connection refused); disconnecting socket

Comment 5 Atin Mukherjee 2018-04-16 05:14:59 UTC

What branch of glusterfs are you running? If its a private branch have you pulled in this fix?

Comment 6 zhou lin 2018-04-17 01:34:48 UTC

the version i am using is glusterfs 3.12.3  i have added https://review.gluster.org/#/c/19263/
but occasionally this issue still exists i think the real root cause is that the glusterfsd process started twice and according to previous analysis

 I can see is every time when I restart all the gluster process (simulating a reboot scenario) I could see respective brick disconnect events being received at glusterd.

#0  __glusterd_brick_rpc_notify (rpc=rpc@entry=0x7fffe8008810, mydata=mydata@entry=0x7fffe800e0d0, 
    event=event@entry=RPC_CLNT_DISCONNECT, data=data@entry=0x0) at glusterd-handler.c:6007
#1  0x00007ffff2936169 in glusterd_big_locked_notify (rpc=0x7fffe8008810, mydata=0x7fffe800e0d0, event=RPC_CLNT_DISCONNECT, 
    data=0x0, notify_fn=0x7ffff2932440 <__glusterd_brick_rpc_notify>) at glusterd-handler.c:68
#2  0x00007ffff78c0343 in rpc_clnt_handle_disconnect (conn=0x7fffe8008840, clnt=0x7fffe8008810) at rpc-clnt.c:876
#3  rpc_clnt_notify (trans=0x7fffe8008ae0, mydata=0x7fffe8008840, event=RPC_TRANSPORT_DISCONNECT, data=<optimized out>)
    at rpc-clnt.c:939
#4  0x00007ffff78bca73 in rpc_transport_notify (this=this@entry=0x7fffe8008ae0, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=data@entry=0x7fffe8008ae0) at rpc-transport.c:545
#5  0x00007fffefc252bf in socket_event_poll_err (idx=<optimized out>, gen=<optimized out>, this=0x7fffe8008ae0) at socket.c:1210
#6  socket_event_handler (fd=9, idx=<optimized out>, gen=<optimized out>, data=0x7fffe8008ae0, poll_in=0, poll_out=<optimized out>, 
    poll_err=16) at socket.c:2488
#7  0x00007ffff7b5097c in event_dispatch_epoll_handler (event=0x7fffee9dce84, event_pool=0x64f040) at event-epoll.c:583
#8  event_dispatch_epoll_worker (data=0x70b820) at event-epoll.c:659
#9  0x00007ffff693f36d in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff61ecbbf in clone () from /lib64/libc.so.6
The question what I have for RPC team is - why do we see a disconnect event in this case?  Any pointers, Mohit/Raghu/Milind?
The side effect of this disconnect event is what caused this issue. In __glusterd_brick_rpc_notify we set the brickinfo->start_triggered to false. So when two glusterd_brick_start threads were racing with each other, in between we got a disconnect because of which this flag got reset and we ended up trying to spawn the same brick twice.

Comment 7 Shyamsundar 2018-06-20 17:57:42 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/