Bug 1574298

Summary: on glusterd initial process, brick started always a error with "EPOLLERR - disconnecting now"
Product: [Community] GlusterFS Reporter: George <george.lian>
Component: glusterdAssignee: Vishal Pandey <vpandey>
Status: CLOSED NOTABUG QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bugs, george.lian, moagrawa
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-20 09:42:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description George 2018-05-03 03:15:45 UTC
Description of problem:
on glusterd initial process, brick started always a error with "EPOLLERR - disconnecting now", and this error will occasionaly lead to start 2 times of glusterfsd, then will lead to glustershd can't work normal

Version-Release number of selected component (if applicable):


How reproducible:

 setup a glusterfs with 4 volume with name "log","export","service","ccs" replated with 2 SN,
then start the 4 volume, and stop the glusterd and all related process/services

Steps to Reproduce:
1. start glusterd services or run command "glusterd -p /somewhere/glusterd.pid"
2. check the log of "glusterd.log"

Actual results:
	Line 625: [2018-04-15 18:05:59.964454] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/ccs/brick
    Line 629: [2018-04-15 18:06:00.117043] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/export/brick
	Line 632: [2018-04-15 18:06:00.277399] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/log/brick
	Line 635: [2018-04-15 18:06:00.379539] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/mstate/brick
	Line 639: [2018-04-15 18:06:00.481778] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/services/brick
			  [2018-04-15 18:06:00.651825] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
	Line 647: [2018-04-15 18:06:00.652182] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/export/brick has disconnected from glusterd.
              [2018-04-15 18:06:00.652545] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
	Line 649: [2018-04-15 18:06:00.652850] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/log/brick has disconnected from glusterd.
              [2018-04-15 18:06:00.656017] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
	Line 658: [2018-04-15 18:06:00.763176] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/mstate/brick has disconnected from glusterd.
	Line 660: [2018-04-15 18:06:00.763845] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/services/brick has disconnected from glusterd.
	Line 661: [2018-04-15 18:06:00.764239] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/log/brick
	Line 664: [2018-04-15 18:06:00.866569] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/mstate/brick
			  [2018-04-15 18:06:00.763558] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
	Line 670: [2018-04-15 18:06:00.980737] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/log/brick has disconnected from glusterd.
		      [2018-04-15 18:06:00.980434] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
	Line 677: [2018-04-15 18:06:01.088354] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/mstate/brick has disconnected from glusterd.

Expected results:
there should not have "EPOLLERR - disconnecting now" message, and brick should always disconnected from glusterd.

Additional info:

Comment 1 Atin Mukherjee 2018-07-02 02:53:45 UTC
Wouldn't https://review.gluster.org/#/c/20197/ fix this problem?

Comment 2 George 2018-07-10 01:14:41 UTC
the root cause shoud be different for this issue.
the issue not fixed by the patch with the above.

There should not have "EPOLLERR - disconnecting now" message when gluster begin start, it is a risk , which will lead to glusterfsd with brick start twice times, and finally lead to glustershd can't correct work,

Comment 3 Atin Mukherjee 2018-10-05 02:33:39 UTC
Mohit - I think you had a root cause around this problem which we saw in house in one of the setup while analyzing a problem in one of the setup having brick multiplexing configured. Could you update this bug with the root cause once you get some time?

Comment 4 Shyamsundar 2018-10-23 14:54:17 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 5 Amar Tumballi 2019-06-17 11:04:35 UTC
Any update?

Comment 7 Vishal Pandey 2019-07-22 07:51:33 UTC
On testing on latest upstream version, I don't see " 0-transport: EPOLLERR - disconnecting now " logs anymore.

Thanks,
Vishal Pandey

Comment 8 Vishal Pandey 2019-07-22 07:53:28 UTC
George, Can you try and reproduce this on the latest upstream version ?

Thanks,
Vishal Pandey

Comment 9 Vishal Pandey 2019-08-21 09:34:50 UTC
Can we make a decision on this issue ?

Comment 10 Vishal Pandey 2019-08-27 07:51:29 UTC
George, Can you try and reproduce this on the latest upstream version ?

Comment 11 Vishal Pandey 2019-09-10 13:20:39 UTC
@George Can you address the needinfo or else I will have to close the bug considering that its no more reproducible.

Comment 12 Vishal Pandey 2019-09-18 08:06:13 UTC
@George Can you address the needinfo or else I will have to close the bug considering that its no more reproducible.

Comment 13 Vishal Pandey 2019-09-20 09:42:26 UTC
As it's no more reproducible, I'm closing the bug. Please feel free to reopen the bug, if the issue persists.

Thanks,
Vishal