Description of problem: on glusterd initial process, brick started always a error with "EPOLLERR - disconnecting now", and this error will occasionaly lead to start 2 times of glusterfsd, then will lead to glustershd can't work normal Version-Release number of selected component (if applicable): How reproducible: setup a glusterfs with 4 volume with name "log","export","service","ccs" replated with 2 SN, then start the 4 volume, and stop the glusterd and all related process/services Steps to Reproduce: 1. start glusterd services or run command "glusterd -p /somewhere/glusterd.pid" 2. check the log of "glusterd.log" Actual results: Line 625: [2018-04-15 18:05:59.964454] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/ccs/brick Line 629: [2018-04-15 18:06:00.117043] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/export/brick Line 632: [2018-04-15 18:06:00.277399] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/log/brick Line 635: [2018-04-15 18:06:00.379539] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/mstate/brick Line 639: [2018-04-15 18:06:00.481778] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/services/brick [2018-04-15 18:06:00.651825] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now Line 647: [2018-04-15 18:06:00.652182] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/export/brick has disconnected from glusterd. [2018-04-15 18:06:00.652545] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now Line 649: [2018-04-15 18:06:00.652850] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/log/brick has disconnected from glusterd. [2018-04-15 18:06:00.656017] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now Line 658: [2018-04-15 18:06:00.763176] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/mstate/brick has disconnected from glusterd. Line 660: [2018-04-15 18:06:00.763845] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/services/brick has disconnected from glusterd. Line 661: [2018-04-15 18:06:00.764239] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/log/brick Line 664: [2018-04-15 18:06:00.866569] I [glusterd-utils.c:5928:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/mstate/brick [2018-04-15 18:06:00.763558] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now Line 670: [2018-04-15 18:06:00.980737] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/log/brick has disconnected from glusterd. [2018-04-15 18:06:00.980434] I [socket.c:2478:socket_event_handler] 0-transport: EPOLLERR - disconnecting now Line 677: [2018-04-15 18:06:01.088354] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick sn-0.local:/mnt/bricks/mstate/brick has disconnected from glusterd. Expected results: there should not have "EPOLLERR - disconnecting now" message, and brick should always disconnected from glusterd. Additional info:
Wouldn't https://review.gluster.org/#/c/20197/ fix this problem?
the root cause shoud be different for this issue. the issue not fixed by the patch with the above. There should not have "EPOLLERR - disconnecting now" message when gluster begin start, it is a risk , which will lead to glusterfsd with brick start twice times, and finally lead to glustershd can't correct work,
Mohit - I think you had a root cause around this problem which we saw in house in one of the setup while analyzing a problem in one of the setup having brick multiplexing configured. Could you update this bug with the root cause once you get some time?
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.
Any update?
On testing on latest upstream version, I don't see " 0-transport: EPOLLERR - disconnecting now " logs anymore. Thanks, Vishal Pandey
George, Can you try and reproduce this on the latest upstream version ? Thanks, Vishal Pandey
Can we make a decision on this issue ?
George, Can you try and reproduce this on the latest upstream version ?
@George Can you address the needinfo or else I will have to close the bug considering that its no more reproducible.
As it's no more reproducible, I'm closing the bug. Please feel free to reopen the bug, if the issue persists. Thanks, Vishal