Description of problem: Brick process does not start automatically after reboot of a node. Failed to start bricks using start force. It worked after running it 2-3 times. Setup: 4 node cluster running 120 dist-rep [2 x 2] volumes. I have seen this issue with only 3 volumes, vol4, vol31, vol89. Though there are other volumes with bricks running on same node. Version-Release number of selected component (if applicable): 3.7.5-19 How reproducible: Not always, haven't faced this issue till 100 volume. Steps to Reproduce: 1. Reboot a Gluster node 2. Check the status of gluster volume Actual results: Brick process is not running Expected results: Brick process should come up after node reboot Additional info: Will attach sosreport and setup details
Neha, Could you attach sosreports from both the nodes for analysis ?
Here is the RCA: When a volume is started glusterd allocates brick from its portmap and the same is communicated to the brick process and the port gets persisted into the store. The same port is been referred for the entire volume life cycle for the brick. Now given if glusterd restarts, it first brings up all the daemons followed by the brick processes. In this case by the time brick process comes up with the persisted port there is no guarantee that the same wouldn't be consumed by other client processes and which has been a case here. When we hit this and took the netstat -nap | grep <port number> it indicated the port was consumed by gluster nfs process. I've a patch [1] which has a short term solution to this problem. [1] http://review.gluster.org/#/c/13865/2
The solution mentioned in #c4 breaks the production w.r.t existing ports configuration as admins will be strict about it. We have discussed and iterated over a solution where the client port range will be redefined such that it doesn't clash with brick ports and there would be a retry logic at volume start to be implemented if the previous attempts to start the brick fails with port already in use. Following patches track the changes: http://review.gluster.org/#/c/13998 http://review.gluster.org/#/c/14043
*** Bug 1332561 has been marked as a duplicate of this bug. ***
Verified this bug using the build "glusterfs-3.7.9-4" The Fix solves the port clash and issue still exist when number of volumes scaled to 400 followed by node reboot will not start the some of the volume bricks because of lack multi threaded e-poll support in GlusterD. Comment from dev team: ---------------------- This looks like a different issue and is because of lack multi threaded e-poll support in GlusterD. [2016-05-13 08:16:04.924247] E [socket.c:2393:socket_connect_finish] 0-glusterfs: connection to 10.70.36.45:24007 failed (Connection timed out) [2016-05-13 08:16:05.128728] E [glusterfsd-mgmt.c:1907:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: rhs-client21.lab.eng.blr.redhat.com (Transport endpoint is not connected) [2016-05-13 08:16:05.340730] I [glusterfsd-mgmt.c:1913:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile From the above log (especially the first one) this indicates that the same brick process failed to connect to glusterd and the connection got timed out. This can happen in a situation where there is lot of back pressure on the other side. Since GlusterD is limited to a single threaded e-poll communication with the brick processes happen over a single path and hence while glusterd tried to start 400 odd brick processes there were 400 RPC connections to handle and that's why few of the brick process got to hear from GlusterD and they came up but others did not. Given that this bug targets to solve the port clash I'd take it that the fix works and would like to see the bug marked as Verified and you can file a different bug with the logs snippet and I can update the bugzilla with the comment marking it as a known issue and take it out of 3.1.3 and the priority of the bug can be really low since technically we have memory limitations with this number of scale. Do let me know if that makes sense. Based on above details moving bug to verified state and i will file a new bug.
New bug raised as per the comment from Dev https://bugzilla.redhat.com/show_bug.cgi?id=1336267 to track the issue.
IMHO, this doesn't need a doc text.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240