https://code.engineering.redhat.com/gerrit/143268 glusterd: show brick online after port registration https://code.engineering.redhat.com/gerrit/143269 glusterd: show brick online after port registration even in brick-mux
After node reboot "gluster-blockd" and "tcmu-runner" services are not coming up. Setup Details: -> Created block setup with 3 node gluster cluster -> Created 300 volumes -> For first 50 volumes , for each volume created 11 block devices, for other 50 volumes one block device each that means total around 600 block devices created. -> Out of 3 node cluster, one node is rebooted , once node is up "gluster-blockd" and tcmu-runner" services are not coming up. Please let me know your inputs to resolve this bug.
(In reply to Rajesh Madaka from comment #7) > After node reboot "gluster-blockd" and "tcmu-runner" services are not coming > up. > > Setup Details: > > -> Created block setup with 3 node gluster cluster > -> Created 300 volumes > > -> For first 50 volumes , for each volume created 11 block devices, for > other 50 volumes one block device each that means total around 600 block > devices created. > > -> Out of 3 node cluster, one node is rebooted , once node is up > "gluster-blockd" and tcmu-runner" services are not coming up. > > Please let me know your inputs to resolve this bug. This bug from gluster-blockd perspective can only be verified once https://bugzilla.redhat.com/show_bug.cgi?id=1598322 is also moved to ON_QA state. It is very difficult to verify it on fs alone. My advise would be to wait for the other bug to be fixed as well.
Build version : glusterfs-server-3.8.4-54.15 I have verified this bug through blocking process using gdb Below are the steps followed for verifying this bug: -> Brick-multiplex enabled on 3 node cluster -> Created and started some 300 volumes -> Created one more volume which is 301 but not started that volume -> Then blocked the brick process in gdb by following below commands for particular node -> gdb -p PID (pid of already started volume brick process) -> b glusterfs_handle_attach -> c for continuing -> then start the volume which is 301 , volume started successfully -> Then check the volume status, brick will be in offline state for that volume on the respective node( the process which is blocked in gdb for that particular node)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2222