+++ This bug was initially created as a clone of Bug #1760261 +++ Description of problem: On a three node cluster with quorum enabled on a replicated volume. Performed add-brick, stopped glusterd on one node then started rebalance on the volume. gluster vol rebalance testvol start volume rebalance: testvol: success: Rebalance on testvol has been started successfully. Use rebalance status command to check status of the rebalance process. ID: 86cfc8b1-1e24-4244-b8e0-6941f4684234 Rebalance start is succeeding when quorum is not met. Version-Release number of selected component (if applicable): glusterfs-server-6.0-15.el7rhgs.x86_64 How reproducible: 2/2 Steps to Reproduce: 1.On a three node cluster, create a 1X3 replicate volume 2. Set "cluster.server-quorum-type" as server and set the ratio to 90. 3. Performed add-brick(3 bricks) 4. stopped glusterd on one node. 5. perform rebalance start Actual results: gluster vol rebalance testvol start volume rebalance: testvol: success: Rebalance on testvol has been started successfully. Use rebalance status command to check status of the rebalance process. ID: 86cfc8b1-1e24-4244-b8e0-6941f4684234 rebalance start is successful when quorum not met Expected results: rebalance start should not succeed when quorum not met Additional info: #### gluster vol info [root@dhcp35-11 ~]# gluster vol info Volume Name: testvol Type: Distributed-Replicate Volume ID: c9822762-7dac-47bd-8645-9cfee3d02b00 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.70.35.11:/bricks/brick4/testvol Brick2: 10.70.35.7:/bricks/brick4/testvol Brick3: 10.70.35.73:/bricks/brick4/testvol Brick4: 10.70.35.73:/bricks/brick4/ht Brick5: 10.70.35.11:/bricks/brick4/ht Brick6: 10.70.35.7:/bricks/brick4/ht Options Reconfigured: cluster.server-quorum-type: server transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off cluster.server-quorum-ratio: 90 #### gluster vol status gluster vol status Status of volume: testvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.11:/bricks/brick4/testvol 49152 0 Y 11039 Brick 10.70.35.7:/bricks/brick4/testvol 49152 0 Y 27266 Brick 10.70.35.73:/bricks/brick4/testvol 49152 0 Y 10746 Brick 10.70.35.73:/bricks/brick4/ht 49153 0 Y 11028 Brick 10.70.35.11:/bricks/brick4/ht 49153 0 Y 11338 Brick 10.70.35.7:/bricks/brick4/ht 49153 0 Y 27551 Self-heal Daemon on localhost N/A N/A Y 11363 Self-heal Daemon on 10.70.35.73 N/A N/A Y 11053 Self-heal Daemon on dhcp35-7.lab.eng.blr.re dhat.com N/A N/A Y 27577 Task Status of Volume testvol ------------------------------------------------------------------------------ There are no active volume tasks #### After stopping glusterd on one node volume status [root@dhcp35-11 ~]# gluster vol status Status of volume: testvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.11:/bricks/brick4/testvol N/A N/A N N/A Brick 10.70.35.7:/bricks/brick4/testvol N/A N/A N N/A Brick 10.70.35.11:/bricks/brick4/ht N/A N/A N N/A Brick 10.70.35.7:/bricks/brick4/ht N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 11363 Self-heal Daemon on dhcp35-7.lab.eng.blr.re dhat.com N/A N/A Y 27577 Task Status of Volume testvol ------------------------------------------------------------------------------ There are no active volume tasks gluster vol rebalance testvol start volume rebalance: testvol: success: Rebalance on testvol has been started successfully. Use rebalance status command to check status of the rebalance process. ID: 86cfc8b1-1e24-4244-b8e0-6941f4684234 [root@dhcp35-11 ~]# gluster vol rebalance testvol status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- dhcp35-7.lab.eng.blr.redhat.com 0 0Bytes 0 0 0 failed 0:00:00 localhost 0 0Bytes 0 0 0 failed 0:00:00 volume rebalance: testvol: success ### glusterd log after stopping glusterd on one of the node [2019-10-10 09:19:00.361314] I [MSGID: 106004] [glusterd-handler.c:6521:__glusterd_peer_rpc_notify] 0-management: Peer <10.70.35.73> (<53117ee2-5182-42c6-8c74-26f43b075a0c>), in state <Peer in Cluster>, has disconnected from glusterd. [2019-10-10 09:19:00.361553] W [glusterd-locks.c:807:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24f6a) [0x7fe6a4b4df6a] -->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x2f790) [0x7fe6a4b58790] -->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0xf3883) [0x7fe6a4c1c883] ) 0-management: Lock for vol testvol not held [2019-10-10 09:19:00.361570] W [MSGID: 106117] [glusterd-handler.c:6542:__glusterd_peer_rpc_notify] 0-management: Lock not released for testvol [2019-10-10 09:19:00.361607] C [MSGID: 106002] [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume testvol. Stopping local bricks. [2019-10-10 09:19:00.361825] I [MSGID: 106542] [glusterd-utils.c:8775:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 11039 [2019-10-10 09:19:01.362068] I [socket.c:871:__socket_shutdown] 0-management: intentional socket shutdown(16) [2019-10-10 09:19:01.362680] I [MSGID: 106542] [glusterd-utils.c:8775:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 11338 [2019-10-10 09:19:02.362982] I [socket.c:871:__socket_shutdown] 0-management: intentional socket shutdown(20) [2019-10-10 09:19:02.363239] I [MSGID: 106143] [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick /bricks/brick4/testvol on port 49152 [2019-10-10 09:19:02.368590] I [MSGID: 106143] [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick /bricks/brick4/ht on port 49153 [2019-10-10 09:19:02.375567] I [MSGID: 106499] [glusterd-handler.c:4502:__glusterd_handle_status_volume] 0-management: Received status volume req for volume testvol [2019-10-10 09:19:25.717254] I [MSGID: 106539] [glusterd-utils.c:12461:glusterd_generate_and_set_task_id] 0-management: Generated task-id 86cfc8b1-1e24-4244-b8e0-6941f4684234 for key rebalance-id [2019-10-10 09:19:30.751060] I [rpc-clnt.c:1014:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2019-10-10 09:19:30.751284] E [MSGID: 106061] [glusterd-utils.c:11159:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index from rsp dict [2019-10-10 09:19:35.761694] E [MSGID: 106061] [glusterd-utils.c:11159:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index from rsp dict [2019-10-10 09:19:35.767505] I [MSGID: 106172] [glusterd-handshake.c:1085:__server_event_notify] 0-glusterd: received defrag status updated [2019-10-10 09:19:35.773243] I [MSGID: 106007] [glusterd-rebalance.c:153:__glusterd_defrag_notify] 0-management: Rebalance process for volume testvol has disconnected. [2019-10-10 09:19:39.436119] E [MSGID: 106061] [glusterd-utils.c:11159:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index from rsp dict [2019-10-10 09:19:39.436978] E [MSGID: 106061] [glusterd-utils.c:11159:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index from rsp dict [2019-10-10 09:31:36.682991] I [MSGID: 106488] [glusterd-handler.c:1564:__glusterd_handle_cli_get_volume] 0-management: Received get vol req [2019-10-10 09:31:36.684006] I [MSGID: 106488] [glusterd-handler.c:1564:__glusterd_handle_cli_get_volume] 0-management: Received get vol req --- Additional comment from RHEL Product and Program Management on 2019-10-10 15:06:22 IST --- This bug is automatically being proposed for the next minor release of Red Hat Gluster Storage by setting the release flag 'rhgs‑3.5.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Bala Konda Reddy M on 2019-10-10 15:16:01 IST --- Setup is in same state for further debugging. Ip: 10.70.35.11 credentials: root/1 Regards, Bala
REVIEW: https://review.gluster.org/23536 (glusterd: rebalance start should fail when quorum is not met) posted (#1) for review on master by Sanju Rakonde
REVIEW: https://review.gluster.org/23536 (glusterd: rebalance start should fail when quorum is not met) merged (#1) on master by Sanju Rakonde