Provide version-Release number of selected component (if applicable): ===================================================================== glusterfs-server-6.0-45.el7rhgs.x86_64 (3.5.3) && the HOTFIX build glusterfs-server-3.12.2-40.el7rhgs.2.HOTFIX.sfdc02762348.BZ1884244.x86_64 Have you searched the Bugzilla archives for same/similar issues reported. ========================================================================= yes Describe the issue:(please be detailed as possible and provide log snippets) [Provide TimeStamp when the issue is seen] ============================================================================== Steps - 1.create a arbiter volume 2.add-brick and rebalance 3.Wait for rebalance to complete (check if it shows completed on volume status) 4.replace brick 5.the volume status shows rebalance as not started logs ==== [root@dhcp35-100 ~]# gluster v create arbiter-vol replica 3 arbiter 1 dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test volume create: arbiter-vol: success: please start the volume to access data [root@dhcp35-100 ~]# gluster v start arbiter-vol volume start: arbiter-vol: success [root@dhcp35-100 ~]# gluster v status arbiter-vol Status of volume: arbiter-vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp35-202.lab.eng.blr.redhat.com:/gl uster/brick1/test 49175 0 Y 7908 Brick dhcp35-31.lab.eng.blr.redhat.com:/glu ster/brick1/test 49175 0 Y 29058 Brick dhcp35-104.lab.eng.blr.redhat.com:/gl uster/brick1/test 49175 0 Y 18812 Self-heal Daemon on localhost N/A N/A Y 3306 Self-heal Daemon on 10.70.35.31 N/A N/A Y 29201 Self-heal Daemon on 10.70.35.104 N/A N/A Y 18969 Self-heal Daemon on dhcp35-202.lab.eng.blr. redhat.com N/A N/A Y 7979 Task Status of Volume arbiter-vol ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-100 ~]# gluster v add-brick arbiter-vol dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-add dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test-add dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test-add volume add-brick: success [root@dhcp35-100 ~]# gluster v rebalance arbiter-vol start volume rebalance: arbiter-vol: success: Rebalance on arbiter-vol has been started successfully. Use rebalance status command to check status of the rebalance process. ID: ad76c0ac-1914-4df4-99d0-5bcb273beb65 [root@dhcp35-100 ~]# gluster v rebalance arbiter-vol status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- dhcp35-202.lab.eng.blr.redhat.com 0 0Bytes 0 0 0 completed 0:00:01 10.70.35.31 0 0Bytes 0 0 0 completed 0:00:01 10.70.35.104 0 0Bytes 0 0 0 completed 0:00:01 volume rebalance: arbiter-vol: success [root@dhcp35-100 ~]# [root@dhcp35-100 ~]# [root@dhcp35-100 ~]# [root@dhcp35-100 ~]# [root@dhcp35-100 ~]# gluster v status arbiter-vol Status of volume: arbiter-vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp35-202.lab.eng.blr.redhat.com:/gl uster/brick1/test 49175 0 Y 7908 Brick dhcp35-31.lab.eng.blr.redhat.com:/glu ster/brick1/test 49175 0 Y 29058 Brick dhcp35-104.lab.eng.blr.redhat.com:/gl uster/brick1/test 49175 0 Y 18812 Brick dhcp35-202.lab.eng.blr.redhat.com:/gl uster/brick1/test-add 49176 0 Y 10504 Brick dhcp35-31.lab.eng.blr.redhat.com:/glu ster/brick1/test-add 49176 0 Y 31636 Brick dhcp35-104.lab.eng.blr.redhat.com:/gl uster/brick1/test-add 49176 0 Y 21198 Self-heal Daemon on localhost N/A N/A Y 5664 Self-heal Daemon on 10.70.35.104 N/A N/A Y 21227 Self-heal Daemon on 10.70.35.31 N/A N/A Y 31657 Self-heal Daemon on dhcp35-202.lab.eng.blr. redhat.com N/A N/A Y 10540 Task Status of Volume arbiter-vol ------------------------------------------------------------------------------ Task : Rebalance ID : ad76c0ac-1914-4df4-99d0-5bcb273beb65 Status : completed [root@dhcp35-100 ~]# gluster v replace-brick arbiter-vol dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-replace commit force volume replace-brick: success: replace-brick commit force operation successful [root@dhcp35-100 ~]# gluster v status arbiter-vol Status of volume: arbiter-vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp35-202.lab.eng.blr.redhat.com:/gl uster/brick1/test-replace 49175 0 Y 12915 Brick dhcp35-31.lab.eng.blr.redhat.com:/glu ster/brick1/test 49175 0 Y 29058 Brick dhcp35-104.lab.eng.blr.redhat.com:/gl uster/brick1/test 49175 0 Y 18812 Brick dhcp35-202.lab.eng.blr.redhat.com:/gl uster/brick1/test-add 49176 0 Y 10504 Brick dhcp35-31.lab.eng.blr.redhat.com:/glu ster/brick1/test-add 49176 0 Y 31636 Brick dhcp35-104.lab.eng.blr.redhat.com:/gl uster/brick1/test-add 49176 0 Y 21198 Self-heal Daemon on localhost N/A N/A Y 7738 Self-heal Daemon on 10.70.35.31 N/A N/A Y 1301 Self-heal Daemon on 10.70.35.104 N/A N/A Y 23385 Self-heal Daemon on dhcp35-202.lab.eng.blr. redhat.com N/A N/A Y 12926 Task Status of Volume arbiter-vol ------------------------------------------------------------------------------ Task : Rebalance ID : ad76c0ac-1914-4df4-99d0-5bcb273beb65 Status : not started Is this issue reproducible? If yes, share more details.: ------------------------------------------------------------- yes Actual results: =============== The volume status shows rebalance as not started Expected results: ================= The volume status should not show rebalance as not started [root@dhcp35-100 ~]# gluster v heal arbiter-vol info Brick dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-replace Status: Connected Number of entries: 0 Brick dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test Status: Connected Number of entries: 0 Brick dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test Status: Connected Number of entries: 0 Brick dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-add Status: Connected Number of entries: 0 Brick dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test-add Status: Connected Number of entries: 0 Brick dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test-add Status: Connected Number of entries: 0 The issue is easily reproducible with or without IO's
Verified with glusterfs-6.0-59.el8rhgs with the following steps: 1. Created 3 node trusted storage pool ( gluster cluster ) with RHEL 8.4 platform ( layered installation ) 2. Created a 1x3 replicate volume and started the volume 3. Fuse mounted the volume and created few files ( around 100 files, each of size 15MB ) 4. Expanded the volume in to 2x3 5. Triggered rebalance. After the above step, rebalance completed successfully. 6. Verified the rebalance status with the 'gluster volume status' output 7. Performed the 'replace brick' step to replace one of the faulty brick with another brick. Now post this 'replace brick' step, 'gluster volume status' output changed the status of rebalance as 'reset by replace-brick' <snip> Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.129:/gluster/brick1/newbrick 49154 0 Y 2380 Brick 10.70.35.165:/gluster/brick1/b1 49152 0 Y 1778 Brick 10.70.35.206:/gluster/brick1/b1 49152 0 Y 1805 Brick 10.70.35.129:/gluster/brick2/b2 49153 0 Y 2211 Brick 10.70.35.165:/gluster/brick2/b2 49153 0 Y 1879 Brick 10.70.35.206:/gluster/brick2/b2 49153 0 Y 1905 Self-heal Daemon on localhost N/A N/A Y 2387 Self-heal Daemon on 10.70.35.165 N/A N/A Y 1975 Self-heal Daemon on 10.70.35.206 N/A N/A Y 1999 Task Status of Volume repvol ------------------------------------------------------------------------------ Task : Rebalance ID : None Status : reset due to replace-brick </snip> 8. Now to cover other scenario of 'reset brick'. Expand the same volume in to 3x3 by adding 3 more bricks. 9. Trigger rebalance and wait for rebalance to complete, and 'gluster volume status' to report rebalance status as 'completed' 10. Reset one of the brick in this 3x3 volume # gluster volume reset-brick <vol> <old_brick> start # gluster volume reset-brick <vol> <old_brick> <same_brick> commit After this checking with 'gluster volume status' again provides the rebalance status as 'reset by reset-brick' <snip> [root@ ]# gluster volume reset-brick repvol 10.70.35.129:/gluster/brick2/b2 10.70.35.129:/gluster/brick2/b2 commit volume reset-brick: success: reset-brick commit operation successful [root@ ]# gluster v status Status of volume: repvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.129:/gluster/brick1/newbrick 49154 0 Y 2380 Brick 10.70.35.165:/gluster/brick1/b1 49152 0 Y 1778 Brick 10.70.35.206:/gluster/brick1/b1 49152 0 Y 1805 Brick 10.70.35.129:/gluster/brick2/b2 49153 0 Y 2667 Brick 10.70.35.165:/gluster/brick2/b2 49153 0 Y 1879 Brick 10.70.35.206:/gluster/brick2/b2 49153 0 Y 1905 Brick 10.70.35.129:/gluster/brick1/b3 49152 0 Y 2480 Brick 10.70.35.165:/gluster/brick2/B3 49154 0 Y 2015 Brick 10.70.35.206:/gluster/brick2/b3 49154 0 Y 2039 Self-heal Daemon on localhost N/A N/A Y 2674 Self-heal Daemon on 10.70.35.206 N/A N/A Y 2140 Self-heal Daemon on 10.70.35.165 N/A N/A Y 2112 Task Status of Volume repvol ------------------------------------------------------------------------------ Task : Rebalance ID : None Status : reset due to reset-brick </snip> 11. Restarting gluster or nodes, and restarting all the nodes retains the status. 12. Repeated the test with distributed-arbitrated replicate volume with the same steps as above and the results are successful as expected Based on the above observations verifying this bug
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHGS 3.5.z Batch Update 5 glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3729