Description of problem: ======================= With a data set having files and directories, on a 6 node setup triggered rebalance. Rebalance on 3 of 6 nodes completed in 7 minutes. On the other nodes rebalance is running for more than 5 hrs and is still running. Seems like rebalance process is not going to end. In rebalance logs I am not seeing any fix-layout changes or file migration messages and in rebalance status there is no change in the values of rebalance files, size, scanned, failures, skipped since 5hrs. I have left the system in the same state and can be used for live debugging. Version-Release number of selected component (if applicable): 3.12.2-7.el7rhgs.x86_64 How reproducible: Reporting at first occurrence Steps to Reproduce: =================== Not sure on the exact reproducer but below steps lead to this situation. 1) Create a x3 volume and start it. 2) FUSE mount on multiple clients 3) From mount point, create files and directories. 4) From one client start renaming the data set created in 3 and from other clients start lookups. 5) Now add-bricks and start rebalance with force option. 6) Wait till the rebalance completes. Actual results: =============== Rebalance on few nodes doesn't seem to complete . Expected results: ================= Rebalance should complete without any issues.
Verified this BZ on glusterfs version: 3.12.2-13.el7rhgs.x86_64. Ran the tests mentioned in the description and Comment 9. Did not see rebalance hang, moving this BZ to Verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607