Verified the bug on build - glusterfs-server-3.7.9-1.el7rhgs.x86_64 steps followed to verify: Test1: 1) created a 4 x 2 dis rep vol (say brick-1 till brick-8) 2) created a dir and under this directory created 10k files 3) Added 4 more bricks 4) Initiated rebalance process 5) killed brick 1 Rebalance process halted on replica pair of brick-1 and brick-2. Rebalance on other bricks went on to complete. There was no inconsistency with the rebalance status. This is expected behavior as rebalance of all files fail under a directory when readdirp fails. To validate this, performed test-2 Test2: 1) created a 4 x 2 dis rep vol (say brick-1 till brick-8) 2) created 100 dirs - dir-{1..100} 3) created 1k files under each directory of directory 4) Added 4 more bricks 5) Initiated rebalance process 6) killed brick 1 Rebalance process continued on all replica-pairs. When readdirp fails on one directory, it continued on subsequent dirs. This is as expected and rebalance status was consistent across all the nodes. Hence, marking this bug as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240