Description of problem: The dht remove-brick operation is expected to treat skipped files as failures as they are left behind on the removed bricks. If a file could not be migrated because there was no subvolume that could accommodate it, the error is ignored because of an incorrect loop counter. This is a regression from previous releases. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Create a 2x1 distribute volume with 500 MB bricks and create enough files so that a single brick cannot accommodate all of them 2. Remove the 2nd brick 3. Check the logs and the remove-brick status. Actual results: The remove-brick status shows no failures. However the rebalance logs show messages : [2017-07-24 09:56:20.191412] W [MSGID: 109033] [dht-rebalance.c:1021:__dht_check_free_space] 0-vol1-dht: Could not find any subvol with space accomodating the file - <filename>. Consider adding bricks Expected results: The remove-brick status should display non-zero failures as some files cannot be moved. Additional info: The counter used to iterate over the decommissioned bricks array is incorrect in __dht_check_free_space (). if (conf->decommission_subvols_cnt) { *ignore_failure = _gf_true; for (i = 0; i < conf->decommission_subvols_cnt; i++) { if (conf->decommissioned_bricks[i] == from) { *ignore_failure = _gf_false; break; } } should be if (conf->decommission_subvols_cnt) { *ignore_failure = _gf_true; for (i = 0; i < conf->subvolume_cnt; i++) { if (conf->decommissioned_bricks[i] == from) { *ignore_failure = _gf_false; break; } }
Verified this bug on glusterfs version glusterfs-3.8.4-36, followed the same steps as in the description and could see that the remove-brick status is displaying the failure count for the files which are not migrated because of no space. Moving this BZ to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774