Bug 1474284 - dht remove-brick status does not indicate failures for files not migrated because of a lack of space
dht remove-brick status does not indicate failures for files not migrated bec...
Status: VERIFIED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute (Show other bugs)
3.3
Unspecified Unspecified
unspecified Severity unspecified
: ---
: RHGS 3.3.0
Assigned To: Nithya Balachandran
Prasad Desala
: Regression
Depends On:
Blocks: 1417151 1474318 1475181
  Show dependency treegraph
 
Reported: 2017-07-24 06:07 EDT by Nithya Balachandran
Modified: 2017-08-16 23:31 EDT (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-36
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1474318 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nithya Balachandran 2017-07-24 06:07:01 EDT
Description of problem:

The dht remove-brick operation is expected to treat skipped files as failures as they are left behind on the removed bricks.

If a file could not be migrated because there was no subvolume that could accommodate it, the error is ignored because of an incorrect loop counter.

This is a regression from previous releases.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a 2x1 distribute volume with 500 MB bricks and create enough files so that a single brick cannot accommodate all of them
2. Remove the 2nd brick
3. Check the logs and the remove-brick status.

Actual results:
The remove-brick status shows no failures. However the rebalance logs show messages :

[2017-07-24 09:56:20.191412] W [MSGID: 109033] [dht-rebalance.c:1021:__dht_check_free_space] 0-vol1-dht: Could not find any subvol with space accomodating the file - <filename>. Consider adding bricks



Expected results:
The remove-brick status should display non-zero failures as some files cannot be moved.


Additional info:

The counter used to iterate over the decommissioned bricks array is incorrect in __dht_check_free_space ().


                if (conf->decommission_subvols_cnt) {
                        *ignore_failure = _gf_true;
                        for (i = 0; i < conf->decommission_subvols_cnt; i++) {
                                if (conf->decommissioned_bricks[i] == from) {
                                        *ignore_failure = _gf_false;
                                         break;
                                }
                        }



should be 


                if (conf->decommission_subvols_cnt) {
                        *ignore_failure = _gf_true;
                        for (i = 0; i < conf->subvolume_cnt; i++) {
                                if (conf->decommissioned_bricks[i] == from) {
                                        *ignore_failure = _gf_false;
                                         break;
                                }
                        }
Comment 9 Prasad Desala 2017-07-31 15:20:33 EDT
Verified this bug on glusterfs version glusterfs-3.8.4-36, followed the same steps as in the description and could see that the remove-brick status is displaying the failure count for the files which are not migrated because of no space.

Moving this BZ to Verified.

Note You need to log in before you can comment on or make changes to this bug.