Bug 1474284

Summary: dht remove-brick status does not indicate failures for files not migrated because of a lack of space
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nithya Balachandran <nbalacha>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: Prasad Desala <tdesala>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, rcyriac, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: Regression
Target Release: RHGS 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-36 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1474318 (view as bug list) Environment:
Last Closed: 2017-09-21 05:04:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1417151, 1474318, 1475181    

Description Nithya Balachandran 2017-07-24 10:07:01 UTC
Description of problem:

The dht remove-brick operation is expected to treat skipped files as failures as they are left behind on the removed bricks.

If a file could not be migrated because there was no subvolume that could accommodate it, the error is ignored because of an incorrect loop counter.

This is a regression from previous releases.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a 2x1 distribute volume with 500 MB bricks and create enough files so that a single brick cannot accommodate all of them
2. Remove the 2nd brick
3. Check the logs and the remove-brick status.

Actual results:
The remove-brick status shows no failures. However the rebalance logs show messages :

[2017-07-24 09:56:20.191412] W [MSGID: 109033] [dht-rebalance.c:1021:__dht_check_free_space] 0-vol1-dht: Could not find any subvol with space accomodating the file - <filename>. Consider adding bricks



Expected results:
The remove-brick status should display non-zero failures as some files cannot be moved.


Additional info:

The counter used to iterate over the decommissioned bricks array is incorrect in __dht_check_free_space ().


                if (conf->decommission_subvols_cnt) {
                        *ignore_failure = _gf_true;
                        for (i = 0; i < conf->decommission_subvols_cnt; i++) {
                                if (conf->decommissioned_bricks[i] == from) {
                                        *ignore_failure = _gf_false;
                                         break;
                                }
                        }



should be 


                if (conf->decommission_subvols_cnt) {
                        *ignore_failure = _gf_true;
                        for (i = 0; i < conf->subvolume_cnt; i++) {
                                if (conf->decommissioned_bricks[i] == from) {
                                        *ignore_failure = _gf_false;
                                         break;
                                }
                        }

Comment 9 Prasad Desala 2017-07-31 19:20:33 UTC
Verified this bug on glusterfs version glusterfs-3.8.4-36, followed the same steps as in the description and could see that the remove-brick status is displaying the failure count for the files which are not migrated because of no space.

Moving this BZ to Verified.

Comment 11 errata-xmlrpc 2017-09-21 05:04:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774