Bug 1474284 - dht remove-brick status does not indicate failures for files not migrated because of a lack of space
Summary: dht remove-brick status does not indicate failures for files not migrated bec...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.3.0
Assignee: Nithya Balachandran
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks: 1417151 1474318 1475181
TreeView+ depends on / blocked
 
Reported: 2017-07-24 10:07 UTC by Nithya Balachandran
Modified: 2017-09-21 05:04 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1474318 (view as bug list)
Environment:
Last Closed: 2017-09-21 05:04:21 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Nithya Balachandran 2017-07-24 10:07:01 UTC
Description of problem:

The dht remove-brick operation is expected to treat skipped files as failures as they are left behind on the removed bricks.

If a file could not be migrated because there was no subvolume that could accommodate it, the error is ignored because of an incorrect loop counter.

This is a regression from previous releases.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a 2x1 distribute volume with 500 MB bricks and create enough files so that a single brick cannot accommodate all of them
2. Remove the 2nd brick
3. Check the logs and the remove-brick status.

Actual results:
The remove-brick status shows no failures. However the rebalance logs show messages :

[2017-07-24 09:56:20.191412] W [MSGID: 109033] [dht-rebalance.c:1021:__dht_check_free_space] 0-vol1-dht: Could not find any subvol with space accomodating the file - <filename>. Consider adding bricks



Expected results:
The remove-brick status should display non-zero failures as some files cannot be moved.


Additional info:

The counter used to iterate over the decommissioned bricks array is incorrect in __dht_check_free_space ().


                if (conf->decommission_subvols_cnt) {
                        *ignore_failure = _gf_true;
                        for (i = 0; i < conf->decommission_subvols_cnt; i++) {
                                if (conf->decommissioned_bricks[i] == from) {
                                        *ignore_failure = _gf_false;
                                         break;
                                }
                        }



should be 


                if (conf->decommission_subvols_cnt) {
                        *ignore_failure = _gf_true;
                        for (i = 0; i < conf->subvolume_cnt; i++) {
                                if (conf->decommissioned_bricks[i] == from) {
                                        *ignore_failure = _gf_false;
                                         break;
                                }
                        }

Comment 9 Prasad Desala 2017-07-31 19:20:33 UTC
Verified this bug on glusterfs version glusterfs-3.8.4-36, followed the same steps as in the description and could see that the remove-brick status is displaying the failure count for the files which are not migrated because of no space.

Moving this BZ to Verified.

Comment 11 errata-xmlrpc 2017-09-21 05:04:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.