Bug 1258875

Summary: DHT: Once remove brick start failed in between Remove brick commit should not be allowed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RajeshReddy <rmekala>
Component: distributeAssignee: Sakshi <sabansal>
Status: CLOSED ERRATA QA Contact: krishnaram Karthick <kramdoss>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amukherj, bsrirama, mzywusko, nbalacha, rcyriac, rhinduja, sabansal, sasundar, smohan, spalai
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.9-4 Doc Type: Bug Fix
Doc Text:
Remove-brick commits were allowed even when remove-brick failed, resulting in data loss when bricks were removed and the remove-brick operation failed because of incomplete data migration from decommissioned bricks. This has been corrected so that failure in remove-brick start prevents commit operations and therefore prevents data loss. If brick data is not important, using the force option forces brick removal regardless.
Story Points: ---
Clone Of:
: 1278325 (view as bug list) Environment:
Last Closed: 2016-06-23 04:54:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1278325, 1299184, 1332370, 1333237    

Description RajeshReddy 2015-09-01 12:47:16 UTC
Document URL: 
=============
DHT: Once remove brick start failed in between Remove brick commit should not be allowed 

Steps:
========
1. Create a distributed volume with three bricks and mount it on client using FUSE
2. From the mount point create lots of directories and one direcotry with 30k files
3. Remove one of the brick form the volume and while re-blance is in progress delete all directories and files from the mount point and due to this remove-brick operation failed 
4.Though remove-brick operation failed remove-commint job is getting succeeded, 

Expected Result:
================
Remove-brick commit should be allowed only when the remove-brick operation job is passed

Comment 2 Sakshi 2016-05-02 15:09:18 UTC
*** Bug 1330484 has been marked as a duplicate of this bug. ***

Comment 5 Atin Mukherjee 2016-05-05 13:37:09 UTC
Mainline upstream : http://review.gluster.org/#/c/12513/
release-3.7 : http://review.gluster.org/#/c/12513/
Downstream patch : https://code.engineering.redhat.com/gerrit/#/c/73466/

Comment 7 krishnaram Karthick 2016-05-20 14:33:37 UTC
Verified the bug in glusterfs-3.7.9-5

commit fails with an error message.

[root@dhcp46-103 ~]# gluster v remove-brick supernova 10.70.47.128:/bricks/brick1/sn 10.70.47.171:/bricks/brick1/sn 10.70.47.187:/bricks/brick1/sn 10.70.46.103:/bricks/brick1/sn status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0            completed        0:27:19
                            10.70.47.187                0        0Bytes             0             0             0            completed        0:29:12
                            10.70.47.171                0        0Bytes             0             0             0            completed        0:29:8
                            10.70.47.128                0        0Bytes             0             3             0               failed        0:7:17
[root@dhcp46-103 ~]# 
[root@dhcp46-103 ~]# gluster v remove-brick supernova 10.70.47.128:/bricks/brick1/sn 10.70.47.171:/bricks/brick1/sn 10.70.47.187:/bricks/brick1/sn 10.70.46.103:/bricks/brick1/sn commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: failed: Staging failed on 10.70.47.128. Error: use 'force' option as migration has failed

 - using force option indeed allowed to remove the brick


[root@dhcp46-103 ~]# gluster v remove-brick supernova 10.70.47.128:/bricks/brick1/sn 10.70.47.171:/bricks/brick1/sn 10.70.47.187:/bricks/brick1/sn 10.70.46.103:/bricks/brick1/sn force
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit force: success

Moving the bug to verified.

Comment 11 errata-xmlrpc 2016-06-23 04:54:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Comment 12 krishnaram Karthick 2016-06-24 08:17:27 UTC
*** Bug 1288448 has been marked as a duplicate of this bug. ***