Bug 1030932

Summary: [rfe] DHT: Remove-brick- Data is migrating even from non-decommissioned bricks
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: shylesh <shmohan>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED NOTABUG QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: rwheeler, spalai, vagarwal, vbellur
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-30 09:54:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description shylesh 2013-11-15 11:10:46 UTC
Description of problem:
 while decommissioning bricks data is also migrated from non-decommissioned bricks which sometimes leads to data loss

Version-Release number of selected component (if applicable):
3.4.0.44rhs-1.el6rhs.x86_64

How reproducible:
Not always

Steps to Reproduce:
1. From a distributed-replicate volume of 11x2 configuration removed a pair of bricks using remove-brick start

2.data is also migrated from the non-decommissioned bricks
 

More info
----------
Volume Name: dist-rep   
Type: Distributed-Replicate
Volume ID: f93775df-84c4-4c3a-8883-185e94acafe4
Status: Started
Number of Bricks: 11 x 2 = 22
Transport-type: tcp
Bricks:
Brick1: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep0
Brick2: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep1
Brick3: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep2
Brick4: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep3
Brick5: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep4
Brick6: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep5
Brick7: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep6
Brick8: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep7
Brick9: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep8
Brick10: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep9
Brick11: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep10---->
Brick12: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep11---->decommissioned pair--> dist-rep-replicate-5
Brick13: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep12
Brick14: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep13
Brick15: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep14
Brick16: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep15
Brick17: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep16
Brick18: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep17
Brick19: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep18
Brick20: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep19
Brick21: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep20
Brick22: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep21
Options Reconfigured:
features.quota: off


command
--------
[root@rhs-client4 mnt]# gluster v remove-brick dist-rep rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep10 rhs-client39.lab.eng.blr.redhat.com:/
home/dist-rep11 status  
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run-time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes             0             0             0    not started             0.00
      rhs-client9.lab.eng.blr.redhat.com             1518       759.0MB          9759             0             0      completed           404.00
     rhs-client39.lab.eng.blr.redhat.com              961       480.5MB          9330             0             0      completed           386.00


looking at the rebalance logs from node rhs-client39.lab.eng.blr.redhat.com
----------------------------



[2013-11-15 09:25:23.281339] I [dht-rebalance.c:672:dht_migrate_file] 0-dist-rep-dht: /5/5/4/1/file.0: attempting to move from dist-rep-replicate
-10 to dist-rep-replicate-1

[2013-11-15 09:25:24.399435] I [dht-rebalance.c:881:dht_migrate_file] 0-dist-rep-dht: completed migration of /5/5/4/5/file.0 from subvolume dist-
rep-replicate-1 to dist-rep-replicate-0

[2013-11-15 09:25:25.252144] I [dht-rebalance.c:881:dht_migrate_file] 0-dist-rep-dht: completed migration of /5/5/5/2/file.0 from subvolume dist-
rep-replicate-10 to dist-rep-replicate-1


Cluster info
------------
rhs-client9.lab.eng.blr.redhat.com
rhs-client39.lab.eng.blr.redhat.com
rhs-client4.lab.eng.blr.redhat.com


Mounted on 
----------
rhs-client4.lab.eng.blr.redhat.com:/mnt


attached the sosreports

Comment 3 Amar Tumballi 2013-12-02 10:02:48 UTC
because the layout changes for existing directories after remove-brick, it does migrate data from even the non-decommissioned bricks.

If this is not expected, then the way we handle remove brick should change.