Description of problem: while decommissioning bricks data is also migrated from non-decommissioned bricks which sometimes leads to data loss Version-Release number of selected component (if applicable): 3.4.0.44rhs-1.el6rhs.x86_64 How reproducible: Not always Steps to Reproduce: 1. From a distributed-replicate volume of 11x2 configuration removed a pair of bricks using remove-brick start 2.data is also migrated from the non-decommissioned bricks More info ---------- Volume Name: dist-rep Type: Distributed-Replicate Volume ID: f93775df-84c4-4c3a-8883-185e94acafe4 Status: Started Number of Bricks: 11 x 2 = 22 Transport-type: tcp Bricks: Brick1: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep0 Brick2: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep1 Brick3: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep2 Brick4: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep3 Brick5: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep4 Brick6: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep5 Brick7: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep6 Brick8: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep7 Brick9: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep8 Brick10: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep9 Brick11: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep10----> Brick12: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep11---->decommissioned pair--> dist-rep-replicate-5 Brick13: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep12 Brick14: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep13 Brick15: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep14 Brick16: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep15 Brick17: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep16 Brick18: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep17 Brick19: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep18 Brick20: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep19 Brick21: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep20 Brick22: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep21 Options Reconfigured: features.quota: off command -------- [root@rhs-client4 mnt]# gluster v remove-brick dist-rep rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep10 rhs-client39.lab.eng.blr.redhat.com:/ home/dist-rep11 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 not started 0.00 rhs-client9.lab.eng.blr.redhat.com 1518 759.0MB 9759 0 0 completed 404.00 rhs-client39.lab.eng.blr.redhat.com 961 480.5MB 9330 0 0 completed 386.00 looking at the rebalance logs from node rhs-client39.lab.eng.blr.redhat.com ---------------------------- [2013-11-15 09:25:23.281339] I [dht-rebalance.c:672:dht_migrate_file] 0-dist-rep-dht: /5/5/4/1/file.0: attempting to move from dist-rep-replicate -10 to dist-rep-replicate-1 [2013-11-15 09:25:24.399435] I [dht-rebalance.c:881:dht_migrate_file] 0-dist-rep-dht: completed migration of /5/5/4/5/file.0 from subvolume dist- rep-replicate-1 to dist-rep-replicate-0 [2013-11-15 09:25:25.252144] I [dht-rebalance.c:881:dht_migrate_file] 0-dist-rep-dht: completed migration of /5/5/5/2/file.0 from subvolume dist- rep-replicate-10 to dist-rep-replicate-1 Cluster info ------------ rhs-client9.lab.eng.blr.redhat.com rhs-client39.lab.eng.blr.redhat.com rhs-client4.lab.eng.blr.redhat.com Mounted on ---------- rhs-client4.lab.eng.blr.redhat.com:/mnt attached the sosreports
because the layout changes for existing directories after remove-brick, it does migrate data from even the non-decommissioned bricks. If this is not expected, then the way we handle remove brick should change.