Description of problem: gluster volume remove-brick on distribute-replicate volume ends up in dataloss for clients. Version-Release number of selected component (if applicable): 3.4.0 3.4.1 How reproducible: always Steps to Reproduce: 1. yes | gluster volume create test replica 2 servserv.generals.ea.com:/mnt/gluster/test1 servserv.generals.ea.com:/mnt/gluster/test2 servserv.generals.ea.com:/mnt/gluster/test3 servserv.generals.ea.com:/mnt/gluster/test4 servserv.generals.ea.com:/mnt/gluster/test5 servserv.generals.ea.com:/mnt/gluster/test6 ; gluster volume start test 2. mount -t glusterfs servserv.generals.ea.com:test /media/test/ 3. cd /media/test ; git clone https://git.fedorahosted.org/git/freeipa.git 4. find /media/test | wc -l 5. gluster volume info Volume Name: test Type: Distributed-Replicate Volume ID: 5467b9fe-9a3c-4850-8449-280ee9789c11 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: servserv.generals.ea.com:/mnt/gluster/test1 Brick2: servserv.generals.ea.com:/mnt/gluster/test2 Brick3: servserv.generals.ea.com:/mnt/gluster/test3 Brick4: servserv.generals.ea.com:/mnt/gluster/test4 Brick5: servserv.generals.ea.com:/mnt/gluster/test5 Brick6: servserv.generals.ea.com:/mnt/gluster/test6 6. gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 start 7. gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 341 17.3MB 1331 0 completed 678.00 8. gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y 9. find /media/test | wc -l 977 Actual results: I just lost files. Expected results: No dataloss on shrinking gluster as this makes gluster quite unusable for any production setup.
It's migrating the wrong dht subvolume! See attachment.
Created attachment 817145 [details] test-rebalance.log
Meh, I jumped to conclusions again. It's not. It's just allowing the migration of files TO the decommissioned subvolume.
Notice make-doc file, in all tests I ran make-doc was on to be decommissioned replica pair and migration didn't even mentioned it.
Confirming that I have the same issue. Files appear to be migrated to the decommissioned subvolume, then after I run "commit", file are gone.
Note that one can reproduce issues from bug #966848, bug #1025404 by running rm -rf /media/test/freeipa after step 9.
Does the same behavior exist on upstream master? There have been several related fixes in dht in master and I would like to determine if master does have the same problem.
I built gluster from master and I can confirm that issue is not there, it would be nice to track down commits and backport them to 3.4 branch as 3.5 is far away. Test: [root@potwora test]# rpm -qa | grep gluster glusterfs-fuse-3git-1.fc20.x86_64 glusterfs-cli-3git-1.fc20.x86_64 glusterfs-rdma-3git-1.fc20.x86_64 glusterfs-api-3git-1.fc20.x86_64 glusterfs-3git-1.fc20.x86_64 glusterfs-api-devel-3git-1.fc20.x86_64 glusterfs-libs-3git-1.fc20.x86_64 glusterfs-server-3git-1.fc20.x86_64 glusterfs-devel-3git-1.fc20.x86_64 glusterfs-debuginfo-3git-1.fc20.x86_64 glusterfs-geo-replication-3git-1.fc20.x86_64 glusterfs-regression-tests-3git-1.fc20.x86_64 [root@potwora ~]# yes | gluster volume stop test force ; yes | gluster volume del test ; rm -rf /mnt/gluster/test* ; yes | gluster volume create test replica 2 servserv.generals.ea.com:/mnt/gluster/test1 servserv.generals.ea.com:/mnt/gluster/test2 servserv.generals.ea.com:/mnt/gluster/test3 servserv.generals.ea.com:/mnt/gluster/test4 servserv.generals.ea.com:/mnt/gluster/test5 servserv.generals.ea.com:/mnt/gluster/test6 force ; gluster volume start test volume stop: test: failed: Volume test does not exist Stopping volume will make its data inaccessible. Do you want to continue? (y/n) volume delete: test: failed: Volume test does not exist Deleting volume will erase all information about the volume. Do you want to continue? (y/n) Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Do you still want to continue creating the volume? (y/n) volume create: test: success: please start the volume to access data volume start: test: success [root@potwora ~]# gluster volume info Volume Name: test Type: Distributed-Replicate Volume ID: a0883ff4-a6b0-4caa-9fce-928616ca362e Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: servserv.generals.ea.com:/mnt/gluster/test1 Brick2: servserv.generals.ea.com:/mnt/gluster/test2 Brick3: servserv.generals.ea.com:/mnt/gluster/test3 Brick4: servserv.generals.ea.com:/mnt/gluster/test4 Brick5: servserv.generals.ea.com:/mnt/gluster/test5 Brick6: servserv.generals.ea.com:/mnt/gluster/test6 [root@potwora ~]# gluster volume status Status of volume: test Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick servserv.generals.ea.com:/mnt/gluster/test1 49152 Y 20680 Brick servserv.generals.ea.com:/mnt/gluster/test2 49153 Y 20692 Brick servserv.generals.ea.com:/mnt/gluster/test3 49154 Y 20703 Brick servserv.generals.ea.com:/mnt/gluster/test4 49155 Y 20714 Brick servserv.generals.ea.com:/mnt/gluster/test5 49156 Y 20725 Brick servserv.generals.ea.com:/mnt/gluster/test6 49157 Y 20736 NFS Server on localhost 2049 Y 20750 Self-heal Daemon on localhost N/A Y 20754 Task Status of Volume test ------------------------------------------------------------------------------ There are no active volume tasks [root@potwora ~]# mkdir /media/test [root@potwora ~]# cd /media/test/ [root@potwora test]# git clone https://git.fedorahosted.org/git/freeipa.git Cloning into 'freeipa'... remote: Counting objects: 58018, done. remote: Compressing objects: 100% (18280/18280), done. remote: Total 58018 (delta 47755), reused 48634 (delta 39585) Receiving objects: 100% (58018/58018), 12.92 MiB | 805.00 KiB/s, done. Resolving deltas: 100% (47755/47755), done. Checking connectivity... done Checking out files: 100% (1171/1171), done. [root@potwora test]# find ./ | wc -l 1323 [root@potwora test]# cd [root@potwora ~]# gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 start volume remove-brick start: success ID: 36624241-2555-4c4a-9109-21b4ecae98fc [root@potwora ~]# gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 173 16.2MB 432 0 0 in progress 3.00 [root@potwora ~]# gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 339 18.7MB 870 0 0 in progress 5.00 [root@potwora ~]# gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 413 19.8MB 1084 0 0 in progress 6.00 [root@potwora ~]# gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 489 20.6MB 1234 0 0 completed 7.00 [root@potwora ~]# gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 489 20.6MB 1234 0 0 completed 7.00 [root@potwora ~]# gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 489 20.6MB 1234 0 0 completed 7.00 [root@potwora ~]# gluster volume remove-brick test servserv.generals.ea.com:/mnt/gluster/test6 servserv.generals.ea.com:/mnt/gluster/test5 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success [root@potwora ~]# cd /media/test/ [root@potwora test]# find ./ | wc -l 1323
Have backported DHT/remove-brick related fixes to release-3.4 @ http://review.gluster.org/#/c/6461/ http://review.gluster.org/#/c/6468/ <--Most likely this will fix the issue http://review.gluster.org/#/c/6469/ http://review.gluster.org/#/c/6470/ http://review.gluster.org/#/c/6471/
Lukas, Can you please verify if 3.4.2qa3 fixes this problem?
No change, issue is still there, should I try to bisect it? [root@glusterkluster:~] rpm -qa | grep gluster glusterfs-libs-3.4.2qa3-1.el6.x86_64 glusterfs-server-3.4.2qa3-1.el6.x86_64 glusterfs-fuse-3.4.2qa3-1.el6.x86_64 glusterfs-api-devel-3.4.2qa3-1.el6.x86_64 glusterfs-cli-3.4.2qa3-1.el6.x86_64 glusterfs-rdma-3.4.2qa3-1.el6.x86_64 glusterfs-devel-3.4.2qa3-1.el6.x86_64 glusterfs-debuginfo-3.4.2qa3-1.el6.x86_64 glusterfs-3.4.2qa3-1.el6.x86_64 glusterfs-geo-replication-3.4.2qa3-1.el6.x86_64 glusterfs-api-3.4.2qa3-1.el6.x86_64 [root@glusterkluster:~] gluster volume info test Volume Name: test Type: Distributed-Replicate Volume ID: e31ec436-d6dd-4cce-ae57-54408aa1f620 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: glusterkluster:/mnt/gluster/test1 Brick2: glusterkluster:/mnt/gluster/test2 Brick3: glusterkluster:/mnt/gluster/test3 Brick4: glusterkluster:/mnt/gluster/test4 Brick5: glusterkluster:/mnt/gluster/test5 Brick6: glusterkluster:/mnt/gluster/test6 [root@glusterkluster:/media/test] find ./ | wc -l 1313 [root@glusterkluster:~] gluster volume remove-brick test glusterkluster:/mnt/gluster/test5 glusterkluster:/mnt/gluster/test6 start volume remove-brick start: success ID: 6cf1cac1-8469-4674-a488-277a7d611dcc gluster volume remove-brick test glusterkluster:/mnt/gluster/test5 glusterkluster:/mnt/gluster/test6 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 337 19.1MB 1319 0 completed 670.00 [root@glusterkluster:~] gluster volume remove-brick test glusterkluster:/mnt/gluster/test5 glusterkluster:/mnt/gluster/test6 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success [root@glusterkluster:~] find /media/test/ | wc -l 975
https://bugzilla.redhat.com/show_bug.cgi?id=966845 please backport 4f63b631dce4cb97525ee13fab0b2a789bcf6b15
Backported - http://review.gluster.org/6517. Does this fix the issue?
glusterfs-3.4.2qa4-1.el6.x86_64 works fine, thank you!