Description of problem: DHT - remove-brick - data loss in remove-brick In DHT 'remove-brick start' makes hash - layout 0000000000000000 for brick other than mentioned in command + no files are migrated from brick that will be removed + data written after start operation also goes to that brick so on commit it ends in data loss Version-Release number of selected component (if applicable): 3.4.0.8rhs-1.el6.x86_64 How reproducible: always Steps to Reproduce: 1.created a DHT volume, start and mount it root@mia ~]# gluster volume create r1 fred.lab.eng.blr.redhat.com:/rhs/brick1/r1 cutlass.lab.eng.blr.redhat.com:/rhs/brick1/r1 fan.lab.eng.blr.redhat.com:/rhs/brick1/r1 mia.lab.eng.blr.redhat.com:/rhs/brick1/r1 volume create: r1: success: please start the volume to access data [root@mia ~]# gluster volume start r1 volume start: r1: success [root@mia ~]# gluster volume status r1 Status of volume: r1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/r1 49155 Y 6944 Brick cutlass.lab.eng.blr.redhat.com:/rhs/brick1/r1 49155 Y 6920 Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/r1 49155 Y 4183 Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/r1 49153 Y 3508 NFS Server on localhost 2049 Y 3518 NFS Server on a37ff566-da82-4ae4-90c6-17763466fd36 2049 Y 4193 NFS Server on c5154da1-be15-40e2-b5f3-9be6dadafd43 2049 Y 6930 NFS Server on ad0337ac-1756-4e04-aa6f-d9c46a24130d 2049 Y 6954 There are no active volume tasks mount [root@rhsauto037 mnt]# mount -t glusterfs fan.lab.eng.blr.redhat.com:/r1 /mnt/rtest 2. create some files and dir inside it [root@rhsauto037 mnt]# cd /mnt/rtest [root@rhsauto037 rtest]# for i in {1..20}; do mkdir d$i; touch f"$i" ; touch d1/f"$i"; done [root@rhsauto037 rtest]# ls d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f1 f11 f13 f15 f17 f19 f20 f4 f6 f8 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f10 f12 f14 f16 f18 f2 f3 f5 f7 f9 3. verify hash layout and file distribution on backend on cutlass:- [root@cutlass ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@cutlass ~]# ls /rhs/brick1/r1 d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f13 f6 f9 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f16 f7 [root@cutlass ~]# ls -l /rhs/brick1/r1 | grep T on mia:- [root@mia ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@mia ~]# ls /rhs/brick1/r1 d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f1 f18 f2 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f10 f19 f20 [root@mia ~]# ls -l /rhs/brick1/r1 | grep T on fred:- [root@fred ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000003ffffffe trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@fred ~]# ls /rhs/brick1/r1 d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f12 f4 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f17 f8 [root@fred ~]# ls -l /rhs/brick1/r1 | grep T on fan :- [root@fan ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@fan ~]# ls /rhs/brick1/r1 d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f11 f15 f5 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f14 f3 [root@fan ~]# ls -l /rhs/brick1/r1 | grep T 4. now remove one brick using start option [root@mia ~]# gluster volume remove-brick r1 mia.lab.eng.blr.redhat.com:/rhs/brick1/r1 start volume remove-brick start: success ID: db943e14-85e4-44f1-ae10-4182f14c3995 [root@mia ~]# gluster volume remove-brick r1 mia.lab.eng.blr.redhat.com:/rhs/brick1/r1 status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 20 0 completed 0.00 10.70.34.80 0 0Bytes 0 0 not started 0.00 10.70.34.116 0 0Bytes 0 0 not started 0.00 fan.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 status says completed but no files were migrated 5. verify on backend ..It changes hash layout of some other brick than mia, here it changes layout on fan and makes it trusted.glusterfs.dht=0x00000001000000000000000000000000 + no files are migrated from mia [root@cutlass ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@cutlass ~]# ls /rhs/brick1/r1 d1 d12 d15 d18 d20 d5 d8 f13 f16 f7 d10 d13 d16 d19 d3 d6 d9 f14 f3 f9 d11 d14 d17 d2 d4 d7 f11 f15 f6 [root@cutlass ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 09:56 f11 ---------T 2 root root 0 May 16 09:56 f14 ---------T 2 root root 0 May 16 09:56 f15 ---------T 2 root root 0 May 16 09:56 f3 [root@mia ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@mia ~]# ls /rhs/brick1/r1 d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f1 f18 f2 f5 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f10 f19 f20 [root@mia ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 02:46 f5 [root@fred ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000055555554 trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@fred ~]# ls /rhs/brick1/r1 d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f12 f17 f8 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f13 f4 [root@fred ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 07:32 f13 [root@fan ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000000000000 trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@fan ~]# ls /rhs/brick1/r1 d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f11 f15 f5 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f14 f3 [root@fan ~]# ls -l /rhs/brick1/r1 | grep T 6. now create new file from mount point and verify on backend it is going to mia mount:- [root@rhsauto037 rtest]# for i in {1..20}; do touch new"$i" ; touch d1/new"$i"; done [root@rhsauto037 rtest]# ls d1 d12 d15 d18 d20 d5 d8 f10 f13 f16 f19 f3 f6 f9 new11 new14 new17 new2 new4 new7 d10 d13 d16 d19 d3 d6 d9 f11 f14 f17 f2 f4 f7 new1 new12 new15 new18 new20 new5 new8 d11 d14 d17 d2 d4 d7 f1 f12 f15 f18 f20 f5 f8 new10 new13 new16 new19 new3 new6 new9 server:- [root@cutlass ~]# ls /rhs/brick1/r1 d1 d12 d15 d18 d20 d5 d8 f13 f16 f7 new15 new7 d10 d13 d16 d19 d3 d6 d9 f14 f3 f9 new2 new9 d11 d14 d17 d2 d4 d7 f11 f15 f6 new14 new5 [root@cutlass ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 09:56 f11 ---------T 2 root root 0 May 16 09:56 f14 ---------T 2 root root 0 May 16 09:56 f15 ---------T 2 root root 0 May 16 09:56 f3 [root@mia ~]# ls /rhs/brick1/r1 d1 d12 d15 d18 d20 d5 d8 f10 f2 new10 new19 new4 d10 d13 d16 d19 d3 d6 d9 f18 f20 new12 new20 new6 d11 d14 d17 d2 d4 d7 f1 f19 f5 new13 new3 new8 [root@mia ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 02:46 f5 [root@fred ~]# ls /rhs/brick1/r1 d1 d12 d15 d18 d20 d5 d8 f13 f8 new16 d10 d13 d16 d19 d3 d6 d9 f17 new1 new17 d11 d14 d17 d2 d4 d7 f12 f4 new11 new18 [root@fred ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 07:32 f13 [root@fan ~]# ls /rhs/brick1/r1 d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f11 f15 f5 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f14 f3 [root@fan ~]# ls -l /rhs/brick1/r1 | grep T 7. now commit remove-brick and check on mount point that files are missing. verify in backend and gluster volume info that it has removed mia but data loss is there server:- [root@mia ~]# gluster volume remove-brick r1 mia.lab.eng.blr.redhat.com:/rhs/brick1/r1 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success before commit on mount:- [root@rhsauto037 rtest]# ls d1 d12 d15 d18 d20 d5 d8 f10 f13 f16 f19 f3 f6 f9 new11 new14 new17 new2 new4 new7 d10 d13 d16 d19 d3 d6 d9 f11 f14 f17 f2 f4 f7 new1 new12 new15 new18 new20 new5 new8 d11 d14 d17 d2 d4 d7 f1 f12 f15 f18 f20 f5 f8 new10 new13 new16 new19 new3 new6 new9 after commit on mount:- [root@rhsauto037 rtest]# ls d1 d11 d13 d15 d17 d19 d20 d4 d6 d8 f11 f13 f15 f17 f4 f6 f8 new1 new14 new16 new18 new5 new9 d10 d12 d14 d16 d18 d2 d3 d5 d7 d9 f12 f14 f16 f3 f5 f7 f9 new11 new15 new17 new2 new7 server:- [root@cutlass ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000055555554 trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@cutlass ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 09:56 f11 ---------T 2 root root 0 May 16 09:58 f12 ---------T 2 root root 0 May 16 09:56 f14 ---------T 2 root root 0 May 16 09:56 f15 ---------T 2 root root 0 May 16 09:58 f17 ---------T 2 root root 0 May 16 09:56 f3 ---------T 2 root root 0 May 16 09:58 f4 ---------T 2 root root 0 May 16 09:58 f8 ---------T 2 root root 0 May 16 09:58 new1 ---------T 2 root root 0 May 16 09:58 new11 ---------T 2 root root 0 May 16 09:58 new16 ---------T 2 root root 0 May 16 09:58 new17 ---------T 2 root root 0 May 16 09:58 new18 [root@cutlass ~]# ls /rhs/brick1/r1 d1 d13 d17 d20 d6 f11 f15 f4 f9 new15 new2 d10 d14 d18 d3 d7 f12 f16 f6 new1 new16 new5 d11 d15 d19 d4 d8 f13 f17 f7 new11 new17 new7 d12 d16 d2 d5 d9 f14 f3 f8 new14 new18 new9 [root@mia ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@mia ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 02:46 f5 [root@mia ~]# ls /rhs/brick1/r1 d1 d12 d15 d18 d20 d5 d8 f10 f2 new10 new19 new4 d10 d13 d16 d19 d3 d6 d9 f18 f20 new12 new20 new6 d11 d14 d17 d2 d4 d7 f1 f19 f5 new13 new3 new8 [root@fred ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@fred ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 07:32 f13 ---------T 2 root root 0 May 16 07:34 f5 [root@fred ~]# ls /rhs/brick1/r1 d1 d12 d15 d18 d20 d5 d8 f13 f5 new11 new18 d10 d13 d16 d19 d3 d6 d9 f17 f8 new16 d11 d14 d17 d2 d4 d7 f12 f4 new1 new17 [root@fan ~]# getfattr -d -m . -e hex /rhs/brick1/r1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 trusted.glusterfs.volume-id=0x7be735a3a5e9437086505841351bc419 [root@fan ~]# ls -l /rhs/brick1/r1 | grep T ---------T 2 root root 0 May 16 02:48 f16 ---------T 2 root root 0 May 16 02:48 f6 ---------T 2 root root 0 May 16 02:48 f7 ---------T 2 root root 0 May 16 02:48 f9 ---------T 2 root root 0 May 16 02:48 new14 ---------T 2 root root 0 May 16 02:48 new15 ---------T 2 root root 0 May 16 02:48 new2 ---------T 2 root root 0 May 16 02:48 new5 ---------T 2 root root 0 May 16 02:48 new7 ---------T 2 root root 0 May 16 02:48 new9 [root@fan ~]# ls /rhs/brick1/r1 d1 d12 d15 d18 d20 d5 d8 f14 f3 f7 new15 new7 d10 d13 d16 d19 d3 d6 d9 f15 f5 f9 new2 new9 d11 d14 d17 d2 d4 d7 f11 f16 f6 new14 new5 e [root@fan ~]# gluster volume status r1 Status of volume: r1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/r1 49155 Y6944 Brick cutlass.lab.eng.blr.redhat.com:/rhs/brick1/r1 49155 Y6920 Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/r1 49155 Y4183 NFS Server on localhost 2049 Y4330 NFS Server on 8675332f-f033-4800-aa2c-9291fc868fbf 2049 Y3665 NFS Server on ad0337ac-1756-4e04-aa6f-d9c46a24130d 2049 Y7094 NFS Server on c5154da1-be15-40e2-b5f3-9be6dadafd43 2049 Y7062 There are no active volume tasks Actual results: data loss in remove-brick In DHT 'remove-brick start' makes hash - layout 0000000000000000 for brick other than mentioned in command + no files are migrated from brick that will be removed + data written after start operation also goes to that brick so on commit it ends in data loss Expected results: Additional info:
[2013-05-19 11:12:40.218890] C [dht-selfheal.c:559:dht_get_layout_count] 0-shishir: brick2: sng1-client-2 <===subvolume being decommissioned [2013-05-19 11:12:40.219014] C [dht-selfheal.c:781:dht_selfheal_layout_new_directory] 0-shishir: gave fix: 0 - 1431655764 on sng1-client-0 for / [2013-05-19 11:12:40.219051] C [dht-selfheal.c:781:dht_selfheal_layout_new_directory] 0-shishir: gave fix: 1431655765 - 2863311529 on sng1-client-1 for / [2013-05-19 11:12:40.219075] C [dht-selfheal.c:781:dht_selfheal_layout_new_directory] 0-shishir: gave fix: 2863311530 - 4294967294 on sng1-client-3 for / <=== no layout given for subvolume sng1-client-2 (This is the correct op) [2013-05-19 11:12:40.219099] C [dht-selfheal.c:736:dht_fix_layout_of_directory] 0-shishir: after overlapt: 0 - 1431655764 on sng1-client-0 for / [2013-05-19 11:12:40.219122] C [dht-selfheal.c:736:dht_fix_layout_of_directory] 0-shishir: after overlapt: 0 - 0 on sng1-client-1 for / <==== layout zeroed out for sng1-client-1 (incorrect) [2013-05-19 11:12:40.219145] C [dht-selfheal.c:736:dht_fix_layout_of_directory] 0-shishir: after overlapt: 1431655765 - 2863311529 on sng1-client-2 for / <=== overlap op gives layout for subvolume sng1-client-2 (incorrect) [2013-05-19 11:12:40.219168] C [dht-selfheal.c:736:dht_fix_layout_of_directory] 0-shishir: after overlapt: 2863311530 - 4294967295 on sng1-client-3 for / [2013-05-19 11:12:40.219201] C [dht-selfheal.c:170:dht_selfheal_dir_xattr_persubvol] 0-shishir: setting hash range 0 - 1431655764 (type 0) on subvolume sng1-client-0 for / [2013-05-19 11:12:40.219544] C [dht-selfheal.c:170:dht_selfheal_dir_xattr_persubvol] 0-shishir: setting hash range 0 - 0 (type 0) on subvolume sng1-client-1 for / [2013-05-19 11:12:40.219677] C [dht-selfheal.c:170:dht_selfheal_dir_xattr_persubvol] 0-shishir: setting hash range 1431655765 - 2863311529 (type 0) on subvolume sng1-client-2 for / [2013-05-19 11:12:40.219996] C [dht-selfheal.c:170:dht_selfheal_dir_xattr_persubvol] 0-shishir: setting hash range 2863311530 - 4294967295 (type 0) on subvolume sng1-client-3 for / <=== layouts written to the disk. dht_selfheal_layout_maximize_overlap called in dht_fix_layout_of_directory over-writes the layouts for optimization, without considering decommissioned nodes, which leads to this problem of incorrect subvolume getting zero-ed out ranges. Suspect this is a regression caused by: commit 4f87fd0ae2ce629576ca5f647a99888d31a46815 Author: Anand Avati <avati> Date: Thu Aug 30 13:15:39 2012 -0700 dht: improve dht_fix_layout_of_directory for better re-assignment ..... Change-Id: I0cbbf3bfa334645728072d66aaaa80120d0b295f BUG: 853258 Signed-off-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/3883 Tested-by: Gluster Build System <jenkins.com>
verified on 3.4.0.9rhs-1.el6.x86_64 Working as per expectation, hence moving it to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html