Description of problem: Remove-brick start on a distributed-replicate volume of 6x2 configuration doestn't migrate any data Version-Release number of selected component (if applicable): [root@rhs1-bb rpm]# rpm -qa | grep gluster glusterfs-fuse-3.4.0.5rhs-1.el6rhs.x86_64 glusterfs-3.4.0.5rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.5rhs-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.5rhs-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1.created a 6x2 distributed-replicate volume 2.created 4 vms on this volume 3.selected first pair of bricks for removal gluster volume remove-brick <vol> brick1 brick2 start 4. upon checking the status it says completed within 2 seconds no migration happens Additional info: ================= 1. RHEVM hostname ============= buzz.lab.eng.blr.redhat.com 2. RHEL (hypervisor) hostname =============================== rhs-gp-srv4.lab.eng.blr.redhat.com 3. RHS nodes (hostname and IP address) ====================================== 10.70.37.76 10.70.37.59 10.70.37.133 10.70.37.134 4. RHS node from where the gluster commands were executed ==================================================== 10.70.37.76 6. Volume name ================= Volume Name: drep Type: Distributed-Replicate Volume ID: 678f4caa-84b5-4c0d-8df3-87479520ed14 Status: Started Number of Bricks: 6 x 2 = 12 7. Mount point on the clients ============================ rhs-gp-srv4.lab.eng.blr.redhat.com:/rhev/data-center/mnt/10.70.37.76:drep 8. Tentative date and time when the issue was hit ================================================= 2013-05-10 06:08 UTC [root@rhs1-bb rpm]# gluster v info Volume Name: drep Type: Distributed-Replicate Volume ID: 678f4caa-84b5-4c0d-8df3-87479520ed14 Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.37.76:/brick1/drr1 -->decommissioned Brick2: 10.70.37.59:/brick1/drr1 -->decommissioned Brick3: 10.70.37.133:/brick1/drr2 Brick4: 10.70.37.134:/brick1/drr2 Brick5: 10.70.37.76:/brick2/drr3 Brick6: 10.70.37.59:/brick2/drr3 Brick7: 10.70.37.133:/brick2/drr4 Brick8: 10.70.37.134:/brick2/drr4 Brick9: 10.70.37.76:/brick3/drr5 Brick10: 10.70.37.59:/brick3/drr5 Brick11: 10.70.37.133:/brick4/drr6 Brick12: 10.70.37.134:/brick4/drr6 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off command executed on 10.70.37.76 rebalance logs ============= 5-10 06:08:10.218311] I [client-handshake.c:450:client_set_lk_version_cbk] 0-drep-client-7: Server lk version = 1 [2013-05-10 06:08:10.218579] I [client-handshake.c:450:client_set_lk_version_cbk] 0-drep-client-11: Server lk version = 1 [2013-05-10 06:08:12.054031] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2013-05-10 06:08:12.058344] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2013-05-10 06:08:12.058481] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2013-05-10 06:08:12.059017] I [glusterfsd-mgmt.c:1544:mgmt_getspec_cbk] 0-drep-client-1: No change in volfile, continuing [2013-05-10 06:08:12.059084] I [glusterfsd-mgmt.c:1544:mgmt_getspec_cbk] 0-drep-client-5: No change in volfile, continuing [2013-05-10 06:08:12.059200] I [glusterfsd-mgmt.c:1544:mgmt_getspec_cbk] 0-drep-client-9: No change in volfile, continuing [2013-05-10 06:08:17.388607] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-drep-client-1: changing port to 49152 (from 0) [2013-05-10 06:08:17.388895] W [socket.c:515:__socket_rwv] 0-drep-client-1: readv on 10.70.37.59:24007 failed (No data available) [2013-05-10 06:08:17.400763] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-drep-client-5: changing port to 49153 (from 0) [2013-05-10 06:08:17.400848] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-drep-client-9: changing port to 49154 (from 0) [2013-05-10 06:08:17.400888] W [socket.c:515:__socket_rwv] 0-drep-client-5: readv on 10.70.37.59:24007 failed (No data available) [2013-05-10 06:08:17.407824] W [socket.c:515:__socket_rwv] 0-drep-client-9: readv on 10.70.37.59:24007 failed (No data available) [2013-05-10 06:08:17.415060] I [client-handshake.c:1658:select_server_supported_programs] 0-drep-client-5: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-05-10 06:08:17.415910] I [client-handshake.c:1658:select_server_supported_programs] 0-drep-client-9: Using Program GlusterFS 3.3 fuse volfile has decommissioned entry =================================== volume drep-dht type cluster/distribute option decommissioned-bricks drep-replicate-0 subvolumes drep-replicate-0 drep-replicate-1 drep-replicate-2 drep-replicate-3 drep-replicate-4 drep-replicate-5 end-volume attaching the sosreport
From the logs, it looks like no files needed to be migrated in this instance: rhs1,rhs2- [2013-05-10 06:08:17.629345] I [dht-common.c:2563:dht_setxattr] 0-drep-dht: fixing the layout of / [2013-05-10 06:08:17.647246] I [dht-rebalance.c:1106:gf_defrag_migrate_data] 0-drep-dht: migrate data called on / [2013-05-10 06:08:17.659776] I [dht-rebalance.c:1311:gf_defrag_migrate_data] 0-drep-dht: Migration operation on dir / took 0.01 secs [2013-05-10 06:08:17.672548] I [dht-rebalance.c:1733:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 0.00 secs [2013-05-10 06:08:17.672574] I [dht-rebalance.c:1736:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 0 rhs4: [2013-05-10 06:08:19.879331] I [dht-rebalance.c:1733:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 2.00 secs [2013-05-10 06:08:19.879371] I [dht-rebalance.c:1736:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 21, failures: 0 Can you please prove a ls -l output(before and after migration) from the brick that was being decommissioned which points to no files being migrated from subvolume-0 even after remove-brick command was issued.
(In reply to comment #5) > From the logs, it looks like no files needed to be migrated in this instance: > > rhs1,rhs2- > [2013-05-10 06:08:17.629345] I [dht-common.c:2563:dht_setxattr] 0-drep-dht: > fixing the layout of / > [2013-05-10 06:08:17.647246] I [dht-rebalance.c:1106:gf_defrag_migrate_data] > 0-drep-dht: migrate data called on / > [2013-05-10 06:08:17.659776] I [dht-rebalance.c:1311:gf_defrag_migrate_data] > 0-drep-dht: Migration operation on dir / took 0.01 secs > > [2013-05-10 06:08:17.672548] I [dht-rebalance.c:1733:gf_defrag_status_get] > 0-glusterfs: Rebalance is completed. Time taken is 0.00 secs > [2013-05-10 06:08:17.672574] I [dht-rebalance.c:1736:gf_defrag_status_get] > 0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 0 > > rhs4: > [2013-05-10 06:08:19.879331] I [dht-rebalance.c:1733:gf_defrag_status_get] > 0-glusterfs: Rebalance is completed. Time taken is 2.00 secs > [2013-05-10 06:08:19.879371] I [dht-rebalance.c:1736:gf_defrag_status_get] > 0-glusterfs: Files migrated: 0, size: 0, lookups: 21, failures: 0 > > Can you please prove a ls -l output(before and after migration) from the > brick that was being decommissioned which points to no files being migrated > from subvolume-0 even after remove-brick command was issued. 1. RHEVM hostname ============= buzz.lab.eng.blr.redhat.com 2. RHEL (hypervisor) hostname =============================== rhs-gp-srv4.lab.eng.blr.redhat.com 3. RHS nodes (hostname and IP address) ====================================== 10.70.37.76 10.70.37.59 10.70.37.133 10.70.37.134 4. RHS node from where the gluster commands were executed ==================================================== 10.70.37.76 5. Mount point on the clients ============================ rhs-gp-srv4.lab.eng.blr.redhat.com:/rhev/data-center/mnt/10.70.37.76:drep volume info =========== Volume Name: drep Type: Distributed-Replicate Volume ID: 678f4caa-84b5-4c0d-8df3-87479520ed14 Status: Started Number of Bricks: 16 x 2 = 32 Transport-type: tcp Bricks: Brick1: 10.70.37.76:/brick1/drr1 Brick2: 10.70.37.59:/brick1/drr1 Brick3: 10.70.37.133:/brick1/drr2 Brick4: 10.70.37.134:/brick1/drr2 Brick5: 10.70.37.76:/brick2/drr3 Brick6: 10.70.37.59:/brick2/drr3 Brick7: 10.70.37.133:/brick2/drr4 Brick8: 10.70.37.134:/brick2/drr4 Brick9: 10.70.37.76:/brick3/drr5 Brick10: 10.70.37.59:/brick3/drr5 Brick11: 10.70.37.133:/brick4/drr6 Brick12: 10.70.37.134:/brick4/drr6 Brick13: 10.70.37.133:/brick5/drr9 Brick14: 10.70.37.134:/brick5/drr9 Brick15: 10.70.37.76:/brick1/drr10 Brick16: 10.70.37.59:/brick1/drr10 Brick17: 10.70.37.133:/brick1/drr11 Brick18: 10.70.37.134:/brick1/drr11 Brick19: 10.70.37.76:/brick5/drr12 Brick20: 10.70.37.59:/brick5/drr12 Brick21: 10.70.37.133:/brick6/drr13 Brick22: 10.70.37.134:/brick6/drr13 Brick23: 10.70.37.133:/brick7/drr14 Brick24: 10.70.37.134:/brick7/drr14 Brick25: 10.70.37.133:/brick6/drr15 Brick26: 10.70.37.134:/brick6/drr15 Brick27: 10.70.37.133:/brick6/drr7 Brick28: 10.70.37.134:/brick6/drr7 Brick29: 10.70.37.76:/brick6/drr18 Brick30: 10.70.37.59:/brick6/drr18 Brick31: 10.70.37.76:/brick6/drr19 ===> decommissioned bricks Brick32: 10.70.37.59:/brick6/drr19 ===> Options Reconfigured: performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable storage.owner-uid: 36 storage.owner-gid: 36 Before remove-brick ================== [root@rhs1-bb drr19]# ls -l total 0 drwxr-xr-x 5 vdsm kvm 45 May 16 15:22 e8a9faf9-439a-47e6-a38c-27d23b8c976f /brick6/drr19 [root@rhs1-bb drr19]# du -sh * 4.0G e8a9faf9-439a-47e6-a38c-27d23b8c976f [root@rhs1-bb drr19]# pwd /brick6/drr19 [root@rhs1-bb drr19]# du -sh . [root@rhs1-bb drr19]# du -h * 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/dom_md 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/a3329d3c-0842-4710-bd2a-47335400a94f 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/a4222b66-18f7-4182-8893-9cbc1be00e18 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/fdf2a3ed-f477-4445-8b5c-098270778aed 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/cf2a98a9-ec3c-45a1-a3bc-ce90b5585e97 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/158c117e-71f4-43a2-8faf-a215d39363ce 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/8bb569b9-b7ad-4724-b14d-1df02f7d86bf 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/tasks 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master 1.0M e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/32697760-8ae2-424e-87f8-296f6827ae3a 2.1G e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/2d7b0c6e-ec09-4226-9097-34fc57dbc85d 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/7e596c1c-82ac-410c-b64f-c7b1199e274c 1.9G e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/44671929-6924-48a4-a86f-bd253e4a8303 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/75cc0479-be7f-4310-bc30-f97311d7242b 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/b4e46d10-64dd-48b7-b4f2-8a9354c8b40e 4.0G e8a9faf9-439a-47e6-a38c-27d23b8c976f/images 4.0G e8a9faf9-439a-47e6-a38c-27d23b8c976f [root@rhs1-bb drr19]# getfattr -d -m . -e hex . # file: . trusted.afr.drep-client-30=0x000000000000000000000000 trusted.afr.drep-client-31=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000bbbbbbbbcccccccb trusted.glusterfs.volume-id=0x678f4caa84b54c0d8df387479520ed14 [root@rhs1-bb drr19]# gluster v remove-brick drep 10.70.37.76:/brick6/drr19 10.70.37.59:/brick6/drr19 start volume remove-brick start: success ID: 9ac04502-34f2-4523-8035-bafeb345d0c5 [root@rhs1-bb drr19]# gluster v remove-brick drep 10.70.37.76:/brick6/drr19 10.70.37.59:/brick6/drr19 status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 completed 0.00 localhost 0 0Bytes 0 0 completed 0.00 localhost 0 0Bytes 0 0 completed 0.00 localhost 0 0Bytes 0 0 completed 0.00 10.70.37.134 0 0Bytes 0 0 not started 0.00 localhost is 10.70.37.76 output from another peer (where we have replica pair) : 10.70.37.59 [root@rhs4-bb brick6]# gluster v remove-brick drep 10.70.37.76:/brick6/drr19 10.70.37.59:/brick6/drr19 status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 29 0 completed 2.00 localhost 0 0Bytes 29 0 completed 2.00 localhost 0 0Bytes 29 0 completed 2.00 localhost 0 0Bytes 29 0 completed 2.00 10.70.37.134 0 0Bytes 0 0 not started 0.00 After remove-brick start ================== [root@rhs1-bb drr19]# getfattr -d -m . -e hex `pwd` getfattr: Removing leading '/' from absolute path names # file: brick6/drr19 trusted.afr.drep-client-30=0x000000000000000000000000 trusted.afr.drep-client-31=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000000000000 trusted.glusterfs.volume-id=0x678f4caa84b54c0d8df387479520ed14 [root@rhs1-bb drr19]# ls -l total 0 drwxr-xr-x 5 vdsm kvm 45 May 16 15:22 e8a9faf9-439a-47e6-a38c-27d23b8c976f [root@rhs1-bb drr19]# du -h * 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/dom_md 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/a3329d3c-0842-4710-bd2a-47335400a94f 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/a4222b66-18f7-4182-8893-9cbc1be00e18 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/fdf2a3ed-f477-4445-8b5c-098270778aed 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/cf2a98a9-ec3c-45a1-a3bc-ce90b5585e97 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/158c117e-71f4-43a2-8faf-a215d39363ce 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms/8bb569b9-b7ad-4724-b14d-1df02f7d86bf 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/vms 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master/tasks 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/master 1.0M e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/32697760-8ae2-424e-87f8-296f6827ae3a 2.1G e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/2d7b0c6e-ec09-4226-9097-34fc57dbc85d 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/7e596c1c-82ac-410c-b64f-c7b1199e274c 1.9G e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/44671929-6924-48a4-a86f-bd253e4a8303 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/75cc0479-be7f-4310-bc30-f97311d7242b 0 e8a9faf9-439a-47e6-a38c-27d23b8c976f/images/b4e46d10-64dd-48b7-b4f2-8a9354c8b40e 4.0G e8a9faf9-439a-47e6-a38c-27d23b8c976f/images 4.0G e8a9faf9-439a-47e6-a38c-27d23b8c976f attaching the latest sosreport
Looks like a duplicate of bug 963896 (fix merged downstream), where in-correct brick/subvolume was marked as being decommissioned.
Verified that the remove-brick operation now migrates the data as expected. Test Environment versions: RHS - glusterfs-server-3.4.0.21rhs-1.el6rhs.x86_64 6X2 Distribute-Replicate Volume used a Storage Domain Red Hat Enterprise Virtualization Manager Version: 3.2.2-0.41.el6ev RHEVH-6.4 Hypervisor with glusterfs-3.4.0.21rhs-1.el6_4.x86_64 RHEL-6.4 Hypervisor with glusterfs-3.4.0.21rhs-1.el6_4.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html