+++ This bug was initially created as a clone of Bug #1264441 +++ when detach tier is in progress, detach tier commit must not be allowed. This was working well previously. This means it is a regression, Could be due to fix for bz#1259694 glusterfs-server-3.7.4-0.33.git1d02d4b.el7.centos.x86_64 Steps to Reproduce: ==================== 1.create a volume, start it and attac tier to vol and create lot of data on hot tier 2.now issue a detach-tier <vname> start 3. While it is in progress,issue a detach-tier <vname> commit --- Additional comment from nchilaka on 2015-09-18 09:21:39 EDT --- this wasfound during automation of qe scripts --- Additional comment from Dan Lambright on 2015-09-30 15:36:21 EDT --- have done a RCA on this and will begin working on a fix. --- Additional comment from Dan Lambright on 2015-10-01 11:14:45 EDT --- submitted fix 12272 --- Additional comment from Vijay Bellur on 2015-10-28 12:44:33 EDT --- REVIEW: http://review.gluster.org/12272 (cluster/tier: migration daemon incorrectly signals detach tier done) posted (#3) for review on release-3.7 by Dan Lambright (dlambrig) --- Additional comment from Vijay Bellur on 2015-10-28 16:31:24 EDT --- REVIEW: http://review.gluster.org/12272 (cluster/tier: Disallow detach commit when detach in progress) posted (#4) for review on release-3.7 by Dan Lambright (dlambrig)
Check this on 3.7.5-6 build. If detach-tier is done from the node where the scan is complete, commit is successful though other node is in progress. If detach-tier is done from the node where scan is in progress, correct error message is shown and detach-tier fails. Moving back the bug. [root@transformers yum.repos.d]# gluster v detach-tier vol1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 116983 0 0 completed 600.00 ninja 0 0Bytes 0 0 0 in progress 0.00 [root@transformers yum.repos.d]# gluster v detach-tier vol1 commit Removing tier can result in data loss. Do you want to Continue? (y/n) y volume detach-tier commit: success Check the detached bricks to ensure all files are migrated. If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. [root@transformers yum.repos.d]#
Bhaskar, Some back ground before I get into the real business on this: Detach start & commit follows sync-op framework in GlusterD where in every phase GlusterD accumulates responses of all the nodes and then decide whether to proceed or fail. The validation what we are talking about here is in staging phase and even if in the local node if detach operation is complete where as in other nodes its not, as a whole the command would fail, that's the guarantee what sync-op framework brings in. I tried to simulate this problem with setting the rebalance status to complete on originator node (where the cli command is run) and rebalance status to started in other nodes, but couldn't reproduce it as CLI throws a error message in that case. Here are the steps I did before taking the gdb control. 1. Create a dist volume (one brick only) in a 2 node cluster. 2. Performed attach-tier with one more brick 3. detach start 4. detach commit Could you provide the steps you performed to reproduce this issue?
I am not sure if it gets hit with one brick. The setup i configured is 8+4 ec volume and 2x2 dist-rep tier volume for it. 1. Create 8+4 ec volume and attach 2x2 dist-rep tier volume 2. Start linux untar and some parallel writes 3. Start detach-tier 4. Once the status shows as completed on any of the node, issue detach-tier commit command and it passes. If the commit is run on any of the node which has in-progress status, it would fail with correct message.
4. Once the status shows as completed on any of the node, issue detach-tier commit command form that node and it passes.
(In reply to Bhaskarakiran from comment #5) > I am not sure if it gets hit with one brick. The setup i configured is 8+4 > ec volume and 2x2 dist-rep tier volume for it. Well, I do not think the number of bricks matter here as the validation is in the staging and we do not really check how many number of bricks are configured for the volume at this point. However, I'll try to execute the same steps and see whether the issue persists. > > 1. Create 8+4 ec volume and attach 2x2 dist-rep tier volume > 2. Start linux untar and some parallel writes > 3. Start detach-tier > 4. Once the status shows as completed on any of the node, issue detach-tier > commit command and it passes. Were you sure that at the time you performed a detach commit, other nodes haven't finished the ongoing scan? How did you guarantee that? It could very well be that you executed the command on other nodes to check the status and at that time it showed that detach is in progress but by the time you executed detach commit, the scan was completed? > > If the commit is run on any of the node which has in-progress status, it > would fail with correct message.
To check this behaviour i executed status and commit commands one after another without any delay. I again tried this on 3.7.5-6 build and did see the same behaviour. [root@transformers ~]# gluster v status vol1 Status of volume: vol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick ninja:/rhs/brick2/vol1-tier4 49158 0 Y 29569 Brick vertigo:/rhs/brick2/vol1-tier3 49158 0 Y 849 Brick ninja:/rhs/brick1/vol1-tier2 49157 0 Y 29551 Brick vertigo:/rhs/brick1/vol1-tier1 49157 0 Y 831 Cold Bricks: Brick transformers:/rhs/brick1/b1 49152 0 Y 40735 Brick interstellar:/rhs/brick1/b2 49152 0 Y 30530 Brick transformers:/rhs/brick2/b3 49153 0 Y 40753 Brick interstellar:/rhs/brick2/b4 49153 0 Y 30548 Brick transformers:/rhs/brick3/b5 49154 0 Y 40771 Brick interstellar:/rhs/brick3/b6 49154 0 Y 30566 Brick transformers:/rhs/brick4/b7 49155 0 Y 40789 Brick interstellar:/rhs/brick4/b8 49155 0 Y 30584 Brick transformers:/rhs/brick5/b9 49156 0 Y 40807 Brick interstellar:/rhs/brick5/b10 49156 0 Y 30602 Brick transformers:/rhs/brick6/b11 49157 0 Y 40825 Brick interstellar:/rhs/brick6/b12 49157 0 Y 30622 Snapshot Daemon on localhost 49158 0 Y 41271 NFS Server on localhost 2049 0 Y 50798 Self-heal Daemon on localhost N/A N/A Y 50806 Quota Daemon on localhost N/A N/A Y 50814 Snapshot Daemon on vertigo 49152 0 Y 31813 NFS Server on vertigo 2049 0 Y 868 Self-heal Daemon on vertigo N/A N/A Y 876 Quota Daemon on vertigo N/A N/A Y 885 Snapshot Daemon on ninja 49152 0 Y 26163 NFS Server on ninja 2049 0 Y 29588 Self-heal Daemon on ninja N/A N/A Y 29596 Quota Daemon on ninja N/A N/A Y 29604 Snapshot Daemon on interstellar.lab.eng.blr .redhat.com 49158 0 Y 30712 NFS Server on interstellar.lab.eng.blr.redh at.com 2049 0 Y 40062 Self-heal Daemon on interstellar.lab.eng.bl r.redhat.com N/A N/A Y 40070 Quota Daemon on interstellar.lab.eng.blr.re dhat.com N/A N/A Y 40078 Task Status of Volume vol1 ------------------------------------------------------------------------------ Task : Detach tier ID : 5cc013cd-8002-4716-8d52-469f85053afd Status : in progress [root@transformers ~]# [root@transformers ~]# gluster v detach-tier vol1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 107 0 0 in progress 8785.00 ninja 0 0Bytes 0 0 0 in progress 0.00 vertigo 0 0Bytes 0 0 0 in progress 0.00 [root@transformers ~]# gluster v detach-tier vol1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 110 0 0 in progress 8859.00 ninja 0 0Bytes 0 0 0 in progress 0.00 vertigo 0 0Bytes 0 0 0 in progress 0.00 [root@transformers ~]# gluster v detach-tier vol1 commi Usage: volume detach-tier <VOLNAME> <start|stop|status|commit|force> Tier command failed [root@transformers ~]# gluster v detach-tier vol1 commit Removing tier can result in data loss. Do you want to Continue? (y/n) y volume detach-tier commit: success Check the detached bricks to ensure all files are migrated. If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. [root@transformers ~]#
volume status: [root@transformers ~]# gluster v status vol1 Status of volume: vol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick transformers:/rhs/brick1/b1 49152 0 Y 40735 Brick interstellar:/rhs/brick1/b2 49152 0 Y 30530 Brick transformers:/rhs/brick2/b3 49153 0 Y 40753 Brick interstellar:/rhs/brick2/b4 49153 0 Y 30548 Brick transformers:/rhs/brick3/b5 49154 0 Y 40771 Brick interstellar:/rhs/brick3/b6 49154 0 Y 30566 Brick transformers:/rhs/brick4/b7 49155 0 Y 40789 Brick interstellar:/rhs/brick4/b8 49155 0 Y 30584 Brick transformers:/rhs/brick5/b9 49156 0 Y 40807 Brick interstellar:/rhs/brick5/b10 49156 0 Y 30602 Brick transformers:/rhs/brick6/b11 49157 0 Y 40825 Brick interstellar:/rhs/brick6/b12 49157 0 Y 30622 Snapshot Daemon on localhost 49158 0 Y 41271 NFS Server on localhost 2049 0 Y 51071 Self-heal Daemon on localhost N/A N/A Y 51079 Quota Daemon on localhost N/A N/A Y 51099 Snapshot Daemon on vertigo 49152 0 Y 31813 NFS Server on vertigo 2049 0 Y 3099 Self-heal Daemon on vertigo N/A N/A Y 3107 Quota Daemon on vertigo N/A N/A Y 3115 Snapshot Daemon on ninja 49152 0 Y 26163 NFS Server on ninja 2049 0 Y 6418 Self-heal Daemon on ninja N/A N/A Y 6426 Quota Daemon on ninja N/A N/A Y 6434 Snapshot Daemon on interstellar.lab.eng.blr .redhat.com 49158 0 Y 30712 NFS Server on interstellar.lab.eng.blr.redh at.com 2049 0 Y 40202 Self-heal Daemon on interstellar.lab.eng.bl r.redhat.com N/A N/A Y 40210 Quota Daemon on interstellar.lab.eng.blr.re dhat.com N/A N/A Y 40220 Task Status of Volume vol1 ------------------------------------------------------------------------------ There are no active volume tasks [root@transformers ~]#
So here is the latest update on the bug: QE tested this behaviour with 3.7.5-6 where the fix https://code.engineering.redhat.com/gerrit/61589 was not pulled in. I am not sure whether this was supposed to be tested in latest bits or previous bits. The behaviour is reproducible in 3.7.5.6 but not in 3.7.5.7. A fixed in version could have saved us getting into all these confusions. Hence moving it to ON_QA with the fixed in version.
checked on 3.7.5-7 and it works correctly. Marking as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html