Bug 1278390
| Summary: | Data Tiering:Regression:Detach tier commit is passing when detach tier is in progress | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Vivek Agarwal <vagarwal> |
| Component: | tier | Assignee: | Dan Lambright <dlambrig> |
| Status: | CLOSED ERRATA | QA Contact: | Bhaskarakiran <byarlaga> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | rhgs-3.1 | CC: | amukherj, asrivast, byarlaga, dlambrig, mzywusko, nchilaka, rhs-bugs, sankarshan, storage-qa-internal |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | RHGS 3.1.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-3.7.5-7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1264441 | Environment: | |
| Last Closed: | 2016-03-01 05:52:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1264441, 1279637 | ||
| Bug Blocks: | 1260783, 1260923 | ||
|
Description
Vivek Agarwal
2015-11-05 11:39:09 UTC
Check this on 3.7.5-6 build. If detach-tier is done from the node where the scan is complete, commit is successful though other node is in progress. If detach-tier is done from the node where scan is in progress, correct error message is shown and detach-tier fails. Moving back the bug.
[root@transformers yum.repos.d]# gluster v detach-tier vol1 status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 0 0Bytes 116983 0 0 completed 600.00
ninja 0 0Bytes 0 0 0 in progress 0.00
[root@transformers yum.repos.d]# gluster v detach-tier vol1 commit
Removing tier can result in data loss. Do you want to Continue? (y/n) y
volume detach-tier commit: success
Check the detached bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick.
[root@transformers yum.repos.d]#
Bhaskar, Some back ground before I get into the real business on this: Detach start & commit follows sync-op framework in GlusterD where in every phase GlusterD accumulates responses of all the nodes and then decide whether to proceed or fail. The validation what we are talking about here is in staging phase and even if in the local node if detach operation is complete where as in other nodes its not, as a whole the command would fail, that's the guarantee what sync-op framework brings in. I tried to simulate this problem with setting the rebalance status to complete on originator node (where the cli command is run) and rebalance status to started in other nodes, but couldn't reproduce it as CLI throws a error message in that case. Here are the steps I did before taking the gdb control. 1. Create a dist volume (one brick only) in a 2 node cluster. 2. Performed attach-tier with one more brick 3. detach start 4. detach commit Could you provide the steps you performed to reproduce this issue? I am not sure if it gets hit with one brick. The setup i configured is 8+4 ec volume and 2x2 dist-rep tier volume for it. 1. Create 8+4 ec volume and attach 2x2 dist-rep tier volume 2. Start linux untar and some parallel writes 3. Start detach-tier 4. Once the status shows as completed on any of the node, issue detach-tier commit command and it passes. If the commit is run on any of the node which has in-progress status, it would fail with correct message. 4. Once the status shows as completed on any of the node, issue detach-tier commit command form that node and it passes. (In reply to Bhaskarakiran from comment #5) > I am not sure if it gets hit with one brick. The setup i configured is 8+4 > ec volume and 2x2 dist-rep tier volume for it. Well, I do not think the number of bricks matter here as the validation is in the staging and we do not really check how many number of bricks are configured for the volume at this point. However, I'll try to execute the same steps and see whether the issue persists. > > 1. Create 8+4 ec volume and attach 2x2 dist-rep tier volume > 2. Start linux untar and some parallel writes > 3. Start detach-tier > 4. Once the status shows as completed on any of the node, issue detach-tier > commit command and it passes. Were you sure that at the time you performed a detach commit, other nodes haven't finished the ongoing scan? How did you guarantee that? It could very well be that you executed the command on other nodes to check the status and at that time it showed that detach is in progress but by the time you executed detach commit, the scan was completed? > > If the commit is run on any of the node which has in-progress status, it > would fail with correct message. To check this behaviour i executed status and commit commands one after another without any delay. I again tried this on 3.7.5-6 build and did see the same behaviour.
[root@transformers ~]# gluster v status vol1
Status of volume: vol1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick ninja:/rhs/brick2/vol1-tier4 49158 0 Y 29569
Brick vertigo:/rhs/brick2/vol1-tier3 49158 0 Y 849
Brick ninja:/rhs/brick1/vol1-tier2 49157 0 Y 29551
Brick vertigo:/rhs/brick1/vol1-tier1 49157 0 Y 831
Cold Bricks:
Brick transformers:/rhs/brick1/b1 49152 0 Y 40735
Brick interstellar:/rhs/brick1/b2 49152 0 Y 30530
Brick transformers:/rhs/brick2/b3 49153 0 Y 40753
Brick interstellar:/rhs/brick2/b4 49153 0 Y 30548
Brick transformers:/rhs/brick3/b5 49154 0 Y 40771
Brick interstellar:/rhs/brick3/b6 49154 0 Y 30566
Brick transformers:/rhs/brick4/b7 49155 0 Y 40789
Brick interstellar:/rhs/brick4/b8 49155 0 Y 30584
Brick transformers:/rhs/brick5/b9 49156 0 Y 40807
Brick interstellar:/rhs/brick5/b10 49156 0 Y 30602
Brick transformers:/rhs/brick6/b11 49157 0 Y 40825
Brick interstellar:/rhs/brick6/b12 49157 0 Y 30622
Snapshot Daemon on localhost 49158 0 Y 41271
NFS Server on localhost 2049 0 Y 50798
Self-heal Daemon on localhost N/A N/A Y 50806
Quota Daemon on localhost N/A N/A Y 50814
Snapshot Daemon on vertigo 49152 0 Y 31813
NFS Server on vertigo 2049 0 Y 868
Self-heal Daemon on vertigo N/A N/A Y 876
Quota Daemon on vertigo N/A N/A Y 885
Snapshot Daemon on ninja 49152 0 Y 26163
NFS Server on ninja 2049 0 Y 29588
Self-heal Daemon on ninja N/A N/A Y 29596
Quota Daemon on ninja N/A N/A Y 29604
Snapshot Daemon on interstellar.lab.eng.blr
.redhat.com 49158 0 Y 30712
NFS Server on interstellar.lab.eng.blr.redh
at.com 2049 0 Y 40062
Self-heal Daemon on interstellar.lab.eng.bl
r.redhat.com N/A N/A Y 40070
Quota Daemon on interstellar.lab.eng.blr.re
dhat.com N/A N/A Y 40078
Task Status of Volume vol1
------------------------------------------------------------------------------
Task : Detach tier
ID : 5cc013cd-8002-4716-8d52-469f85053afd
Status : in progress
[root@transformers ~]#
[root@transformers ~]# gluster v detach-tier vol1 status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 0 0Bytes 107 0 0 in progress 8785.00
ninja 0 0Bytes 0 0 0 in progress 0.00
vertigo 0 0Bytes 0 0 0 in progress 0.00
[root@transformers ~]# gluster v detach-tier vol1 status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 0 0Bytes 110 0 0 in progress 8859.00
ninja 0 0Bytes 0 0 0 in progress 0.00
vertigo 0 0Bytes 0 0 0 in progress 0.00
[root@transformers ~]# gluster v detach-tier vol1 commi
Usage: volume detach-tier <VOLNAME> <start|stop|status|commit|force>
Tier command failed
[root@transformers ~]# gluster v detach-tier vol1 commit
Removing tier can result in data loss. Do you want to Continue? (y/n) y
volume detach-tier commit: success
Check the detached bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick.
[root@transformers ~]#
volume status: [root@transformers ~]# gluster v status vol1 Status of volume: vol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick transformers:/rhs/brick1/b1 49152 0 Y 40735 Brick interstellar:/rhs/brick1/b2 49152 0 Y 30530 Brick transformers:/rhs/brick2/b3 49153 0 Y 40753 Brick interstellar:/rhs/brick2/b4 49153 0 Y 30548 Brick transformers:/rhs/brick3/b5 49154 0 Y 40771 Brick interstellar:/rhs/brick3/b6 49154 0 Y 30566 Brick transformers:/rhs/brick4/b7 49155 0 Y 40789 Brick interstellar:/rhs/brick4/b8 49155 0 Y 30584 Brick transformers:/rhs/brick5/b9 49156 0 Y 40807 Brick interstellar:/rhs/brick5/b10 49156 0 Y 30602 Brick transformers:/rhs/brick6/b11 49157 0 Y 40825 Brick interstellar:/rhs/brick6/b12 49157 0 Y 30622 Snapshot Daemon on localhost 49158 0 Y 41271 NFS Server on localhost 2049 0 Y 51071 Self-heal Daemon on localhost N/A N/A Y 51079 Quota Daemon on localhost N/A N/A Y 51099 Snapshot Daemon on vertigo 49152 0 Y 31813 NFS Server on vertigo 2049 0 Y 3099 Self-heal Daemon on vertigo N/A N/A Y 3107 Quota Daemon on vertigo N/A N/A Y 3115 Snapshot Daemon on ninja 49152 0 Y 26163 NFS Server on ninja 2049 0 Y 6418 Self-heal Daemon on ninja N/A N/A Y 6426 Quota Daemon on ninja N/A N/A Y 6434 Snapshot Daemon on interstellar.lab.eng.blr .redhat.com 49158 0 Y 30712 NFS Server on interstellar.lab.eng.blr.redh at.com 2049 0 Y 40202 Self-heal Daemon on interstellar.lab.eng.bl r.redhat.com N/A N/A Y 40210 Quota Daemon on interstellar.lab.eng.blr.re dhat.com N/A N/A Y 40220 Task Status of Volume vol1 ------------------------------------------------------------------------------ There are no active volume tasks [root@transformers ~]# So here is the latest update on the bug: QE tested this behaviour with 3.7.5-6 where the fix https://code.engineering.redhat.com/gerrit/61589 was not pulled in. I am not sure whether this was supposed to be tested in latest bits or previous bits. The behaviour is reproducible in 3.7.5.6 but not in 3.7.5.7. A fixed in version could have saved us getting into all these confusions. Hence moving it to ON_QA with the fixed in version. checked on 3.7.5-7 and it works correctly. Marking as verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html |