Bug 1501885
Summary: | "replace-brick" operation on a distribute volume kills all the glustershd daemon process in a cluster | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Vijay Avuthu <vavuthu> |
Component: | replicate | Assignee: | Atin Mukherjee <amukherj> |
Status: | CLOSED ERRATA | QA Contact: | Vijay Avuthu <vavuthu> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.3 | CC: | amukherj, rhinduja, rhs-bugs, sheggodu, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | RHGS 3.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | rebase | ||
Fixed In Version: | glusterfs-3.12.2-1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-04 06:36:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1503134 |
Description
Vijay Avuthu
2017-10-13 12:16:16 UTC
This is a negative test case as replace brick is supposed to be done only on bricks of replica subvol. replacing brick for plain distribute can lead to data loss. That said, what I think is happening is that as a part of replace brick, glusterd kills the selfheal daemon in the assumption that after replace brick is successful, it will restart shd with the new graph (containing the new brick path) but probably is not doing it because it is a distribute volume. Need to check in the code though. In RHGS-3.4 replace brick will not be allowed for dist only bolumes. I’ve a patch in 3.12 branch now for the same. We can actually target this bug for 3.4.0 then? s/bolumes/volumes https://review.gluster.org/18334 is the patch (In reply to Atin Mukherjee from comment #3) > In RHGS-3.4 replace brick will not be allowed for dist only bolumes. I’ve a > patch in 3.12 branch now for the same. We can actually target this bug for > 3.4.0 then? Makes sense. Feel free to assign the bug to yourself and mark the bug for 3.4.0 in the internal whiteboard. Update: ========== verified the below scenario. 1. Created replicate and distribute volume 2. tried to do replace brick. ( expected as per patch mentioned in comment 5 ) # gluster vol replace-brick dist 10.70.35.61:/bricks/brick1/b1 10.70.35.61:/bricks/brick1/b1_1 commit force volume replace-brick: failed: replace-brick is not permitted on distribute only volumes. Please use add-brick and remove-brick operations instead. 3. check the glustershd pid # ps -eaf | grep -i glustershd root 25630 1 0 05:29 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/ed8ce427ce1f40d0b8a8c3c5b162e9b7.socket --xlator-option *replicate*.node-uuid=be801d54-d39e-40cb-967c-0987cfd4f5f7 root 25871 20220 0 05:32 pts/0 00:00:00 grep --color=auto -i glustershd 4. removed the brick from distribute volume ( commit after rebalance is completed) # gluster vol remove-brick dist 10.70.35.61:/bricks/brick1/b1 start volume remove-brick start: success ID: 4ede625b-1643-4100-b89b-27d322e63856 5. Add brick to distribute volume # gluster vol add-brick dist 10.70.35.61:/bricks/brick1/b1_new volume add-brick: success 6. check the glustershd pid # ps -eaf | grep -i glustershd | grep -v grep root 25630 1 0 05:29 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/ed8ce427ce1f40d0b8a8c3c5b162e9b7.socket --xlator-option *replicate*.node-uuid=be801d54-d39e-40cb-967c-0987cfd4f5f7 Changing status to Verfied. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |