Bug 1303125
| Summary: | After GlusterD restart, Remove-brick commit happening even though data migration not completed. | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> | |
| Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | |
| Status: | CLOSED ERRATA | QA Contact: | Byreddy <bsrirama> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | rhgs-3.1 | CC: | amukherj, asrivast, byarlaga, lbailey, rcyriac, rhs-bugs, sankarshan, smohan, storage-qa-internal, vbellur | |
| Target Milestone: | --- | Keywords: | ZStream | |
| Target Release: | RHGS 3.1.3 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.7.9-1 | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, when glusterd was restarted on a node while rebalance was still in progress, remove-brick commits succeeded even though rebalance was not yet complete. This resulted in data loss. This update ensures that remove-brick commits fail with appropriate log messages when rebalance is in progress.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1303269 (view as bug list) | Environment: | ||
| Last Closed: | 2016-06-23 05:05:52 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1268895, 1299184, 1302968, 1303269, 1310972 | |||
|
Description
Byreddy
2016-01-29 15:52:08 UTC
RCA: remove brick operation when in progress is determined by a flag 'decommission_is_in_progress' in volume. This flag doesn't get persisted though and because of which on a glusterd restart the information is lost and all such validations of blocking remove brick commit when rebalance is in progress is skipped through. I agree with QE that this is a potential data loss situation and should be considered as *blocker*. I've posted a fix in upstream http://review.gluster.org/#/c/13323/ Workaround for this bug is that after restarting glusterd and before performing remove-brick commit user should check remove-brick status. If the remove brick status is in progress then user should not perform remove-brick commit operation. I don't think #comment 5 is valid until and unless we pull in https://bugzilla.redhat.com/show_bug.cgi?id=1302968 . On a glusterd restart as per the current code it can never connect to the ongoing rebalance daemon which means the statistics are stale. So executing remove brick status after glusterd restart can not indicate the rebalance completion status of all the nodes with the current code. Yes Atin, #comment 5 is valid only when https://bugzilla.redhat.com/show_bug.cgi?id=1302968 pulled in. Looks good now :) The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified. Verified this bug using the build "glusterfs-3.7.9-1" Repeated the reproducing steps mentioned in description section, Fix is working properly, it's not allowing to commit the remove-brick operation when data migration is in progress after glusterd restart. and rebalance will continue after glusterd restart as well. With these details, moving this bug to next state, LGTM :) but why the flag is moved to '?' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |