Bug 1063830
Summary: | remove-brick/add-brick : remove-brick or add-brick can lead to data loss if there are pending self-heals on any of the subvolumes. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | spandura |
Component: | replicate | Assignee: | Ravishankar N <ravishankar> |
Status: | CLOSED EOL | QA Contact: | storage-qa-internal <storage-qa-internal> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.1 | CC: | asriram, nlevinki, ravishankar, rhs-bugs, storage-qa-internal, vbellur |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Performing add brick or remove brick operations on a volume having replica pairs when there are pending self-heals can cause potential data loss.
Workaround: Ensure that all bricks of the volume are up and there are no pending self-heals. You can view the pending heal info using the command `gluster volume heal <volname> info`.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-12-03 17:16:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
spandura
2014-02-11 13:45:05 UTC
Case 2:- ======== Consider a 3 x 2 distribute-replicate volume. If any of the brick is offline when a distribute subvolume is removed there will be pending self-heals to be performed on the offline brick of the files/dirs migrated from the removed bricks. After successful remove-brick operation , even though the graph has changed and brick5, brick6 now gets client-2 and client-3 afr change-log attributes, Opendir still refers to the previous stale client-4 and client-5 change-log attributes of the brick5 and brick6 and self-heals all the data. Steps to Reproduce: ====================== 1. Create 3 x 2 distribute-replicate volume. Start the volume. 2. Create fuse mount. 3. Bring down brick5 offline. 4. Create 10 files from mount point. 5. Bring back brick5 online. 6. Wait for self-heal to happen 7. Bring brick6 offline. 8. remove the distribute sub-volume-1 from the volume. 9. Wait for the migration to complete and then commit the remove-brick operation. 10. Bring back brick6 online. 11. perform "ls -l" from mount point. self-heal happens from brick5 to brick6 and the stale change-logs are still referred. Case 1 will be fixed with the persistent changelog implementation to be implemented in Denali. But it needs to be documented as a known issue for Corbett. Hence adding to BZ 1035040. doc text needs to be set. Case 2 happens if before self-heal, afr_opendir happens for the first time on an inode which triggers a conservative merge. This is the expected behaviour by design. Please review the edited doc text and sign off. Updated as suggested. Please sign off. Looks good to me. Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release. |