Bug 1154491
Summary: | split-brain reported on files whose change-logs are all zeros | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Pranith Kumar K <pkarampu> |
Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | mainline | CC: | a.ghoshal, bugs, kdhananj, ksubrahm, spandura, storage-qa-internal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1139599 | Environment: | |
Last Closed: | 2017-08-24 12:02:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1139599, 1141750, 1292379, 1293239, 1293240, 1293265 | ||
Bug Blocks: |
Comment 1
Pranith Kumar K
2014-10-20 04:05:04 UTC
RCA from https://bugzilla.redhat.com/show_bug.cgi?id=1141750#c3 which has same RCA but different manifestation: Simpler test to re-create the bug: 0) Create a replicate volume 1x2 start it and mount it. 1) Open a file 'a' from the mount and keep writing to it. 2) Bring one of the bricks down 3) rename the file '<mnt>/a' to '<mnt>/b' 4) Wait for at least one write to complete while the brick is still down. 5) Restart the brick 6) Wait until self-heal completes and stop the 'writing' from mount point. Root cause: When Rename happens while the brick is down after the brick comes back up, entry self-heal is triggered on the parent directory of where the rename happened, in this case that is <mnt>. As part of this entry self-heal 1) file 'a' is deleted and 2) file 'b' is re-created. 0) In parallel to this, writing fd needs to be opened on the file from the mount point. If re-opening of the file in step-0) happens before step-1) of self-heal then this issue is observed. Writes from mount keep going to the file that was deleted where as the self-heal happens on the file created at step-2. So the checksum mismatches. One more manifestation of this issue is https://bugzilla.redhat.com/show_bug.cgi?id=1139599. Where writes from the mount only increase the file on the 'always up' brick but the file on the other brick is not growing. This leads to split-brain because of size mismatch but all-zero pending changelog. It is a day-1 bug in link-self-heal of entry-self-heal. Bug exists from 2012. This is the bug in implementation of the following commit: commit 1936e29c3ac3d6466d391545d761ad8e60ae2e03 Author: Pranith Kumar K <pranithk> Date: Wed Feb 29 16:31:18 2012 +0530 cluster/afr: Hardlink Self-heal Change-Id: Iea0b38011edbc7fc9d75a95d1775139a5234e514 BUG: 765391 Signed-off-by: Pranith Kumar K <pranithk> Reviewed-on: http://review.gluster.com/2841 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Amar Tumballi <amarts> Reviewed-by: Vijay Bellur <vijay> Very good testing shwetha! Thanks a lot for the help in re-creating and condensing the test case kritika! Pranith |