Bug 1142601

Summary: files with open fd's getting into split-brain when bricks goes offline and comes back online
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, spandura, ssamanta
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Regression
Fixed In Version: glusterfs-3.7.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1131466
: 1142612 (view as bug list) Environment:
Last Closed: 2015-05-14 17:27:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1131466    
Bug Blocks: 1142612, 1142614    

Comment 1 Anand Avati 2014-09-17 06:12:32 UTC
REVIEW: http://review.gluster.org/8755 (cluster/afr: Launch self-heal only when all the brick status is known) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2014-09-17 06:26:35 UTC
REVIEW: http://review.gluster.org/8755 (cluster/afr: Launch self-heal only when all the brick status is known) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2014-09-18 11:22:39 UTC
REVIEW: http://review.gluster.org/8755 (cluster/afr: Launch self-heal only when all the brick status is known) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Anand Avati 2014-09-18 11:44:23 UTC
REVIEW: http://review.gluster.org/8755 (cluster/afr: Launch self-heal only when all the brick status is known) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 5 Anand Avati 2014-09-25 11:54:13 UTC
REVIEW: http://review.gluster.org/8755 (cluster/afr: Launch self-heal only when all the brick status is known) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Anand Avati 2014-09-29 10:25:25 UTC
REVIEW: http://review.gluster.org/8755 (cluster/afr: Launch self-heal only when all the brick status is known) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 7 Anand Avati 2014-09-29 10:49:00 UTC
REVIEW: http://review.gluster.org/8755 (cluster/afr: Launch self-heal only when all the brick status is known) posted (#7) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 8 Anand Avati 2014-09-29 12:46:59 UTC
COMMIT: http://review.gluster.org/8755 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 94045e4ae779b1bde54ad1dd0ed87981a6872125
Author: Pranith Kumar K <pkarampu>
Date:   Wed Sep 17 11:33:23 2014 +0530

    cluster/afr: Launch self-heal only when all the brick status is known
    
    Problem:
    File goes into split-brain because of wrong erasing of xattrs.
    
    RCA:
    The issue happens because index self-heal is triggered even before all the
    bricks are up. So what ends up happening while erasing the xattrs is, xattrs
    are erased only on the sink brick for the brick that it thinks is up leading to
    split-brain
    
    Example:
    lets say the xattrs before heal started are:
    brick 2:
    trusted.afr.vol1-client-2=0x000000020000000000000000
    trusted.afr.vol1-client-3=0x000000020000000000000000
    
    brick 3:
    trusted.afr.vol1-client-2=0x000010040000000000000000
    trusted.afr.vol1-client-3=0x000000000000000000000000
    
    if only brick-2 came up at the time of triggering the self-heal only
    'trusted.afr.vol1-client-2' is erased leading to the following xattrs:
    
    brick 2:
    trusted.afr.vol1-client-2=0x000000000000000000000000
    trusted.afr.vol1-client-3=0x000000020000000000000000
    
    brick 3:
    trusted.afr.vol1-client-2=0x000010040000000000000000
    trusted.afr.vol1-client-3=0x000000000000000000000000
    
    So the file goes into split-brain.
    
    Change-Id: I1185713c688e0f41fd32bf2a5953c505d17a3173
    BUG: 1142601
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8755
    Reviewed-by: Krutika Dhananjay <kdhananj>
    Tested-by: Gluster Build System <jenkins.com>

Comment 9 Niels de Vos 2015-05-14 17:27:43 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 10 Niels de Vos 2015-05-14 17:35:36 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 11 Niels de Vos 2015-05-14 17:37:58 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 12 Niels de Vos 2015-05-14 17:43:54 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user