Bug 1717819

Summary: Changes to self-heal logic w.r.t. detecting metadata split-brains
Product: [Community] GlusterFS Reporter: Karthik U S <ksubrahm>
Component: replicateAssignee: Karthik U S <ksubrahm>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1805097 1806931 (view as bug list) Environment:
Last Closed: 2019-06-10 14:48:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1805097, 1806931    

Description Karthik U S 2019-06-06 09:21:37 UTC
Description of problem:

We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in metadata split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others for metadata. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks.

Comment 1 Worker Ant 2019-06-06 09:34:30 UTC
REVIEW: https://review.gluster.org/22831 (Cluster/afr: Don't treat all bricks having metadata pending as split-brain) posted (#1) for review on master by Karthik U S

Comment 2 Worker Ant 2019-06-10 14:48:46 UTC
REVIEW: https://review.gluster.org/22831 (Cluster/afr: Don't treat all bricks having metadata pending as split-brain) merged (#5) on master by Amar Tumballi