Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1608158

Summary: split brain resolution regression tests fail sporadically
Product: [Community] GlusterFS Reporter: Raghavendra G <rgowdapp>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: bugs, pkarampu
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:47:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raghavendra G 2018-07-25 04:58:47 UTC
Description of problem:
I was trying to debug regression failures on [1] and observed that split-brain-resolution.t was failing consistently.

=========================
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
-------------------
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures stat was not served from md-cache, but instead was wound down to afr which failed stat with EIO as the file was in split brain. So, I did another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat requests being absorbed either by kernel attribute cache or md-cache. When its not happening stats are reaching afr and resulting in failures of cmds like getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
tests/basic/afr/split-brain-resolution.t:
tests/bugs/bug-1368312.t: 
tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t

Discussion on this topic can be found on gluster-devel with subj: regression failures on afr/split-brain-resolution


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ravishankar N 2018-07-25 10:17:43 UTC
Note: gluster-devel thread: https://lists.gluster.org/pipermail/gluster-devel/2018-July/055018.html

Comment 2 Raghavendra G 2018-07-26 03:55:09 UTC
Added one more test as bad:
./tests/bugs/replicate/bug-1438255-do-not-mark-self-accusing-xattrs.t

Comment 3 Ravishankar N 2018-08-01 10:15:39 UTC
Pranith, could you respond here or on the mailing list on what you think about allowing fstats in case of split-brain?

Comment 4 Pranith Kumar K 2018-08-01 12:50:01 UTC
I am of the opinion to remove user-choice based split-brain resolution feature and the .t files that go along with it. These were fine with replica-2. But all new development and testing is happening with replica-3 or arbiter and in future with thin-arbiter. So one by one it is better to sunset these kinds of features.

Comment 5 Worker Ant 2020-03-12 12:47:08 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/948, and will be tracked there from now on. Visit GitHub issues URL for further details