Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1608158

Summary:	split brain resolution regression tests fail sporadically
Product:	[Community] GlusterFS	Reporter:	Raghavendra G <rgowdapp>
Component:	replicate	Assignee:	Ravishankar N <ravishankar>
Status:	CLOSED UPSTREAM	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	mainline	CC:	bugs, pkarampu
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-03-12 12:47:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Raghavendra G 2018-07-25 04:58:47 UTC

Description of problem:
I was trying to debug regression failures on [1] and observed that split-brain-resolution.t was failing consistently.

=========================
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
-------------------
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures stat was not served from md-cache, but instead was wound down to afr which failed stat with EIO as the file was in split brain. So, I did another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat requests being absorbed either by kernel attribute cache or md-cache. When its not happening stats are reaching afr and resulting in failures of cmds like getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
tests/basic/afr/split-brain-resolution.t:
tests/bugs/bug-1368312.t: 
tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t

Discussion on this topic can be found on gluster-devel with subj: regression failures on afr/split-brain-resolution


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ravishankar N 2018-07-25 10:17:43 UTC

Note: gluster-devel thread: https://lists.gluster.org/pipermail/gluster-devel/2018-July/055018.html

Comment 2 Raghavendra G 2018-07-26 03:55:09 UTC

Added one more test as bad:
./tests/bugs/replicate/bug-1438255-do-not-mark-self-accusing-xattrs.t

Comment 3 Ravishankar N 2018-08-01 10:15:39 UTC

Pranith, could you respond here or on the mailing list on what you think about allowing fstats in case of split-brain?

Comment 4 Pranith Kumar K 2018-08-01 12:50:01 UTC

I am of the opinion to remove user-choice based split-brain resolution feature and the .t files that go along with it. These were fine with replica-2. But all new development and testing is happening with replica-3 or arbiter and in future with thin-arbiter. So one by one it is better to sunset these kinds of features.

Comment 5 Worker Ant 2020-03-12 12:47:08 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/948, and will be tracked there from now on. Visit GitHub issues URL for further details