1417426 – inconsistent behavior with automatic split brain resolution policy as "Majority" on a x2 volume

Bug 1417426 - inconsistent behavior with automatic split brain resolution policy as "Majority" on a x2 volume

Summary: inconsistent behavior with automatic split brain resolution policy as "Majori...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Iraj Jamali
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1304593
TreeView+	depends on / blocked

Reported:	2017-01-29 07:56 UTC by Nag Pavan Chilakam
Modified:	2018-11-16 05:48 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-16 05:48:07 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2017-01-29 07:56:58 UTC

Description of problem:
===================
firstly, I don't see a real reason for having a policy "majority"
The definition given is as below (from patch information in github)
https://review.gluster.org/#/c/14535/

The majority policy will not pick a source if there is no majority. The other three policies pick the first brick with a valid reply and non-zero ctime/mtime/size as source.

but when i create a data splitbrain situation with metadata heal pending,
the splitbrain sometimes gets resolved, sometimes not

However over the long run, ie by disabling and enabling heal shd, the file get healed.

The problem is as below:
1)firstly what is the purpose of majority option in a x2 volume, as there can be no majority
2)the majority policy if at all is required, must have no effect on a x2 volume.





Version-Release number of selected component (if applicable):
==========
3.8.4-12

How reproducible:
=====
mostly

Steps to Reproduce:
=================\
1.create a 2x2 volume spanning on two nodes(b1 on n1 b2 on n2, b3 on n1 , b4 on n2; b1-b2 replica pairs   b3-b4 replica pairs)
2.mount on clients as below
c1 sees on n1
c2 only sees n2
c3 sees n1 and n2
3.now set fav child policy to majority
4. create a file f1 from c3
5. disable self heal deamon
6. echo from c1 its hostname and from c2 its hostname to file f1 ==>will result in data splitbrain
7. do a chmod of f1 to say 0000 from c2
8. check heal info , must show dir1 as in splitbrain
9. now enable heal and trigger heal


the file sometimes fails to heal the split brain with below shd logs


[2017-01-29 07:41:44.017813] W [MSGID: 108042] [afr-self-heal-common.c:828:afr_mark_split_brain_source_sinks_by_policy] 60-dhtafr-replicate-0: Source dhtafr-client-0 selected as authentic to resolve conflicting data in file (gfid:5c510dc1-b210-4c56-bea5-07448d22dee9) by SIZE (34 bytes @ 2017-01-29 13:05:04 mtime, 2017-01-29 13:11:44 ctime).
[2017-01-29 07:41:44.018377] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 60-dhtafr-replicate-0: performing metadata selfheal on 5c510dc1-b210-4c56-bea5-07448d22dee9




However if we disable and enable the heal again, the split brain gets resolved

Comment 2 Nag Pavan Chilakam 2017-01-29 07:57:53 UTC

Volume Name: dhtafr
Type: Distributed-Replicate
Volume ID: dc74c0b6-eb4c-402e-b6af-b38d3fddb3c1
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.35.37:/rhs/brick1/dhtafr
Brick2: 10.70.35.116:/rhs/brick1/dhtafr
Brick3: 10.70.35.37:/rhs/brick2/dhtafr
Brick4: 10.70.35.116:/rhs/brick2/dhtafr
Options Reconfigured:
cluster.self-heal-daemon: disable
cluster.favorite-child-policy: majority
performance.readdir-ahead: on
nfs.disable: on
[root@dhcp35-196 ~]# 
[root@dhcp35-196 ~]# 
[root@dhcp35-196 ~]# 
[root@dhcp35-196 ~]#

Comment 6 Atin Mukherjee 2018-11-11 21:43:16 UTC

This BZ hasn't received any updates since long time. What's the plan on this bug?

Note You need to log in before you can comment on or make changes to this bug.