Bug 1384983

Summary:	split-brain observed with arbiter & replica 3 volume.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	RamaKasturi <knarra>
Component:	arbiter	Assignee:	Ravishankar N <ravishankar>
Status:	CLOSED ERRATA	QA Contact:	SATHEESARAN <sasundar>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.2	CC:	amukherj, bkunal, ksubrahm, pkarampu, psony, ravishankar, rcyriac, rhinduja, rhs-bugs, sasundar, ssaha, storage-qa-internal, vavuthu
Target Milestone:	---
Target Release:	RHGS 3.4.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-04 06:29:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1506140, 1539358, 1541458, 1542380, 1542382, 1597120, 1597123
Bug Blocks:	1503134

Description RamaKasturi 2016-10-14 13:43:54 UTC

Description of problem:
Did a gluster volume force restart to start one of the brick and it had no impact on the brick which was stopped. Stopped all my volumes and rebooted my nodes and i see that files in engine volume are in split-brain.

Version-Release number of selected component (if applicable):
glusterfs-3.8.4-2.el7rhgs.x86_64

How reproducible:
seen it once

Steps to Reproduce:
1. 
2.
3.

Actual results:
files in engine volume are in split-brain state

Expected results:
files should not be in split-brain.

Additional info:

Comment 2 RamaKasturi 2016-10-14 13:47:18 UTC

Files in split-brain
===================================================
root@rhsqa-grafton2 ~]# gluster volume heal engine info split-brain
Brick 10.70.36.79:/rhgs/brick1/engine
/__DIRECT_IO_TEST__
Status: Connected
Number of entries in split-brain: 1

Brick 10.70.36.80:/rhgs/brick1/engine
/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent
/__DIRECT_IO_TEST__
Status: Connected
Number of entries in split-brain: 2

Brick 10.70.36.81:/rhgs/brick1/engine
/__DIRECT_IO_TEST__
/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent
Status: Connected
Number of entries in split-brain: 2

getfattrs on the file which are in split-brain
==================================================
[root@rhsqa-grafton1 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/__DIRECT_IO_TEST__
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick1/engine/__DIRECT_IO_TEST__
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-1=0x0000000000000b5e00000000
trusted.afr.engine-client-2=0x000000000000000000000000
trusted.gfid=0x9202d90daed441a69b7538d4d6eae1b1
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000000000000000000000000000000000000000000000000000000000000000

[root@rhsqa-grafton2 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick1/engine/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000001
trusted.afr.engine-client-0=0x0000000500000000000000e5
trusted.gfid=0x4da13f61cc0b4d46ae303f2676866f06
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

[root@rhsqa-grafton2 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/__DIRECT_IO_TEST__
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick1/engine/__DIRECT_IO_TEST__
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-0=0x000000000000000200000000
trusted.afr.engine-client-2=0x000000000000000100000000
trusted.gfid=0x9202d90daed441a69b7538d4d6eae1b1
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000000000000000000000000000000000000000000000000000000000000000

[root@rhsqa-grafton3 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick1/engine/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x0000000000000000000014f3
trusted.afr.engine-client-0=0x0000000500000000000000b1
trusted.afr.engine-client-1=0x000000000000000000000000
trusted.gfid=0x4da13f61cc0b4d46ae303f2676866f06
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

[root@rhsqa-grafton3 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/__DIRECT_IO_TEST__
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick1/engine/__DIRECT_IO_TEST__
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-0=0x000000000000000000000000
trusted.afr.engine-client-1=0x0000000000000b5e00000000
trusted.gfid=0x9202d90daed441a69b7538d4d6eae1b1
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000000000000000000000000000000000000000000000000000000000000000

Comment 3 RamaKasturi 2016-10-14 13:48:43 UTC

sosreports can be found in the link below:
==================================================

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/split_brain/

Comment 4 RamaKasturi 2016-10-14 13:55:50 UTC

volume info for engine vol:
==========================
[root@rhsqa-grafton1 ~]# gluster volume info engine
 
Volume Name: engine
Type: Replicate
Volume ID: 03c68517-4be1-45e3-b788-87e10d73f3ee
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.36.79:/rhgs/brick1/engine
Brick2: 10.70.36.80:/rhgs/brick1/engine
Brick3: 10.70.36.81:/rhgs/brick1/engine (arbiter)
Options Reconfigured:
server.ssl: on
client.ssl: on
auth.ssl-allow: 10.70.36.79,10.70.36.80,10.70.36.81
performance.strict-o-direct: on
user.cifs: off
network.ping-timeout: 30
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
performance.low-prio-threads: 32
features.shard-block-size: 512MB
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: off
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.granular-entry-heal: on

Comment 8 SATHEESARAN 2017-06-16 07:37:33 UTC

Resetting all the acks, which was previously present on this bug

Comment 9 SATHEESARAN 2017-06-16 07:40:03 UTC

I could consistenly hit issue with arbiter volume with RHGS 3.3.0 ( interim build )
- glusterfs-3.8.4-28.el7rhgs

Comment 10 SATHEESARAN 2017-06-16 07:56:15 UTC

Tested with RHGS 3.3.0 interim build ( glusterfs-3.8.4-28.el7rhgs ) and I could hit this issue consistenly with the other issue of arbiter becoming source of heal BZ 1401969

Very simple test is to:
1. Create arbiter volume 1x (2+1) with bricks - brick1, brick2, arbiter
2. Fuse mount it on any RHEL 7 client
3. Run some app ( dd, truncate, etc, ) on a single file
4. Kill brick2
5. sleep for 3 seconds
6. Bring up brick2, sleep for 3 seconds, kill arbiter
7. sleep for 3 seconds
8. Bring up arbiter, sleep for 3 seconds, kill brick1
9. sleep for 3 seconds
10. continue with step 4

When the above steps are repeated, I observed that I landed up in a split-brain or arbiter becoming source of heal ( bz 1401969 ).

Comment 19 SATHEESARAN 2017-07-13 05:59:49 UTC

The additional information that I have is, I am able to hit the split-brain issue with replica 3 volume as well.

Here are the steps to reproduce.

Setup details
--------------
1. 3 Node Gluster cluster ( node1, node2, node3 )
2. Create replica 3 volume
3. Mount it on node1

Steps
------

There are 2 scripts run in parallel to reproduce this issue.

Script1 kills then starts bricks, in cyclic fashion across all the bricks, such a way that there are always 2 bricks alive at one instance.

while true; do 
    kill node2-brick2
    kill node2-glusterd
    sleep 3
    start node2-glusterd # This also starts the brick on this node
    sleep 1

    kill node3-brick3
    kill node3-glusterd
    sleep 3
    start node3-glusterd # This also starts the brick on this node
    sleep 1

    kill node1-brick1
    kill node1-glusterd
    sleep 3
    start node1-glusterd # This also starts the brick on this node
    sleep 1
done

script2 does I/O on the fuse mount, while the bricks are killed & started by scrip1

MOUNTPATH=/mnt/test
while true; do  
    echo "dd if=/dev/urandom of=$MOUNTPATH/FILE bs=128k count=10" >> /var/log/glusterfs/mnt-test.log
    dd if=/dev/urandom of=$MOUNTPATH/FILE bs=128k count=10
    echo "truncate $MOUNTPATH/FILE --size 5K" >> /var/log/glusterfs/mnt-test.log
    truncate $MOUNTPATH/FILE --size 5K
    echo "cat /home/template > $MOUNTPATH/FILE" >> /var/log/glusterfs/mnt-test.log
    cat /home/template > $MOUNTPATH/FILE
    echo "truncate $MOUNTPATH/FILE --size 100k" >> /var/log/glusterfs/mnt-test.log
    truncate $MOUNTPATH/FILE --size 100k
done


When I ran the above script with replica 3, I could still see the file on fuse mount in a split-brain state

Comment 20 SATHEESARAN 2017-07-13 06:19:30 UTC

I could hit with this split-brain issue with the scripts as described in comment19 very consistently

Comment 21 SATHEESARAN 2017-07-13 07:57:24 UTC

Tested with the Ravi's fix and also with cluster.eager-lock=off
But still I could land up in split-brain scenario

Here are the changelogs from all the bricks:

Brick1
-------
# getfattr -d -m. -ehex /gluster/brick1/b1/FILE 
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick1/b1/FILE
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.arbvol-client-1=0x00006ba80000000000000000
trusted.afr.arbvol-client-2=0x000000010000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0xbadce32eff854b928546c7fff5a63b30

Brick2
-------
# getfattr -d -m. -ehex /gluster/brick1/b1/FILE
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick1/b1/FILE
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.arbvol-client-0=0x000000010000000000000000
trusted.afr.arbvol-client-2=0x000036070000000000000000
trusted.afr.dirty=0x000005c70000000000000000
trusted.gfid=0xbadce32eff854b928546c7fff5a63b30

Brick3 ( arbiter )
-------------------
# getfattr -d -m. -ehex /gluster/brick1/b1/FILE
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick1/b1/FILE
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.arbvol-client-1=0x00015e040000000000000000
trusted.afr.dirty=0x000000010000000000000000
trusted.gfid=0xbadce32eff854b928546c7fff5a63b30

Comment 55 SATHEESARAN 2018-08-23 19:25:32 UTC

Tested with RHGS 3.4.0 nightly build - glusterfs-3.12.2-16.el7rhgs with the steps in comment42. No issues found.

Comment 57 errata-xmlrpc 2018-09-04 06:29:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607