Description of problem: Did a gluster volume force restart to start one of the brick and it had no impact on the brick which was stopped. Stopped all my volumes and rebooted my nodes and i see that files in engine volume are in split-brain. Version-Release number of selected component (if applicable): glusterfs-3.8.4-2.el7rhgs.x86_64 How reproducible: seen it once Steps to Reproduce: 1. 2. 3. Actual results: files in engine volume are in split-brain state Expected results: files should not be in split-brain. Additional info:
Files in split-brain =================================================== root@rhsqa-grafton2 ~]# gluster volume heal engine info split-brain Brick 10.70.36.79:/rhgs/brick1/engine /__DIRECT_IO_TEST__ Status: Connected Number of entries in split-brain: 1 Brick 10.70.36.80:/rhgs/brick1/engine /53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent /__DIRECT_IO_TEST__ Status: Connected Number of entries in split-brain: 2 Brick 10.70.36.81:/rhgs/brick1/engine /__DIRECT_IO_TEST__ /53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent Status: Connected Number of entries in split-brain: 2 getfattrs on the file which are in split-brain ================================================== [root@rhsqa-grafton1 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/__DIRECT_IO_TEST__ getfattr: Removing leading '/' from absolute path names # file: rhgs/brick1/engine/__DIRECT_IO_TEST__ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-1=0x0000000000000b5e00000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x9202d90daed441a69b7538d4d6eae1b1 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x0000000000000000000000000000000000000000000000000000000000000000 [root@rhsqa-grafton2 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent getfattr: Removing leading '/' from absolute path names # file: rhgs/brick1/engine/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000001 trusted.afr.engine-client-0=0x0000000500000000000000e5 trusted.gfid=0x4da13f61cc0b4d46ae303f2676866f06 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root@rhsqa-grafton2 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/__DIRECT_IO_TEST__ getfattr: Removing leading '/' from absolute path names # file: rhgs/brick1/engine/__DIRECT_IO_TEST__ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x000000000000000200000000 trusted.afr.engine-client-2=0x000000000000000100000000 trusted.gfid=0x9202d90daed441a69b7538d4d6eae1b1 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x0000000000000000000000000000000000000000000000000000000000000000 [root@rhsqa-grafton3 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent getfattr: Removing leading '/' from absolute path names # file: rhgs/brick1/engine/53c84f1e-3643-45aa-805e-8c9e92ee3098/ha_agent security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x0000000000000000000014f3 trusted.afr.engine-client-0=0x0000000500000000000000b1 trusted.afr.engine-client-1=0x000000000000000000000000 trusted.gfid=0x4da13f61cc0b4d46ae303f2676866f06 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root@rhsqa-grafton3 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine/__DIRECT_IO_TEST__ getfattr: Removing leading '/' from absolute path names # file: rhgs/brick1/engine/__DIRECT_IO_TEST__ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x000000000000000000000000 trusted.afr.engine-client-1=0x0000000000000b5e00000000 trusted.gfid=0x9202d90daed441a69b7538d4d6eae1b1 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x0000000000000000000000000000000000000000000000000000000000000000
sosreports can be found in the link below: ================================================== http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/split_brain/
volume info for engine vol: ========================== [root@rhsqa-grafton1 ~]# gluster volume info engine Volume Name: engine Type: Replicate Volume ID: 03c68517-4be1-45e3-b788-87e10d73f3ee Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.79:/rhgs/brick1/engine Brick2: 10.70.36.80:/rhgs/brick1/engine Brick3: 10.70.36.81:/rhgs/brick1/engine (arbiter) Options Reconfigured: server.ssl: on client.ssl: on auth.ssl-allow: 10.70.36.79,10.70.36.80,10.70.36.81 performance.strict-o-direct: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on cluster.granular-entry-heal: on
Resetting all the acks, which was previously present on this bug
I could consistenly hit issue with arbiter volume with RHGS 3.3.0 ( interim build ) - glusterfs-3.8.4-28.el7rhgs
Tested with RHGS 3.3.0 interim build ( glusterfs-3.8.4-28.el7rhgs ) and I could hit this issue consistenly with the other issue of arbiter becoming source of heal BZ 1401969 Very simple test is to: 1. Create arbiter volume 1x (2+1) with bricks - brick1, brick2, arbiter 2. Fuse mount it on any RHEL 7 client 3. Run some app ( dd, truncate, etc, ) on a single file 4. Kill brick2 5. sleep for 3 seconds 6. Bring up brick2, sleep for 3 seconds, kill arbiter 7. sleep for 3 seconds 8. Bring up arbiter, sleep for 3 seconds, kill brick1 9. sleep for 3 seconds 10. continue with step 4 When the above steps are repeated, I observed that I landed up in a split-brain or arbiter becoming source of heal ( bz 1401969 ).
The additional information that I have is, I am able to hit the split-brain issue with replica 3 volume as well. Here are the steps to reproduce. Setup details -------------- 1. 3 Node Gluster cluster ( node1, node2, node3 ) 2. Create replica 3 volume 3. Mount it on node1 Steps ------ There are 2 scripts run in parallel to reproduce this issue. Script1 kills then starts bricks, in cyclic fashion across all the bricks, such a way that there are always 2 bricks alive at one instance. while true; do kill node2-brick2 kill node2-glusterd sleep 3 start node2-glusterd # This also starts the brick on this node sleep 1 kill node3-brick3 kill node3-glusterd sleep 3 start node3-glusterd # This also starts the brick on this node sleep 1 kill node1-brick1 kill node1-glusterd sleep 3 start node1-glusterd # This also starts the brick on this node sleep 1 done script2 does I/O on the fuse mount, while the bricks are killed & started by scrip1 MOUNTPATH=/mnt/test while true; do echo "dd if=/dev/urandom of=$MOUNTPATH/FILE bs=128k count=10" >> /var/log/glusterfs/mnt-test.log dd if=/dev/urandom of=$MOUNTPATH/FILE bs=128k count=10 echo "truncate $MOUNTPATH/FILE --size 5K" >> /var/log/glusterfs/mnt-test.log truncate $MOUNTPATH/FILE --size 5K echo "cat /home/template > $MOUNTPATH/FILE" >> /var/log/glusterfs/mnt-test.log cat /home/template > $MOUNTPATH/FILE echo "truncate $MOUNTPATH/FILE --size 100k" >> /var/log/glusterfs/mnt-test.log truncate $MOUNTPATH/FILE --size 100k done When I ran the above script with replica 3, I could still see the file on fuse mount in a split-brain state
I could hit with this split-brain issue with the scripts as described in comment19 very consistently
Tested with the Ravi's fix and also with cluster.eager-lock=off But still I could land up in split-brain scenario Here are the changelogs from all the bricks: Brick1 ------- # getfattr -d -m. -ehex /gluster/brick1/b1/FILE getfattr: Removing leading '/' from absolute path names # file: gluster/brick1/b1/FILE security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.arbvol-client-1=0x00006ba80000000000000000 trusted.afr.arbvol-client-2=0x000000010000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xbadce32eff854b928546c7fff5a63b30 Brick2 ------- # getfattr -d -m. -ehex /gluster/brick1/b1/FILE getfattr: Removing leading '/' from absolute path names # file: gluster/brick1/b1/FILE security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.arbvol-client-0=0x000000010000000000000000 trusted.afr.arbvol-client-2=0x000036070000000000000000 trusted.afr.dirty=0x000005c70000000000000000 trusted.gfid=0xbadce32eff854b928546c7fff5a63b30 Brick3 ( arbiter ) ------------------- # getfattr -d -m. -ehex /gluster/brick1/b1/FILE getfattr: Removing leading '/' from absolute path names # file: gluster/brick1/b1/FILE security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.arbvol-client-1=0x00015e040000000000000000 trusted.afr.dirty=0x000000010000000000000000 trusted.gfid=0xbadce32eff854b928546c7fff5a63b30
Tested with RHGS 3.4.0 nightly build - glusterfs-3.12.2-16.el7rhgs with the steps in comment42. No issues found.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607