Created attachment 1472059 [details] gluster-health-report Description of problem: split-brain observed on parent dir while verifying bug 1566336 Version-Release number of selected component (if applicable): Build used: glusterfs-3.12.2-15.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1) create 1 * 3 volume and start 2) Disable all client side heals and create dir from client 3) Fill the 2 bricks from back-end ( b1 and b2 ) 4) From mount point, create the file inside dir and it should fail with "No Space" but the name entry is created on b0. 5) check the heal info and it should list the above file 6) check the change logs of dir ( parent ) and dirty bit should be set. 7) make space in b1 and b2 by removing previously created files from backend 8) trigger heal and the file which was created in step 4 should be healed. 9) dirty bit should be cleared from dir. Actual results: At step 5, observed split-brain on parent dir Expected results: parent dir shouldn't be in split-brain Additional info: 5) # gluster vol heal 13 info Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0 /test/test1 /test - Is in split-brain Status: Connected Number of entries: 2 Brick rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1 Status: Connected Number of entries: 0 Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2 Status: Connected Number of entries: 0 # 6) # getfattr -d -m . -e hex /bricks/brick0/b0/test/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/b0/test/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000001 trusted.gfid=0x76b00d4743754753a99f8b5f74f2f6bd trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 # getfattr -d -m . -e hex /bricks/brick0/b0/test/test1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/b0/test/test1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.13-client-1=0x000000010000000100000000 trusted.afr.13-client-2=0x000000010000000100000000 trusted.gfid=0xfa4dad2a2e3e47ada319c5bc2ca9e2b1 trusted.gfid2path.8c8c0ebfbd194a1f=0x37366230306434372d343337352d343735332d613939662d3862356637346632663662642f7465737431 # > # gluster vol info 13 Volume Name: 13 Type: Replicate Volume ID: 620301ee-9a31-4320-85cd-1beedcd93cdf Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0 Brick2: rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1 Brick3: rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2 Options Reconfigured: cluster.entry-self-heal: off cluster.metadata-self-heal: off cluster.data-self-heal: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off # SOS Report: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/vavuthu/split_brain_on_bricks_full/
Attempted an upstream fix via https://review.gluster.org/21135 (BZ 1626994).
Verified the fix, see below. Build used glusterfs-3.12.2-21.el7rhgs.x86_64 At Step 5 from Bug Description, No split-brain is reported by heal info # gluster vol heal replicate_bug info Brick 10.70.47.133:/bricks/brick3/day4 Status: Connected Number of entries: 0 Brick 10.70.46.168:/bricks/brick3/day4 /dir1/300mbfile /dir1 Status: Connected Number of entries: 2 Brick 10.70.47.102:/bricks/brick3/day4 Status: Connected Number of entries: 0 Also we can see dirty bit at step 6, which is as expected. # getfattr -d -m . -e hex /bricks/brick3/day4/dir1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick3/day4/dir1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000001 trusted.gfid=0xbd6c8c8584b7476c9eba9c8d128e5765 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 getfattr -d -m . -e hex /bricks/brick3/day4/dir1/300mbfile getfattr: Removing leading '/' from absolute path names # file: bricks/brick3/day4/dir1/300mbfile security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.replicate_bug-client-0=0x000000010000000100000000 trusted.afr.replicate_bug-client-2=0x000000010000000100000000 trusted.gfid=0x1c851d3ae1df4abb93f204761a156d03 trusted.gfid2path.a63e0b78c611e2f2=0x62643663386338352d383462372d343736632d396562612d3963386431323865353736352f3330306d6266696c65 Moving it to verified
Looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3432