Previously, if a file creation was successful on only one brick in a replica 3 volume, that brick would have the pending changelog set as part of new entry marking for that file. Since the entry transaction failed on quorum number of bricks, the parent of that file will not have any entry pending changelog set for this transaction. Due to this the entry would be listed in the heal info output but would never get healed by the SHD crawl or index heal.
With this fix if an entry transaction fails on quorum number of bricks, a dirty marking is set on the parent of the file on the brick where the transaction was successful. This allows the entry to be healed as part of the next SHD crawl or as part of the index heal.
Update:
========
Build used: glusterfs-3.12.2-15.el7rhgs.x86_64
Scenario:
1) create 1 * 3 volume and start
2) Disable all client side heals and create dir from client
3) Fill the 2 bricks from back-end ( b1 and b2 )
4) From mount point, create the file inside dir and it should fail with "No Space" but the name ebtry is created on b0.
5) check the heal info and it should list the above file
6) check the change logs of dir ( parent ) and dirty bit should be set.
7) make space in b1 and b2 by removing previously created files from backend
8) trigger heal and the file which was created in step 4 should be healed.
9) dirty bit should be cleared from dir.
o/p:
5)
# gluster vol heal 13 info
Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0
/test/test1
/test - Is in split-brain
Status: Connected
Number of entries: 2
Brick rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1
Status: Connected
Number of entries: 0
Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2
Status: Connected
Number of entries: 0
#
6)
# getfattr -d -m . -e hex /bricks/brick0/b0/test/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000001
trusted.gfid=0x76b00d4743754753a99f8b5f74f2f6bd
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000
# getfattr -d -m . -e hex /bricks/brick0/b0/test/test1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/test1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.13-client-1=0x000000010000000100000000
trusted.afr.13-client-2=0x000000010000000100000000
trusted.gfid=0xfa4dad2a2e3e47ada319c5bc2ca9e2b1
trusted.gfid2path.8c8c0ebfbd194a1f=0x37366230306434372d343337352d343735332d613939662d3862356637346632663662642f7465737431
#
8)
# gluster vol heal 13 info
Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0
Status: Connected
Number of entries: 0
Brick rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1
Status: Connected
Number of entries: 0
Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2
Status: Connected
Number of entries: 0
#
9)
# getfattr -d -m . -e hex /bricks/brick0/b0/test/test1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/test1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.13-client-1=0x000000000000000000000000
trusted.afr.13-client-2=0x000000000000000000000000
trusted.gfid=0xfa4dad2a2e3e47ada319c5bc2ca9e2b1
trusted.gfid2path.8c8c0ebfbd194a1f=0x37366230306434372d343337352d343735332d613939662d3862356637346632663662642f7465737431
[root@rhsauto025 ~]# getfattr -d -m . -e hex /bricks/brick0/b0/test
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x76b00d4743754753a99f8b5f74f2f6bd
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000
#
> Observation we made is at step 5, parent dir is in split brain but it shouldn't be. Discussed with Karthik and this bug is about healing , hence moving this bug to verified and raising new bz for split-brain.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2018:2607
Update: ======== Build used: glusterfs-3.12.2-15.el7rhgs.x86_64 Scenario: 1) create 1 * 3 volume and start 2) Disable all client side heals and create dir from client 3) Fill the 2 bricks from back-end ( b1 and b2 ) 4) From mount point, create the file inside dir and it should fail with "No Space" but the name ebtry is created on b0. 5) check the heal info and it should list the above file 6) check the change logs of dir ( parent ) and dirty bit should be set. 7) make space in b1 and b2 by removing previously created files from backend 8) trigger heal and the file which was created in step 4 should be healed. 9) dirty bit should be cleared from dir. o/p: 5) # gluster vol heal 13 info Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0 /test/test1 /test - Is in split-brain Status: Connected Number of entries: 2 Brick rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1 Status: Connected Number of entries: 0 Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2 Status: Connected Number of entries: 0 # 6) # getfattr -d -m . -e hex /bricks/brick0/b0/test/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/b0/test/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000001 trusted.gfid=0x76b00d4743754753a99f8b5f74f2f6bd trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 # getfattr -d -m . -e hex /bricks/brick0/b0/test/test1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/b0/test/test1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.13-client-1=0x000000010000000100000000 trusted.afr.13-client-2=0x000000010000000100000000 trusted.gfid=0xfa4dad2a2e3e47ada319c5bc2ca9e2b1 trusted.gfid2path.8c8c0ebfbd194a1f=0x37366230306434372d343337352d343735332d613939662d3862356637346632663662642f7465737431 # 8) # gluster vol heal 13 info Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0 Status: Connected Number of entries: 0 Brick rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1 Status: Connected Number of entries: 0 Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2 Status: Connected Number of entries: 0 # 9) # getfattr -d -m . -e hex /bricks/brick0/b0/test/test1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/b0/test/test1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.13-client-1=0x000000000000000000000000 trusted.afr.13-client-2=0x000000000000000000000000 trusted.gfid=0xfa4dad2a2e3e47ada319c5bc2ca9e2b1 trusted.gfid2path.8c8c0ebfbd194a1f=0x37366230306434372d343337352d343735332d613939662d3862356637346632663662642f7465737431 [root@rhsauto025 ~]# getfattr -d -m . -e hex /bricks/brick0/b0/test getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/b0/test security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x76b00d4743754753a99f8b5f74f2f6bd trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 # > Observation we made is at step 5, parent dir is in split brain but it shouldn't be. Discussed with Karthik and this bug is about healing , hence moving this bug to verified and raising new bz for split-brain.