Bug 1566336

Summary: [GSS] Pending heals are not getting completed in CNS environment
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Abhishek Kumar <abhishku>
Component: replicateAssignee: Karthik U S <ksubrahm>
Status: CLOSED ERRATA QA Contact: Vijay Avuthu <vavuthu>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: abhishku, aflierl, amukherj, atripath, atumball, dwojslaw, ksubrahm, madam, psony, ravishankar, rhs-bugs, sheggodu, srmukher, storage-qa-internal, sunnikri, vdas
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-14 Doc Type: Bug Fix
Doc Text:
Previously, if a file creation was successful on only one brick in a replica 3 volume, that brick would have the pending changelog set as part of new entry marking for that file. Since the entry transaction failed on quorum number of bricks, the parent of that file will not have any entry pending changelog set for this transaction. Due to this the entry would be listed in the heal info output but would never get healed by the SHD crawl or index heal. With this fix if an entry transaction fails on quorum number of bricks, a dirty marking is set on the parent of the file on the brick where the transaction was successful. This allows the entry to be healed as part of the next SHD crawl or as part of the index heal.
Story Points: ---
Clone Of:
: 1586020 (view as bug list) Environment:
Last Closed: 2018-09-04 06:46:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1586020    
Bug Blocks: 1503138    

Comment 52 Vijay Avuthu 2018-08-01 11:13:30 UTC
Update:
========

Build used: glusterfs-3.12.2-15.el7rhgs.x86_64

Scenario:

1) create 1 * 3 volume and start
2) Disable all client side heals and create dir from client
3) Fill the 2 bricks from back-end ( b1 and b2 )
4) From mount point, create the file inside dir and it should fail with "No Space" but the name ebtry is created on b0.
5) check the heal info and it should list the above file
6) check the change logs of dir ( parent ) and dirty bit should be set.
7) make space in b1 and b2 by removing previously created files from backend
8) trigger heal and the file which was created in step 4 should be healed.
9) dirty bit should be cleared from dir.


o/p:

5)
# gluster vol heal 13 info 
Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0
/test/test1 
/test - Is in split-brain

Status: Connected
Number of entries: 2

Brick rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1
Status: Connected
Number of entries: 0

Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2
Status: Connected
Number of entries: 0
# 

6) 
# getfattr -d -m . -e hex /bricks/brick0/b0/test/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000001
trusted.gfid=0x76b00d4743754753a99f8b5f74f2f6bd
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000

# getfattr -d -m . -e hex /bricks/brick0/b0/test/test1 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/test1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.13-client-1=0x000000010000000100000000
trusted.afr.13-client-2=0x000000010000000100000000
trusted.gfid=0xfa4dad2a2e3e47ada319c5bc2ca9e2b1
trusted.gfid2path.8c8c0ebfbd194a1f=0x37366230306434372d343337352d343735332d613939662d3862356637346632663662642f7465737431
#

8) 
# gluster vol heal 13 info 
Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0
Status: Connected
Number of entries: 0

Brick rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1
Status: Connected
Number of entries: 0

Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2
Status: Connected
Number of entries: 0
#


9) 
# getfattr -d -m . -e hex /bricks/brick0/b0/test/test1 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/test1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.13-client-1=0x000000000000000000000000
trusted.afr.13-client-2=0x000000000000000000000000
trusted.gfid=0xfa4dad2a2e3e47ada319c5bc2ca9e2b1
trusted.gfid2path.8c8c0ebfbd194a1f=0x37366230306434372d343337352d343735332d613939662d3862356637346632663662642f7465737431

[root@rhsauto025 ~]# getfattr -d -m . -e hex /bricks/brick0/b0/test
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x76b00d4743754753a99f8b5f74f2f6bd
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000
#

> Observation we made is at step 5, parent dir is in split brain but it shouldn't be. Discussed with Karthik and this bug is about healing , hence moving this bug to verified and raising new bz for split-brain.

Comment 56 errata-xmlrpc 2018-09-04 06:46:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607