Bug 1792821

Summary:	Heal pending on brick post upgrading from RHV 4.2.8 or RHV 4.3.7 to RHV 4.3.8
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	milind <mwaykole>
Component:	rhhi	Assignee:	Ravishankar N <ravishankar>
Status:	CLOSED ERRATA	QA Contact:	milind <mwaykole>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	rhhiv-1.7	CC:	godas, mmuench, pasik, ravishankar, rcyriac, rhs-bugs, sasundar, smitra, swachira
Target Milestone:	---
Target Release:	RHHI-V 1.8
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Previously, healing of entries in directories could be triggered when only the heal source (and not the heal target) was available. This led to replication extended attributes being reset and resulted in a GFID split-brain condition when the heal target became available again. Entry healing is now triggered only when all bricks in a replicated set are available, to avoid this issue.	Story Points:	---
Clone Of:
Clones:	1801624 1804164 (view as bug list)		Environment:
Last Closed:	2020-08-04 14:51:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1848893
Bug Blocks:	1779975

Description milind 2020-01-20 05:07:22 UTC

Description of problem:
.prob* file is found in one brick and missing on other 2 bricks

Version-Release number of selected component (if applicable):

RHGS 3.5.1 (6.0-28)
RHVH-4.3.8


Steps to Reproduce:
1.Create a VM 
2.Run I/O in the background  
3.while running the I/O kill one engine brick 
4. wait for 10 minutes 
5. restart glusterd 

Actual results:
.prob missing on 2 brick

Expected results:
There should be no heal pending in the engine and .prob file should be present on all the engine brick

Additional info:

[node1.example.com ~]# ls /gluster_bricks/engine/engine/ -a
.  ..  faf5b9c4-04b0-4ec2-9743-afa5207966fc  .glusterfs  .prob-ddb8b8b6-f2bd-42f1-b1b2-f8106ac78a0a  .shard

[node2.example.com ~]# ls /gluster_bricks/engine/engine/ -a
.  ..  faf5b9c4-04b0-4ec2-9743-afa5207966fc  .glusterfs  .shard

[node3.example.com ~]# ls /gluster_bricks/engine/engine/ -a
.  ..  faf5b9c4-04b0-4ec2-9743-afa5207966fc  .glusterfs  .shard

Comment 2 Sahina Bose 2020-01-23 09:48:16 UTC

Ravi, what's next step on this bug?

Comment 3 Ravishankar N 2020-01-24 12:40:44 UTC

On looking at the setup we found that the entry was not getting healed because the parent dir did not have any entry pending xattrs. The test (thanks Sas for the info) that writes to the prob file apparently unlinks the file before continuing to write to it, so maybe the expected result is that the file be _removed_ from all bricks, not that it is present on them:
------------------------
f = os.open(path, os.O_WRONLY | os.O_DIRECT | os.O_DSYNC | os.O_CREAT | os.O_EXCL, stat.S_IRUSR | stat.S_IWUSR)
#time.sleep(20)
os.unlink(path)

#time.sleep(20)
m = mmap.mmap(-1, 1024)
s = b' ' * 1024

m.write(s)
os.write(f, m)
os.close(f)
------------------------
So it looks like one of the bricks (engine-client-0) was killed at the time of unlink of the prob file so the unlink did not go through on it. But AFR should have marked pending xattrs during post-op on the good bricks (so that selfheal later on removes the prob file from this brick also). I do not see any network errors on the client log which can explain a post-op failure, so I'm not sure what happened here. We need to see if this can be consistently recreated. Leaving a need-info on Milind for the same. We need the exact time the killing and restating of the bricks happen to correlate it with the log.

Comment 7 SATHEESARAN 2020-02-10 07:09:33 UTC

I have also seen the same behavior when upgrading from RHV 4.2.8 to RHV 4.3.8
and also from RHV 4.3.7 to RHV 4.3.8

During this upgrade, one of the bricks were killed, and gluster software was upgraded from RHGS 3.4.4 ( gluster-3.12.2-47.5 ) to RHGS 3.5.1 ( gluster-6.0-29 )

After upgrading one of the node, the he.metadata and he.lockspace files were shown are pending to heal and
that continued forever. On checking for its GFID, then it was mismatching with the same file on other 2 bricks,
but self-heal was not happening though, as the changelog entry was missing in the parent directory.

Comment 8 Ravishankar N 2020-02-10 07:30:02 UTC

So I am able to reproduce the issue fairly consistently. 
1. Create a 1x3 volume with RHHI options enabled.
2. Create and write to a file from the mount.
3. Bring one brick down, delete and re-create the file so that there is pending (granular) entry heal.
4. With the brick still down, launch the index heal.
xat
Even though there is nothing to be healed (since the sink brick is still down), index heal seems to be doing a no-op and resetting parent dir's afr changelog xattrs, which is why the entry never gets healed. 
In the QE setup also, this race is what is happening. Even before the upgraded node comes online, the shd does the entry heal described above. We can see messages like these in the shd log where there is no 'source' and the good bricks are 'sinks':
[2020-02-10 05:57:55.847756] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-testvol-replicate-0: Completed entry selfheal on 77dd5a45-dbf5-4592-b31b-b440382302e9. sources= sinks=0 2

I need to check where the bug is in the code, if it is specific to granular entry heal and how to fix it.

Comment 9 Ravishankar N 2020-02-11 11:41:54 UTC

(In reply to Ravishankar N from comment #8)
> I need to check where the bug is in the code, if it is specific to granular
> entry heal and how to fix it.

So the gfid split-brain will happen only if granular-entry heal is enabled, but even otherwise, even if only two good bricks are up, spurious entry heals are triggered continuously leading to multiple unnecessary network ops. I'm sending a fix upstream for review.

Comment 10 Ravishankar N 2020-02-11 11:53:07 UTC

Upstream patch: https://review.gluster.org/#/c/glusterfs/+/24109/

Comment 15 milind 2020-07-17 08:29:32 UTC

[node1]# rpm -qa | grep -i glusterfs
glusterfs-libs-6.0-37.1.el8rhgs.x86_64
glusterfs-geo-replication-6.0-37.1.el8rhgs.x86_64
glusterfs-rdma-6.0-37.1.el8rhgs.x86_64
glusterfs-api-6.0-37.1.el8rhgs.x86_64
glusterfs-server-6.0-37.1.el8rhgs.x86_64
glusterfs-fuse-6.0-37.1.el8rhgs.x86_64
glusterfs-cli-6.0-37.1.el8rhgs.x86_64
glusterfs-events-6.0-37.1.el8rhgs.x86_64
glusterfs-6.0-37.1.el8rhgs.x86_64
glusterfs-client-xlators-6.0-37.1.el8rhgs.x86_64

[node1]# imgbase w
You are on rhvh-4.4.1.1-0.20200713.0+1

[node1]# rpm -qa | grep -i ansible
gluster-ansible-maintenance-1.0.1-9.el8rhgs.noarch
gluster-ansible-cluster-1.0-1.el8rhgs.noarch
ansible-2.9.10-1.el8ae.noarch
gluster-ansible-features-1.0.5-7.el8rhgs.noarch
gluster-ansible-roles-1.0.5-17.el8rhgs.noarch
ovirt-ansible-engine-setup-1.2.4-1.el8ev.noarch
gluster-ansible-infra-1.0.4-11.el8rhgs.noarch
ovirt-ansible-hosted-engine-setup-1.1.6-1.el8ev.noarch
gluster-ansible-repositories-1.0.1-2.el8rhgs.noarch


	

As i dont see any pending heal in RHHI-V setup , Heance marking this bug as verified

Comment 17 errata-xmlrpc 2020-08-04 14:51:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHHI for Virtualization 1.8 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3314