1858197 – Pending self-heal on the volume, post the bricks are online

Bug 1858197 - Pending self-heal on the volume, post the bricks are online

Summary: Pending self-heal on the volume, post the bricks are online

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhhi
Sub Component:
Version:	rhhiv-1.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Gobinda Das
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1858201
Blocks:	RHHI-V_1.8_Release_Notes
TreeView+	depends on / blocked

Reported:	2020-07-17 07:57 UTC by SATHEESARAN
Modified:	2020-09-16 08:30 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	In dual network configurations(one for gluster and other for ovirt management), Automatic File Replication(AFR) healer threads are not spawned with the restart of self-heal daemon resulting in pending self heal entries in the volume. To work around this issue, change the hostname to the other network FQDN using the command # hostnamectl set-hostname <other-network-FQDN> Start the heal using the command: # gluster volume heal <volname>
Clone Of:
Clones:	1858201 (view as bug list)
Environment:
Last Closed:	2020-09-16 07:34:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description SATHEESARAN 2020-07-17 07:57:21 UTC

Description of problem:
-------------------------
With RHHI-V 1.8, when using 2 network interfaces with 2 FQDNs, this issue is seen that heal is pending on volumes.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHHI-V 1.8
RHGS 3.5.2-async ( glusterfs-6.0-37.1.el8rhgs )

How reproducible:
-----------------
Always

Steps to Reproduce:
---------------------
1. Create a RHHI-V deployment with 3 hosts - used dedicated networks for gluster and ovirtmgmt
2. kill one of the engine brick, make sure there are pending entries and force start the engine volume to bring the brick UP
3. Check for self-heal

Actual results:
---------------
Pending self-heal on the node

Expected results:
-----------------
No pending heals, even if there are entries using 'heal' command should heal the entries.

Comment 1 SATHEESARAN 2020-07-17 08:03:37 UTC

There are few issues that are seen. Ravi and I debugged this issue and
Ravi came up with following observations:

1. afr healer threads went not present on the host.
It should be always available on the node, but not sure, why it wasn't there.

2. Restarting glustershd should have started the afr_healer thread, even that
didn't happen.

3. Changing the hostname of the host to the FQDN corresponding to other network,
and then triggering heal settles the problem

Thanks Ravi.
This issue was not seen with RHHI-V 1.7 with RHEL 7 server.

This bug can be marked as known_issue for RHHI-V 1.8, as the initial suspicion is
around RHEL 8 networking changes + how glusterd resolves the network names.

Comment 5 Gobinda Das 2020-09-16 07:34:58 UTC

Closing this as dependent bug is already closed.

Comment 6 SATHEESARAN 2020-09-16 08:30:25 UTC

This issue happens with the 2 network interfaces and FQDNs corresponds to backend network.
This is not getting fixed in RHGS, so its better to close this bug as WONTFIX.

The known_issue for this bug still holds true, as the bug is not getting fixed

Note You need to log in before you can comment on or make changes to this bug.