Bug 2026833

Summary:	[Ganesha][RHEL 8.5] HA status is in FAILOVER when configuring NFS ganesha with RHEL 8.5 platform
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	SATHEESARAN <sasundar>
Component:	common-ha	Assignee:	Kaleb KEITHLEY <kkeithle>
Status:	CLOSED ERRATA	QA Contact:	Manisha Saini <msaini>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.5	CC:	agantony, kkeithle, nyancey, rhs-bugs, sheggodu, smulay, sselvan, tshacked, vdas
Target Milestone:	---	Keywords:	Regression, ZStream
Target Release:	RHGS 3.5.z Batch Update 7
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-6.0-62	Doc Type:	Bug Fix
Doc Text:	Previously, the `crmadmin` command waited forever or for 83 mins instead of timing out at 5 s, and glusterd waited for 2 mins for the setup command to complete before its own timeout. This is because `pacemaker-2.1.x` changed the semantics of the `--timeout` command line parameter for the `crmadmin` utility. The value was an integer that specified a timeout in milliseconds. With this update, the value is time specific, for example, 5 s, and defaults to seconds if the value is an integer. Now, the `crmadmin` command times out after 5 s as it did with the previous version of pacemaker.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-05-31 12:37:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2033272

Description SATHEESARAN 2021-11-26 05:27:58 UTC

Description of problem:
-----------------------
3 RHGS 3.5.5 nodes are installed ith RHGS 3.5.5 ISO based on RHEL 8.4. The nodes are subscribed to baseos, appstream, high-availability repos. The nodes are upgraded to RHEL 8.5. 

nfs-ganesha deployment fails in the step 'gluster nfs-ganesha enable'
and cluster HA status is FAILOVER
<snip>
TASK [Enable nfs-ganesha] ********************************************************************************************************************************************************************
fatal: [dhcp35-137.lab.eng.blr.redhat.com]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/libexec/platform-python"}, "changed": true, "cmd": "gluster nfs-ganesha enable --mode=script", "delta": "0:10:00.111919", "end": "2021-11-26 00:15:17.630279", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-11-26 00:05:17.518360", "stderr": "", "stderr_lines": [], "stdout": "This will take a few minutes to complete. Please wait ..\nError : Request timed out", "stdout_lines": ["This will take a few minutes to complete. Please wait ..", "Error : Request timed out"]}
...ignoring

</snip>

Version-Release number of selected component (if applicable):
---------------------------------------------------------------
RHGS 3.5.5 ( glusterfs-6.0-59.el8rhgs )
RHEL 8.5 ( 4.18.0-348.2.1.el8_5.x86_64 )

pacemaker-cli-2.1.0-8.el8.x86_64
pacemaker-schemas-2.1.0-8.el8.noarch
pacemaker-2.1.0-8.el8.x86_64
pacemaker-cluster-libs-2.1.0-8.el8.x86_64
pacemaker-libs-2.1.0-8.el8.x86_64

corosynclib-3.1.5-1.el8.x86_64
corosync-3.1.5-1.el8.x86_64

pcs-0.10.10-4.el8.x86_64
pacemaker-cli-2.1.0-8.el8.x86_64
pacemaker-schemas-2.1.0-8.el8.noarch
pacemaker-2.1.0-8.el8.x86_64
pacemaker-cluster-libs-2.1.0-8.el8.x86_64
pacemaker-libs-2.1.0-8.el8.x86_64
corosynclib-3.1.5-1.el8.x86_64
corosync-3.1.5-1.el8.x86_64

nfs-ganesha-3.4-8.el8rhgs.x86_64
nfs-ganesha-gluster-3.4-8.el8rhgs.x86_64
nfs-ganesha-selinux-3.4-8.el8rhgs.noarch
resource-agents-4.1.1-98.el8.x86_64

How reproducible:
------------------
Always

Steps to Reproduce:
-------------------
1. Create 3 node cluster with RHGS 3.5.5 with RHEL 8.5 platform
2. Create a volume
3. Deploy NFS ganesha using gdeploy

Actual results:
---------------
NFS ganesha deployment fails, HA status as FAILOVER

Expected results:
-----------------
NFS ganesha deployment should succeed with HA status as HEALTHY

Additional info:
-----------------
I have tested the same with RHEL 8.4 and RHGS 3.5.5, everything works good.
But it fails with RHEL 8.5, which indicates this should be a platform specific or HA rpms related regression. So adding the keyword 'Regression'

One another observation is that this error pops up during the execution of 'gluster nfs-ganesha enable' and after that point, all the gluster commands on the node ( where the ganesha deployment is attempted ) is stuck till timeout.

Comment 8 Kaleb KEITHLEY 2021-12-01 15:22:19 UTC

https://github.com/gluster/glusterfs/pull/2999

Comment 9 Kaleb KEITHLEY 2021-12-21 12:36:00 UTC

*** Bug 2033272 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2022-05-31 12:37:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4840