Previously, the `crmadmin` command waited forever or for 83 mins instead of timing out at 5 s, and glusterd waited for 2 mins for the setup command to complete before its own timeout. This is because `pacemaker-2.1.x` changed the semantics of the `--timeout` command line parameter for the `crmadmin` utility. The value was an integer that specified a timeout in milliseconds.
With this update, the value is time specific, for example, 5 s, and defaults to seconds if the value is an integer. Now, the `crmadmin` command times out after 5 s as it did with the previous version of pacemaker.
Description of problem:
-----------------------
3 RHGS 3.5.5 nodes are installed ith RHGS 3.5.5 ISO based on RHEL 8.4. The nodes are subscribed to baseos, appstream, high-availability repos. The nodes are upgraded to RHEL 8.5.
nfs-ganesha deployment fails in the step 'gluster nfs-ganesha enable'
and cluster HA status is FAILOVER
<snip>
TASK [Enable nfs-ganesha] ********************************************************************************************************************************************************************
fatal: [dhcp35-137.lab.eng.blr.redhat.com]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/libexec/platform-python"}, "changed": true, "cmd": "gluster nfs-ganesha enable --mode=script", "delta": "0:10:00.111919", "end": "2021-11-26 00:15:17.630279", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-11-26 00:05:17.518360", "stderr": "", "stderr_lines": [], "stdout": "This will take a few minutes to complete. Please wait ..\nError : Request timed out", "stdout_lines": ["This will take a few minutes to complete. Please wait ..", "Error : Request timed out"]}
...ignoring
</snip>
Version-Release number of selected component (if applicable):
---------------------------------------------------------------
RHGS 3.5.5 ( glusterfs-6.0-59.el8rhgs )
RHEL 8.5 ( 4.18.0-348.2.1.el8_5.x86_64 )
pacemaker-cli-2.1.0-8.el8.x86_64
pacemaker-schemas-2.1.0-8.el8.noarch
pacemaker-2.1.0-8.el8.x86_64
pacemaker-cluster-libs-2.1.0-8.el8.x86_64
pacemaker-libs-2.1.0-8.el8.x86_64
corosynclib-3.1.5-1.el8.x86_64
corosync-3.1.5-1.el8.x86_64
pcs-0.10.10-4.el8.x86_64
pacemaker-cli-2.1.0-8.el8.x86_64
pacemaker-schemas-2.1.0-8.el8.noarch
pacemaker-2.1.0-8.el8.x86_64
pacemaker-cluster-libs-2.1.0-8.el8.x86_64
pacemaker-libs-2.1.0-8.el8.x86_64
corosynclib-3.1.5-1.el8.x86_64
corosync-3.1.5-1.el8.x86_64
nfs-ganesha-3.4-8.el8rhgs.x86_64
nfs-ganesha-gluster-3.4-8.el8rhgs.x86_64
nfs-ganesha-selinux-3.4-8.el8rhgs.noarch
resource-agents-4.1.1-98.el8.x86_64
How reproducible:
------------------
Always
Steps to Reproduce:
-------------------
1. Create 3 node cluster with RHGS 3.5.5 with RHEL 8.5 platform
2. Create a volume
3. Deploy NFS ganesha using gdeploy
Actual results:
---------------
NFS ganesha deployment fails, HA status as FAILOVER
Expected results:
-----------------
NFS ganesha deployment should succeed with HA status as HEALTHY
Additional info:
-----------------
I have tested the same with RHEL 8.4 and RHGS 3.5.5, everything works good.
But it fails with RHEL 8.5, which indicates this should be a platform specific or HA rpms related regression. So adding the keyword 'Regression'
One another observation is that this error pops up during the execution of 'gluster nfs-ganesha enable' and after that point, all the gluster commands on the node ( where the ganesha deployment is attempted ) is stuck till timeout.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (glusterfs bug fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:4840
Description of problem: ----------------------- 3 RHGS 3.5.5 nodes are installed ith RHGS 3.5.5 ISO based on RHEL 8.4. The nodes are subscribed to baseos, appstream, high-availability repos. The nodes are upgraded to RHEL 8.5. nfs-ganesha deployment fails in the step 'gluster nfs-ganesha enable' and cluster HA status is FAILOVER <snip> TASK [Enable nfs-ganesha] ******************************************************************************************************************************************************************** fatal: [dhcp35-137.lab.eng.blr.redhat.com]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/libexec/platform-python"}, "changed": true, "cmd": "gluster nfs-ganesha enable --mode=script", "delta": "0:10:00.111919", "end": "2021-11-26 00:15:17.630279", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-11-26 00:05:17.518360", "stderr": "", "stderr_lines": [], "stdout": "This will take a few minutes to complete. Please wait ..\nError : Request timed out", "stdout_lines": ["This will take a few minutes to complete. Please wait ..", "Error : Request timed out"]} ...ignoring </snip> Version-Release number of selected component (if applicable): --------------------------------------------------------------- RHGS 3.5.5 ( glusterfs-6.0-59.el8rhgs ) RHEL 8.5 ( 4.18.0-348.2.1.el8_5.x86_64 ) pacemaker-cli-2.1.0-8.el8.x86_64 pacemaker-schemas-2.1.0-8.el8.noarch pacemaker-2.1.0-8.el8.x86_64 pacemaker-cluster-libs-2.1.0-8.el8.x86_64 pacemaker-libs-2.1.0-8.el8.x86_64 corosynclib-3.1.5-1.el8.x86_64 corosync-3.1.5-1.el8.x86_64 pcs-0.10.10-4.el8.x86_64 pacemaker-cli-2.1.0-8.el8.x86_64 pacemaker-schemas-2.1.0-8.el8.noarch pacemaker-2.1.0-8.el8.x86_64 pacemaker-cluster-libs-2.1.0-8.el8.x86_64 pacemaker-libs-2.1.0-8.el8.x86_64 corosynclib-3.1.5-1.el8.x86_64 corosync-3.1.5-1.el8.x86_64 nfs-ganesha-3.4-8.el8rhgs.x86_64 nfs-ganesha-gluster-3.4-8.el8rhgs.x86_64 nfs-ganesha-selinux-3.4-8.el8rhgs.noarch resource-agents-4.1.1-98.el8.x86_64 How reproducible: ------------------ Always Steps to Reproduce: ------------------- 1. Create 3 node cluster with RHGS 3.5.5 with RHEL 8.5 platform 2. Create a volume 3. Deploy NFS ganesha using gdeploy Actual results: --------------- NFS ganesha deployment fails, HA status as FAILOVER Expected results: ----------------- NFS ganesha deployment should succeed with HA status as HEALTHY Additional info: ----------------- I have tested the same with RHEL 8.4 and RHGS 3.5.5, everything works good. But it fails with RHEL 8.5, which indicates this should be a platform specific or HA rpms related regression. So adding the keyword 'Regression' One another observation is that this error pops up during the execution of 'gluster nfs-ganesha enable' and after that point, all the gluster commands on the node ( where the ganesha deployment is attempted ) is stuck till timeout.