RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2167528 - Fix stonith watchdog timeout
Summary: Fix stonith watchdog timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: rhel-system-roles
Version: 9.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 9.2
Assignee: Tomas Jelinek
QA Contact: michal novacek
Steven J. Levine
URL:
Whiteboard: role:ha_cluster
Depends On:
Blocks: 2167941
TreeView+ depends on / blocked
 
Reported: 2023-02-06 22:48 UTC by Rich Megginson
Modified: 2023-05-09 08:42 UTC (History)
5 users (show)

Fixed In Version: rhel-system-roles-1.21.0-0.19.el9
Doc Type: Bug Fix
Doc Text:
.Setting `stonith-watchdog-timeout` property with the `ha_cluster` System Role now works in a stopped cluster Previously, when you set the `stonith-watchdog-timeout` property with the `ha_cluster` System Role in a stopped cluster, the property reverted to its previous value and the role failed. With this fix, configuring the `stonith-watchdog-timeout` property by using the `ha_cluster` System Role works properly.
Clone Of:
: 2167941 (view as bug list)
Environment:
Last Closed: 2023-05-09 07:38:27 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-147683 0 None None None 2023-02-06 22:50:42 UTC
Red Hat Product Errata RHEA-2023:2246 0 None None None 2023-05-09 07:38:44 UTC

Description Rich Megginson 2023-02-06 22:48:38 UTC
https://github.com/linux-system-roles/ha_cluster/pull/105 

Fix a corner case bug occuring with pcs 0.10.15 and newer and pcs 0.11.4 and newer:
When setting stonith-watchdog-timeout in a stopped cluster, cib.xml.sig is not modified by pcs. This leads to pacemaker ignoring new cib.xml content and read configuration from a previous version of cib, effectively reverting stonith-watchdog-timeout update done by the role. Pacemaker then exits with an error and the role fails, unable to proceed and configure the cluster.

With old pcs versions, removing cib.xml.sig has no adverse effects. There is no need to check for pcs version when removing the file.

Comment 5 michal novacek 2023-02-17 12:41:07 UTC
I have verified that `stonith-watchdog-timeout` is correctly set with `rhel-system-roles-1.21.0-0.19.el9` in the case that the cib.xml is modified when cluster is stopped.

---

> [root@virt-566 ~]# cat test.yml
- hosts: all
  vars:

    ha_cluster_hacluster_password: password
    ha_cluster_enable_repos: False

    ha_cluster_cluster_properties:
      - attrs:
          - name: stonith-watchdog-timeout
            value: 1201

  tasks:
    - import_role:
        name: rhel-system-roles.ha_cluster

> [root@virt-565 playbooks]# rpm -q rhel-system-roles pcs
rhel-system-roles-1.21.0-0.19.el9.noarch
pcs-0.11.4-4.el9.x86_64

> Check that with running cluster cib and signature are "synced".
[root@virt-566 ~]# ls -l /var/lib/pacemaker/cib/cib.xml*
-rw-------. 1 hacluster haclient 2503 Feb 17 12:26 /var/lib/pacemaker/cib/cib.xml
-rw-------. 1 hacluster haclient   32 Feb 17 12:26 /var/lib/pacemaker/cib/cib.xml.sig

> Observe the value of stonith-watchdog-timeout
> [root@virt-566 ~]# pcs property
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: my-cluster
 dc-version: 2.1.5-5.el9-a3f44794f94
 have-watchdog: true
>  stonith-watchdog-timeout: 1202

> Stop the cluster.
> [root@virt-566 ~]# pcs cluster stop --all
virt-566: Stopping Cluster (pacemaker)...
virt-567: Stopping Cluster (pacemaker)...
virt-567: Stopping Cluster (corosync)...
virt-566: Stopping Cluster (corosync)...

> Modify the cib by setting stonith-watchdog-timeout.
> [root@virt-566 ~]# pcs -f  /var/lib/pacemaker/cib/cib.xml property set stonith-watchdog-timeout=1203

> Observe signature not being updated.
> [root@virt-566 ~]# ls -l /var/lib/pacemaker/cib/cib.xml*
-rw-------. 1 hacluster haclient 2515 Feb 17 12:29 /var/lib/pacemaker/cib/cib.xml
-rw-------. 1 hacluster haclient   32 Feb 17 12:26 /var/lib/pacemaker/cib/cib.xml.sig

> run the playbook test.yml

> Observe the value to be correctly set to playbook stonith-watchdog timeout value
> [root@virt-566 ~]# pcs property
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: my-cluster
 dc-version: 2.1.5-5.el9-a3f44794f94
 have-watchdog: true
>  stonith-watchdog-timeout: 1201

Comment 14 errata-xmlrpc 2023-05-09 07:38:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rhel-system-roles bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:2246


Note You need to log in before you can comment on or make changes to this bug.