2022463 – Enabling sbd before starting the cluster sets an incorrect `validate-with` value in /var/lib/pacemaker/cib/cib.xml

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2022463 - Enabling sbd before starting the cluster sets an incorrect `validate-with` value in /var/lib/pacemaker/cib/cib.xml

Summary: Enabling sbd before starting the cluster sets an incorrect `validate-with` va...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	8.5
Hardware:	Unspecified
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	8.6
Assignee:	Tomas Jelinek
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2042433
TreeView+	depends on / blocked

Reported:	2021-11-11 17:16 UTC by Chad Newsom
Modified:	2022-05-10 15:24 UTC (History)
CC List:	11 users (show)
Fixed In Version:	pcs-0.10.12-3.el8
Doc Type:	Bug Fix
Doc Text:	Cause: User sets up a cluster without starting it, then they set up SBD and then they start the cluster. Consequence: Pcs creates a default empty CIB in Pacemaker 1.x format. Various pcs commands or their options do not work, until the CIB is manually upgraded to Pacemaker 2.x format. Fix: Make pcs create an empty CIB in Pacemaker 2.x format. Result: Pcs works with no need to upgrade CIB manually.
Clone Of:
Clones:	2040420 2042433 (view as bug list)
Environment:
Last Closed:	2022-05-10 14:50:48 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-102549	None	None	None	2021-11-11 17:21:23 UTC
Red Hat Knowledge Base (Solution)	6805041	None	None	None	2022-03-09 18:39:34 UTC
Red Hat Product Errata	RHEA-2022:1978	None	None	None	2022-05-10 14:51:23 UTC

Description Chad Newsom 2021-11-11 17:16:57 UTC

Description of problem:

If a cluster is created and then sbd is enabled before starting the cluster, `validate-with` is set to `pacemaker-1.2`. This causes issues creating pacemaker alerts. Alternatively, if the cluster is started *before* sbd is enabled, `validate-with` is set to `pacemaker-3.7`


Version-Release number of selected component (if applicable):

pcs-0.10.10-4.el8.x86_64

How reproducible:


Steps to Reproduce:


 - Set up cluster:

[root@rhel8-node-1 ~]# pcs cluster setup rhel8-cluster-p1 rhel8-node-1.clust rhel8-node-2.clust
No addresses specified for host 'rhel8-node-1.clust', using 'rhel8-node-1.clust'
No addresses specified for host 'rhel8-node-2.clust', using 'rhel8-node-2.clust'
Destroying cluster on hosts: 'rhel8-node-1.clust', 'rhel8-node-2.clust'...
rhel8-node-2.clust: Successfully destroyed cluster
rhel8-node-1.clust: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful removal of the file 'pcsd settings'
rhel8-node-2.clust: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful distribution of the file 'corosync authkey'
rhel8-node-1.clust: successful distribution of the file 'pacemaker authkey'
rhel8-node-2.clust: successful distribution of the file 'corosync authkey'
rhel8-node-2.clust: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful distribution of the file 'corosync.conf'
rhel8-node-2.clust: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

[root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/
total 0

 - Set up sbd:

[root@rhel8-node-1 ~]# sbd -d /dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187 create
Initializing device /dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187
Creating version 2.1 header on device 3 (uuid: cbd6599b-48e7-4bd5-ad40-ce5f64ead0eb)
Initializing 255 slots on device 3
Device /dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187 is initialized.
Did you check sbd service down on all nodes before? If not do so now and restart afterwards.

[root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/
total 0

[root@rhel8-node-1 ~]# pcs stonith sbd enable device=/dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187
Running SBD pre-enabling checks...
rhel8-node-2.clust: SBD pre-enabling checks done
rhel8-node-1.clust: SBD pre-enabling checks done
Distributing SBD config...
rhel8-node-1.clust: SBD config saved
rhel8-node-2.clust: SBD config saved
Enabling sbd...
rhel8-node-2.clust: sbd enabled
rhel8-node-1.clust: sbd enabled
Warning: Cluster restart is required in order to apply these changes.

[root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/
total 16
-rw-r--r-- 1 hacluster haclient 239 Nov 11 11:58 cib-0.raw
-rw-r----- 1 hacluster haclient   1 Nov 11 11:58 cib.last
-rw------- 1 hacluster haclient 307 Nov 11 11:58 cib.xml
-rw------- 1 hacluster haclient  32 Nov 11 11:58 cib.xml.sig


 - `validate-with` is set to `pacemaker-1.2`

[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="2" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Thu Nov 11 11:58:48 2021">

 - Start cluster:

[root@rhel8-node-1 ~]# pcs cluster start --all
rhel8-node-1.clust: Starting Cluster...
rhel8-node-2.clust: Starting Cluster...

[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="3" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Thu Nov 11 12:01:48 2021" update-origin="rhel8-node-2.clust" update-client="crmd" update-user="hacluster">


 - Alert creation fails:


[root@rhel8-node-1 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
Error: Unable to update cib
Call cib_apply_diff failed (-203): Update does not conform to the configured schema

<cib admin_epoch="0" epoch="8" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Thu Nov 11 12:03:26 2021" update-origin="rhel8-node-1.clust" update-client="cibadmin" update-user="root" crm_feature_set="3.11.0" have-quorum="1" dc-uuid="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="true"/>
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.0-8.el8-7c3f660707"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="rhel8-cluster-p1"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="rhel8-node-1.clust"/>
      <node id="2" uname="rhel8-node-2.clust"/>
    </nodes>
    <resources/>
    <constraints/>
    <alerts>
      <alert id="test_alert" path="/var/lib/pacemaker/alert_file.sh"/>
    </alerts>
  </configuration>
  <status>
    <node_state id="1" uname="rhel8-node-1.clust" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="1">
        <lrm_resources/>
      </lrm>
    </node_state>
    <node_state id="2" uname="rhel8-node-2.clust" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="2">
        <lrm_resources/>
      </lrm>
    </node_state>
  </status>
</cib>





 - Starting the cluster prior to sbd setup sets `validate-with` to `pacemaker-3.7`, which allows alerts to be created without issue:

[root@rhel8-node-1 ~]# pcs cluster setup rhel8-cluster-p1 --start rhel8-node-1.clust rhel8-node-2.clust
No addresses specified for host 'rhel8-node-1.clust', using 'rhel8-node-1.clust'
No addresses specified for host 'rhel8-node-2.clust', using 'rhel8-node-2.clust'
Destroying cluster on hosts: 'rhel8-node-1.clust', 'rhel8-node-2.clust'...
rhel8-node-2.clust: Successfully destroyed cluster
rhel8-node-1.clust: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful removal of the file 'pcsd settings'
rhel8-node-2.clust: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful distribution of the file 'corosync authkey'
rhel8-node-1.clust: successful distribution of the file 'pacemaker authkey'
rhel8-node-2.clust: successful distribution of the file 'corosync authkey'
rhel8-node-2.clust: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful distribution of the file 'corosync.conf'
rhel8-node-2.clust: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'rhel8-node-1.clust', 'rhel8-node-2.clust'...


 - cib is created and "validate-with" is 3.7:

[root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/
total 20
-rw------- 1 hacluster haclient 258 Nov 11 10:56 cib-1.raw
-rw------- 1 hacluster haclient  32 Nov 11 10:56 cib-1.raw.sig
-rw-r----- 1 hacluster haclient   1 Nov 11 10:56 cib.last
-rw------- 1 hacluster haclient 446 Nov 11 10:56 cib.xml
-rw------- 1 hacluster haclient  32 Nov 11 10:56 cib.xml.sig


[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib crm_feature_set="3.11.0" validate-with="pacemaker-3.7" epoch="5" num_updates="0" admin_epoch="0" cib-last-written="Thu Nov 11 10:57:05 2021" update-origin="rhel8-node-2.clust" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="2">



Actual results:

"validate-with" is only set to a proper value if the cluster is started *before* enabling sbd

Expected results:

"validate-with" should be 3.7 or a value that allows alerts to be created 

Additional info:

Comment 1 Tomas Jelinek 2021-11-12 08:41:28 UTC

Chad, can you try running 'pcs cluster cib-upgrade' before creating alerts as a workaround and report back? Thanks.

Comment 2 Chad Newsom 2021-12-13 21:39:08 UTC

Hi,

Thank you for the suggestion. I tried `pcs cluster cib-upgrade`. Interestingly, it did not update the `validate-with` value, but I was able to create an alert afterwards:


[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="3" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Mon Dec 13 16:32:44 2021" update-origin="rhel8-node-1.priv" update-client="crmd" update-user="hacluster">
[root@rhel8-node-1 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
Error: Unable to update cib
Call cib_apply_diff failed (-203): Update does not conform to the configured schema

<cib admin_epoch="0" epoch="4" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Mon Dec 13 16:33:00 2021" update-origin="rhel8-node-1.priv" update-client="cibadmin" update-user="root">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options"/>
    </crm_config>
    <nodes>
      <node id="1" uname="rhel8-node-1.priv"/>
      <node id="2" uname="rhel8-node-2.priv"/>
    </nodes>
    <resources/>
    <constraints/>
    <alerts>
      <alert id="test_alert" path="/var/lib/pacemaker/alert_file.sh"/>
    </alerts>
  </configuration>
  <status/>
</cib>

[root@rhel8-node-1 ~]# 
[root@rhel8-node-1 ~]# pcs cluster cib-upgrade
Cluster CIB has been upgraded to latest version
[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="7" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Mon Dec 13 16:33:06 2021" update-origin="rhel8-node-2.priv" update-client="crmd" update-user="hacluster" crm_feature_set="3.11.0" have-quorum="1" dc-uuid="2">
[root@rhel8-node-1 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
[root@rhel8-node-1 ~]# 
[root@rhel8-node-1 ~]# pcs config | grep Alert
Alerts:
 Alert: test_alert (path=/var/lib/pacemaker/alert_file.sh)

Comment 3 Chad Newsom 2021-12-21 15:46:46 UTC

Hi,

The customer followed up and confirmed that the behavior is the same in their environment:

 - pcs cluster cib-upgrade returns the same message that you have ("Cluster CIB has been upgraded to latest version")

 - cib.xml on disk is NOT changed after running the 'cib-upgrade' command

 - pcs alert create does succeed if it is run after the pcs cluster cib-upgrade command (so this is a reasonable workaround)

 - cib.xml on disk IS changed after the pcs alert create command (validate-with changes to "pacemaker-3.7", admin_epoch increments from 0 to 1, epoch decreases from 7 to 2)

Comment 6 Tomas Jelinek 2022-01-14 09:30:35 UTC

Upstream fix + tests:
https://github.com/ClusterLabs/pcs/commit/9d6375aef6c05763269494623e62c769c189399f

Reproducer / test in comment 0

Comment 7 Miroslav Lisik 2022-01-14 14:26:04 UTC

DevTestResults:

[root@r8-node-01 pcs]# rpm -q pcs
pcs-0.10.12-3.el8.x86_64

[root@r8-node-01 pcs]# pcs host auth -u hacluster -p $PASSWORD r8-node-0{1..3}
r8-node-03: Authorized
r8-node-01: Authorized
r8-node-02: Authorized
[root@r8-node-01 pcs]# pcs cluster setup HACluster r8-node-0{1..3}
No addresses specified for host 'r8-node-01', using 'r8-node-01'
No addresses specified for host 'r8-node-02', using 'r8-node-02'
No addresses specified for host 'r8-node-03', using 'r8-node-03'
Destroying cluster on hosts: 'r8-node-01', 'r8-node-02', 'r8-node-03'...
r8-node-01: Successfully destroyed cluster
r8-node-03: Successfully destroyed cluster
r8-node-02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'r8-node-01', 'r8-node-02', 'r8-node-03'
r8-node-01: successful removal of the file 'pcsd settings'
r8-node-03: successful removal of the file 'pcsd settings'
r8-node-02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'r8-node-01', 'r8-node-02', 'r8-node-03'
r8-node-01: successful distribution of the file 'corosync authkey'
r8-node-01: successful distribution of the file 'pacemaker authkey'
r8-node-03: successful distribution of the file 'corosync authkey'
r8-node-03: successful distribution of the file 'pacemaker authkey'
r8-node-02: successful distribution of the file 'corosync authkey'
r8-node-02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'r8-node-01', 'r8-node-02', 'r8-node-03'
r8-node-01: successful distribution of the file 'corosync.conf'
r8-node-03: successful distribution of the file 'corosync.conf'
r8-node-02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
[root@r8-node-01 pcs]# ls -l /var/lib/pacemaker/cib/
total 0

[root@r8-node-01 pcs]# pcs stonith sbd device setup device=/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063
WARNING: All current content on device(s) '/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063' will be overwritten. Are you sure you want to continue? [y/N] y
Initializing device '/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063'...
Device initialized successfully
[root@r8-node-01 pcs]# pcs stonith sbd enable device=/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063
Running SBD pre-enabling checks...
r8-node-01: SBD pre-enabling checks done
r8-node-03: SBD pre-enabling checks done
r8-node-02: SBD pre-enabling checks done
Distributing SBD config...
r8-node-01: SBD config saved
r8-node-03: SBD config saved
r8-node-02: SBD config saved
Enabling sbd...
r8-node-03: sbd enabled
r8-node-01: sbd enabled
r8-node-02: sbd enabled
Warning: Cluster restart is required in order to apply these changes.
[root@r8-node-01 pcs]# ls -l /var/lib/pacemaker/cib/
total 16
-rw-r--r--. 1 hacluster haclient 278 Jan 14 13:20 cib-0.raw
-rw-r-----. 1 hacluster haclient   1 Jan 14 13:20 cib.last
-rw-------. 1 hacluster haclient 307 Jan 14 13:20 cib.xml
-rw-------. 1 hacluster haclient  32 Jan 14 13:20 cib.xml.sig
[root@r8-node-01 pcs]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="2" num_updates="0" validate-with="pacemaker-3.1" cib-last-written="Fri Jan 14 13:20:09 2022">
[root@r8-node-01 pcs]# pcs cluster start --all
r8-node-03: Starting Cluster...
r8-node-02: Starting Cluster...
r8-node-01: Starting Cluster...
[root@r8-node-01 pcs]# pcs cluster cib | grep validate-with
<cib admin_epoch="0" epoch="7" num_updates="5" validate-with="pacemaker-3.1" cib-last-written="Fri Jan 14 13:22:08 2022" update-origin="r8-node-02" update-client="crmd" update-user="hacluster" crm_feature_set="3.11.0" have-quorum="1" dc-uuid="2">
[root@r8-node-01 pcs]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
[root@r8-node-01 pcs]# echo $?
0

Comment 15 errata-xmlrpc 2022-05-10 14:50:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:1978

Note You need to log in before you can comment on or make changes to this bug.