Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2040420

Summary:	Enabling sbd before starting the cluster sets an incorrect `validate-with` value in /var/lib/pacemaker/cib/cib.xml
Product:	Red Hat Enterprise Linux 9	Reporter:	Tomas Jelinek <tojeline>
Component:	pcs	Assignee:	Tomas Jelinek <tojeline>
Status:	CLOSED ERRATA	QA Contact:	cluster-qe <cluster-qe>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	9.0	CC:	cluster-maint, cluster-qe, cnewsom, idevat, kmalyjur, mlisik, mmazoure, mpospisi, nhostako, omular, svalasti, tojeline, troy.engel
Target Milestone:	rc	Keywords:	EasyFix, Regression, Triaged
Target Release:	9.0	Flags:	pm-rhel: mirror+
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	pcs-0.11.1-8.el9	Doc Type:	Bug Fix
Doc Text:	Cause: User sets up a cluster without starting it, then they set up SBD and then they start the cluster. Consequence: Pcs creates a default empty CIB in Pacemaker 1.x format. Various pcs commands or their options do not work, until the CIB is manually upgraded to Pacemaker 2.x format. Fix: Make pcs create an empty CIB in Pacemaker 2.x format. Result: Pcs works with no need to upgrade CIB manually.	Story Points:	---
Clone Of:	2022463	Environment:
Last Closed:	2022-05-17 12:19:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Tomas Jelinek 2022-01-13 16:46:07 UTC

+++ This bug was initially created as a clone of Bug #2022463 +++

Description of problem:

If a cluster is created and then sbd is enabled before starting the cluster, `validate-with` is set to `pacemaker-1.2`. This causes issues creating pacemaker alerts. Alternatively, if the cluster is started *before* sbd is enabled, `validate-with` is set to `pacemaker-3.7`


Version-Release number of selected component (if applicable):

pcs-0.10.10-4.el8.x86_64

How reproducible:


Steps to Reproduce:


 - Set up cluster:

[root@rhel8-node-1 ~]# pcs cluster setup rhel8-cluster-p1 rhel8-node-1.clust rhel8-node-2.clust
No addresses specified for host 'rhel8-node-1.clust', using 'rhel8-node-1.clust'
No addresses specified for host 'rhel8-node-2.clust', using 'rhel8-node-2.clust'
Destroying cluster on hosts: 'rhel8-node-1.clust', 'rhel8-node-2.clust'...
rhel8-node-2.clust: Successfully destroyed cluster
rhel8-node-1.clust: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful removal of the file 'pcsd settings'
rhel8-node-2.clust: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful distribution of the file 'corosync authkey'
rhel8-node-1.clust: successful distribution of the file 'pacemaker authkey'
rhel8-node-2.clust: successful distribution of the file 'corosync authkey'
rhel8-node-2.clust: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful distribution of the file 'corosync.conf'
rhel8-node-2.clust: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

[root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/
total 0

 - Set up sbd:

[root@rhel8-node-1 ~]# sbd -d /dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187 create
Initializing device /dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187
Creating version 2.1 header on device 3 (uuid: cbd6599b-48e7-4bd5-ad40-ce5f64ead0eb)
Initializing 255 slots on device 3
Device /dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187 is initialized.
Did you check sbd service down on all nodes before? If not do so now and restart afterwards.

[root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/
total 0

[root@rhel8-node-1 ~]# pcs stonith sbd enable device=/dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187
Running SBD pre-enabling checks...
rhel8-node-2.clust: SBD pre-enabling checks done
rhel8-node-1.clust: SBD pre-enabling checks done
Distributing SBD config...
rhel8-node-1.clust: SBD config saved
rhel8-node-2.clust: SBD config saved
Enabling sbd...
rhel8-node-2.clust: sbd enabled
rhel8-node-1.clust: sbd enabled
Warning: Cluster restart is required in order to apply these changes.

[root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/
total 16
-rw-r--r-- 1 hacluster haclient 239 Nov 11 11:58 cib-0.raw
-rw-r----- 1 hacluster haclient   1 Nov 11 11:58 cib.last
-rw------- 1 hacluster haclient 307 Nov 11 11:58 cib.xml
-rw------- 1 hacluster haclient  32 Nov 11 11:58 cib.xml.sig


 - `validate-with` is set to `pacemaker-1.2`

[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="2" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Thu Nov 11 11:58:48 2021">

 - Start cluster:

[root@rhel8-node-1 ~]# pcs cluster start --all
rhel8-node-1.clust: Starting Cluster...
rhel8-node-2.clust: Starting Cluster...

[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="3" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Thu Nov 11 12:01:48 2021" update-origin="rhel8-node-2.clust" update-client="crmd" update-user="hacluster">


 - Alert creation fails:


[root@rhel8-node-1 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
Error: Unable to update cib
Call cib_apply_diff failed (-203): Update does not conform to the configured schema

<cib admin_epoch="0" epoch="8" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Thu Nov 11 12:03:26 2021" update-origin="rhel8-node-1.clust" update-client="cibadmin" update-user="root" crm_feature_set="3.11.0" have-quorum="1" dc-uuid="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="true"/>
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.0-8.el8-7c3f660707"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="rhel8-cluster-p1"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="rhel8-node-1.clust"/>
      <node id="2" uname="rhel8-node-2.clust"/>
    </nodes>
    <resources/>
    <constraints/>
    <alerts>
      <alert id="test_alert" path="/var/lib/pacemaker/alert_file.sh"/>
    </alerts>
  </configuration>
  <status>
    <node_state id="1" uname="rhel8-node-1.clust" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="1">
        <lrm_resources/>
      </lrm>
    </node_state>
    <node_state id="2" uname="rhel8-node-2.clust" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="2">
        <lrm_resources/>
      </lrm>
    </node_state>
  </status>
</cib>





 - Starting the cluster prior to sbd setup sets `validate-with` to `pacemaker-3.7`, which allows alerts to be created without issue:

[root@rhel8-node-1 ~]# pcs cluster setup rhel8-cluster-p1 --start rhel8-node-1.clust rhel8-node-2.clust
No addresses specified for host 'rhel8-node-1.clust', using 'rhel8-node-1.clust'
No addresses specified for host 'rhel8-node-2.clust', using 'rhel8-node-2.clust'
Destroying cluster on hosts: 'rhel8-node-1.clust', 'rhel8-node-2.clust'...
rhel8-node-2.clust: Successfully destroyed cluster
rhel8-node-1.clust: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful removal of the file 'pcsd settings'
rhel8-node-2.clust: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful distribution of the file 'corosync authkey'
rhel8-node-1.clust: successful distribution of the file 'pacemaker authkey'
rhel8-node-2.clust: successful distribution of the file 'corosync authkey'
rhel8-node-2.clust: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'rhel8-node-1.clust', 'rhel8-node-2.clust'
rhel8-node-1.clust: successful distribution of the file 'corosync.conf'
rhel8-node-2.clust: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'rhel8-node-1.clust', 'rhel8-node-2.clust'...


 - cib is created and "validate-with" is 3.7:

[root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/
total 20
-rw------- 1 hacluster haclient 258 Nov 11 10:56 cib-1.raw
-rw------- 1 hacluster haclient  32 Nov 11 10:56 cib-1.raw.sig
-rw-r----- 1 hacluster haclient   1 Nov 11 10:56 cib.last
-rw------- 1 hacluster haclient 446 Nov 11 10:56 cib.xml
-rw------- 1 hacluster haclient  32 Nov 11 10:56 cib.xml.sig


[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib crm_feature_set="3.11.0" validate-with="pacemaker-3.7" epoch="5" num_updates="0" admin_epoch="0" cib-last-written="Thu Nov 11 10:57:05 2021" update-origin="rhel8-node-2.clust" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="2">



Actual results:

"validate-with" is only set to a proper value if the cluster is started *before* enabling sbd

Expected results:

"validate-with" should be 3.7 or a value that allows alerts to be created 

Additional info:

--- Additional comment from Tomas Jelinek on 2021-11-12 09:41:28 CET ---

Chad, can you try running 'pcs cluster cib-upgrade' before creating alerts as a workaround and report back? Thanks.

--- Additional comment from Chad Newsom on 2021-12-13 22:39:08 CET ---

Hi,

Thank you for the suggestion. I tried `pcs cluster cib-upgrade`. Interestingly, it did not update the `validate-with` value, but I was able to create an alert afterwards:


[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="3" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Mon Dec 13 16:32:44 2021" update-origin="rhel8-node-1.priv" update-client="crmd" update-user="hacluster">
[root@rhel8-node-1 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
Error: Unable to update cib
Call cib_apply_diff failed (-203): Update does not conform to the configured schema

<cib admin_epoch="0" epoch="4" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Mon Dec 13 16:33:00 2021" update-origin="rhel8-node-1.priv" update-client="cibadmin" update-user="root">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options"/>
    </crm_config>
    <nodes>
      <node id="1" uname="rhel8-node-1.priv"/>
      <node id="2" uname="rhel8-node-2.priv"/>
    </nodes>
    <resources/>
    <constraints/>
    <alerts>
      <alert id="test_alert" path="/var/lib/pacemaker/alert_file.sh"/>
    </alerts>
  </configuration>
  <status/>
</cib>

[root@rhel8-node-1 ~]# 
[root@rhel8-node-1 ~]# pcs cluster cib-upgrade
Cluster CIB has been upgraded to latest version
[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="7" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Mon Dec 13 16:33:06 2021" update-origin="rhel8-node-2.priv" update-client="crmd" update-user="hacluster" crm_feature_set="3.11.0" have-quorum="1" dc-uuid="2">
[root@rhel8-node-1 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
[root@rhel8-node-1 ~]# 
[root@rhel8-node-1 ~]# pcs config | grep Alert
Alerts:
 Alert: test_alert (path=/var/lib/pacemaker/alert_file.sh)

--- Additional comment from Chad Newsom on 2021-12-21 16:46:46 CET ---

Hi,

The customer followed up and confirmed that the behavior is the same in their environment:

 - pcs cluster cib-upgrade returns the same message that you have ("Cluster CIB has been upgraded to latest version")

 - cib.xml on disk is NOT changed after running the 'cib-upgrade' command

 - pcs alert create does succeed if it is run after the pcs cluster cib-upgrade command (so this is a reasonable workaround)

 - cib.xml on disk IS changed after the pcs alert create command (validate-with changes to "pacemaker-3.7", admin_epoch increments from 0 to 1, epoch decreases from 7 to 2)

Comment 2 Tomas Jelinek 2022-01-14 09:29:57 UTC

Upstream fix + tests:
https://github.com/ClusterLabs/pcs/commit/2b9562296e4f42d31544459d44097c2145005b67

Reproducer / test in comment 0

Comment 3 Miroslav Lisik 2022-01-14 17:47:15 UTC

DevTestResults:

[root@r90-node-01 ~]# rpm -q pcs
pcs-0.11.1-8.el9.x86_64

[root@r90-node-01 ~]# pcs host auth -u hacluster -p password r90-node-0{1..2}
r90-node-01: Authorized
r90-node-02: Authorized
[root@r90-node-01 ~]# pcs cluster setup HACLuster r90-node-0{1..2}
No addresses specified for host 'r90-node-01', using 'r90-node-01'
No addresses specified for host 'r90-node-02', using 'r90-node-02'
Destroying cluster on hosts: 'r90-node-01', 'r90-node-02'...
r90-node-01: Successfully destroyed cluster
r90-node-02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'r90-node-01', 'r90-node-02'
r90-node-01: successful removal of the file 'pcsd settings'
r90-node-02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'r90-node-01', 'r90-node-02'
r90-node-01: successful distribution of the file 'corosync authkey'
r90-node-01: successful distribution of the file 'pacemaker authkey'
r90-node-02: successful distribution of the file 'corosync authkey'
r90-node-02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'r90-node-01', 'r90-node-02'
r90-node-01: successful distribution of the file 'corosync.conf'
r90-node-02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
[root@r90-node-01 ~]# ls -l /var/lib/pacemaker/cib/
total 0
[root@r90-node-01 ~]# pcs stonith sbd enable
Running SBD pre-enabling checks...
r90-node-01: SBD pre-enabling checks done
r90-node-02: SBD pre-enabling checks done
Warning: auto_tie_breaker quorum option will be enabled to make SBD fencing effective. Cluster has to be offline to be able to make this change.
Checking corosync is not running on nodes...
r90-node-02: corosync is not running
r90-node-01: corosync is not running
Sending updated corosync.conf to nodes...
r90-node-01: Succeeded
r90-node-02: Succeeded
Distributing SBD config...
r90-node-01: SBD config saved
r90-node-02: SBD config saved
Enabling sbd...
r90-node-01: sbd enabled
r90-node-02: sbd enabled
Warning: Cluster restart is required in order to apply these changes.
[root@r90-node-01 ~]# ls -l /var/lib/pacemaker/cib/
total 16
-rw-r--r--. 1 hacluster haclient 278 Jan 14 15:51 cib-0.raw
-rw-r-----. 1 hacluster haclient   1 Jan 14 15:51 cib.last
-rw-------. 1 hacluster haclient 307 Jan 14 15:51 cib.xml
-rw-------. 1 hacluster haclient  32 Jan 14 15:51 cib.xml.sig
[root@r90-node-01 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="2" num_updates="0" validate-with="pacemaker-3.1" cib-last-written="Fri Jan 14 15:51:43 2022">
[root@r90-node-01 ~]# pcs cluster start --all
r90-node-01: Starting Cluster...
r90-node-02: Starting Cluster...
[root@r90-node-01 ~]# pcs cluster cib | grep validate-with
<cib admin_epoch="0" epoch="3" num_updates="0" validate-with="pacemaker-3.1" cib-last-written="Fri Jan 14 15:52:18 2022" update-origin="r90-node-01" update-client="crmd" update-user="hacluster">
[root@r90-node-01 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
[root@r90-node-01 ~]# echo $?
0

Comment 10 errata-xmlrpc 2022-05-17 12:19:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: pcs), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2290