Bug 2022463
| Summary: | Enabling sbd before starting the cluster sets an incorrect `validate-with` value in /var/lib/pacemaker/cib/cib.xml | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Chad Newsom <cnewsom> | |
| Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 8.5 | CC: | cluster-maint, idevat, kmalyjur, mlisik, mmazoure, mpospisi, nhostako, omular, svalasti, tojeline, troy.engel | |
| Target Milestone: | rc | Keywords: | EasyFix, Regression, Triaged, ZStream | |
| Target Release: | 8.6 | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | pcs-0.10.12-3.el8 | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
User sets up a cluster without starting it, then they set up SBD and then they start the cluster.
Consequence:
Pcs creates a default empty CIB in Pacemaker 1.x format. Various pcs commands or their options do not work, until the CIB is manually upgraded to Pacemaker 2.x format.
Fix:
Make pcs create an empty CIB in Pacemaker 2.x format.
Result:
Pcs works with no need to upgrade CIB manually.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 2040420 2042433 (view as bug list) | Environment: | ||
| Last Closed: | 2022-05-10 14:50:48 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2042433 | |||
Chad, can you try running 'pcs cluster cib-upgrade' before creating alerts as a workaround and report back? Thanks. Hi,
Thank you for the suggestion. I tried `pcs cluster cib-upgrade`. Interestingly, it did not update the `validate-with` value, but I was able to create an alert afterwards:
[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="3" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Mon Dec 13 16:32:44 2021" update-origin="rhel8-node-1.priv" update-client="crmd" update-user="hacluster">
[root@rhel8-node-1 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
Error: Unable to update cib
Call cib_apply_diff failed (-203): Update does not conform to the configured schema
<cib admin_epoch="0" epoch="4" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Mon Dec 13 16:33:00 2021" update-origin="rhel8-node-1.priv" update-client="cibadmin" update-user="root">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options"/>
</crm_config>
<nodes>
<node id="1" uname="rhel8-node-1.priv"/>
<node id="2" uname="rhel8-node-2.priv"/>
</nodes>
<resources/>
<constraints/>
<alerts>
<alert id="test_alert" path="/var/lib/pacemaker/alert_file.sh"/>
</alerts>
</configuration>
<status/>
</cib>
[root@rhel8-node-1 ~]#
[root@rhel8-node-1 ~]# pcs cluster cib-upgrade
Cluster CIB has been upgraded to latest version
[root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="7" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Mon Dec 13 16:33:06 2021" update-origin="rhel8-node-2.priv" update-client="crmd" update-user="hacluster" crm_feature_set="3.11.0" have-quorum="1" dc-uuid="2">
[root@rhel8-node-1 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
[root@rhel8-node-1 ~]#
[root@rhel8-node-1 ~]# pcs config | grep Alert
Alerts:
Alert: test_alert (path=/var/lib/pacemaker/alert_file.sh)
Hi,
The customer followed up and confirmed that the behavior is the same in their environment:
- pcs cluster cib-upgrade returns the same message that you have ("Cluster CIB has been upgraded to latest version")
- cib.xml on disk is NOT changed after running the 'cib-upgrade' command
- pcs alert create does succeed if it is run after the pcs cluster cib-upgrade command (so this is a reasonable workaround)
- cib.xml on disk IS changed after the pcs alert create command (validate-with changes to "pacemaker-3.7", admin_epoch increments from 0 to 1, epoch decreases from 7 to 2)
Upstream fix + tests: https://github.com/ClusterLabs/pcs/commit/9d6375aef6c05763269494623e62c769c189399f Reproducer / test in comment 0 DevTestResults:
[root@r8-node-01 pcs]# rpm -q pcs
pcs-0.10.12-3.el8.x86_64
[root@r8-node-01 pcs]# pcs host auth -u hacluster -p $PASSWORD r8-node-0{1..3}
r8-node-03: Authorized
r8-node-01: Authorized
r8-node-02: Authorized
[root@r8-node-01 pcs]# pcs cluster setup HACluster r8-node-0{1..3}
No addresses specified for host 'r8-node-01', using 'r8-node-01'
No addresses specified for host 'r8-node-02', using 'r8-node-02'
No addresses specified for host 'r8-node-03', using 'r8-node-03'
Destroying cluster on hosts: 'r8-node-01', 'r8-node-02', 'r8-node-03'...
r8-node-01: Successfully destroyed cluster
r8-node-03: Successfully destroyed cluster
r8-node-02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'r8-node-01', 'r8-node-02', 'r8-node-03'
r8-node-01: successful removal of the file 'pcsd settings'
r8-node-03: successful removal of the file 'pcsd settings'
r8-node-02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'r8-node-01', 'r8-node-02', 'r8-node-03'
r8-node-01: successful distribution of the file 'corosync authkey'
r8-node-01: successful distribution of the file 'pacemaker authkey'
r8-node-03: successful distribution of the file 'corosync authkey'
r8-node-03: successful distribution of the file 'pacemaker authkey'
r8-node-02: successful distribution of the file 'corosync authkey'
r8-node-02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'r8-node-01', 'r8-node-02', 'r8-node-03'
r8-node-01: successful distribution of the file 'corosync.conf'
r8-node-03: successful distribution of the file 'corosync.conf'
r8-node-02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
[root@r8-node-01 pcs]# ls -l /var/lib/pacemaker/cib/
total 0
[root@r8-node-01 pcs]# pcs stonith sbd device setup device=/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063
WARNING: All current content on device(s) '/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063' will be overwritten. Are you sure you want to continue? [y/N] y
Initializing device '/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063'...
Device initialized successfully
[root@r8-node-01 pcs]# pcs stonith sbd enable device=/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063
Running SBD pre-enabling checks...
r8-node-01: SBD pre-enabling checks done
r8-node-03: SBD pre-enabling checks done
r8-node-02: SBD pre-enabling checks done
Distributing SBD config...
r8-node-01: SBD config saved
r8-node-03: SBD config saved
r8-node-02: SBD config saved
Enabling sbd...
r8-node-03: sbd enabled
r8-node-01: sbd enabled
r8-node-02: sbd enabled
Warning: Cluster restart is required in order to apply these changes.
[root@r8-node-01 pcs]# ls -l /var/lib/pacemaker/cib/
total 16
-rw-r--r--. 1 hacluster haclient 278 Jan 14 13:20 cib-0.raw
-rw-r-----. 1 hacluster haclient 1 Jan 14 13:20 cib.last
-rw-------. 1 hacluster haclient 307 Jan 14 13:20 cib.xml
-rw-------. 1 hacluster haclient 32 Jan 14 13:20 cib.xml.sig
[root@r8-node-01 pcs]# grep validate-with /var/lib/pacemaker/cib/cib.xml
<cib admin_epoch="0" epoch="2" num_updates="0" validate-with="pacemaker-3.1" cib-last-written="Fri Jan 14 13:20:09 2022">
[root@r8-node-01 pcs]# pcs cluster start --all
r8-node-03: Starting Cluster...
r8-node-02: Starting Cluster...
r8-node-01: Starting Cluster...
[root@r8-node-01 pcs]# pcs cluster cib | grep validate-with
<cib admin_epoch="0" epoch="7" num_updates="5" validate-with="pacemaker-3.1" cib-last-written="Fri Jan 14 13:22:08 2022" update-origin="r8-node-02" update-client="crmd" update-user="hacluster" crm_feature_set="3.11.0" have-quorum="1" dc-uuid="2">
[root@r8-node-01 pcs]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert
[root@r8-node-01 pcs]# echo $?
0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pcs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:1978 |
Description of problem: If a cluster is created and then sbd is enabled before starting the cluster, `validate-with` is set to `pacemaker-1.2`. This causes issues creating pacemaker alerts. Alternatively, if the cluster is started *before* sbd is enabled, `validate-with` is set to `pacemaker-3.7` Version-Release number of selected component (if applicable): pcs-0.10.10-4.el8.x86_64 How reproducible: Steps to Reproduce: - Set up cluster: [root@rhel8-node-1 ~]# pcs cluster setup rhel8-cluster-p1 rhel8-node-1.clust rhel8-node-2.clust No addresses specified for host 'rhel8-node-1.clust', using 'rhel8-node-1.clust' No addresses specified for host 'rhel8-node-2.clust', using 'rhel8-node-2.clust' Destroying cluster on hosts: 'rhel8-node-1.clust', 'rhel8-node-2.clust'... rhel8-node-2.clust: Successfully destroyed cluster rhel8-node-1.clust: Successfully destroyed cluster Requesting remove 'pcsd settings' from 'rhel8-node-1.clust', 'rhel8-node-2.clust' rhel8-node-1.clust: successful removal of the file 'pcsd settings' rhel8-node-2.clust: successful removal of the file 'pcsd settings' Sending 'corosync authkey', 'pacemaker authkey' to 'rhel8-node-1.clust', 'rhel8-node-2.clust' rhel8-node-1.clust: successful distribution of the file 'corosync authkey' rhel8-node-1.clust: successful distribution of the file 'pacemaker authkey' rhel8-node-2.clust: successful distribution of the file 'corosync authkey' rhel8-node-2.clust: successful distribution of the file 'pacemaker authkey' Sending 'corosync.conf' to 'rhel8-node-1.clust', 'rhel8-node-2.clust' rhel8-node-1.clust: successful distribution of the file 'corosync.conf' rhel8-node-2.clust: successful distribution of the file 'corosync.conf' Cluster has been successfully set up. [root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/ total 0 - Set up sbd: [root@rhel8-node-1 ~]# sbd -d /dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187 create Initializing device /dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187 Creating version 2.1 header on device 3 (uuid: cbd6599b-48e7-4bd5-ad40-ce5f64ead0eb) Initializing 255 slots on device 3 Device /dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187 is initialized. Did you check sbd service down on all nodes before? If not do so now and restart afterwards. [root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/ total 0 [root@rhel8-node-1 ~]# pcs stonith sbd enable device=/dev/disk/by-id/scsi-36001405fc9745aec6a1408baa65f8187 Running SBD pre-enabling checks... rhel8-node-2.clust: SBD pre-enabling checks done rhel8-node-1.clust: SBD pre-enabling checks done Distributing SBD config... rhel8-node-1.clust: SBD config saved rhel8-node-2.clust: SBD config saved Enabling sbd... rhel8-node-2.clust: sbd enabled rhel8-node-1.clust: sbd enabled Warning: Cluster restart is required in order to apply these changes. [root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/ total 16 -rw-r--r-- 1 hacluster haclient 239 Nov 11 11:58 cib-0.raw -rw-r----- 1 hacluster haclient 1 Nov 11 11:58 cib.last -rw------- 1 hacluster haclient 307 Nov 11 11:58 cib.xml -rw------- 1 hacluster haclient 32 Nov 11 11:58 cib.xml.sig - `validate-with` is set to `pacemaker-1.2` [root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml <cib admin_epoch="0" epoch="2" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Thu Nov 11 11:58:48 2021"> - Start cluster: [root@rhel8-node-1 ~]# pcs cluster start --all rhel8-node-1.clust: Starting Cluster... rhel8-node-2.clust: Starting Cluster... [root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml <cib admin_epoch="0" epoch="3" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Thu Nov 11 12:01:48 2021" update-origin="rhel8-node-2.clust" update-client="crmd" update-user="hacluster"> - Alert creation fails: [root@rhel8-node-1 ~]# pcs alert create path=/var/lib/pacemaker/alert_file.sh id=test_alert Error: Unable to update cib Call cib_apply_diff failed (-203): Update does not conform to the configured schema <cib admin_epoch="0" epoch="8" num_updates="0" validate-with="pacemaker-1.2" cib-last-written="Thu Nov 11 12:03:26 2021" update-origin="rhel8-node-1.clust" update-client="cibadmin" update-user="root" crm_feature_set="3.11.0" have-quorum="1" dc-uuid="1"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="true"/> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.0-8.el8-7c3f660707"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/> <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="rhel8-cluster-p1"/> </cluster_property_set> </crm_config> <nodes> <node id="1" uname="rhel8-node-1.clust"/> <node id="2" uname="rhel8-node-2.clust"/> </nodes> <resources/> <constraints/> <alerts> <alert id="test_alert" path="/var/lib/pacemaker/alert_file.sh"/> </alerts> </configuration> <status> <node_state id="1" uname="rhel8-node-1.clust" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member"> <lrm id="1"> <lrm_resources/> </lrm> </node_state> <node_state id="2" uname="rhel8-node-2.clust" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member"> <lrm id="2"> <lrm_resources/> </lrm> </node_state> </status> </cib> - Starting the cluster prior to sbd setup sets `validate-with` to `pacemaker-3.7`, which allows alerts to be created without issue: [root@rhel8-node-1 ~]# pcs cluster setup rhel8-cluster-p1 --start rhel8-node-1.clust rhel8-node-2.clust No addresses specified for host 'rhel8-node-1.clust', using 'rhel8-node-1.clust' No addresses specified for host 'rhel8-node-2.clust', using 'rhel8-node-2.clust' Destroying cluster on hosts: 'rhel8-node-1.clust', 'rhel8-node-2.clust'... rhel8-node-2.clust: Successfully destroyed cluster rhel8-node-1.clust: Successfully destroyed cluster Requesting remove 'pcsd settings' from 'rhel8-node-1.clust', 'rhel8-node-2.clust' rhel8-node-1.clust: successful removal of the file 'pcsd settings' rhel8-node-2.clust: successful removal of the file 'pcsd settings' Sending 'corosync authkey', 'pacemaker authkey' to 'rhel8-node-1.clust', 'rhel8-node-2.clust' rhel8-node-1.clust: successful distribution of the file 'corosync authkey' rhel8-node-1.clust: successful distribution of the file 'pacemaker authkey' rhel8-node-2.clust: successful distribution of the file 'corosync authkey' rhel8-node-2.clust: successful distribution of the file 'pacemaker authkey' Sending 'corosync.conf' to 'rhel8-node-1.clust', 'rhel8-node-2.clust' rhel8-node-1.clust: successful distribution of the file 'corosync.conf' rhel8-node-2.clust: successful distribution of the file 'corosync.conf' Cluster has been successfully set up. Starting cluster on hosts: 'rhel8-node-1.clust', 'rhel8-node-2.clust'... - cib is created and "validate-with" is 3.7: [root@rhel8-node-1 ~]# ll /var/lib/pacemaker/cib/ total 20 -rw------- 1 hacluster haclient 258 Nov 11 10:56 cib-1.raw -rw------- 1 hacluster haclient 32 Nov 11 10:56 cib-1.raw.sig -rw-r----- 1 hacluster haclient 1 Nov 11 10:56 cib.last -rw------- 1 hacluster haclient 446 Nov 11 10:56 cib.xml -rw------- 1 hacluster haclient 32 Nov 11 10:56 cib.xml.sig [root@rhel8-node-1 ~]# grep validate-with /var/lib/pacemaker/cib/cib.xml <cib crm_feature_set="3.11.0" validate-with="pacemaker-3.7" epoch="5" num_updates="0" admin_epoch="0" cib-last-written="Thu Nov 11 10:57:05 2021" update-origin="rhel8-node-2.clust" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="2"> Actual results: "validate-with" is only set to a proper value if the cluster is started *before* enabling sbd Expected results: "validate-with" should be 3.7 or a value that allows alerts to be created Additional info: