Bug 1843079

Summary: Upgrade CIB if user specifies on-fail=demote
Product: Red Hat Enterprise Linux 8 Reporter: Ken Gaillot <kgaillot>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact: Steven J. Levine <slevine>
Priority: high    
Version: 8.3CC: cfeist, cluster-maint, idevat, lmanasko, mlisik, mmazoure, mpospisi, nhostako, omular, tojeline
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.10.6-3.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 02:28:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1837747    
Bug Blocks:    
Attachments:
Description Flags
proposed fix + tests none

Description Ken Gaillot 2020-06-02 17:25:16 UTC
Description of problem: For 8.3 Bug 1837747, Pacemaker is adding a new on-fail value "demote" for operations. This required a schema change, so if a user specifies on-fail="demote" with a schema version below 3.4, it will need an auto-upgrade first.


Version-Release number of selected component (if applicable): 8.3

Comment 5 Tomas Jelinek 2020-07-02 14:33:29 UTC
Created attachment 1699643 [details]
proposed fix + tests

Affected commands:
* pcs resource create
* pcs resource update
* pcs resource op add
* pcs cluster node add-remote

When adding a resource operation with 'on-fail' option set to 'demote', check that CIB schema version is 3.4 or higher. Try to upgrade to version 3.4 if it is currently lower.

Comment 7 Miroslav Lisik 2020-07-17 17:06:34 UTC
Test:

[root@r8-node-01 pcs]# rpm -q pacemaker pcs
pacemaker-2.0.4-3.el8.x86_64
pcs-0.10.6-3.el8.x86_64

Edit cluster CIB with 'pcs cluster edit' and set validate-with="pacemaker-3.3"

[root@r8-node-01 pcs]# pcs cluster cib | xmllint --xpath /cib/@validate-with -
 validate-with="pacemaker-3.3"
[root@r8-node-01 pcs]# pcs resource create TestStateful ocf:pacemaker:Stateful promotable
[root@r8-node-01 pcs]# pcs resource config
 Clone: TestStateful-clone
  Meta Attrs: promotable=true
  Resource: TestStateful (class=ocf provider=pacemaker type=Stateful)
   Operations: demote interval=0s timeout=10s (TestStateful-demote-interval-0s)
               monitor interval=10s role=Master timeout=20s (TestStateful-monitor-interval-10s)
               monitor interval=11s role=Slave timeout=20s (TestStateful-monitor-interval-11s)
               notify interval=0s timeout=5s (TestStateful-notify-interval-0s)
               promote interval=0s timeout=10s (TestStateful-promote-interval-0s)
               start interval=0s timeout=20s (TestStateful-start-interval-0s)
               stop interval=0s timeout=20s (TestStateful-stop-interval-0s)

[root@r8-node-01 pcs]# pcs resource update TestStateful op monitor interval=10s role=Master timeout=20s on-fail=demote
Cluster CIB has been upgraded to latest version
[root@r8-node-01 pcs]# echo $?
0
[root@r8-node-01 pcs]# pcs cluster cib | xmllint --xpath /cib/@validate-with -
 validate-with="pacemaker-3.4"

Comment 10 Nina Hostakova 2020-07-22 15:16:18 UTC
BEFORE_FIX
==========
[root@virt-023 ~]# rpm -q pcs
pcs-0.10.4-6.el8.x86_64

[root@virt-023 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.3.0" validate-with="pacemaker-3.2" epoch="12" num_updates="30" admin_epoch="0" cib-last-written="Wed Jul 22 16:27:33 2020" update-origin="virt-023" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">

[root@virt-023 ~]# pcs resource create promotable_test ocf:pacemaker:Stateful promotable op monitor interval=10 role=Master timeout=20s on-fail=demote
Error: 'demote' is not a valid on-fail value, use 'block', 'fence', 'ignore', 'restart', 'restart-container', 'standby', 'stop'
[root@virt-023 ~]# echo $?
1
[root@virt-023 ~]# pcs resource create promotable_test ocf:pacemaker:Stateful promotable
[root@virt-023 ~]# pcs resource update promotable_test op monitor interval=10 role=Master timeout=20s on-fail=demote
Error: Unable to update cib
...
[root@virt-023 ~]# echo $?
1


AFTER_FIX
=========
[root@virt-158 ~]# rpm -q pcs
pcs-0.10.6-3.el8.x86_64


> It is possible to specify 'on-fail=demote' option for CIB schema version 3.4

[root@virt-158 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.4" epoch="12" num_updates="3" admin_epoch="3" cib-last-written="Tue Jul 21 12:32:45 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">


# CREATE RESOURCE

[root@virt-158 ~]# pcs resource create promotable_test ocf:pacemaker:Stateful promotable op monitor interval=10 role=Master timeout=20s on-fail=demote
[root@virt-158 ~]# echo $?
0
[root@virt-158 ~]# pcs resource config
 Clone: promotable_test-clone
  Meta Attrs: promotable=true
  Resource: promotable_test (class=ocf provider=pacemaker type=Stateful)
   Operations: demote interval=0s timeout=10s (promotable_test-demote-interval-0s)
               monitor interval=10 on-fail=demote role=Master timeout=20s (promotable_test-monitor-interval-10)
               notify interval=0s timeout=5s (promotable_test-notify-interval-0s)
               promote interval=0s timeout=10s (promotable_test-promote-interval-0s)
               start interval=0s timeout=20s (promotable_test-start-interval-0s)
               stop interval=0s timeout=20s (promotable_test-stop-interval-0s)


# UPDATE RESOURCE

[root@virt-158 ~]# pcs resource update promotable_test op monitor interval=10 role=Master timeout=20s on-fail=restart
[root@virt-158 ~]# pcs resource update promotable_test op monitor interval=10 role=Master timeout=20s on-fail=demote
[root@virt-158 ~]# echo $?
0
[root@virt-158 ~]# pcs resource config
 Clone: promotable_test-clone
  Meta Attrs: promotable=true
  Resource: promotable_test (class=ocf provider=pacemaker type=Stateful)
   Operations: demote interval=0s timeout=10s (promotable_test-demote-interval-0s)
               notify interval=0s timeout=5s (promotable_test-notify-interval-0s)
               promote interval=0s timeout=10s (promotable_test-promote-interval-0s)
               start interval=0s timeout=20s (promotable_test-start-interval-0s)
               stop interval=0s timeout=20s (promotable_test-stop-interval-0s)
               monitor interval=10 on-fail=demote role=Master timeout=20s (promotable_test-monitor-interval-10)


# ADD OPERATION

[root@virt-158 ~]# pcs resource op delete promotable_test monitor
[root@virt-158 ~]# pcs resource op add promotable_test monitor interval=10s role=Master timeout=20s on-fail=demote
[root@virt-158 ~]# echo $?
0
[root@virt-158 ~]# pcs resource config
 Clone: promotable_test-clone
  Meta Attrs: promotable=true
  Resource: promotable_test (class=ocf provider=pacemaker type=Stateful)
   Operations: demote interval=0s timeout=10s (promotable_test-demote-interval-0s)
               notify interval=0s timeout=5s (promotable_test-notify-interval-0s)
               promote interval=0s timeout=10s (promotable_test-promote-interval-0s)
               start interval=0s timeout=20s (promotable_test-start-interval-0s)
               stop interval=0s timeout=20s (promotable_test-stop-interval-0s)
               monitor interval=10s on-fail=demote role=Master timeout=20s (promotable_test-monitor-interval-10s)


# ADD REMOTE NODE

[root@virt-158 ~]# pcs cluster node add-remote virt-160 op monitor interval=10 role=Master timeout=20 on-fail=demote
No addresses specified for host 'virt-160', using 'virt-160'
Sending 'pacemaker authkey' to 'virt-160'
virt-160: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'virt-160'
virt-160: successful run of 'pacemaker_remote enable'
virt-160: successful run of 'pacemaker_remote start'
[root@virt-158 ~]# echo $?
0
[root@virt-158 ~]# pcs resource config
 Resource: virt-160 (class=ocf provider=pacemaker type=remote)
  Attributes: server=virt-160
  Operations: migrate_from interval=0s timeout=60s (virt-160-migrate_from-interval-0s)
              migrate_to interval=0s timeout=60s (virt-160-migrate_to-interval-0s)
              monitor interval=10 on-fail=demote role=Master timeout=20 (virt-160-monitor-interval-10)
              reload interval=0s timeout=60s (virt-160-reload-interval-0s)
              start interval=0s timeout=60s (virt-160-start-interval-0s)
              stop interval=0s timeout=60s (virt-160-stop-interval-0s)


> Edit version to 3.3 using 'pcs cluster edit' and check if it upgrades automatically

# Cannot be edited to the version below 3.4 while there is still 'on-fail=demote' taking place 
[root@virt-158 ~]# pcs cluster edit
Error: unable to push cib
....

[root@virt-158 ~]# pcs cluster node remove-remote virt-160
Requesting 'pacemaker_remote disable', 'pacemaker_remote stop' on 'virt-160'
virt-160: successful run of 'pacemaker_remote disable'
virt-160: successful run of 'pacemaker_remote stop'
Requesting remove 'pacemaker authkey' from 'virt-160'
virt-160: successful removal of the file 'pacemaker authkey'
Deleting Resource - virt-160

[root@virt-158 ~]# pcs cluster edit
CIB updated
[root@virt-158 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.3" epoch="15" num_updates="53" admin_epoch="0" cib-last-written="Mon Jul 20 18:50:29 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">


# CREATE RESOURCE

[root@virt-158 ~]# pcs resource create promotable_test ocf:pacemaker:Stateful promotable op monitor interval=10 role=Master timeout=20s on-fail=demote
CIB has been upgraded to the latest schema version.
[root@virt-158 ~]# echo $?
0
[root@virt-158 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.4" epoch="2" num_updates="20" admin_epoch="4" cib-last-written="Tue Jul 21 14:31:26 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">
[root@virt-158 ~]# pcs resource config
 Clone: promotable_test-clone
  Meta Attrs: promotable=true
  Resource: promotable_test (class=ocf provider=pacemaker type=Stateful)
   Operations: demote interval=0s timeout=10s (promotable_test-demote-interval-0s)
               monitor interval=10 on-fail=demote role=Master timeout=20s (promotable_test-monitor-interval-10)
               notify interval=0s timeout=5s (promotable_test-notify-interval-0s)
               promote interval=0s timeout=10s (promotable_test-promote-interval-0s)
               start interval=0s timeout=20s (promotable_test-start-interval-0s)
               stop interval=0s timeout=20s (promotable_test-stop-interval-0s)


# UPDATE RESOURCE

[root@virt-158 ~]# pcs resource update promotable_test op monitor interval=10 role=Master timeout=20s on-fail=restart
[root@virt-158 ~]# pcs cluster edit
CIB updated
[root@virt-158 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.3" epoch="3" num_updates="21" admin_epoch="4" cib-last-written="Tue Jul 21 14:33:01 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">
[root@virt-158 ~]# pcs resource update promotable_test op monitor interval=10 role=Master timeout=20s on-fail=demote
Cluster CIB has been upgraded to latest version
[root@virt-158 ~]# echo $?
0
[root@virt-158 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.4" epoch="2" num_updates="10" admin_epoch="5" cib-last-written="Tue Jul 21 14:34:58 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">
[root@virt-158 ~]# pcs resource config
 Clone: promotable_test-clone
  Meta Attrs: promotable=true
  Resource: promotable_test (class=ocf provider=pacemaker type=Stateful)
   Operations: demote interval=0s timeout=10s (promotable_test-demote-interval-0s)
               notify interval=0s timeout=5s (promotable_test-notify-interval-0s)
               promote interval=0s timeout=10s (promotable_test-promote-interval-0s)
               start interval=0s timeout=20s (promotable_test-start-interval-0s)
               stop interval=0s timeout=20s (promotable_test-stop-interval-0s)
               monitor interval=10 on-fail=demote role=Master timeout=20s (promotable_test-monitor-interval-10)


# ADD OPERATION

[root@virt-158 ~]# pcs resource op delete promotable_test monitor
[root@virt-158 ~]# pcs cluster edit
CIB updated
[root@virt-158 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.3" epoch="3" num_updates="22" admin_epoch="6" cib-last-written="Tue Jul 21 14:43:00 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">
[root@virt-158 ~]# pcs resource op add promotable_test monitor interval=10s role=Master timeout=20s on-fail=demote
Cluster CIB has been upgraded to latest version
[root@virt-158 ~]# echo $?
0
[root@virt-158 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.4" epoch="2" num_updates="12" admin_epoch="7" cib-last-written="Tue Jul 21 14:44:08 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">
[root@virt-158 ~]# pcs resource config
 Clone: promotable_test-clone
  Meta Attrs: promotable=true
  Resource: promotable_test (class=ocf provider=pacemaker type=Stateful)
   Operations: demote interval=0s timeout=10s (promotable_test-demote-interval-0s)
               notify interval=0s timeout=5s (promotable_test-notify-interval-0s)
               promote interval=0s timeout=10s (promotable_test-promote-interval-0s)
               start interval=0s timeout=20s (promotable_test-start-interval-0s)
               stop interval=0s timeout=20s (promotable_test-stop-interval-0s)
               monitor interval=10s on-fail=demote role=Master timeout=20s (promotable_test-monitor-interval-10s)


# ADD REMOTE NODE

[root@virt-158 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.3" epoch="9" num_updates="13" admin_epoch="9" cib-last-written="Wed Jul 22 10:11:03 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">
[root@virt-158 ~]# pcs cluster node add-remote virt-160 op monitor interval=10 role=Master timeout=20 on-fail=demote
CIB has been upgraded to the latest schema version.
No addresses specified for host 'virt-160', using 'virt-160'
Sending 'pacemaker authkey' to 'virt-160'
virt-160: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'virt-160'
virt-160: successful run of 'pacemaker_remote enable'
virt-160: successful run of 'pacemaker_remote start'
[root@virt-158 ~]# echo $?
0
[root@virt-158 ~]# pcs resource config
 Resource: virt-160 (class=ocf provider=pacemaker type=remote)
  Attributes: server=virt-160
  Operations: migrate_from interval=0s timeout=60s (virt-160-migrate_from-interval-0s)
              migrate_to interval=0s timeout=60s (virt-160-migrate_to-interval-0s)
              monitor interval=10 on-fail=demote role=Master timeout=20 (virt-160-monitor-interval-10)
              reload interval=0s timeout=60s (virt-160-reload-interval-0s)
              start interval=0s timeout=60s (virt-160-start-interval-0s)
              stop interval=0s timeout=60s (virt-160-stop-interval-0s)
[root@virt-158 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.4" epoch="2" num_updates="8" admin_epoch="10" cib-last-written="Wed Jul 22 10:17:13 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">

# check the remote node for having the upgraded version as well
[root@virt-160 ~]# pcs cluster cib | grep validate-with
<cib crm_feature_set="3.4.0" validate-with="pacemaker-3.4" epoch="2" num_updates="8" admin_epoch="10" cib-last-written="Wed Jul 22 10:17:13 2020" update-origin="virt-158" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">



Marking verified for pcs-0.10.6-3.el8.

Comment 13 errata-xmlrpc 2020-11-04 02:28:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4617