1872376 – Add command line option to calculate Pacemaker resource operation digest

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1872376 - Add command line option to calculate Pacemaker resource operation digest

Summary: Add command line option to calculate Pacemaker resource operation digest

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	8.3
Hardware:	All
OS:	All
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	8.4
Assignee:	Ken Gaillot
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1872378 1894575 2023845 2024522
TreeView+	depends on / blocked

Reported:	2020-08-25 15:38 UTC by Chris Feist
Modified:	2021-11-18 09:04 UTC (History)
CC List:	3 users (show)
Fixed In Version:	pacemaker-2.0.5-5.el8
Doc Type:	No Doc Update
Doc Text:	This new feature exists primarily for pcs's use; end users will not find it useful.
Clone Of:
Clones:	1872378 (view as bug list)
Environment:
Last Closed:	2021-05-18 15:26:40 UTC
Type:	Feature Request
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	4526971	0	None	None	None	2021-01-04 18:32:07 UTC
Red Hat Product Errata	RHEA-2021:1782	0	None	None	None	2021-05-18 15:46:50 UTC

Description Chris Feist 2020-08-25 15:38:34 UTC

Provide a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources

Comment 1 Ken Gaillot 2020-08-25 20:59:09 UTC

Pacemaker restarts resources when unfencing device configuration changes because the resources have a dependency (both in code and in reality) on unfencing with the current parameters.

Pacemaker detects configuration changes by saving a hash of the resource parameters in the operation history recorded for any action. When checking current conditions, it compares that recorded hash with a re-calculated hash of the current parameters.

Unfortunately neither Pacemaker nor the fence agent has enough information to know when a parameter change is "safe" to perform without restarting resources. Such safety is dependent on the particular capabilities of a given fence device. Neither Pacemaker nor the agent knows what the previous parameter values were (though Pacemaker knows the hash).

Therefore, there cannot be a general, automated solution to the problem.

However, I can think of a higher-level workaround. Pacemaker could provide a new command-line option

crm_resource -r <rsc> --digest <op>

that would show what the operation digests would be for the given resource and operation, if run at that moment.

pcs could use the existing "stonith_admin --unfence" command to execute the unfencing directly, without going through the usual cluster management. It could then make two CIB changes simultaneously, the desired change in the fence device configuration plus updates to the operation hashes and node attributes that pacemaker uses to detect changes, so that pacemaker does not see or react to the change (other than rescheduling any recurring monitor on the device, which is still desirable).

This would be a dangerous ability since it would disable pacemaker's response to changes, leaving it totally on the caller to ensure that it is safe to do so. It would also be somewhat brittle since it would involve changes to pacemaker's status section, which does not have a guaranteed schema. In short, it's a bad idea, but I don't see a better way.

Comment 2 Ken Gaillot 2020-08-25 21:08:32 UTC

> pcs could use the existing "stonith_admin --unfence" command to execute the unfencing directly

Correction: that would use the originally configured parameters. Instead, pcs would most likely have to run the agent directly, with the new parameters. We might be able to come up with some "direct execute" option in stonith_admin to abstract that, but I'm not keen on that idea. Also, fence agents are executed by the cluster as root, but stonith_admin runs as whatever user runs it, which could complicate some situations.

Comment 3 Ken Gaillot 2020-08-25 21:11:41 UTC

> crm_resource -r <rsc> --digest <op>

This command would take either -r <rsc> (to use the existing parameters) or --class/--agent/--option (same as --validate currently) to hash arbitrary parameters.

Comment 5 Ken Gaillot 2021-01-11 23:29:54 UTC

Implemented upstream as of commit 4e726eb

Example of how to change a resource parameter without causing a restart:

1. The following configuration items are needed (examples):
- resource ID (rsc1)
- parameter name (param1)
- desired parameter value (value1)
- resource's monitor interval (10s)
- resource's monitor timeout if specified (20s)

2. Determine where rsc1 is running with "crm_resource --locate -r rsc1" (example: node1).

3. Show what the new digests would be for a one-time operation:

    crm_resource --digests --output-as=xml -r rsc1 -N node1 param1=value1

Example output:

    <pacemaker-result api-version="2.3" request="crm_resource --digests --output-as=xml -r rsc1 -N node1 param1=value1">
      <digests resource="rsc1" node="node1" task="monitor" interval="0ms">
        <digest type="all" hash="f2317cad3d54cec5d7d7aa7d0bf35cf8">
           <parameters/>
        </digest>
        <digest type="nonprivate" hash="f2317cad3d54cec5d7d7aa7d0bf35cf8">
          <parameters/>
        </digest>
        <digest type="nonreloadable" hash="f2317cad3d54cec5d7d7aa7d0bf35cf8">
          <parameters/>
        </digest>
      </digests>
      <status code="0" message="OK"/>
    </pacemaker-result>

4. Repeat for the recurring monitor:

    crm_resource --digests --output-as=xml -r rsc1 -N node1 param1=value1 CRM_meta_interval=10000 CRM_meta_timeout=20000

Output will be similar.

5. Update the CIB:
5a. Dump the entire CIB to a file.
5b. Edit the resource configuration to have the new parameter value.
5c. Find the resource history on the appropriate node (the section starting <lrm_resource id="rsc1"> inside <node_state id="node1">), and look at each <lrm_rsc_op> entry in it. If "operation" is "monitor", use the digests obtained using the monitor parameters, otherwise use the digests obtained from the first digest command. Replace any "op-digest" with the type="all" digest, any op-secure-digest with the type="nonprivate" digest, and any "op-restart-digest" with the type="nonreloadable" digest.
5d. Load the file back into the cluster.

The risks of doing that are that it's easy to make a mistake and thus cause restarts to happen anyway, and that making such a change without separately ensuring the new values are effective for the service means the service will remain running with the old values. Also, the status section syntax is not guaranteed to stay the same across Pacemaker releases.

Comment 6 Ken Gaillot 2021-01-12 18:35:37 UTC

Support for this feature can be determined by checking that Pacemaker's CRM feature set >= 3.6.4

Comment 10 Markéta Smazová 2021-02-03 16:45:25 UTC

>   [root@virt-148 ~]# rpm -q pacemaker
>   pacemaker-2.0.5-5.el8.x86_64


Check that version of Pacemaker's CRM feature set >= 3.6.4:

>   [root@virt-148 ~]# pacemakerd --features
>   Pacemaker 2.0.5-5.el8 (Build: ba59be7122)
>    Supporting v3.7.0:  generated-manpages agent-manpages ncurses libqb-logging libqb-ipc systemd nagios  corosync-native atomic-attrd acls cibsecrets


Check man/help:

>   [root@virt-148 ~]# man crm_resource  | grep -A 4 digests 
>          --digests
>                 (Advanced) Show parameter hashes that Pacemaker uses to detect configuration changes
>                 (only  accurate  if  there  is  resource  history  on the specified node). Required:
>                 --resource, --node.  Optional: any NAME=VALUE parameters will be  used  to  override
>                 the configuration (to see what the hash would be with those changes).

>   [root@virt-148 ~]# crm_resource --help-all | grep -A 5 digests 
>     --digests                         (Advanced) Show parameter hashes that Pacemaker uses to detect
>                                       configuration changes (only accurate if there is resource
>                                       history on the specified node). Required: --resource, --node.
>                                       Optional: any NAME=VALUE parameters will be used to override
>                                       the configuration (to see what the hash would be with those
>                                       changes).




>   [root@virt-148 ~]# pcs status
>   Cluster name: STSRHTS31149
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-149 (version 2.0.5-5.el8-ba59be7122) - partition with quorum
>     * Last updated: Wed Feb  3 16:21:48 2021
>     * Last change:  Wed Feb  3 16:18:50 2021 by root via cibadmin on virt-148
>     * 2 nodes configured
>     * 2 resource instances configured

>   Node List:
>     * Online: [ virt-148 virt-149 ]

>   Full List of Resources:
>     * fence-virt-148	(stonith:fence_xvm):	 Started virt-148
>     * fence-virt-149	(stonith:fence_xvm):	 Started virt-149

>   Daemon Status:
>     corosync: active/disabled
>     pacemaker: active/disabled
>     pcsd: active/enabled


Check resource configuration for a resource parameter that will be updated. In this reproducer attribute "delay" will be updated to 10s.

>   [root@virt-148 ~]# pcs stonith config fence-virt-148
>    Resource: fence-virt-148 (class=stonith type=fence_xvm)
>     Attributes: delay=20 pcmk_host_check=static-list pcmk_host_list=virt-148 pcmk_host_map=virt-148:virt-148.cluster-qe.lab.eng.brq.redhat.com
>     Operations: monitor interval=60s (fence-virt-148-monitor-interval-60s)


List all fence-virt-148 operations:

>   [root@virt-148 ~]# crm_resource --list-all-operations --resource fence-virt-148
>   fence-virt-148	(stonith:fence_xvm):	 Started: fence-virt-148_monitor_0 (node=virt-149, call=5, rc=7, last-rc-change=Mon Feb  1 11:21:55 2021, exec=24ms): complete
>   fence-virt-148	(stonith:fence_xvm):	 Started: fence-virt-148_start_0 (node=virt-148, call=1363, rc=0, last-rc-change=Tue Feb  2 15:54:51 2021, exec=363ms): complete
>   fence-virt-148	(stonith:fence_xvm):	 Started: fence-virt-148_monitor_60000 (node=virt-148, call=1365, rc=0, last-rc-change=Tue Feb  2 15:54:51 2021, exec=401ms): complete

Determine where resource is running:

>   [root@virt-148 ~]# crm_resource --locate --resource fence-virt-148
>   resource fence-virt-148 is running on: virt-148


Show what the new digests would be for a one-time operation:

>   [root@virt-148 ~]# crm_resource --digests --output-as=xml --resource fence-virt-148 --node virt-148 delay=10
>   <pacemaker-result api-version="2.3" request="crm_resource --digests --output-as=xml --resource fence-virt-148 --node virt-148 delay=10">
>     <digests resource="fence-virt-148" node="virt-148" task="start" interval="0ms">
>       <digest type="all" hash="80ac5753667128b490ba6fb7b4001d67">
>         <parameters delay="10" pcmk_host_check="static-list" pcmk_host_map="virt-148:virt-148.cluster-qe.lab.eng.brq.redhat.com" pcmk_host_list="virt-148"/>
>       </digest>
>       <digest type="nonprivate" hash="bfd73b9a2527a3b3944d09490855b2f2">
>         <parameters delay="10"/>
>       </digest>
>     </digests>
>     <status code="0" message="OK"/>
>   </pacemaker-result>

Show what the new digests would be for the recurring monitor:

>   [root@virt-148 ~]# crm_resource --digests --output-as=xml --resource fence-virt-148 --node virt-148 delay=10 CRM_meta_interval=10000 CRM_meta_timeout=20000
>   <pacemaker-result api-version="2.3" request="crm_resource --digests --output-as=xml --resource fence-virt-148 --node virt-148 delay=10 CRM_meta_interval=10000 CRM_meta_timeout=20000">
>     <digests resource="fence-virt-148" node="virt-148" task="start" interval="10000ms">
>       <digest type="all" hash="aa308fdc91fcd3c64737036870fd86f4">
>         <parameters delay="10" pcmk_host_check="static-list" pcmk_host_map="virt-148:virt-148.cluster-qe.lab.eng.brq.redhat.com" pcmk_host_list="virt-148" CRM_meta_timeout="20000"/>
>       </digest>
>       <digest type="nonprivate" hash="bfd73b9a2527a3b3944d09490855b2f2">
>         <parameters delay="10"/>
>       </digest>
>     </digests>
>     <status code="0" message="OK"/>
>   </pacemaker-result>

Dump the entire CIB to a file:

>   [root@virt-148 ~]# pcs cluster cib > cib-original.xml
>   [root@virt-148 ~]# cp cib-original.xml cib-new.xml

Edit the resource configuration as described in Comment 5 (section 5b, 5c):

>   [root@virt-148 ~]# vim cib-new.xml

>   [root@virt-148 ~]# diff cib-original.xml cib-new.xml
>   20c20
>   <           <nvpair id="fence-virt-148-instance_attributes-delay" name="delay" value="20"/>
>   ---
>   >           <nvpair id="fence-virt-148-instance_attributes-delay" name="delay" value="10"/>
>   71,72c71,72
>   <             <lrm_rsc_op id="fence-virt-148_last_0" operation_key="fence-virt-148_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.7.0" transition-key="4:751:0:d9791968-ac53-4d94-9d23-b3ede3e40c27" transition-magic="0:0;4:751:0:d9791968-ac53-4d94-9d23-b3ede3e40c27" exit-reason="" on_node="virt-148" call-id="1363" rc-code="0" op-status="0" interval="0" last-rc-change="1612277691" last-run="1612277691" exec-time="363" queue-time="0" op-digest="bd02e4f8cfe532fb1c9d5807b72b193b"/>
>   <             <lrm_rsc_op id="fence-virt-148_monitor_60000" operation_key="fence-virt-148_monitor_60000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.7.0" transition-key="2:751:0:d9791968-ac53-4d94-9d23-b3ede3e40c27" transition-magic="0:0;2:751:0:d9791968-ac53-4d94-9d23-b3ede3e40c27" exit-reason="" on_node="virt-148" call-id="1365" rc-code="0" op-status="0" interval="60000" last-rc-change="1612277691" exec-time="401" queue-time="0" op-digest="811d822164fd020e36ef256380f696d8"/>
>   ---
>   >             <lrm_rsc_op id="fence-virt-148_last_0" operation_key="fence-virt-148_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.7.0" transition-key="4:751:0:d9791968-ac53-4d94-9d23-b3ede3e40c27" transition-magic="0:0;4:751:0:d9791968-ac53-4d94-9d23-b3ede3e40c27" exit-reason="" on_node="virt-148" call-id="1363" rc-code="0" op-status="0" interval="0" last-rc-change="1612277691" last-run="1612277691" exec-time="363" queue-time="0" op-digest="80ac5753667128b490ba6fb7b4001d67"/>
>   >             <lrm_rsc_op id="fence-virt-148_monitor_60000" operation_key="fence-virt-148_monitor_60000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.7.0" transition-key="2:751:0:d9791968-ac53-4d94-9d23-b3ede3e40c27" transition-magic="0:0;2:751:0:d9791968-ac53-4d94-9d23-b3ede3e40c27" exit-reason="" on_node="virt-148" call-id="1365" rc-code="0" op-status="0" interval="60000" last-rc-change="1612277691" exec-time="401" queue-time="0" op-digest="aa308fdc91fcd3c64737036870fd86f4"/>



Load the file back into the cluster: 

>   [root@virt-148 ~]# pcs cluster cib-push cib-new.xml diff-against=cib-original.xml
>   CIB updated


Check that resource did not restart:

>   [root@virt-148 ~]# crm_resource --list-all-operations --resource fence-virt-148
>   fence-virt-148	(stonith:fence_xvm):	 Started: fence-virt-148_monitor_0 (node=virt-149, call=5, rc=7, last-rc-change=Mon Feb  1 11:21:55 2021, exec=24ms): complete
>   fence-virt-148	(stonith:fence_xvm):	 Started: fence-virt-148_start_0 (node=virt-148, call=1363, rc=0, last-rc-change=Tue Feb  2 15:54:51 2021, exec=363ms): complete
>   fence-virt-148	(stonith:fence_xvm):	 Started: fence-virt-148_monitor_60000 (node=virt-148, call=1365, rc=0, last-rc-change=Tue Feb  2 15:54:51 2021, exec=401ms): complete


>   [root@virt-148 ~]# pcs status
>   Cluster name: STSRHTS31149
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-149 (version 2.0.5-5.el8-ba59be7122) - partition with quorum
>     * Last updated: Wed Feb  3 16:29:26 2021
>     * Last change:  Wed Feb  3 16:28:19 2021 by root via cibadmin on virt-148
>     * 2 nodes configured
>     * 2 resource instances configured

>   Node List:
>     * Online: [ virt-148 virt-149 ]

>   Full List of Resources:
>     * fence-virt-148	(stonith:fence_xvm):	 Started virt-148
>     * fence-virt-149	(stonith:fence_xvm):	 Started virt-149

>   Daemon Status:
>     corosync: active/disabled
>     pacemaker: active/disabled
>     pcsd: active/enabled


Check that the resource attribute "delay" is updated with the new value:

>   [root@virt-148 ~]# pcs stonith config fence-virt-148
>    Resource: fence-virt-148 (class=stonith type=fence_xvm)
>     Attributes: delay=10 pcmk_host_check=static-list pcmk_host_list=virt-148 pcmk_host_map=virt-148:virt-148.cluster-qe.lab.eng.brq.redhat.com
>     Operations: monitor interval=60s (fence-virt-148-monitor-interval-60s)


marking verified in pacemaker-2.0.5-5.el8

Comment 12 errata-xmlrpc 2021-05-18 15:26:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1782

Comment 13 errata-xmlrpc 2021-05-18 15:46:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1782

Note You need to log in before you can comment on or make changes to this bug.