Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1872483

Summary: Only some nodes get re-unfenced after unfencing device configuration change
Product: Red Hat Enterprise Linux 8 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Klaus Wenninger <kwenning>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: high    
Version: 8.3CC: cluster-maint, lmiksik, msmazova
Target Milestone: rcKeywords: Reopened, Triaged
Target Release: 8.7Flags: pm-rhel: mirror+
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: pacemaker-2.1.4-4.el8 Doc Type: Bug Fix
Doc Text:
Cause: Pacemaker tracked the device configuration used to unfence a node only for Pacemaker Remote nodes. Consequence: If the configuration is changed for a fencing device that supports unfencing, only Pacemaker Remote nodes and the cluster node running the device monitor will be unfenced again. Fix: Pacemaker now tracks unfencing configuration for all nodes. Result: If the configuration is changed for a fencing device that supports unfencing, all nodes will be unfenced again.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-08 09:42:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ken Gaillot 2020-08-25 21:26:59 UTC
Description of problem: When the configuration of a fence device resource capable of unfencing is changed, some nodes are not re-unfenced, which for example means they do not have access to new disks added to a fence_scsi device.


Version-Release number of selected component (if applicable): 8.3


How reproducible: consistently


Steps to Reproduce:
1. Configure a cluster of at least two cluster nodes and an unfencing-capable fence device. Optionally create one or more remote nodes (not guest nodes).
2. Change a configuration parameter for the fence device.

Actual results: Only the cluster node running the fence device, and any remote nodes, are re-unfenced.


Expected results: All nodes are re-unfenced.


Additional info: It is expected that non-fencing resources running on a node will be stopped before unfencing is attempted, and started again afterward.

Comment 1 Ken Gaillot 2020-08-25 21:36:28 UTC
When performing an action, Pacemaker saves a hash of the resource/action parameters in the operation history when recording the result. When checking current conditions, Pacemaker recalculates the hash and compares it to the stored hash, and if they're different, it restarts the resource. This is what causes the node running the device to be re-unfenced.

Pacemaker also sets a #node-unfenced node attribute on all cluster and remote nodes indicating when the node was last unfenced. In addition, it copies the unfencing op hash to the #digests-all node attribute for all remote nodes. When comparing the calculated hash to the stored hash, it also compares it against these hashes, which is what causes the remote nodes to be re-unfenced.

The fix will be to extend the special node attributes to all nodes.

Comment 7 RHEL Program Management 2022-02-25 07:27:21 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 8 Ken Gaillot 2022-02-25 15:04:46 UTC
This is a priority for 8.7 / 9.1

Comment 13 Markéta Smazová 2022-08-08 15:24:35 UTC
before fix:
-----------

>   [root@virt-032 ~]# rpm -q pacemaker
>   Pacemaker-2.1.2-4.el8.x86_64

Configure a cluster of at least two cluster nodes and an unfencing-capable fence device:

>   [root@virt-032 ~]# export DISK1=/dev/disk/by-id/scsi-360014059b30c84d885b4dd28e53b99b5
>   [root@virt-032 ~]# export DISK2=/dev/disk/by-id/scsi-36001405a62d8d74eda6497b9392b29da
>   [root@virt-032 ~]# export DISK3=/dev/disk/by-id/scsi-360014057bdd1a8cf9a54d3a8313ac04f

>   [root@virt-245 ~]# pcs stonith create fence-scsi fence_scsi devices=$DISK1 pcmk_host_check=static-list pcmk_host_list="virt-032 virt-033 virt-034" pcmk_reboot_action=off meta provides=unfencing

>   [root@virt-032 ~]# pcs stonith config
>    Resource: fence-scsi (class=stonith type=fence_scsi)
>     Attributes: devices=/dev/disk/by-id/scsi-360014059b30c84d885b4dd28e53b99b5 pcmk_host_check=static-list pcmk_host_list="virt-032 virt-033 virt-034" pcmk_reboot_action=off
>     Meta Attrs: provides=unfencing
>     Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

>   [root@virt-032 ~]# for d in $DISK{1,2,3}; do sg_persist -n -i -k -d $d ; done
>     PR generation=0x9, 3 registered reservation keys follow:
>       0x78460000
>       0x78460001
>       0x78460002
>     PR generation=0x6, there are NO registered reservation keys
>     PR generation=0x6, there are NO registered reservation keys

>   [root@virt-032 ~]# pcs status --full
>   Cluster name: STSRHTS28411
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-034 (3) (version 2.1.2-4.el8-ada5c3b36e2) - partition with quorum
>     * Last updated: Mon Aug  8 10:07:06 2022
>     * Last change:  Mon Aug  8 10:05:33 2022 by root via cibadmin on virt-032
>     * 3 nodes configured
>     * 1 resource instance configured

>   Node List:
>     * Online: [ virt-032 (1) virt-033 (2) virt-034 (3) ]

>   Full List of Resources:
>     * fence-scsi	(stonith:fence_scsi):	 Started virt-032

>   Migration Summary:

>   Fencing History:
>     * unfencing of virt-034 successful: delegate=virt-034, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00'
>     * unfencing of virt-032 successful: delegate=virt-032, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00'
>     * unfencing of virt-033 successful: delegate=virt-033, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00'

>   Tickets:

>   PCSD Status:
>     virt-032: Online
>     virt-033: Online
>     virt-034: Online

>   Daemon Status:
>     corosync: active/enabled
>     pacemaker: active/enabled
>     pcsd: active/enabled

Create Dummy resource:

>   [root@virt-032 ~]# pcs resource create dummy1 ocf:pacemaker:Dummy
>   [root@virt-032 ~]# pcs resource
>     * dummy1	(ocf::pacemaker:Dummy):	 Started virt-033
>   [root@virt-032 ~]# crm_resource --list-all-operations --resource dummy1 | grep start
>   dummy1	(ocf::pacemaker:Dummy):	 Started: dummy1_start_0 (node=virt-033, call=41, rc=0, last-rc-change=Mon Aug  8 10:09:32 2022, exec=13ms): complete

Update configuration of the fence device:

>   [root@virt-032 ~]# pcs stonith update fence-scsi devices=$DISK1,$DISK2
>   [root@virt-032 ~]# pcs stonith config
>    Resource: fence-scsi (class=stonith type=fence_scsi)
>     Attributes: devices=/dev/disk/by-id/scsi-360014059b30c84d885b4dd28e53b99b5,/dev/disk/by-id/scsi-36001405a62d8d74eda6497b9392b29da pcmk_host_check=static-list pcmk_host_list="virt-032 virt-033 virt-034" pcmk_reboot_action=off
>     Meta Attrs: provides=unfencing
>     Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

Only cluster node running the fence device (virt-032) is re-unfenced:

>   [root@virt-032 ~]# crm_mon -1m
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-034 (version 2.1.2-4.el8-ada5c3b36e2) - partition with quorum
>     * Last updated: Mon Aug  8 10:16:57 2022
>     * Last change:  Mon Aug  8 10:16:34 2022 by root via cibadmin on virt-032
>     * 3 nodes configured
>     * 2 resource instances configured

>   Node List:
>     * Online: [ virt-032 virt-033 virt-034 ]

>   Active Resources:
>     * fence-scsi	(stonith:fence_scsi):	 Started virt-032
>     * dummy1	(ocf::pacemaker:Dummy):	 Started virt-033

>   Fencing History:
>     * unfencing of virt-032 successful: delegate=virt-032, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:16:36 +02:00'
>     * unfencing of virt-034 successful: delegate=virt-034, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00'
>     * unfencing of virt-032 successful: delegate=virt-032, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00'
>     * unfencing of virt-033 successful: delegate=virt-033, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00'

Dummy resource did not restart:

>   [root@virt-032 ~]# crm_resource --list-all-operations --resource dummy1 | grep start
>   dummy1	(ocf::pacemaker:Dummy):	 Started: dummy1_start_0 (node=virt-033, call=41, rc=0, last-rc-change=Mon Aug  8 10:09:32 2022, exec=13ms): complete



after fix:
----------

>   [root@virt-245 ~]# rpm -q pacemaker
>   pacemaker-2.1.4-4.el8.x86_64

Configure a cluster of at least two cluster nodes and an unfencing-capable fence device:

>   [root@virt-245 ~]# export DISK1=/dev/disk/by-id/scsi-360014052414c35e879e4d1f8f53e12c0
>   [root@virt-245 ~]# export DISK2=/dev/disk/by-id/scsi-36001405b755af8cd519497a96ef77a3d
>   [root@virt-245 ~]# export DISK3=/dev/disk/by-id/scsi-36001405fbdbf026cc2d49e8ad582c9e1

>   [root@virt-245 ~]# pcs stonith create fence-scsi fence_scsi devices=$DISK1 pcmk_host_check=static-list pcmk_host_list="virt-245 virt-246 virt-248" pcmk_reboot_action=off meta provides=unfencing

>   [root@virt-245 ~]# pcs stonith config
>   Resource: fence-scsi (class=stonith type=fence_scsi)
>     Attributes: fence-scsi-instance_attributes
>       devices=/dev/disk/by-id/scsi-360014052414c35e879e4d1f8f53e12c0
>       pcmk_host_check=static-list
>       pcmk_host_list="virt-245 virt-246 virt-248"
>       pcmk_reboot_action=off
>     Meta Attributes: fence-scsi-meta_attributes
>       provides=unfencing
>     Operations:
>       monitor: fence-scsi-monitor-interval-60s
>         interval=60s

>   [root@virt-245 ~]# for d in $DISK{1,2,3}; do sg_persist -n -i -k -d $d ; done
>     PR generation=0xf, 3 registered reservation keys follow:
>       0xb3bc0001
>       0xb3bc0000
>       0xb3bc0002
>     PR generation=0x8, there are NO registered reservation keys
>     PR generation=0x0, there are NO registered reservation keys

>   [root@virt-245 ~]# pcs status --full
>   Cluster name: STSRHTS15434
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-246 (2) (version 2.1.4-4.el8-dc6eb4362e) - partition with quorum
>     * Last updated: Mon Aug  8 11:01:25 2022
>     * Last change:  Mon Aug  8 11:01:17 2022 by root via cibadmin on virt-245
>     * 3 nodes configured
>     * 1 resource instance configured

>   Node List:
>     * Online: [ virt-245 (1) virt-246 (2) virt-248 (3) ]

>   Full List of Resources:
>     * fence-scsi	(stonith:fence_scsi):	 Started virt-245

>   Migration Summary:

>   Fencing History:
>     * unfencing of virt-245 successful: delegate=virt-245, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00'
>     * unfencing of virt-246 successful: delegate=virt-246, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00'
>     * unfencing of virt-248 successful: delegate=virt-248, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00'

>   Tickets:

>   PCSD Status:
>     virt-245: Online
>     virt-246: Online
>     virt-248: Online

>   Daemon Status:
>     corosync: active/disabled
>     pacemaker: active/disabled
>     pcsd: active/enabled

Create Dummy resource:

>   [root@virt-245 ~]# pcs resource create dummy1 ocf:pacemaker:Dummy
>   [root@virt-245 ~]# pcs resource
>     * dummy1	(ocf::pacemaker:Dummy):	 Started virt-246
>   [root@virt-245 ~]# crm_resource --list-all-operations --resource dummy1 | grep start
>   dummy1	(ocf::pacemaker:Dummy):	 Started: dummy1_start_0 (node=virt-246, call=36, rc=0, last-rc-change=Mon Aug  8 11:01:38 2022, exec=21ms): complete

Update configuration of the fence device:

>   [root@virt-245 ~]# pcs stonith update fence-scsi devices=$DISK1,$DISK2
>   [root@virt-245 ~]# pcs stonith config
>   Resource: fence-scsi (class=stonith type=fence_scsi)
>     Attributes: fence-scsi-instance_attributes
>       devices=/dev/disk/by-id/scsi-360014052414c35e879e4d1f8f53e12c0,/dev/disk/by-id/scsi-36001405b755af8cd519497a96ef77a3d
>       pcmk_host_check=static-list
>       pcmk_host_list="virt-245 virt-246 virt-248"
>       pcmk_reboot_action=off
>     Meta Attributes: fence-scsi-meta_attributes
>       provides=unfencing
>     Operations:
>       monitor: fence-scsi-monitor-interval-60s
>         interval=60s

All cluster nodes are re-unfenced:

>   [root@virt-245 ~]# pcs status --full
>   Cluster name: STSRHTS15434
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-246 (2) (version 2.1.4-4.el8-dc6eb4362e) - partition with quorum
>     * Last updated: Mon Aug  8 11:02:48 2022
>     * Last change:  Mon Aug  8 11:02:34 2022 by root via cibadmin on virt-245
>     * 3 nodes configured
>     * 2 resource instances configured

>   Node List:
>     * Online: [ virt-245 (1) virt-246 (2) virt-248 (3) ]

>   Full List of Resources:
>     * fence-scsi	(stonith:fence_scsi):	 Started virt-245
>     * dummy1	(ocf::pacemaker:Dummy):	 Started virt-246

>   Migration Summary:

>   Fencing History:
>     * unfencing of virt-245 successful: delegate=virt-245, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:02:34 +02:00'
>     * unfencing of virt-248 successful: delegate=virt-248, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:02:34 +02:00'
>     * unfencing of virt-246 successful: delegate=virt-246, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:02:34 +02:00'
>     * unfencing of virt-245 successful: delegate=virt-245, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00'
>     * unfencing of virt-246 successful: delegate=virt-246, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00'
>     * unfencing of virt-248 successful: delegate=virt-248, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00'

>   Tickets:

>   PCSD Status:
>     virt-245: Online
>     virt-246: Online
>     virt-248: Online

>   Daemon Status:
>     corosync: active/disabled
>     pacemaker: active/disabled
>     pcsd: active/enabled

Dummy resource restarted:

>   [root@virt-245 ~]# crm_resource --list-all-operations --resource dummy1 | grep start
>   dummy1	(ocf::pacemaker:Dummy):	 Started: dummy1_start_0 (node=virt-246, call=45, rc=0, last-rc-change=Mon Aug  8 11:02:34 2022, exec=142ms): complete


marking verified in pacemaker-2.1.4-4.el8

Comment 15 errata-xmlrpc 2022-11-08 09:42:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7573