Bug 1872483
| Summary: | Only some nodes get re-unfenced after unfencing device configuration change | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Ken Gaillot <kgaillot> |
| Component: | pacemaker | Assignee: | Klaus Wenninger <kwenning> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.3 | CC: | cluster-maint, lmiksik, msmazova |
| Target Milestone: | rc | Keywords: | Reopened, Triaged |
| Target Release: | 8.7 | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-2.1.4-4.el8 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: Pacemaker tracked the device configuration used to unfence a node only for Pacemaker Remote nodes.
Consequence: If the configuration is changed for a fencing device that supports unfencing, only Pacemaker Remote nodes and the cluster node running the device monitor will be unfenced again.
Fix: Pacemaker now tracks unfencing configuration for all nodes.
Result: If the configuration is changed for a fencing device that supports unfencing, all nodes will be unfenced again.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-08 09:42:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ken Gaillot
2020-08-25 21:26:59 UTC
When performing an action, Pacemaker saves a hash of the resource/action parameters in the operation history when recording the result. When checking current conditions, Pacemaker recalculates the hash and compares it to the stored hash, and if they're different, it restarts the resource. This is what causes the node running the device to be re-unfenced. Pacemaker also sets a #node-unfenced node attribute on all cluster and remote nodes indicating when the node was last unfenced. In addition, it copies the unfencing op hash to the #digests-all node attribute for all remote nodes. When comparing the calculated hash to the stored hash, it also compares it against these hashes, which is what causes the remote nodes to be re-unfenced. The fix will be to extend the special node attributes to all nodes. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. This is a priority for 8.7 / 9.1 before fix: ----------- > [root@virt-032 ~]# rpm -q pacemaker > Pacemaker-2.1.2-4.el8.x86_64 Configure a cluster of at least two cluster nodes and an unfencing-capable fence device: > [root@virt-032 ~]# export DISK1=/dev/disk/by-id/scsi-360014059b30c84d885b4dd28e53b99b5 > [root@virt-032 ~]# export DISK2=/dev/disk/by-id/scsi-36001405a62d8d74eda6497b9392b29da > [root@virt-032 ~]# export DISK3=/dev/disk/by-id/scsi-360014057bdd1a8cf9a54d3a8313ac04f > [root@virt-245 ~]# pcs stonith create fence-scsi fence_scsi devices=$DISK1 pcmk_host_check=static-list pcmk_host_list="virt-032 virt-033 virt-034" pcmk_reboot_action=off meta provides=unfencing > [root@virt-032 ~]# pcs stonith config > Resource: fence-scsi (class=stonith type=fence_scsi) > Attributes: devices=/dev/disk/by-id/scsi-360014059b30c84d885b4dd28e53b99b5 pcmk_host_check=static-list pcmk_host_list="virt-032 virt-033 virt-034" pcmk_reboot_action=off > Meta Attrs: provides=unfencing > Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) > [root@virt-032 ~]# for d in $DISK{1,2,3}; do sg_persist -n -i -k -d $d ; done > PR generation=0x9, 3 registered reservation keys follow: > 0x78460000 > 0x78460001 > 0x78460002 > PR generation=0x6, there are NO registered reservation keys > PR generation=0x6, there are NO registered reservation keys > [root@virt-032 ~]# pcs status --full > Cluster name: STSRHTS28411 > Cluster Summary: > * Stack: corosync > * Current DC: virt-034 (3) (version 2.1.2-4.el8-ada5c3b36e2) - partition with quorum > * Last updated: Mon Aug 8 10:07:06 2022 > * Last change: Mon Aug 8 10:05:33 2022 by root via cibadmin on virt-032 > * 3 nodes configured > * 1 resource instance configured > Node List: > * Online: [ virt-032 (1) virt-033 (2) virt-034 (3) ] > Full List of Resources: > * fence-scsi (stonith:fence_scsi): Started virt-032 > Migration Summary: > Fencing History: > * unfencing of virt-034 successful: delegate=virt-034, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00' > * unfencing of virt-032 successful: delegate=virt-032, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00' > * unfencing of virt-033 successful: delegate=virt-033, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00' > Tickets: > PCSD Status: > virt-032: Online > virt-033: Online > virt-034: Online > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled Create Dummy resource: > [root@virt-032 ~]# pcs resource create dummy1 ocf:pacemaker:Dummy > [root@virt-032 ~]# pcs resource > * dummy1 (ocf::pacemaker:Dummy): Started virt-033 > [root@virt-032 ~]# crm_resource --list-all-operations --resource dummy1 | grep start > dummy1 (ocf::pacemaker:Dummy): Started: dummy1_start_0 (node=virt-033, call=41, rc=0, last-rc-change=Mon Aug 8 10:09:32 2022, exec=13ms): complete Update configuration of the fence device: > [root@virt-032 ~]# pcs stonith update fence-scsi devices=$DISK1,$DISK2 > [root@virt-032 ~]# pcs stonith config > Resource: fence-scsi (class=stonith type=fence_scsi) > Attributes: devices=/dev/disk/by-id/scsi-360014059b30c84d885b4dd28e53b99b5,/dev/disk/by-id/scsi-36001405a62d8d74eda6497b9392b29da pcmk_host_check=static-list pcmk_host_list="virt-032 virt-033 virt-034" pcmk_reboot_action=off > Meta Attrs: provides=unfencing > Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) Only cluster node running the fence device (virt-032) is re-unfenced: > [root@virt-032 ~]# crm_mon -1m > Cluster Summary: > * Stack: corosync > * Current DC: virt-034 (version 2.1.2-4.el8-ada5c3b36e2) - partition with quorum > * Last updated: Mon Aug 8 10:16:57 2022 > * Last change: Mon Aug 8 10:16:34 2022 by root via cibadmin on virt-032 > * 3 nodes configured > * 2 resource instances configured > Node List: > * Online: [ virt-032 virt-033 virt-034 ] > Active Resources: > * fence-scsi (stonith:fence_scsi): Started virt-032 > * dummy1 (ocf::pacemaker:Dummy): Started virt-033 > Fencing History: > * unfencing of virt-032 successful: delegate=virt-032, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:16:36 +02:00' > * unfencing of virt-034 successful: delegate=virt-034, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00' > * unfencing of virt-032 successful: delegate=virt-032, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00' > * unfencing of virt-033 successful: delegate=virt-033, client=pacemaker-controld.618423, origin=virt-034, completed='2022-08-08 10:05:34 +02:00' Dummy resource did not restart: > [root@virt-032 ~]# crm_resource --list-all-operations --resource dummy1 | grep start > dummy1 (ocf::pacemaker:Dummy): Started: dummy1_start_0 (node=virt-033, call=41, rc=0, last-rc-change=Mon Aug 8 10:09:32 2022, exec=13ms): complete after fix: ---------- > [root@virt-245 ~]# rpm -q pacemaker > pacemaker-2.1.4-4.el8.x86_64 Configure a cluster of at least two cluster nodes and an unfencing-capable fence device: > [root@virt-245 ~]# export DISK1=/dev/disk/by-id/scsi-360014052414c35e879e4d1f8f53e12c0 > [root@virt-245 ~]# export DISK2=/dev/disk/by-id/scsi-36001405b755af8cd519497a96ef77a3d > [root@virt-245 ~]# export DISK3=/dev/disk/by-id/scsi-36001405fbdbf026cc2d49e8ad582c9e1 > [root@virt-245 ~]# pcs stonith create fence-scsi fence_scsi devices=$DISK1 pcmk_host_check=static-list pcmk_host_list="virt-245 virt-246 virt-248" pcmk_reboot_action=off meta provides=unfencing > [root@virt-245 ~]# pcs stonith config > Resource: fence-scsi (class=stonith type=fence_scsi) > Attributes: fence-scsi-instance_attributes > devices=/dev/disk/by-id/scsi-360014052414c35e879e4d1f8f53e12c0 > pcmk_host_check=static-list > pcmk_host_list="virt-245 virt-246 virt-248" > pcmk_reboot_action=off > Meta Attributes: fence-scsi-meta_attributes > provides=unfencing > Operations: > monitor: fence-scsi-monitor-interval-60s > interval=60s > [root@virt-245 ~]# for d in $DISK{1,2,3}; do sg_persist -n -i -k -d $d ; done > PR generation=0xf, 3 registered reservation keys follow: > 0xb3bc0001 > 0xb3bc0000 > 0xb3bc0002 > PR generation=0x8, there are NO registered reservation keys > PR generation=0x0, there are NO registered reservation keys > [root@virt-245 ~]# pcs status --full > Cluster name: STSRHTS15434 > Cluster Summary: > * Stack: corosync > * Current DC: virt-246 (2) (version 2.1.4-4.el8-dc6eb4362e) - partition with quorum > * Last updated: Mon Aug 8 11:01:25 2022 > * Last change: Mon Aug 8 11:01:17 2022 by root via cibadmin on virt-245 > * 3 nodes configured > * 1 resource instance configured > Node List: > * Online: [ virt-245 (1) virt-246 (2) virt-248 (3) ] > Full List of Resources: > * fence-scsi (stonith:fence_scsi): Started virt-245 > Migration Summary: > Fencing History: > * unfencing of virt-245 successful: delegate=virt-245, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00' > * unfencing of virt-246 successful: delegate=virt-246, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00' > * unfencing of virt-248 successful: delegate=virt-248, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00' > Tickets: > PCSD Status: > virt-245: Online > virt-246: Online > virt-248: Online > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Create Dummy resource: > [root@virt-245 ~]# pcs resource create dummy1 ocf:pacemaker:Dummy > [root@virt-245 ~]# pcs resource > * dummy1 (ocf::pacemaker:Dummy): Started virt-246 > [root@virt-245 ~]# crm_resource --list-all-operations --resource dummy1 | grep start > dummy1 (ocf::pacemaker:Dummy): Started: dummy1_start_0 (node=virt-246, call=36, rc=0, last-rc-change=Mon Aug 8 11:01:38 2022, exec=21ms): complete Update configuration of the fence device: > [root@virt-245 ~]# pcs stonith update fence-scsi devices=$DISK1,$DISK2 > [root@virt-245 ~]# pcs stonith config > Resource: fence-scsi (class=stonith type=fence_scsi) > Attributes: fence-scsi-instance_attributes > devices=/dev/disk/by-id/scsi-360014052414c35e879e4d1f8f53e12c0,/dev/disk/by-id/scsi-36001405b755af8cd519497a96ef77a3d > pcmk_host_check=static-list > pcmk_host_list="virt-245 virt-246 virt-248" > pcmk_reboot_action=off > Meta Attributes: fence-scsi-meta_attributes > provides=unfencing > Operations: > monitor: fence-scsi-monitor-interval-60s > interval=60s All cluster nodes are re-unfenced: > [root@virt-245 ~]# pcs status --full > Cluster name: STSRHTS15434 > Cluster Summary: > * Stack: corosync > * Current DC: virt-246 (2) (version 2.1.4-4.el8-dc6eb4362e) - partition with quorum > * Last updated: Mon Aug 8 11:02:48 2022 > * Last change: Mon Aug 8 11:02:34 2022 by root via cibadmin on virt-245 > * 3 nodes configured > * 2 resource instances configured > Node List: > * Online: [ virt-245 (1) virt-246 (2) virt-248 (3) ] > Full List of Resources: > * fence-scsi (stonith:fence_scsi): Started virt-245 > * dummy1 (ocf::pacemaker:Dummy): Started virt-246 > Migration Summary: > Fencing History: > * unfencing of virt-245 successful: delegate=virt-245, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:02:34 +02:00' > * unfencing of virt-248 successful: delegate=virt-248, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:02:34 +02:00' > * unfencing of virt-246 successful: delegate=virt-246, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:02:34 +02:00' > * unfencing of virt-245 successful: delegate=virt-245, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00' > * unfencing of virt-246 successful: delegate=virt-246, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00' > * unfencing of virt-248 successful: delegate=virt-248, client=pacemaker-controld.512439, origin=virt-246, completed='2022-08-08 11:01:18 +02:00' > Tickets: > PCSD Status: > virt-245: Online > virt-246: Online > virt-248: Online > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Dummy resource restarted: > [root@virt-245 ~]# crm_resource --list-all-operations --resource dummy1 | grep start > dummy1 (ocf::pacemaker:Dummy): Started: dummy1_start_0 (node=virt-246, call=45, rc=0, last-rc-change=Mon Aug 8 11:02:34 2022, exec=142ms): complete marking verified in pacemaker-2.1.4-4.el8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7573 |