RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2179010 - Need a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources
Summary: Need a way to add a scsi fencing device to a cluster without requiring a rest...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pcs
Version: 8.7
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: rc
: 8.9
Assignee: Miroslav Lisik
QA Contact: cluster-qe
Steven J. Levine
URL:
Whiteboard:
Depends On: 2177996
Blocks: 2180706 2180707
TreeView+ depends on / blocked
 
Reported: 2023-03-16 12:34 UTC by Tomas Jelinek
Modified: 2023-11-14 15:53 UTC (History)
12 users (show)

Fixed In Version: pcs-0.10.16-1.el8
Doc Type: Bug Fix
Doc Text:
.`pcs` command to update multipath SCSI devices now works correctly Due to changes in the Pacemaker CIB file, the `pcs stonith update-scsi-devices` command stopped working as designed, causing an unwanted restart of some cluster resources. With this fix, this command works correctly and updates SCSI devices without requiring a restart of other cluster resources running on the same node.
Clone Of: 2177996
: 2180706 2180707 (view as bug list)
Environment:
Last Closed: 2023-11-14 15:22:35 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CLUSTERQE-6459 0 None None None 2023-03-16 17:02:24 UTC
Red Hat Issue Tracker RHELPLAN-152060 0 None None None 2023-03-16 12:37:05 UTC
Red Hat Knowledge Base (Solution) 4526971 0 None None None 2023-03-16 12:54:46 UTC
Red Hat Product Errata RHBA-2023:6903 0 None None None 2023-11-14 15:23:15 UTC

Description Tomas Jelinek 2023-03-16 12:34:43 UTC
+++ This bug was initially created as a clone of Bug #2177996 +++

Description of problem:

###Similar to Bug 2024522, only difference is that we are using fence_scsi agent to build our cluster:

"cmd": "pcs stonith create emc_fence fence_scsi \"devices=/dev/mapper/368ccf09800de023572e4a63b259687d1,/dev/mapper/368ccf098003e693c025c28dd5cdf69d9,/dev/mapper/368ccf0980076ee9888c1334880ea05d1\" \"pcmk_host_list=e2e-l4-236128,e2e-l4-236129\" pcmk_monitor_action=\"metadata\" pcmk_reboot_action=\"off\" pcmk_host_check=\"static-list\" meta provides=\"unfencing --force\"\n",

[root@e2e-l4-236129 ~]# pcs stonith config
Resource: emc_fence (class=stonith type=fence_scsi)
  Attributes: emc_fence-instance_attributes
    devices=/dev/mapper/368ccf098003e693c025c28dd5cdf69d9,/dev/mapper/368ccf0980076ee9888c1334880ea05d1,/dev/mapper/368ccf09800de023572e4a63b259687d1
    pcmk_host_check=static-list
    pcmk_host_list=e2e-l4-236128,e2e-l4-236129
    pcmk_monitor_action=metadata
    pcmk_reboot_action=off
  Meta Attributes: emc_fence-meta_attributes
    provides=unfencing
  Operations:
    monitor: emc_fence-monitor-interval-60s
      interval=60s

####Create several resources on the cluster:

[root@e2e-l4-236129 ~]# pcs status
Cluster name: RHCS
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-03-14 03:51:09 -04:00)
Cluster Summary:
  * Stack: corosync
  * Current DC: e2e-l4-236128 (version 2.1.5-7.el9-a3f44794f94) - partition with quorum
  * Last updated: Tue Mar 14 03:51:10 2023
  * Last change:  Tue Mar 14 03:27:11 2023 by root via cibadmin on e2e-l4-236128
  * 2 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ e2e-l4-236128 e2e-l4-236129 ]

Full List of Resources:
  * emc_fence   (stonith:fence_scsi):    Started e2e-l4-236128
  * Clone Set: dlm-clone [dlm]:
    * Started: [ e2e-l4-236128 e2e-l4-236129 ]
  * Clone Set: lvmlockd-clone [lvmlockd]:
    * Started: [ e2e-l4-236128 e2e-l4-236129 ]
  * Clone Set: alua_1_vg_1678695957-clone [alua_1_vg_1678695957]:
    * Started: [ e2e-l4-236128 e2e-l4-236129 ]
  * Clone Set: alua_0_vg_1678695957-clone [alua_0_vg_1678695957]:
    * Started: [ e2e-l4-236128 e2e-l4-236129 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


####And then we use update-scsi-devices to add or remove one more device to stonith devices

[root@e2e-l4-236128 ~]# pcs stonith update-scsi-devices emc_fence remove /dev/mapper/368ccf09800b6f39e1227c2c42c3fb481
[root@e2e-l4-236128 ~]# pcs stonith update-scsi-devices emc_fence add /dev/mapper/368ccf09800b6f39e1227c2c42c3fb481

####We can observe resource lock down and cluster restart  

Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice: Unfencing Pacemaker Remote node e2e-l4-236129 because the definition of emc_fence changed
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice: Unfencing Pacemaker Remote node e2e-l4-236128 because the definition of emc_fence changed
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice:  * Fence (on) e2e-l4-236128
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice:  * Fence (on) e2e-l4-236129
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice: Actions: Restart    emc_fence      ( e2e-l4-236128 )  due to required stonith
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice: Actions: Restart    dlm:0          ( e2e-l4-236128 )  due to required stonith

####Is there anyway we can avoid restarting fencing when we modify the stonith devices?

--- Additional comment from Miroslav Lisik on 2023-03-14 14:18:12 CET ---

Ken, could you look at this?

It does not work with latest pacemaker-2.1.5-7.el9 in RHEL-9.3. The last working version is pacemaker-2.1.4-2.el9 in RHEL-9.1. Is it possible that something has changed in pacemaker-2.1.4-3.el9 regarding digests calculation? (bz1872376)

--- Additional comment from Ken Gaillot on 2023-03-14 17:21:18 CET ---

(In reply to Miroslav Lisik from comment #1)
> Ken, could you look at this?
> 
> It does not work with latest pacemaker-2.1.5-7.el9 in RHEL-9.3. The last working version is pacemaker-2.1.4-2.el9 in RHEL-9.1. Is it possible that something has changed in pacemaker-2.1.4-3.el9 regarding digests calculation? (bz1872376)

Not that I know of, and our regression test for it passes. Can you give me the CIB and commands you're using to test?

--- Additional comment from Miroslav Lisik on 2023-03-14 18:08:44 CET ---

Here are commands:

export disk1=/dev/disk/by-id/scsi-SLIO-ORG_r91-disk-01_7ad95d75-3cf3-448e-a591-42b9ba690b22
export disk2=/dev/disk/by-id/scsi-SLIO-ORG_r91-disk-02_e9a0c17d-c631-41cf-a135-e3453ce0c501
pcs host auth -u hacluster -p password r91-1 r91-2
pcs cluster setup HACluster r91-1 r91-2 --start --wait
pcs stonith create fence-scsi fence_scsi devices=$disk1 pcmk_host_check=static-list 'pcmk_host_list=r91-1 r91-2' pcmk_reboot_action=off meta provides=unfencing
pcs resource create d1 ocf:pacemaker:Dummy
pcs resource create d2 ocf:pacemaker:Dummy
pcs cluster cib > cib_before.xml
pcs stonith update-scsi-devices fence-scsi add $disk2
pcs cluster cib > cib_after.xml

Logs after executing 'update-scsi-devices' command folllows.

With latest rhel-9.1.z versions, see cib_before.xml, cib_after.xml:

[root@r91-1 ~]# rpm -q pcs pacemaker
pcs-0.11.3-4.el9_1.2.x86_64
pacemaker-2.1.4-5.el9_1.2.x86_64

[root@r91-1 ~]# journalctl -n0 -f
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: Added 'fence-scsi' to device list (1 active device)
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of stop operation for fence-scsi on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of stop operation for fence-scsi on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of stop operation for d2 on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of stop operation for d2 on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: fence-scsi is eligible to fence (on) r91-1: static-list
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: fence-scsi is eligible to fence (on) r91-1: static-list
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: Operation 'on' [2379] targeting r91-1 using fence-scsi returned 0
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: Operation 'on' targeting r91-1 by r91-1 for pacemaker-controld.1631@r91-2: OK (complete)
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: r91-1 was unfenced by r91-1 at the request of pacemaker-controld.1631@r91-2
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #node-unfenced[r91-1]: 1678811397 -> 1678811811
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #digests-all[r91-1]: fence-scsi:fence_scsi:ec9ecba84b274fb35effbf2a47226087, -> fence-scsi:fence_scsi:41f4daa097914fe0b3f6ba8363f28cf9,
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #digests-secure[r91-1]: fence-scsi:fence_scsi:8a7469d06699bb5cdf0da9304affaf6e, -> fence-scsi:fence_scsi:2467afb1330d3c0048abdc371dea6bc3,
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: Operation 'on' targeting r91-2 by r91-2 for pacemaker-controld.1631@r91-2: OK (complete)
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: r91-2 was unfenced by r91-2 at the request of pacemaker-controld.1631@r91-2
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #node-unfenced[r91-2]: 1678811397 -> 1678811811
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #digests-all[r91-2]: fence-scsi:fence_scsi:ec9ecba84b274fb35effbf2a47226087, -> fence-scsi:fence_scsi:41f4daa097914fe0b3f6ba8363f28cf9,
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #digests-secure[r91-2]: fence-scsi:fence_scsi:8a7469d06699bb5cdf0da9304affaf6e, -> fence-scsi:fence_scsi:2467afb1330d3c0048abdc371dea6bc3,
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of start operation for d2 on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of start operation for d2 on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of start operation for fence-scsi on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of start operation for fence-scsi on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of monitor operation for fence-scsi on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of monitor operation for d2 on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of monitor operation for d2 on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of monitor operation for fence-scsi on r91-1: ok


[root@r91-2 ~]# journalctl -n0 -f
Mar 14 17:36:38 r91-2 systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Added 'fence-scsi' to device list (1 active device)
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Unfencing r91-2 (remote): because the definition of fence-scsi changed
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Unfencing r91-1 (remote): because the definition of fence-scsi changed
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice:  * Fence (on) r91-1
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice:  * Fence (on) r91-2
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Actions: Restart    fence-scsi     ( r91-1 )  due to required stonith
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Actions: Restart    d1             ( r91-2 )  due to required stonith
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Actions: Restart    d2             ( r91-1 )  due to required stonith
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-205.bz2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating stop operation fence-scsi_stop_0 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating stop operation d1_stop_0 locally on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting local execution of stop operation for d1 on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating stop operation d2_stop_0 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Result of stop operation for d1 on r91-2: ok
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting fencing (on) of node r91-2
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Client pacemaker-controld.1631 wants to fence (on) r91-2 using any device
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Requesting peer fencing (on) targeting r91-2
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: fence-scsi is eligible to fence (on) r91-2: static-list
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Requesting that r91-2 perform 'on' action targeting r91-2
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: fence-scsi is eligible to fence (on) r91-2: static-list
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting fencing (on) of node r91-1
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Client pacemaker-controld.1631 wants to fence (on) r91-1 using any device
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Requesting peer fencing (on) targeting r91-1
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Requesting that r91-1 perform 'on' action targeting r91-1
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Operation 'on' targeting r91-1 by r91-1 for pacemaker-controld.1631@r91-2: OK (complete)
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Fence operation 10 for r91-1 passed
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: r91-1 was unfenced by r91-1 at the request of pacemaker-controld.1631@r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating start operation fence-scsi_start_0 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating start operation d2_start_0 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #node-unfenced[r91-1]: 1678811397 -> 1678811811
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #digests-all[r91-1]: fence-scsi:fence_scsi:ec9ecba84b274fb35effbf2a47226087, -> fence-scsi:fence_scsi:41f4daa097914fe0b3f6ba8363f28cf9,
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #digests-secure[r91-1]: fence-scsi:fence_scsi:8a7469d06699bb5cdf0da9304affaf6e, -> fence-scsi:fence_scsi:2467afb1330d3c0048abdc371dea6bc3,
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Transition 5 aborted by status-1-.node-unfenced doing modify #node-unfenced=1678811811: Transient attribute change
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Operation 'on' [2266] targeting r91-2 using fence-scsi returned 0
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Operation 'on' targeting r91-2 by r91-2 for pacemaker-controld.1631@r91-2: OK (complete)
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Fence operation 9 for r91-2 passed
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: r91-2 was unfenced by r91-2 at the request of pacemaker-controld.1631@r91-2
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #node-unfenced[r91-2]: 1678811397 -> 1678811811
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #digests-all[r91-2]: fence-scsi:fence_scsi:ec9ecba84b274fb35effbf2a47226087, -> fence-scsi:fence_scsi:41f4daa097914fe0b3f6ba8363f28cf9,
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #digests-secure[r91-2]: fence-scsi:fence_scsi:8a7469d06699bb5cdf0da9304affaf6e, -> fence-scsi:fence_scsi:2467afb1330d3c0048abdc371dea6bc3,
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Transition 5 (Complete=7, Pending=0, Fired=0, Skipped=3, Incomplete=4, Source=/var/lib/pacemaker/pengine/pe-input-205.bz2): Stopped
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Actions: Start      d1             ( r91-2 )
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Calculated transition 6, saving inputs in /var/lib/pacemaker/pengine/pe-input-206.bz2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating monitor operation fence-scsi_monitor_60000 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating start operation d1_start_0 locally on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating monitor operation d2_monitor_10000 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting local execution of start operation for d1 on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Result of start operation for d1 on r91-2: ok
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating monitor operation d1_monitor_10000 locally on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting local execution of monitor operation for d1 on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Result of monitor operation for d1 on r91-2: ok
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Transition 6 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-206.bz2): Complete
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

With the last working version, see cib_before2.xml, cib_after2.xml:

[root@r91-1 ~]# rpm -q pcs pacemaker
pcs-0.11.3-4.el9_1.2.x86_64
pacemaker-2.1.4-2.el9.x86_64

[root@r91-1 ~]# journalctl -n0 -f
Mar 14 17:53:30 r91-1 pacemaker-fenced[4320]:  notice: Added 'fence-scsi' to device list (1 active device)

[root@r91-2 ~]# journalctl -n0 -f
Mar 14 17:53:30 r91-2 pacemaker-controld[4061]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
Mar 14 17:53:30 r91-2 pacemaker-fenced[4057]:  notice: Added 'fence-scsi' to device list (1 active device)
Mar 14 17:53:30 r91-2 pacemaker-schedulerd[4060]:  notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-215.bz2
Mar 14 17:53:30 r91-2 pacemaker-controld[4061]:  notice: Transition 5 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-215.bz2): Complete
Mar 14 17:53:30 r91-2 pacemaker-controld[4061]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

--- Additional comment from Ken Gaillot on 2023-03-15 02:58:17 CET ---

I need to investigate some more, but I think this is not a new problem, but an expansion of a case we neglected with the original bz.

We're changing the digests in the resource history, but there is a separate copy of the digest stored in the "#digests-all" and "#digest-secure" node attributes, used to detect whether a node needs to be re-unfenced after a change in stonith device definition.

Before 2.1.4-3, this only had an effect on Pacemaker Remote nodes, which is why we missed it. Since 2.1.4-3, it applies to all nodes.

That was an important fix, so I think the only way around this will be for pcs to replace those node attribute values in addition to the digests in the resource history.

The node attribute values look like: "stonith-fence_compute-fence-nova:fence_compute:ad312d85623cdb0a792e6fbd5e91a820,"

That is a comma-separated list of DEVICE_NAME:AGENT_NAME:DIGEST for each stonith device that has unfenced the node. "#digests-all" uses the all-parameter digest, and "#digest-secure" (which is only relevant to crm_simulate) uses the non-private-parameter digest.

It's possible that's not the only issue, but that's definitely part of it.

Comment 9 Miroslav Lisik 2023-03-29 12:27:13 UTC
Upstream commit: https://github.com/ClusterLabs/pcs/commit/bf7d33bdd41f6e51321ae66cd521cefc93acb3a4
Updated commands:
  * pcs stonith update-scsi-devices

Test:

Setup cluster with a shared storage and fence_scsi fencing.
Setup enough resources in order to have each node running some resource.
Use `pcs stonith update-scsi-devices` command to modify scsi devices of the fence_scsi stonith device.
Check that resources did not restart (journalctl, crm_rsource --list-operations)

Comment 12 Michal Pospisil 2023-05-29 10:10:11 UTC
DevTestResults:

[root@r08-09-a ~]# rpm -q pcs
pcs-0.10.16-1.el8.x86_64

(pcs) [root@r08-09-a pcs]# pcs_test/suite --installed --traditional-verbose pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_nonrecurring_start_op_with_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_nonrecurring_start_op_with_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_monitor_with_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_monitor_with_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_one_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_one_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_no_monitor_ops (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_no_monitor_ops ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_default_monitor (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_default_monitor ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_timeouts (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_timeouts ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_digests_with_empty_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_digests_with_empty_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes_multi_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_all_digest_types (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_all_digest_types ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_no_digest_for_our_stonith_id (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_no_digest_for_our_stonith_id ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_digests_attrs (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_digests_attrs ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes_multi_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_various_start_ops_one_lrm_start_op (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_various_start_ops_one_lrm_start_op ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma_multi_value ... OK

----------------------------------------------------------------------
Ran 17 tests in 0.096s

OK

Additional testing on real cluster by mlisik:

root@r8-node-01 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)
[root@r8-node-01 ~]# rpm -q pcs pacemaker
pcs-0.10.16-1.el8.x86_64
pacemaker-2.1.6-1.el8.x86_64
[root@r8-node-01 ~]# export disk1=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d
[root@r8-node-01 ~]# export disk2=/dev/disk/by-id/scsi-36001405fb15e3edf2994db380037abac
[root@r8-node-01 ~]# export NODELIST=(r8-node-01 r8-node-02)
[root@r8-node-01 ~]# pcs host auth -u hacluster -p password ${NODELIST[*]}
r8-node-01: Authorized
r8-node-02: Authorized
[root@r8-node-01 ~]# pcs cluster setup HACluster ${NODELIST[*]} --start --wait
No addresses specified for host 'r8-node-01', using 'r8-node-01'
No addresses specified for host 'r8-node-02', using 'r8-node-02'
Destroying cluster on hosts: 'r8-node-01', 'r8-node-02'...
r8-node-01: Successfully destroyed cluster
r8-node-02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'r8-node-01', 'r8-node-02'
r8-node-01: successful removal of the file 'pcsd settings'
r8-node-02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'r8-node-01', 'r8-node-02'
r8-node-01: successful distribution of the file 'corosync authkey'
r8-node-01: successful distribution of the file 'pacemaker authkey'
r8-node-02: successful distribution of the file 'corosync authkey'
r8-node-02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'r8-node-01', 'r8-node-02'
r8-node-01: successful distribution of the file 'corosync.conf'
r8-node-02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'r8-node-01', 'r8-node-02'...
Waiting for node(s) to start: 'r8-node-01', 'r8-node-02'...
r8-node-01: Cluster started
r8-node-02: Cluster started
[root@r8-node-01 ~]# pcs stonith create fence-scsi fence_scsi devices=$disk1 pcmk_host_check=static-list pcmk_host_list="${NODELIST[*]}" pcmk_reboot_action=off meta provides=unfencing
[root@r8-node-01 ~]# for i in $(seq 1 ${#NODELIST[@]}); do pcs resource create "d$i" ocf:pacemaker:Dummy; done
[root@r8-node-01 ~]# pcs resource
  * d1  (ocf::pacemaker:Dummy):  Started r8-node-02
  * d2  (ocf::pacemaker:Dummy):  Started r8-node-01
[root@r8-node-01 ~]# pcs stonith
  * fence-scsi  (stonith:fence_scsi):	Started r8-node-01
[root@r8-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: fence-scsi-instance_attributes
	devices=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d
	pcmk_host_check=static-list
	pcmk_host_list="r8-node-01 r8-node-02"
	pcmk_reboot_action=off
  Meta Attributes: fence-scsi-meta_attributes
	provides=unfencing
  Operations:
	monitor: fence-scsi-monitor-interval-60s
  	interval=60s
[root@r8-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o1.txt
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_start_0 (node=r8-node-01, call=6, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=87ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_60000 (node=r8-node-01, call=7, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=88ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_0 (node=r8-node-02, call=5, rc=7, last-rc-change='Fri May 26 15:34:39 2023', exec=2ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_monitor_0 (node=r8-node-01, call=11, rc=7, last-rc-change='Fri May 26 15:34:40 2023', exec=14ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_start_0 (node=r8-node-02, call=10, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=18ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_monitor_10000 (node=r8-node-02, call=11, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=11ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_start_0 (node=r8-node-01, call=16, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=16ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_monitor_10000 (node=r8-node-01, call=17, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=13ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_monitor_0 (node=r8-node-02, call=15, rc=7, last-rc-change='Fri May 26 15:34:41 2023', exec=20ms): complete
[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi add $disk2
[root@r8-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o2.txt
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_start_0 (node=r8-node-01, call=6, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=87ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_60000 (node=r8-node-01, call=7, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=88ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_0 (node=r8-node-02, call=5, rc=7, last-rc-change='Fri May 26 15:34:39 2023', exec=2ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_monitor_0 (node=r8-node-01, call=11, rc=7, last-rc-change='Fri May 26 15:34:40 2023', exec=14ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_start_0 (node=r8-node-02, call=10, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=18ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_monitor_10000 (node=r8-node-02, call=11, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=11ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_start_0 (node=r8-node-01, call=16, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=16ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_monitor_10000 (node=r8-node-01, call=17, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=13ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_monitor_0 (node=r8-node-02, call=15, rc=7, last-rc-change='Fri May 26 15:34:41 2023', exec=20ms): complete
[root@r8-node-01 ~]# diff -u o1.txt o2.txt
[root@r8-node-01 ~]# echo $?
0

[root@r8-node-01 ~]# journalctl -n0 -f
-- Logs begin at Fri 2023-05-26 15:09:03 CEST. --
May 26 15:39:47 r8-node-01 pacemaker-fenced[4373]:  notice: Added 'fence-scsi' to device list (1 active device)

[root@r8-node-02 ~]# journalctl -n0 -f
-- Logs begin at Fri 2023-05-26 15:09:05 CEST. --
May 26 15:39:47 r8-node-02 pacemaker-controld[3703]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
May 26 15:39:47 r8-node-02 pacemaker-fenced[3699]:  notice: Added 'fence-scsi' to device list (1 active device)
May 26 15:39:47 r8-node-02 pacemaker-schedulerd[3702]:  notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-400.bz2
May 26 15:39:47 r8-node-02 pacemaker-controld[3703]:  notice: Transition 5 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-400.bz2): Complete
May 26 15:39:47 r8-node-02 pacemaker-controld[3703]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

[root@r8-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: fence-scsi-instance_attributes
	devices=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d,/dev/disk/by-
    	id/scsi-36001405fb15e3edf2994db380037abac
	pcmk_host_check=static-list
	pcmk_host_list="r8-node-01 r8-node-02"
	pcmk_reboot_action=off
  Meta Attributes: fence-scsi-meta_attributes
	provides=unfencing
  Operations:
	monitor: fence-scsi-monitor-interval-60s
  	interval=60s

RESULT: SCSI device was added to the stonith configuration and resources were not restarted.

Comment 22 errata-xmlrpc 2023-11-14 15:22:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6903


Note You need to log in before you can comment on or make changes to this bug.