RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2177996 - Need a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources [NEEDINFO]
Summary: Need a way to add a scsi fencing device to a cluster without requiring a rest...
Keywords:
Status: CLOSED ERRATA
Alias: None
Deadline: 2023-05-29
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: pcs
Version: 9.1
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: rc
: 9.3
Assignee: Miroslav Lisik
QA Contact: cluster-qe
Steven J. Levine
URL:
Whiteboard:
Depends On:
Blocks: 2179010 2180704 2180705
TreeView+ depends on / blocked
 
Reported: 2023-03-14 07:55 UTC by wilson.hua
Modified: 2023-11-07 09:00 UTC (History)
14 users (show)

Fixed In Version: pcs-0.11.5-1.el9
Doc Type: Bug Fix
Doc Text:
.`pcs` command to update multipath SCSI devices now works correctly Due to changes in the Pacemaker CIB file, the `pcs stonith update-scsi-devices` command stopped working as designed, causing an unwanted restart of some cluster resources. With this fix, this command works correctly and updates SCSI devices without requiring a restart of other cluster resources running on the same node.
Clone Of:
: 2179010 2180704 2180705 (view as bug list)
Environment:
Last Closed: 2023-11-07 08:23:11 UTC
Type: Bug
Target Upstream Version:
Embargoed:
wilson.hua: needinfo?


Attachments (Terms of Use)
full log of the two nodes (29.54 KB, application/zip)
2023-03-14 07:55 UTC, wilson.hua
no flags Details
cib file (10.63 KB, text/plain)
2023-03-14 17:09 UTC, Miroslav Lisik
no flags Details
cib_after.xml (10.71 KB, text/plain)
2023-03-14 17:10 UTC, Miroslav Lisik
no flags Details
cib_before2.xml (10.63 KB, text/plain)
2023-03-14 17:10 UTC, Miroslav Lisik
no flags Details
cib_after2.xml (10.18 KB, text/plain)
2023-03-14 17:11 UTC, Miroslav Lisik
no flags Details
cib_before.xml (10.63 KB, text/plain)
2023-03-14 17:12 UTC, Miroslav Lisik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CLUSTERQE-6458 0 None None None 2023-03-16 15:06:42 UTC
Red Hat Issue Tracker RHELPLAN-151683 0 None None None 2023-03-14 07:57:10 UTC
Red Hat Issue Tracker RHELPLAN-151684 0 None None None 2023-03-14 07:57:13 UTC
Red Hat Knowledge Base (Solution) 4526971 0 None None None 2023-03-14 15:39:36 UTC
Red Hat Product Errata RHSA-2023:6316 0 None None None 2023-11-07 08:24:42 UTC

Description wilson.hua 2023-03-14 07:55:43 UTC
Created attachment 1950477 [details]
full log of the two nodes

Description of problem:

###Similar to Bug 2024522, only difference is that we are using fence_scsi agent to build our cluster:

"cmd": "pcs stonith create emc_fence fence_scsi \"devices=/dev/mapper/368ccf09800de023572e4a63b259687d1,/dev/mapper/368ccf098003e693c025c28dd5cdf69d9,/dev/mapper/368ccf0980076ee9888c1334880ea05d1\" \"pcmk_host_list=e2e-l4-236128,e2e-l4-236129\" pcmk_monitor_action=\"metadata\" pcmk_reboot_action=\"off\" pcmk_host_check=\"static-list\" meta provides=\"unfencing --force\"\n",

[root@e2e-l4-236129 ~]# pcs stonith config
Resource: emc_fence (class=stonith type=fence_scsi)
  Attributes: emc_fence-instance_attributes
    devices=/dev/mapper/368ccf098003e693c025c28dd5cdf69d9,/dev/mapper/368ccf0980076ee9888c1334880ea05d1,/dev/mapper/368ccf09800de023572e4a63b259687d1
    pcmk_host_check=static-list
    pcmk_host_list=e2e-l4-236128,e2e-l4-236129
    pcmk_monitor_action=metadata
    pcmk_reboot_action=off
  Meta Attributes: emc_fence-meta_attributes
    provides=unfencing
  Operations:
    monitor: emc_fence-monitor-interval-60s
      interval=60s

####Create several resources on the cluster:

[root@e2e-l4-236129 ~]# pcs status
Cluster name: RHCS
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-03-14 03:51:09 -04:00)
Cluster Summary:
  * Stack: corosync
  * Current DC: e2e-l4-236128 (version 2.1.5-7.el9-a3f44794f94) - partition with quorum
  * Last updated: Tue Mar 14 03:51:10 2023
  * Last change:  Tue Mar 14 03:27:11 2023 by root via cibadmin on e2e-l4-236128
  * 2 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ e2e-l4-236128 e2e-l4-236129 ]

Full List of Resources:
  * emc_fence   (stonith:fence_scsi):    Started e2e-l4-236128
  * Clone Set: dlm-clone [dlm]:
    * Started: [ e2e-l4-236128 e2e-l4-236129 ]
  * Clone Set: lvmlockd-clone [lvmlockd]:
    * Started: [ e2e-l4-236128 e2e-l4-236129 ]
  * Clone Set: alua_1_vg_1678695957-clone [alua_1_vg_1678695957]:
    * Started: [ e2e-l4-236128 e2e-l4-236129 ]
  * Clone Set: alua_0_vg_1678695957-clone [alua_0_vg_1678695957]:
    * Started: [ e2e-l4-236128 e2e-l4-236129 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


####And then we use update-scsi-devices to add or remove one more device to stonith devices

[root@e2e-l4-236128 ~]# pcs stonith update-scsi-devices emc_fence remove /dev/mapper/368ccf09800b6f39e1227c2c42c3fb481
[root@e2e-l4-236128 ~]# pcs stonith update-scsi-devices emc_fence add /dev/mapper/368ccf09800b6f39e1227c2c42c3fb481

####We can observe resource lock down and cluster restart  

Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice: Unfencing Pacemaker Remote node e2e-l4-236129 because the definition of emc_fence changed
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice: Unfencing Pacemaker Remote node e2e-l4-236128 because the definition of emc_fence changed
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice:  * Fence (on) e2e-l4-236128
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice:  * Fence (on) e2e-l4-236129
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice: Actions: Restart    emc_fence      ( e2e-l4-236128 )  due to required stonith
Mar 14 03:25:29 e2e-l4-236128 pacemaker-schedulerd[2594]: notice: Actions: Restart    dlm:0          ( e2e-l4-236128 )  due to required stonith

####Is there anyway we can avoid restarting fencing when we modify the stonith devices?


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Miroslav Lisik 2023-03-14 13:18:12 UTC
Ken, could you look at this?

It does not work with latest pacemaker-2.1.5-7.el9 in RHEL-9.3. The last
working version is pacemaker-2.1.4-2.el9 in RHEL-9.1. Is it possible
that something has changed in pacemaker-2.1.4-3.el9 regarding digests
calculation? (bz1872376)

Comment 2 Ken Gaillot 2023-03-14 16:21:18 UTC
(In reply to Miroslav Lisik from comment #1)
> Ken, could you look at this?
> 
> It does not work with latest pacemaker-2.1.5-7.el9 in RHEL-9.3. The last
> working version is pacemaker-2.1.4-2.el9 in RHEL-9.1. Is it possible
> that something has changed in pacemaker-2.1.4-3.el9 regarding digests
> calculation? (bz1872376)

Not that I know of, and our regression test for it passes. Can you give me the CIB and commands you're using to test?

Comment 3 Miroslav Lisik 2023-03-14 17:08:44 UTC
Here are commands:

export disk1=/dev/disk/by-id/scsi-SLIO-ORG_r91-disk-01_7ad95d75-3cf3-448e-a591-42b9ba690b22
export disk2=/dev/disk/by-id/scsi-SLIO-ORG_r91-disk-02_e9a0c17d-c631-41cf-a135-e3453ce0c501
pcs host auth -u hacluster -p password r91-1 r91-2
pcs cluster setup HACluster r91-1 r91-2 --start --wait
pcs stonith create fence-scsi fence_scsi devices=$disk1 pcmk_host_check=static-list 'pcmk_host_list=r91-1 r91-2' pcmk_reboot_action=off meta provides=unfencing
pcs resource create d1 ocf:pacemaker:Dummy
pcs resource create d2 ocf:pacemaker:Dummy
pcs cluster cib > cib_before.xml
pcs stonith update-scsi-devices fence-scsi add $disk2
pcs cluster cib > cib_after.xml

Logs after executing 'update-scsi-devices' command folllows.

With latest rhel-9.1.z versions, see cib_before.xml, cib_after.xml:

[root@r91-1 ~]# rpm -q pcs pacemaker
pcs-0.11.3-4.el9_1.2.x86_64
pacemaker-2.1.4-5.el9_1.2.x86_64

[root@r91-1 ~]# journalctl -n0 -f
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: Added 'fence-scsi' to device list (1 active device)
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of stop operation for fence-scsi on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of stop operation for fence-scsi on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of stop operation for d2 on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of stop operation for d2 on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: fence-scsi is eligible to fence (on) r91-1: static-list
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: fence-scsi is eligible to fence (on) r91-1: static-list
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: Operation 'on' [2379] targeting r91-1 using fence-scsi returned 0
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: Operation 'on' targeting r91-1 by r91-1 for pacemaker-controld.1631@r91-2: OK (complete)
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: r91-1 was unfenced by r91-1 at the request of pacemaker-controld.1631@r91-2
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #node-unfenced[r91-1]: 1678811397 -> 1678811811
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #digests-all[r91-1]: fence-scsi:fence_scsi:ec9ecba84b274fb35effbf2a47226087, -> fence-scsi:fence_scsi:41f4daa097914fe0b3f6ba8363f28cf9,
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #digests-secure[r91-1]: fence-scsi:fence_scsi:8a7469d06699bb5cdf0da9304affaf6e, -> fence-scsi:fence_scsi:2467afb1330d3c0048abdc371dea6bc3,
Mar 14 17:36:51 r91-1 pacemaker-fenced[1663]:  notice: Operation 'on' targeting r91-2 by r91-2 for pacemaker-controld.1631@r91-2: OK (complete)
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: r91-2 was unfenced by r91-2 at the request of pacemaker-controld.1631@r91-2
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #node-unfenced[r91-2]: 1678811397 -> 1678811811
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #digests-all[r91-2]: fence-scsi:fence_scsi:ec9ecba84b274fb35effbf2a47226087, -> fence-scsi:fence_scsi:41f4daa097914fe0b3f6ba8363f28cf9,
Mar 14 17:36:51 r91-1 pacemaker-attrd[1665]:  notice: Setting #digests-secure[r91-2]: fence-scsi:fence_scsi:8a7469d06699bb5cdf0da9304affaf6e, -> fence-scsi:fence_scsi:2467afb1330d3c0048abdc371dea6bc3,
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of start operation for d2 on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of start operation for d2 on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of start operation for fence-scsi on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of start operation for fence-scsi on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of monitor operation for fence-scsi on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Requesting local execution of monitor operation for d2 on r91-1
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of monitor operation for d2 on r91-1: ok
Mar 14 17:36:51 r91-1 pacemaker-controld[1667]:  notice: Result of monitor operation for fence-scsi on r91-1: ok


[root@r91-2 ~]# journalctl -n0 -f
Mar 14 17:36:38 r91-2 systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Added 'fence-scsi' to device list (1 active device)
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Unfencing r91-2 (remote): because the definition of fence-scsi changed
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Unfencing r91-1 (remote): because the definition of fence-scsi changed
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice:  * Fence (on) r91-1
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice:  * Fence (on) r91-2
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Actions: Restart    fence-scsi     ( r91-1 )  due to required stonith
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Actions: Restart    d1             ( r91-2 )  due to required stonith
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Actions: Restart    d2             ( r91-1 )  due to required stonith
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-205.bz2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating stop operation fence-scsi_stop_0 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating stop operation d1_stop_0 locally on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting local execution of stop operation for d1 on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating stop operation d2_stop_0 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Result of stop operation for d1 on r91-2: ok
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting fencing (on) of node r91-2
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Client pacemaker-controld.1631 wants to fence (on) r91-2 using any device
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Requesting peer fencing (on) targeting r91-2
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: fence-scsi is eligible to fence (on) r91-2: static-list
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Requesting that r91-2 perform 'on' action targeting r91-2
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: fence-scsi is eligible to fence (on) r91-2: static-list
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting fencing (on) of node r91-1
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Client pacemaker-controld.1631 wants to fence (on) r91-1 using any device
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Requesting peer fencing (on) targeting r91-1
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Requesting that r91-1 perform 'on' action targeting r91-1
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Operation 'on' targeting r91-1 by r91-1 for pacemaker-controld.1631@r91-2: OK (complete)
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Fence operation 10 for r91-1 passed
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: r91-1 was unfenced by r91-1 at the request of pacemaker-controld.1631@r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating start operation fence-scsi_start_0 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating start operation d2_start_0 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #node-unfenced[r91-1]: 1678811397 -> 1678811811
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #digests-all[r91-1]: fence-scsi:fence_scsi:ec9ecba84b274fb35effbf2a47226087, -> fence-scsi:fence_scsi:41f4daa097914fe0b3f6ba8363f28cf9,
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #digests-secure[r91-1]: fence-scsi:fence_scsi:8a7469d06699bb5cdf0da9304affaf6e, -> fence-scsi:fence_scsi:2467afb1330d3c0048abdc371dea6bc3,
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Transition 5 aborted by status-1-.node-unfenced doing modify #node-unfenced=1678811811: Transient attribute change
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Operation 'on' [2266] targeting r91-2 using fence-scsi returned 0
Mar 14 17:36:51 r91-2 pacemaker-fenced[1627]:  notice: Operation 'on' targeting r91-2 by r91-2 for pacemaker-controld.1631@r91-2: OK (complete)
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Fence operation 9 for r91-2 passed
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: r91-2 was unfenced by r91-2 at the request of pacemaker-controld.1631@r91-2
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #node-unfenced[r91-2]: 1678811397 -> 1678811811
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #digests-all[r91-2]: fence-scsi:fence_scsi:ec9ecba84b274fb35effbf2a47226087, -> fence-scsi:fence_scsi:41f4daa097914fe0b3f6ba8363f28cf9,
Mar 14 17:36:51 r91-2 pacemaker-attrd[1629]:  notice: Setting #digests-secure[r91-2]: fence-scsi:fence_scsi:8a7469d06699bb5cdf0da9304affaf6e, -> fence-scsi:fence_scsi:2467afb1330d3c0048abdc371dea6bc3,
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Transition 5 (Complete=7, Pending=0, Fired=0, Skipped=3, Incomplete=4, Source=/var/lib/pacemaker/pengine/pe-input-205.bz2): Stopped
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Actions: Start      d1             ( r91-2 )
Mar 14 17:36:51 r91-2 pacemaker-schedulerd[1630]:  notice: Calculated transition 6, saving inputs in /var/lib/pacemaker/pengine/pe-input-206.bz2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating monitor operation fence-scsi_monitor_60000 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating start operation d1_start_0 locally on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating monitor operation d2_monitor_10000 on r91-1
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting local execution of start operation for d1 on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Result of start operation for d1 on r91-2: ok
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Initiating monitor operation d1_monitor_10000 locally on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Requesting local execution of monitor operation for d1 on r91-2
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Result of monitor operation for d1 on r91-2: ok
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: Transition 6 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-206.bz2): Complete
Mar 14 17:36:51 r91-2 pacemaker-controld[1631]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

With the last working version, see cib_before2.xml, cib_after2.xml:

[root@r91-1 ~]# rpm -q pcs pacemaker
pcs-0.11.3-4.el9_1.2.x86_64
pacemaker-2.1.4-2.el9.x86_64

[root@r91-1 ~]# journalctl -n0 -f
Mar 14 17:53:30 r91-1 pacemaker-fenced[4320]:  notice: Added 'fence-scsi' to device list (1 active device)

[root@r91-2 ~]# journalctl -n0 -f
Mar 14 17:53:30 r91-2 pacemaker-controld[4061]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
Mar 14 17:53:30 r91-2 pacemaker-fenced[4057]:  notice: Added 'fence-scsi' to device list (1 active device)
Mar 14 17:53:30 r91-2 pacemaker-schedulerd[4060]:  notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-215.bz2
Mar 14 17:53:30 r91-2 pacemaker-controld[4061]:  notice: Transition 5 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-215.bz2): Complete
Mar 14 17:53:30 r91-2 pacemaker-controld[4061]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

Comment 4 Miroslav Lisik 2023-03-14 17:09:38 UTC
Created attachment 1950691 [details]
cib file

Comment 5 Miroslav Lisik 2023-03-14 17:10:14 UTC
Created attachment 1950692 [details]
cib_after.xml

Comment 6 Miroslav Lisik 2023-03-14 17:10:57 UTC
Created attachment 1950693 [details]
cib_before2.xml

Comment 7 Miroslav Lisik 2023-03-14 17:11:46 UTC
Created attachment 1950694 [details]
cib_after2.xml

Comment 8 Miroslav Lisik 2023-03-14 17:12:18 UTC
Created attachment 1950695 [details]
cib_before.xml

Comment 9 Ken Gaillot 2023-03-15 01:58:17 UTC
I need to investigate some more, but I think this is not a new problem, but an expansion of a case we neglected with the original bz.

We're changing the digests in the resource history, but there is a separate copy of the digest stored in the "#digests-all" and "#digest-secure" node attributes, used to detect whether a node needs to be re-unfenced after a change in stonith device definition.

Before 2.1.4-3, this only had an effect on Pacemaker Remote nodes, which is why we missed it. Since 2.1.4-3, it applies to all nodes.

That was an important fix, so I think the only way around this will be for pcs to replace those node attribute values in addition to the digests in the resource history.

The node attribute values look like: "stonith-fence_compute-fence-nova:fence_compute:ad312d85623cdb0a792e6fbd5e91a820,"

That is a comma-separated list of DEVICE_NAME:AGENT_NAME:DIGEST for each stonith device that has unfenced the node. "#digests-all" uses the all-parameter digest, and "#digest-secure" (which is only relevant to crm_simulate) uses the non-private-parameter digest.

It's possible that's not the only issue, but that's definitely part of it.

Comment 18 wilson.hua 2023-03-24 02:10:31 UTC
Dear Team,

May I know the current progress of this issue?

Thank you!

Comment 19 Tomas Jelinek 2023-03-24 09:42:53 UTC
Hello Wilson Hua,

Engineering team is currently working on the fix for this issue. Pcs packages incorporating the fix are expected to be built and passed to QA for testing during next week.

Regards,
Tomas Jelinek

Comment 20 Miroslav Lisik 2023-03-24 17:38:23 UTC
Ken:

Is there a third digest attribute which uses nonreloadable-parameter digest or there are only two of them - "#digests-all" and "#digests-secure"?

Comment 21 Ken Gaillot 2023-03-27 15:38:00 UTC
(In reply to Miroslav Lisik from comment #20)
> Ken:
> 
> Is there a third digest attribute which uses nonreloadable-parameter digest
> or there are only two of them - "#digests-all" and "#digests-secure"?

Only two, because this solely determines whether to unfence the node, not whether to restart or reload the device

Comment 22 Miroslav Lisik 2023-03-29 12:29:19 UTC
Upstream commit: https://github.com/ClusterLabs/pcs/commit/b18ba53144b7d2d5e435eab369cc1f2c0680a85f
Updated commands:
  * pcs stonith update-scsi-devices

Test:

Setup cluster with a shared storage and fence_scsi fencing.
Setup enough resources in order to have each node running some resource.
Use `pcs stonith update-scsi-devices` command to modify scsi devices of the fence_scsi stonith device.
Check that resources did not restart (journalctl, crm_rsource --list-operations)

Comment 24 wilson.hua 2023-04-26 09:07:32 UTC
Just double confirm, this issue will be fixed in RHEL 9.3, right?
When it is fixed, will there be any patch for RHEL 9.1, 9.2 and 8.8? As we know, the issue can be reproduced on those versions too

Comment 25 Tomas Jelinek 2023-04-26 09:27:37 UTC
(In reply to wilson.hua from comment #24)
> Just double confirm, this issue will be fixed in RHEL 9.3, right?
> When it is fixed, will there be any patch for RHEL 9.1, 9.2 and 8.8? As we know, the issue can be reproduced on those versions too

This issue will be fixed in 9.3, 9.2, 8.9 and 8.8. It won't be fixed in 9.1 and 8.7 as those releases approach their End of Life.

Comment 27 Michal Pospisil 2023-05-26 09:28:46 UTC
DevTestResults:

[root@r09-03-a ~]# rpm -q pcs
pcs-0.11.5-1.el9.x86_64

(pcs) [root@r09-03-a pcs]# pcs_test/suite --installed --traditional-verbose pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi
...
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_monitor_with_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_monitor_with_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_nonrecurring_start_op_with_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_nonrecurring_start_op_with_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_one_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_one_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_default_monitor (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_default_monitor ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_timeouts (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_timeouts ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_no_monitor_ops (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_no_monitor_ops ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes_multi_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_digests_with_empty_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_digests_with_empty_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_all_digest_types (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_all_digest_types ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_no_digest_for_our_stonith_id (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_no_digest_for_our_stonith_id ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_digests_attrs (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_digests_attrs ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes_multi_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_various_start_ops_one_lrm_start_op (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_various_start_ops_one_lrm_start_op ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma_multi_value ... OK

----------------------------------------------------------------------
Ran 17 tests in 0.090s

OK

[root@r90-node-01 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux release 9.3 Beta (Plow)

[root@r90-node-01 ~]# rpm -q pcs pacemaker
pcs-0.11.5-1.el9.x86_64
pacemaker-2.1.6-1.el9.x86_64

Additional testing on real cluster by mlisik:

[root@r90-node-01 ~]# export disk1=/dev/disk/by-id/scsi-36001405ab8c8a45d1794808a8872f1c2
[root@r90-node-01 ~]# export disk2=/dev/disk/by-id/scsi-36001405c1cf9f31e16e49b6942bf60c7
[root@r90-node-01 ~]# export NODELIST=(r90-node-01 r90-node-02)
[root@r90-node-01 ~]# pcs cluster destroy --all
Warning: It is recommended to run 'pcs cluster stop' before destroying the cluster.
WARNING: This would kill all cluster processes and then PERMANENTLY remove cluster state and configuration
Type 'yes' or 'y' to proceed, anything else to cancel: y
Warning: Unable to load CIB to get guest and remote nodes from it, those nodes will not be deconfigured.
r90-node-01: Stopping Cluster (pacemaker)...
r90-node-02: Stopping Cluster (pacemaker)...
r90-node-01: Successfully destroyed cluster
r90-node-02: Successfully destroyed cluster
[root@r90-node-01 ~]# pcs host auth -u hacluster -p password ${NODELIST[*]}
r90-node-01: Authorized
r90-node-02: Authorized
[root@r90-node-01 ~]# pcs cluster setup HACluster ${NODELIST[*]} --start --wait
No addresses specified for host 'r90-node-01', using 'r90-node-01'
No addresses specified for host 'r90-node-02', using 'r90-node-02'
Destroying cluster on hosts: 'r90-node-01', 'r90-node-02'...
r90-node-01: Successfully destroyed cluster
r90-node-02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'r90-node-01', 'r90-node-02'
r90-node-01: successful removal of the file 'pcsd settings'
r90-node-02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'r90-node-01', 'r90-node-02'
r90-node-01: successful distribution of the file 'corosync authkey'
r90-node-01: successful distribution of the file 'pacemaker authkey'
r90-node-02: successful distribution of the file 'corosync authkey'
r90-node-02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'r90-node-01', 'r90-node-02'
r90-node-01: successful distribution of the file 'corosync.conf'
r90-node-02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'r90-node-01', 'r90-node-02'...
Waiting for node(s) to start: 'r90-node-01', 'r90-node-02'...
r90-node-02: Cluster started
r90-node-01: Cluster started
[root@r90-node-01 ~]# pcs stonith create fence-scsi fence_scsi devices=$disk1 pcmk_host_check=static-list pcmk_host_list="${NODELIST[*]}" pcmk_reboot_action=off meta provides=unfencing
[root@r90-node-01 ~]# for i in $(seq 1 ${#NODELIST[@]}); do pcs resource create "d$i" ocf:pacemaker:Dummy; done
[root@r90-node-01 ~]# pcs resource
  * d1  (ocf:pacemaker:Dummy):   Started r90-node-02
  * d2  (ocf:pacemaker:Dummy):   Started r90-node-01
[root@r90-node-01 ~]# pcs stonith
  * fence-scsi  (stonith:fence_scsi):	Started r90-node-01
[root@r90-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: fence-scsi-instance_attributes
	devices=/dev/disk/by-id/scsi-36001405ab8c8a45d1794808a8872f1c2
	pcmk_host_check=static-list
	pcmk_host_list="r90-node-01 r90-node-02"
	pcmk_reboot_action=off
  Meta Attributes: fence-scsi-meta_attributes
	provides=unfencing
  Operations:
	monitor: fence-scsi-monitor-interval-60s
  	interval=60s


[root@r90-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o1.txt
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_start_0 (node=r90-node-01, call=6, rc=0, last-rc-change='Thu May 25 13:44:21 2023', exec=64ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_60000 (node=r90-node-01, call=7, rc=0, last-rc-change='Thu May 25 13:44:21 2023', exec=66ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_0 (node=r90-node-02, call=5, rc=7, last-rc-change='Thu May 25 13:44:21 2023', exec=3ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_monitor_0 (node=r90-node-01, call=11, rc=7, last-rc-change='Thu May 25 13:44:56 2023', exec=29ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_start_0 (node=r90-node-02, call=10, rc=0, last-rc-change='Thu May 25 13:44:56 2023', exec=16ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_monitor_10000 (node=r90-node-02, call=11, rc=0, last-rc-change='Thu May 25 13:44:56 2023', exec=13ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_start_0 (node=r90-node-01, call=16, rc=0, last-rc-change='Thu May 25 13:44:57 2023', exec=13ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_monitor_10000 (node=r90-node-01, call=17, rc=0, last-rc-change='Thu May 25 13:44:57 2023', exec=11ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_monitor_0 (node=r90-node-02, call=15, rc=7, last-rc-change='Thu May 25 13:44:57 2023', exec=17ms): complete

[root@r90-node-01 ~]# pcs stonith update-scsi-devices fence-scsi add $disk2


[root@r90-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o2.txt
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_start_0 (node=r90-node-01, call=6, rc=0, last-rc-change='Thu May 25 13:44:21 2023', exec=64ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_60000 (node=r90-node-01, call=7, rc=0, last-rc-change='Thu May 25 13:44:21 2023', exec=66ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_0 (node=r90-node-02, call=5, rc=7, last-rc-change='Thu May 25 13:44:21 2023', exec=3ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_monitor_0 (node=r90-node-01, call=11, rc=7, last-rc-change='Thu May 25 13:44:56 2023', exec=29ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_start_0 (node=r90-node-02, call=10, rc=0, last-rc-change='Thu May 25 13:44:56 2023', exec=16ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_monitor_10000 (node=r90-node-02, call=11, rc=0, last-rc-change='Thu May 25 13:44:56 2023', exec=13ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_start_0 (node=r90-node-01, call=16, rc=0, last-rc-change='Thu May 25 13:44:57 2023', exec=13ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_monitor_10000 (node=r90-node-01, call=17, rc=0, last-rc-change='Thu May 25 13:44:57 2023', exec=11ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_monitor_0 (node=r90-node-02, call=15, rc=7, last-rc-change='Thu May 25 13:44:57 2023', exec=17ms): complete

[root@r90-node-01 ~]# diff -u o1.txt o2.txt
[root@r90-node-01 ~]# echo $?
0

[root@r90-node-01 ~]# journalctl -n 0 -f
May 25 13:48:02 r90-node-01 pacemaker-controld[6150]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
May 25 13:48:02 r90-node-01 pacemaker-fenced[6146]:  notice: Added 'fence-scsi' to device list (1 active device)
May 25 13:48:02 r90-node-01 pacemaker-schedulerd[6149]:  notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-2839.bz2
May 25 13:48:02 r90-node-01 pacemaker-controld[6150]:  notice: Transition 5 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2839.bz2): Complete
May 25 13:48:02 r90-node-01 pacemaker-controld[6150]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

[root@r90-node-02 ~]# journalctl -n 0 -f
May 25 13:48:02 r90-node-02 pacemaker-fenced[100644]:  notice: Added 'fence-scsi' to device list (1 active device)

[root@r90-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: fence-scsi-instance_attributes
	devices=/dev/disk/by-id/scsi-36001405ab8c8a45d1794808a8872f1c2,/dev/disk/by-id/scsi-36001405c1cf9f31e16e49b6942bf60c7
	pcmk_host_check=static-list
	pcmk_host_list="r90-node-01 r90-node-02"
	pcmk_reboot_action=off
  Meta Attributes: fence-scsi-meta_attributes
	provides=unfencing
  Operations:
	monitor: fence-scsi-monitor-interval-60s
  	interval=60s

RESULT: SCSI device was added to the stonith configuration and resources were not restarted.

Comment 38 errata-xmlrpc 2023-11-07 08:23:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: pcs security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6316


Note You need to log in before you can comment on or make changes to this bug.