Bug 2180706

Summary: Need a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources [rhel-8.8.0.z]
Product: Red Hat Enterprise Linux 8 Reporter: RHEL Program Management Team <pgm-rhel-tools>
Component: pcsAssignee: Miroslav Lisik <mlisik>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.7CC: cfeist, cluster-maint, cluster-qe, idevat, kgaillot, lmiksik, mlisik, mmazoure, mpospisi, nhostako, omular, sbradley, tojeline, wilson.hua
Target Milestone: rcKeywords: Regression, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: pcs-0.10.15-4.el8_8.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2179010 Environment:
Last Closed: 2023-05-16 09:59:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2179010    
Bug Blocks:    

Comment 1 wilson.hua 2023-03-31 06:18:08 UTC
In the KB 4526971, I found below statement:

Red Hat Enterprise Linux 9
The issue (bugzilla bug: 2024522) has been resolved with the errata RHSA-2022:7935 with the following package(s): pcs-snmp-0.11.3-4.el9, pcs-0.11.3-4.el9 or later.

But I have checked my environment, the pcs version is later that above version:

[root@E2E-L4-236148 ~]# rpm -qa | grep pcs
pcs-0.11.4-6.el9.x86_64

But we still see resource restart when updating stonith list, is this issue really fixed?

Comment 2 wilson.hua 2023-03-31 06:24:25 UTC
Please ignore above reply, did not see the update in parent bug when I update the comment, know that this issue is being working on

Comment 3 Michal Pospisil 2023-03-31 06:34:59 UTC
Thanks for the help with testing the scratch build from mlisik

DevTestResults:

[root@r8-node-01 ~]# rpm -q pcs pacemaker
pcs-0.10.15-4.el8_8.1.x86_64
pacemaker-2.1.5-8.el8.x86_64

[root@r8-node-01 ~]# export disk1=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d
[root@r8-node-01 ~]# export disk2=/dev/disk/by-id/scsi-36001405fb15e3edf2994db380037abac
[root@r8-node-01 ~]# export NODELIST=(r8-node-01 r8-node-02)

[root@r8-node-01 ~]# pcs cluster setup HACluster ${NODELIST[*]} --start --wait
No addresses specified for host 'r8-node-01', using 'r8-node-01'
No addresses specified for host 'r8-node-02', using 'r8-node-02'
Destroying cluster on hosts: 'r8-node-01', 'r8-node-02'...
r8-node-01: Successfully destroyed cluster
r8-node-02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'r8-node-01', 'r8-node-02'
r8-node-01: successful removal of the file 'pcsd settings'
r8-node-02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'r8-node-01', 'r8-node-02'
r8-node-01: successful distribution of the file 'corosync authkey'
r8-node-01: successful distribution of the file 'pacemaker authkey'
r8-node-02: successful distribution of the file 'corosync authkey'
r8-node-02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'r8-node-01', 'r8-node-02'
r8-node-01: successful distribution of the file 'corosync.conf'
r8-node-02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'r8-node-01', 'r8-node-02'...
Waiting for node(s) to start: 'r8-node-01', 'r8-node-02'...
r8-node-01: Cluster started
r8-node-02: Cluster started
[root@r8-node-01 ~]# pcs stonith create fence-scsi fence_scsi devices=$disk1 pcmk_host_check=static-list pcmk_host_list="${NODELIST[*]}" pcmk_reboot_action=off meta provides=unfencing
[root@r8-node-01 ~]# for i in $(seq 1 ${#NODELIST[@]}); do pcs resource create "d$i" ocf:pacemaker:Dummy; done
[root@r8-node-01 ~]# pcs resource
  * d1  (ocf::pacemaker:Dummy):  Started r8-node-02
  * d2  (ocf::pacemaker:Dummy):  Started r8-node-01
[root@r8-node-01 ~]# pcs stonith
  * fence-scsi  (stonith:fence_scsi):	Started r8-node-01
[root@r8-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: fence-scsi-instance_attributes
	devices=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d
	pcmk_host_check=static-list
	pcmk_host_list="r8-node-01 r8-node-02"
	pcmk_reboot_action=off
  Meta Attributes: fence-scsi-meta_attributes
	provides=unfencing
  Operations:
	monitor: fence-scsi-monitor-interval-60s
  	interval=60s

[root@r8-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o1.txt
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_start_0 (node=r8-node-01, call=6, rc=0, last-rc-change='Thu Mar 30 17:52:18 2023', exec=84ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_60000 (node=r8-node-01, call=7, rc=0, last-rc-change='Thu Mar 30 17:52:18 2023', exec=86ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_0 (node=r8-node-02, call=5, rc=7, last-rc-change='Thu Mar 30 17:52:18 2023', exec=1ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_monitor_0 (node=r8-node-01, call=11, rc=7, last-rc-change='Thu Mar 30 17:52:23 2023', exec=18ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_start_0 (node=r8-node-02, call=10, rc=0, last-rc-change='Thu Mar 30 17:52:23 2023', exec=16ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_monitor_10000 (node=r8-node-02, call=11, rc=0, last-rc-change='Thu Mar 30 17:52:23 2023', exec=13ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_start_0 (node=r8-node-01, call=16, rc=0, last-rc-change='Thu Mar 30 17:52:25 2023', exec=16ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_monitor_10000 (node=r8-node-01, call=17, rc=0, last-rc-change='Thu Mar 30 17:52:25 2023', exec=13ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_monitor_0 (node=r8-node-02, call=15, rc=7, last-rc-change='Thu Mar 30 17:52:25 2023', exec=16ms): complete

[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi add $disk2

[root@r8-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o2.txt
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_start_0 (node=r8-node-01, call=6, rc=0, last-rc-change='Thu Mar 30 17:52:18 2023', exec=84ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_60000 (node=r8-node-01, call=7, rc=0, last-rc-change='Thu Mar 30 17:52:18 2023', exec=86ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_0 (node=r8-node-02, call=5, rc=7, last-rc-change='Thu Mar 30 17:52:18 2023', exec=1ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_monitor_0 (node=r8-node-01, call=11, rc=7, last-rc-change='Thu Mar 30 17:52:23 2023', exec=18ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_start_0 (node=r8-node-02, call=10, rc=0, last-rc-change='Thu Mar 30 17:52:23 2023', exec=16ms): complete
d1  	(ocf::pacemaker:Dummy):  Started: d1_monitor_10000 (node=r8-node-02, call=11, rc=0, last-rc-change='Thu Mar 30 17:52:23 2023', exec=13ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_start_0 (node=r8-node-01, call=16, rc=0, last-rc-change='Thu Mar 30 17:52:25 2023', exec=16ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_monitor_10000 (node=r8-node-01, call=17, rc=0, last-rc-change='Thu Mar 30 17:52:25 2023', exec=13ms): complete
d2  	(ocf::pacemaker:Dummy):  Started: d2_monitor_0 (node=r8-node-02, call=15, rc=7, last-rc-change='Thu Mar 30 17:52:25 2023', exec=16ms): complete

[root@r8-node-01 ~]# diff -u o1.txt o2.txt
[root@r8-node-01 ~]# echo $?
0

[root@r8-node-01 ~]# journalctl -n0 -f
-- Logs begin at Mon 2023-03-27 17:05:22 CEST. --
Mar 30 17:55:26 r8-node-01 pacemaker-controld[151490]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
Mar 30 17:55:26 r8-node-01 pacemaker-fenced[151486]:  notice: Added 'fence-scsi' to device list (1 active device)
Mar 30 17:55:26 r8-node-01 pacemaker-schedulerd[151489]:  notice: Calculated transition 7, saving inputs in /var/lib/pacemaker/pengine/pe-input-2961.bz2
Mar 30 17:55:26 r8-node-01 pacemaker-controld[151490]:  notice: Transition 7 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2961.bz2): Complete
Mar 30 17:55:26 r8-node-01 pacemaker-controld[151490]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

[root@r8-node-02 ~]# journalctl -n0 -f
-- Logs begin at Mon 2023-03-27 17:05:23 CEST. --
Mar 30 17:55:26 r8-node-02 pacemaker-fenced[139018]:  notice: Added 'fence-scsi' to device list (1 active device)

Comment 13 errata-xmlrpc 2023-05-16 09:59:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: pcs security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3082