Bug 2180704

Summary: Need a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources [rhel-9.2.0.z]
Product: Red Hat Enterprise Linux 9 Reporter: RHEL Program Management Team <pgm-rhel-tools>
Component: pcsAssignee: Miroslav Lisik <mlisik>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 9.1CC: cfeist, cluster-maint, idevat, jwboyer, kgaillot, mlisik, mmazoure, mpospisi, nhostako, omular, sbradley, tojeline
Target Milestone: rcKeywords: Regression, Triaged, ZStream
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: pcs-0.11.4-7.el9_2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2177996 Environment:
Last Closed: 2023-05-09 11:35:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2177996    
Bug Blocks:    

Comment 1 Michal Pospisil 2023-03-31 07:25:13 UTC
Thanks for the help with testing the scratch build from mlisik

DevTestResults:

[root@r90-node-01 ~]# rpm -q pcs pacemaker
pcs-0.11.4-7.el9.x86_64
pacemaker-2.1.5-7.el9.x86_64

[root@r90-node-01 ~]# export disk1=/dev/disk/by-id/scsi-36001405ab8c8a45d1794808a8872f1c2
[root@r90-node-01 ~]# export disk2=/dev/disk/by-id/scsi-36001405c1cf9f31e16e49b6942bf60c7
[root@r90-node-01 ~]# export NODELIST=(r90-node-01 r90-node-02)
[root@r90-node-01 ~]# pcs host auth -u hacluster -p password ${NODELIST[*]}
r90-node-01: Authorized
r90-node-02: Authorized
[root@r90-node-01 ~]# pcs cluster setup HACluster ${NODELIST[*]} --start --wait
No addresses specified for host 'r90-node-01', using 'r90-node-01'
No addresses specified for host 'r90-node-02', using 'r90-node-02'
Destroying cluster on hosts: 'r90-node-01', 'r90-node-02'...
r90-node-01: Successfully destroyed cluster
r90-node-02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'r90-node-01', 'r90-node-02'
r90-node-01: successful removal of the file 'pcsd settings'
r90-node-02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'r90-node-01', 'r90-node-02'
r90-node-01: successful distribution of the file 'corosync authkey'
r90-node-01: successful distribution of the file 'pacemaker authkey'
r90-node-02: successful distribution of the file 'corosync authkey'
r90-node-02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'r90-node-01', 'r90-node-02'
r90-node-01: successful distribution of the file 'corosync.conf'
r90-node-02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'r90-node-01', 'r90-node-02'...
Waiting for node(s) to start: 'r90-node-01', 'r90-node-02'...
r90-node-02: Cluster started
r90-node-01: Cluster started
[root@r90-node-01 ~]# pcs stonith create fence-scsi fence_scsi devices=$disk1 pcmk_host_check=static-list pcmk_host_list="${NODELIST[*]}" pcmk_reboot_action=off meta provides=unfencing
[root@r90-node-01 ~]# for i in $(seq 1 ${#NODELIST[@]}); do pcs resource create "d$i" ocf:pacemaker:Dummy; done
[root@r90-node-01 ~]# pcs resource
  * d1  (ocf:pacemaker:Dummy):   Started r90-node-02
  * d2  (ocf:pacemaker:Dummy):   Started r90-node-01
[root@r90-node-01 ~]# pcs stonith
  * fence-scsi  (stonith:fence_scsi):	Started r90-node-01
root@r90-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: fence-scsi-instance_attributes
	devices=/dev/disk/by-id/scsi-36001405ab8c8a45d1794808a8872f1c2
	pcmk_host_check=static-list
	pcmk_host_list="r90-node-01 r90-node-02"
	pcmk_reboot_action=off
  Meta Attributes: fence-scsi-meta_attributes
	provides=unfencing
  Operations:
	monitor: fence-scsi-monitor-interval-60s
  	interval=60s


[root@r90-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o1.txt
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_start_0 (node=r90-node-01, call=6, rc=0, last-rc-change='Wed Mar 29 16:59:45 2023', exec=65ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_60000 (node=r90-node-01, call=7, rc=0, last-rc-change='Wed Mar 29 16:59:45 2023', exec=65ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_0 (node=r90-node-02, call=5, rc=7, last-rc-change='Wed Mar 29 16:59:45 2023', exec=1ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_monitor_0 (node=r90-node-01, call=11, rc=7, last-rc-change='Wed Mar 29 17:00:36 2023', exec=11ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_start_0 (node=r90-node-02, call=10, rc=0, last-rc-change='Wed Mar 29 17:00:36 2023', exec=14ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_monitor_10000 (node=r90-node-02, call=11, rc=0, last-rc-change='Wed Mar 29 17:00:36 2023', exec=11ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_start_0 (node=r90-node-01, call=16, rc=0, last-rc-change='Wed Mar 29 17:00:37 2023', exec=13ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_monitor_10000 (node=r90-node-01, call=17, rc=0, last-rc-change='Wed Mar 29 17:00:37 2023', exec=11ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_monitor_0 (node=r90-node-02, call=15, rc=7, last-rc-change='Wed Mar 29 17:00:36 2023', exec=15ms): complete

[root@r90-node-01 ~]# pcs stonith update-scsi-devices fence-scsi add $disk2

[root@r90-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o2.txt
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_start_0 (node=r90-node-01, call=6, rc=0, last-rc-change='Wed Mar 29 16:59:45 2023', exec=65ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_60000 (node=r90-node-01, call=7, rc=0, last-rc-change='Wed Mar 29 16:59:45 2023', exec=65ms): complete
fence-scsi  	(stonith:fence_scsi):	Started: fence-scsi_monitor_0 (node=r90-node-02, call=5, rc=7, last-rc-change='Wed Mar 29 16:59:45 2023', exec=1ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_monitor_0 (node=r90-node-01, call=11, rc=7, last-rc-change='Wed Mar 29 17:00:36 2023', exec=11ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_start_0 (node=r90-node-02, call=10, rc=0, last-rc-change='Wed Mar 29 17:00:36 2023', exec=14ms): complete
d1  	(ocf:pacemaker:Dummy):   Started: d1_monitor_10000 (node=r90-node-02, call=11, rc=0, last-rc-change='Wed Mar 29 17:00:36 2023', exec=11ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_start_0 (node=r90-node-01, call=16, rc=0, last-rc-change='Wed Mar 29 17:00:37 2023', exec=13ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_monitor_10000 (node=r90-node-01, call=17, rc=0, last-rc-change='Wed Mar 29 17:00:37 2023', exec=11ms): complete
d2  	(ocf:pacemaker:Dummy):   Started: d2_monitor_0 (node=r90-node-02, call=15, rc=7, last-rc-change='Wed Mar 29 17:00:36 2023', exec=15ms): complete
[root@r90-node-01 ~]# diff -u o1.txt o2.txt
[root@r90-node-01 ~]# echo $?
0

[root@r90-node-01 ~]# journalctl -n0 -f
Mar 29 17:03:54 r90-node-01 pacemaker-controld[114031]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
Mar 29 17:03:54 r90-node-01 pacemaker-fenced[114027]:  notice: Added 'fence-scsi' to device list (1 active device)
Mar 29 17:03:54 r90-node-01 pacemaker-schedulerd[114030]:  notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-2467.bz2
Mar 29 17:03:54 r90-node-01 pacemaker-controld[114031]:  notice: Transition 5 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2467.bz2): Complete
Mar 29 17:03:54 r90-node-01 pacemaker-controld[114031]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

[root@r90-node-02 ~]# journalctl -n0 -f
Mar 29 17:03:54 r90-node-02 pacemaker-fenced[67529]:  notice: Added 'fence-scsi' to device list (1 active device)

Comment 13 errata-xmlrpc 2023-05-09 11:35:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: pcs security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2652