1759995 – [RFE] Need ability to add/remove storage devices with scsi fencing

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1759995 - [RFE] Need ability to add/remove storage devices with scsi fencing

Summary: [RFE] Need ability to add/remove storage devices with scsi fencing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	8.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	8.5
Assignee:	Miroslav Lisik
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-10-09 15:15 UTC by Chris Feist
Modified:	2021-11-09 19:04 UTC (History)
CC List:	13 users (show)
Fixed In Version:	pcs-0.10.8-4.el8
Doc Type:	Enhancement
Doc Text:	Feature: Provide a way to update scsi fencing device(s) in cluster without need to restart of cluster nodes. Reason: After adding new scsi fencing devices to a cluster configuration, cluster nodes needed to be restarted in order to unfence newly added scsi devices. Result: New command 'pcs stonith update-scsi-devices' updates scsi devices in running cluster and unfence them on each cluster node.
Clone Of:
Environment:
Last Closed:	2021-11-09 17:33:12 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
proposed fix + tests (114.38 KB, patch) 2021-07-08 14:12 UTC, Miroslav Lisik	no flags	Details \| Diff
additional fixes + tests (63.73 KB, patch) 2021-07-21 11:17 UTC, Miroslav Lisik	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	4526971	0	None	None	None	2019-10-24 03:05:15 UTC
Red Hat Product Errata	RHSA-2021:4142	0	None	None	None	2021-11-09 17:33:37 UTC

Internal Links: 1991654 1992668

Description Chris Feist 2019-10-09 15:15:02 UTC

Currently there is no easy way to add (or remove) devices in a running cluster using scsi fencing.  It requires a cluster stop/start to recognize the new devices.

Requesting the ability to add devices on a running cluster.

Comment 2 Tomas Jelinek 2020-01-13 12:43:31 UTC

Chris,

Can you provide more details about what needs to be done in pcs? Based on the original description I cannot remember what the issue is and how you propose it to be resolved.

Thanks.

Comment 16 Miroslav Lisik 2021-07-08 14:12:00 UTC

Created attachment 1799688 [details]
proposed fix + tests

Added command:
pcs stonith update-scsi-devices

Test:

# pcs stonith update-scsi-devices scsi-fencing set <device-path1> <device-path2>

Scsi devices should be unfenced and updated and no resources should be restarted.

Comment 17 Miroslav Lisik 2021-07-09 07:17:25 UTC

Test:

[root@r8-node-01 ~]# rpm -q pcs
pcs-0.10.8-3.el8.x86_64

export disk1=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b
export disk2=/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe
export disk3=/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4
export disk4=/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x23, there are NO registered reservation keys
  PR generation=0xf, there are NO registered reservation keys
  PR generation=0x0, there are NO registered reservation keys
  PR generation=0x0, there are NO registered reservation keys
[root@r8-node-01 ~]# pcs stonith create scsi-fencing fence_scsi devices="$disk1" pcmk_host_check="static-list" pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action="off" meta provides="unfencing"
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x25, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0xf, there are NO registered reservation keys
  PR generation=0x0, there are NO registered reservation keys
  PR generation=0x0, there are NO registered reservation keys
[root@r8-node-01 ~]# for i in $(seq -w 01 04); do pcs resource create d-$i ocf:pacemaker:Dummy; done
[root@r8-node-01 ~]# pcs stonith
  * scsi-fencing        (stonith:fence_scsi):    Started r8-node-01
[root@r8-node-01 ~]# pcs resource
  * d-01        (ocf::pacemaker:Dummy):  Started r8-node-02
  * d-02        (ocf::pacemaker:Dummy):  Started r8-node-01
  * d-03        (ocf::pacemaker:Dummy):  Started r8-node-02
  * d-04        (ocf::pacemaker:Dummy):  Started r8-node-01
[root@r8-node-01 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)


### Updating scsi devices (adding 3 new devices):

[root@r8-node-01 ~]# pcs stonith update-scsi-devices scsi-fencing set $disk1 $disk2 $disk3 $disk4

Result: scsi devices have been updated, no resources have been restarted and devices are unfenced, there are keys frome each node on the devices.

[root@r8-node-01 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe,/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063 pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)


[root@r8-node-01 ~]# tail -n 0 -f /var/log/messages
Jul  8 15:22:19 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul  8 15:22:20 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 8, saving inputs in /var/lib/pacemaker/pengine/pe-input-163.bz2
Jul  8 15:22:20 r8-node-01 pacemaker-controld[624029]: notice: Transition 8 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-163.bz2): Complete
Jul  8 15:22:20 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Jul  8 15:22:39 r8-node-01 kernel: sdd: sdd1
Jul  8 15:22:39 r8-node-01 kernel: sda: sda1
Jul  8 15:22:39 r8-node-01 kernel: sdd: sdd1
Jul  8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul  8 15:22:40 r8-node-01 pacemaker-fenced[624025]: notice: Added 'scsi-fencing' to device list (1 active device)
Jul  8 15:22:40 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 9, saving inputs in /var/lib/pacemaker/pengine/pe-input-164.bz2
Jul  8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-164.bz2): Complete
Jul  8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE

[root@r8-node-02 ~]# tail -n 0 -f /var/log/messages
Jul  8 15:22:39 r8-node-02 kernel: sdd: sdd1
Jul  8 15:22:39 r8-node-02 kernel: sd 6:0:0:0: Parameters changed
Jul  8 15:22:39 r8-node-02 kernel: sda: sda1
Jul  8 15:22:39 r8-node-02 kernel: sdd: sdd1
Jul  8 15:22:40 r8-node-02 pacemaker-fenced[577901]: notice: Added 'scsi-fencing' to device list (1 active device)

[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x26, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x11, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x2, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x2, 2 registered reservation keys follow:
    0x14080000
    0x14080001

### Updating scsi devices (removing 2 devices):


[root@r8-node-01 ~]# pcs stonith update-scsi-devices scsi-fencing set $disk3 $disk4

Result: scsi devices have been updated, no resources have been restarted and devices are unfenced, there are keys frome each node on the devices.
NOTE: command does not undo unfencing which is previous cluster behavior

[root@r8-node-01 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4,/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063 pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)

[root@r8-node-01 ~]# tail -n 0 -f /var/log/messages
Jul  8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul  8 15:34:48 r8-node-01 pacemaker-fenced[624025]: notice: Added 'scsi-fencing' to device list (1 active device)
Jul  8 15:34:48 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 10, saving inputs in /var/lib/pacemaker/pengine/pe-input-165.bz2
Jul  8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: Transition 10 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-165.bz2): Complete
Jul  8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE

[root@r8-node-02 ~]# tail -n 0 -f /var/log/messages
Jul  8 15:34:48 r8-node-02 pacemaker-fenced[577901]: notice: Added 'scsi-fencing' to device list (1 active device)


[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x26, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x11, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x3, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x3, 2 registered reservation keys follow:
    0x14080000
    0x14080001

### Fence node r8-node-02:

[root@r8-node-01 ~]# pcs stonith fence r8-node-02
Node: r8-node-02 fenced
[root@r8-node-01 ~]# for d in $disk{3,4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x4, 1 registered reservation key follows:
    0x14080000
  PR generation=0x4, 1 registered reservation key follows:
    0x14080000

Comment 22 Miroslav Lisik 2021-07-21 11:17:09 UTC

Created attachment 1804053 [details]
additional fixes + tests

Fixes for 3 special cases.

Modified command:
* pcs stonith update-scsi-devices


Test:

Environment:
A 3-node cluster with configured fence_scsi fencing and resources running on each node.

A)
1. Stop cluster on one of cluster nodes
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices
3. Command is successful

B)
1. Shutdown a cluster node or stop pcsd
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices without and with option --skip-offline
3. Command is successful using --skip-offline node

C)
1. on one cluster node fail shared device that is going to be added
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices
3. Command failed and devices are not updated

Comment 23 Miroslav Lisik 2021-07-21 11:29:16 UTC

DevTestResults:

[root@r8-node-01 pcs]# rpm -q pcs
pcs-0.10.8-4.el8.x86_64

A)

[root@r8-node-01 pcs]# pcs status pcsd
  r8-node-03: Online
  r8-node-02: Online
  r8-node-01: Online
[root@r8-node-01 pcs]# pcs status nodes corosync
Corosync Nodes:
 Online: r8-node-01 r8-node-02
 Offline: r8-node-03
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)


[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2
[root@r8-node-01 ~]# echo $?
0
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

B)

[root@r8-node-01 pcs]# pcs status pcsd
  r8-node-03: Offline
  r8-node-02: Online
  r8-node-01: Online
[root@r8-node-01 pcs]# pcs status nodes corosync
Corosync Nodes:
 Online: r8-node-01 r8-node-02
 Offline: r8-node-03
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2
[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2
Error: Unable to connect to r8-node-03 (Failed to connect to r8-node-03 port 2224: Connection refused), use --skip-offline to override
Error: Errors have occurred, therefore pcs is unable to continue
[root@r8-node-01 ~]# echo $?
1
[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 --skip-offline
Warning: Unable to connect to r8-node-03 (Failed to connect to r8-node-03 port 2224: Connection refused)
[root@r8-node-01 ~]# echo $?
0
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

C)

[root@r8-node-01 pcs]# pcs status pcsd
  r8-node-03: Online
  r8-node-02: Online
  r8-node-01: Online
[root@r8-node-01 pcs]# pcs status nodes corosync
Corosync Nodes:
 Online: r8-node-01 r8-node-02 r8-node-03
 Offline:
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)


### Failing $disk2 on the r8-node-3:
[root@r8-node-03 ~]# echo offline > /sys/block/sda/device/state


[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2
Error: r8-node-03: Unfencing failed:
2021-07-20 19:40:02,530 ERROR: Cannot get registration keys

2021-07-20 19:40:02,531 ERROR: Please use '-h' for usage
Error: Errors have occurred, therefore pcs is unable to continue
[root@r8-node-01 ~]# echo $?
1
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

Comment 24 Michal Mazourek 2021-08-09 16:28:57 UTC

AFTER: (first part of verification)
======

[root@virt-015 ~]# rpm -q pcs
pcs-0.10.8-4.el8.x86_64


1. Possibility to update storage devices with working unfencing
----------------------------------------------------------------

## finding usable shared disks

[root@virt-015 ~]# ls -lr /dev/disk/by-id/ | grep -m 3 "sda\|sdb\|sdc" 
lrwxrwxrwx. 1 root root  9 Jul 27 09:47 wwn-0x6001405f29b5a9c236b40b594bd7d1d9 -> ../../sdc
lrwxrwxrwx. 1 root root  9 Jul 27 09:47 wwn-0x6001405e0796ba98d9541a284013d803 -> ../../sda
lrwxrwxrwx. 1 root root  9 Jul 27 09:47 wwn-0x60014059ae0b018a58042f999af8dba2 -> ../../sdb


## checking reservation keys on the disks

# sda
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803
  PR generation=0x0, there are NO registered reservation keys

# sdb
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2
  PR generation=0x0, there are NO registered reservation keys

# sdc
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9
  PR generation=0x0, there are NO registered reservation keys

> no reservation keys on the disks


## creating fence_scsi fence agent with one device (sda)

[root@virt-015 ~]# pcs stonith create scsi-fencing fence_scsi devices="/dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803" pcmk_host_check="static-list" pcmk_host_list="virt-015 virt-016" pcmk_reboot_action="off" meta provides="unfencing"
[root@virt-015 ~]# echo $?
0
[root@virt-015 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)

> OK


## checking system log

Jul 27 10:58:25 virt-015 pacemaker-fenced[50435]: notice: Added 'scsi-fencing' to device list (1 active device)
Jul 27 10:58:25 virt-015 systemd[1]: Starting Check PMIE instances are running...
Jul 27 10:58:25 virt-015 pacemaker-fenced[50435]: notice: scsi-fencing is eligible to fence (on) virt-015: static-list
Jul 27 10:58:25 virt-015 pacemaker-fenced[50435]: notice: scsi-fencing is eligible to fence (on) virt-015: static-list
Jul 27 10:58:26 virt-015 pacemaker-fenced[50435]: notice: Operation 'on' [62961] (call 23 from pacemaker-controld.50454) targeting virt-015 using scsi-fencing returned 0 (OK)
Jul 27 10:58:26 virt-015 pacemaker-fenced[50435]: notice: Operation 'on' targeting virt-015 by virt-015 for pacemaker-controld.50454@virt-016: OK
Jul 27 10:58:26 virt-015 pacemaker-controld[50439]: notice: virt-015 was successfully unfenced by virt-015 (at the request of virt-016)
Jul 27 10:58:26 virt-015 pacemaker-fenced[50435]: notice: Operation 'on' targeting virt-016 by virt-016 for pacemaker-controld.50454@virt-016: OK
Jul 27 10:58:26 virt-015 pacemaker-controld[50439]: notice: virt-016 was successfully unfenced by virt-016 (at the request of virt-016)

> OK: fence scsi was successfully added with one device. Nodes were unfenced as seen in the log.


## checking the reservation keys (unfencing)

# sda 
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803
  PR generation=0x2, 2 registered reservation keys follow:
    0xb8fb0000
    0xb8fb0001

> OK: 2 reservation keys (2 nodes in the cluster) are registered for sda

# sdb and sdc
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2
  PR generation=0x0, there are NO registered reservation keys
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9
  PR generation=0x0, there are NO registered reservation keys

> OK: the other two disks have no registered reservation keys


## checking help for updating the storage devices (new command update-scsi-devices)

[root@virt-015 ~]# pcs stonith update-scsi-devices

Usage: pcs stonith update-scsi-devices...
    update-scsi-devices <stonith id> set <device-path> [<device-path>...]
        Update scsi fencing devices without affecting other resources. Stonith
        resource must be running on one cluster node. Each device will be
        unfenced on each cluster node running cluster. Supported fence agents:
        fence_scsi.

> OK: Usage is present, man page is updated


## updating the storage devices (adding disks by setting all sda, sdb and sdc for scsi-fencing agent)

[root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9
[root@virt-015 ~]# echo $?
0

> OK

[root@virt-015 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2,/dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803,/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)

> OK: Update of devices in running cluster works. All three storage devices are now set for scsi-fencing agent.


## checking the reservation keys (unfencing)

[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803
  PR generation=0x3, 2 registered reservation keys follow:
    0xb8fb0000
    0xb8fb0001
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2
  PR generation=0x2, 2 registered reservation keys follow:
    0xb8fb0001
    0xb8fb0000
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9
  PR generation=0x2, 2 registered reservation keys follow:
    0xb8fb0001
    0xb8fb0000

> OK: The keys are registered for all devices


## updating the storage devices (removing disks by setting only sbd and sdc for scsi-fencing agent)

[root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9
[root@virt-015 ~]# echo $?
0

> OK

[root@virt-015 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2,/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)

> OK: Devices were reduced to sbd and sdc


## checking the reservation keys (unfencing)

[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803
  PR generation=0x3, 2 registered reservation keys follow:
    0xb8fb0000
    0xb8fb0001
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2
  PR generation=0x3, 2 registered reservation keys follow:
    0xb8fb0001
    0xb8fb0000
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9
  PR generation=0x3, 2 registered reservation keys follow:
    0xb8fb0001
    0xb8fb0000

> The keys remain on the removed disk, which is not an issue, as the fencing is not configured with the disk anymore


## omitting devices in update command

[root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set 

Usage: pcs stonith update-scsi-devices...
    update-scsi-devices <stonith id> set <device-path> [<device-path>...]
        Update scsi fencing devices without affecting other resources. Stonith
        resource must be running on one cluster node. Each device will be
        unfenced on each cluster node running cluster. Supported fence agents:
        fence_scsi.

Hint: You must specify set devices to be updated

> OK: the command is hinting to specify devices


## omitting stonith id

[root@virt-015 ~]# pcs stonith update-scsi-devices set /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9

Usage: pcs stonith update-scsi-devices...
    update-scsi-devices <stonith id> set <device-path> [<device-path>...]
        Update scsi fencing devices without affecting other resources. Stonith
        resource must be running on one cluster node. Each device will be
        unfenced on each cluster node running cluster. Supported fence agents:
        fence_scsi.

> OK


## ivalid device id

[root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/invalidId
Error: virt-016: Unfencing failed:
2021-07-27 13:29:39,937 ERROR: Failed: device "/dev/disk/by-id/invalidId" does not exist

2021-07-27 13:29:39,938 ERROR: Please use '-h' for usage
Error: virt-015: Unfencing failed:
2021-07-27 13:29:39,941 ERROR: Failed: device "/dev/disk/by-id/invalidId" does not exist

2021-07-27 13:29:39,942 ERROR: Please use '-h' for usage
Error: Unable to perform operation on any available node/host, therefore it is not possible to continue
Error: Errors have occurred, therefore pcs is unable to continue
[root@virt-015 ~]# echo $?
1
[root@virt-015 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2,/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)

> OK: invalid device id is recognized by the unfencing action and is not set


## invalid stonith id

[root@virt-015 ~]# pcs stonith update-scsi-devices invalid_scsi set /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9
Error: resource 'invalid_scsi' does not exist
Error: Errors have occurred, therefore pcs is unable to continue
[root@virt-015 ~]# echo $?
1

> OK


2. Resources are not restarted when devices are updated
--------------------------------------------------------

## Creating resources

[root@virt-015 ~]# pcs resource create r1 ocf:heartbeat:Dummy
[root@virt-015 ~]# pcs resource create r2 ocf:pacemaker:Stateful promotable
[root@virt-015 ~]# pcs resource create r3 ocf:pacemaker:Dummy clone
[root@virt-015 ~]# pcs resource create r4 ocf:pacemaker:Dummy --group g1
[root@virt-015 ~]# pcs constraint colocation add r1 with r4

[root@virt-015 ~]# pcs resource
  * Clone Set: locking-clone [locking]:
    * Started: [ virt-015 virt-016 ]
  * r1	(ocf::heartbeat:Dummy):	 Started virt-016
  * Clone Set: r2-clone [r2] (promotable):
    * Masters: [ virt-016 ]
    * Slaves: [ virt-015 ]
  * Clone Set: r3-clone [r3]:
    * Started: [ virt-015 virt-016 ]
  * Resource Group: g1:
    * r4	(ocf::pacemaker:Dummy):	 Started virt-016


## getting time of the most recent start operation for each resource

[root@virt-015 ~]# crm_resource --list-all-operations --resource r1 | grep start
r1	(ocf::heartbeat:Dummy):	 Started: r1_start_0 (node=virt-016, call=85, rc=0, last-rc-change=Tue Jul 27 14:03:38 2021, exec=9ms): complete

[root@virt-015 ~]# crm_resource --list-all-operations --resource r2 | grep start
r2	(ocf::pacemaker:Stateful):	 Master: r2_start_0 (node=virt-015, call=126, rc=0, last-rc-change=Tue Jul 27 14:05:08 2021, exec=39ms): complete

[root@virt-015 ~]# crm_resource --list-all-operations --resource r3 | grep start
r3	(ocf::pacemaker:Dummy):	 Started: r3_start_0 (node=virt-016, call=122, rc=0, last-rc-change=Tue Jul 27 14:05:20 2021, exec=13ms): complete
r3	(ocf::pacemaker:Dummy):	 Started: r3_start_0 (node=virt-015, call=135, rc=0, last-rc-change=Tue Jul 27 14:05:20 2021, exec=15ms): complete

[root@virt-015 ~]# crm_resource --list-all-operations --resource r4 | grep start
r4	(ocf::pacemaker:Dummy):	 Started: r4_start_0 (node=virt-016, call=130, rc=0, last-rc-change=Tue Jul 27 14:06:32 2021, exec=12ms): complete


## updating the fence_scsi

[root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2
[root@virt-015 ~]# echo $?
0
[root@virt-015 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)


## getting time of the most recent start operation for each resource again, to find out if some resource has restarted

[root@virt-015 ~]# crm_resource --list-all-operations --resource r1 | grep start
r1	(ocf::heartbeat:Dummy):	 Started: r1_start_0 (node=virt-016, call=85, rc=0, last-rc-change=Tue Jul 27 14:03:38 2021, exec=9ms): complete

[root@virt-015 ~]# crm_resource --list-all-operations --resource r2 | grep start
r2	(ocf::pacemaker:Stateful):	 Master: r2_start_0 (node=virt-015, call=126, rc=0, last-rc-change=Tue Jul 27 14:05:08 2021, exec=39ms): complete

[root@virt-015 ~]# crm_resource --list-all-operations --resource r3 | grep start
r3	(ocf::pacemaker:Dummy):	 Started: r3_start_0 (node=virt-016, call=122, rc=0, last-rc-change=Tue Jul 27 14:05:20 2021, exec=13ms): complete
r3	(ocf::pacemaker:Dummy):	 Started: r3_start_0 (node=virt-015, call=135, rc=0, last-rc-change=Tue Jul 27 14:05:20 2021, exec=15ms): complete

[root@virt-015 ~]# crm_resource --list-all-operations --resource r4 | grep start
r4	(ocf::pacemaker:Dummy):	 Started: r4_start_0 (node=virt-016, call=130, rc=0, last-rc-change=Tue Jul 27 14:06:32 2021, exec=12ms): complete

> OK: time of the most recent start operation for each resource stayed exactly the same, as before the update, thus the resources didn't restart.


3. Functionality and fencing
-----------------------------

## configuring a clustered lvm volume with a GFS2 filesystem (on sda), to test node's ability to write on the disk

# lvm
[root@virt-015 ~]# pvcreate /dev/sda 
  Physical volume "/dev/sda" successfully created.
[root@virt-015 ~]# vgcreate myvg /dev/sda
  Volume group "myvg" successfully created
[root@virt-015 ~]# lvcreate -n lv01 -L 500M myvg
  Logical volume "lv01" created.
[root@virt-015 ~]# lvs | grep lv01
  lv01 myvg          -wi-a----- 500.00m   

# filesystem
[root@virt-015 ~]# pcs resource config locking-clone
 Clone: locking-clone
  Meta Attrs: interleave=true
  Group: locking
   Resource: dlm (class=ocf provider=pacemaker type=controld)
    Operations: monitor interval=30s (dlm-monitor-interval-30s)
                start interval=0s timeout=90s (dlm-start-interval-0s)
                stop interval=0s timeout=100s (dlm-stop-interval-0s)
   Resource: lvmlockd (class=ocf provider=heartbeat type=lvmlockd)
    Operations: monitor interval=30s (lvmlockd-monitor-interval-30s)
                start interval=0s timeout=90s (lvmlockd-start-interval-0s)
                stop interval=0s timeout=90s (lvmlockd-stop-interval-0s)

[root@virt-015 ~]# mkfs.gfs2 -p lock_dlm -j 2 -t STSRHTS12223:samba /dev/myvg/lv01 
It appears to contain an existing filesystem (ext4)
/dev/myvg/lv01 is a symbolic link to /dev/dm-2
This will destroy any data on /dev/dm-2
Are you sure you want to proceed? [y/n] y
Discarding device contents (may take a while on large devices): Done
Adding journals: Done 
Building resource groups: Done 
Creating quota file: Done
Writing superblock and syncing: Done
Device:                    /dev/myvg/lv01
Block size:                4096
Device size:               0.49 GB (128000 blocks)
Filesystem size:           0.49 GB (127997 blocks)
Journals:                  2
Journal size:              8MB
Resource groups:           4
Locking protocol:          "lock_dlm"
Lock table:                "STSRHTS12223:samba"
UUID:                      0f66a376-28d5-4158-93f7-fb035cd482ff

[root@virt-015 ~]# pcs resource create fs ocf:heartbeat:Filesystem device="/dev/myvg/lv01" directory="/mnt" fstype="gfs2" clone
[root@virt-015 mnt]# pcs constraint order start locking-clone then fs-clone
[root@virt-015 ~]# pcs resource config fs-clone
 Clone: fs-clone
  Resource: fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/myvg/lv01 directory=/mnt fstype=gfs2
   Operations: monitor interval=20s timeout=40s (fs-monitor-interval-20s)
               start interval=0s timeout=60s (fs-start-interval-0s)
               stop interval=0s timeout=60s (fs-stop-interval-0s)

# on the first node
[root@virt-015 ~]# mount | grep /mnt
/dev/mapper/myvg-lv01 on /mnt type gfs2 (rw,relatime,seclabel)

# on the second node
[root@virt-016 mnt]# mount | grep  /mnt
/dev/mapper/myvg-lv01 on /mnt type gfs2 (rw,relatime,seclabel)


## updating scsi fencing to use sda storage device

[root@virt-015 mnt]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803

[root@virt-015 mnt]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)

[root@virt-015 mnt]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803
  PR generation=0x4, 2 registered reservation keys follow:
    0xb8fb0000
    0xb8fb0001

# on the first node
[root@virt-015 mnt]# touch test1

# on the second node 
[root@virt-016 mnt]# touch test2

[root@virt-015 mnt]# ls
test1  test2

> OK: scsi fencing is ready, both nodes are unfenced and have ability to write on the disk


## Fencing one of the node and checking its ability to write on the device

[root@virt-016 mnt]# pcs stonith fence virt-016
Node: virt-016 fenced

# on the first node
[root@virt-015 mnt]# touch test3
[root@virt-015 mnt]# ls
test1  test2  test3

> OK: The healthy node is still have ability to write and read from the disk

# on the second (fenced) node
[root@virt-016 mnt]# touch test4

> OK: This action will freeze and nothing is written to the disk, same for trying to read from the disk.


## Trying to mount the disk manually, as the cluster on the fenced node is stopped, same as the fs-clone resource

[root@virt-016 myvg]# mount /dev/myvg/lv01 /mnt

> OK: This action will freeze as well. The fenced node lost ability to write/read from the updated shared storage (sda)


## checking the reservation keys on the disk

[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803
  PR generation=0x5, 1 registered reservation key follows:
    0xb8fb0000

> OK: One key was removed and only the first node's key is present


## after rebooting the fenced node, checking if it's unfenced again

[root@virt-015 /]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803
  PR generation=0x10, 2 registered reservation keys follow:
    0xb8fb0000
    0xb8fb0001

[root@virt-016 mnt]# ls
test1  test2  test3
[root@virt-016 mnt]# touch test4
[root@virt-016 mnt]# ls
test1  test2  test3  test4

> OK


## updating scsi fencing device to use sdb storage device

[root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2
[root@virt-015 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2
  PR generation=0x6, 2 registered reservation keys follow:
    0xb8fb0001
    0xb8fb0000


## checking node's ability to write on the disk

# first node     
[root@virt-015 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 bs=1M count=1 oflag=direct
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00321169 s, 326 MB/s
[root@virt-015 ~]# echo $?
0

# second node
[root@virt-016 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 bs=1M count=1 oflag=direct
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00257591 s, 407 MB/s
[root@virt-016 ~]# echo $?
0

> Both disks can directly write on the disk


## fencing one node

[root@virt-016 ~]# pcs stonith fence virt-016
Node: virt-016 fenced

[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2
  PR generation=0x7, 1 registered reservation key follows:
    0xb8fb0000

> OK: The key on the updated device of the fenced node is missing


## checking node's ability to write on the disk again

# first node
[root@virt-015 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 bs=1M count=1 oflag=direct
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0024456 s, 429 MB/s
[root@virt-015 ~]# echo $?
0

# second (fenced) node
[root@virt-016 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 bs=1M count=1 oflag=direct
dd: error writing '/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2': Invalid exchange
1+0 records in
0+0 records out
0 bytes copied, 0.00658947 s, 0.0 kB/s
[root@virt-016 ~]# echo $?
1

> OK: Fenced node can't write on the disk, that was updated for fence scsi agent


## checking other disks, that are not configured with the fence scsi

[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9
  PR generation=0x3, 2 registered reservation keys follow:
    0xb8fb0001
    0xb8fb0000
[root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803
  PR generation=0x10, 2 registered reservation keys follow:
    0xb8fb0000
    0xb8fb0001

> OK: The keys on the disks that are not configured with fence scsi are untouched

# first node
[root@virt-015 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 bs=1M count=1 oflag=direct
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00333029 s, 315 MB/s
[root@virt-015 ~]# echo $?
0

# second (fenced) node
[root@virt-016 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 bs=1M count=1 oflag=direct
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00282057 s, 372 MB/s
[root@virt-016 ~]# echo $?
0

> OK: The fenced node is still able to write on the disk, that is not in fence scsi configuration. 


## Keys are back for the configured device after reboot

[root@virt-016 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2
  PR generation=0x8, 2 registered reservation keys follow:
    0xb8fb0000
    0xb8fb0001

> OK

Comment 27 errata-xmlrpc 2021-11-09 17:33:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: pcs security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4142

Note You need to log in before you can comment on or make changes to this bug.