Bug 1759995
| Summary: | [RFE] Need ability to add/remove storage devices with scsi fencing | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Chris Feist <cfeist> | ||||||
| Component: | pcs | Assignee: | Miroslav Lisik <mlisik> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 8.2 | CC: | aperotti, aromito, cfeist, cluster-maint, idevat, mlisik, mmazoure, mpospisi, nhostako, nwahl, oalbrigt, omular, tojeline | ||||||
| Target Milestone: | rc | Keywords: | FutureFeature, Reopened, Triaged | ||||||
| Target Release: | 8.5 | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | pcs-0.10.8-4.el8 | Doc Type: | Enhancement | ||||||
| Doc Text: |
Feature:
Provide a way to update scsi fencing device(s) in cluster without need to restart of cluster nodes.
Reason:
After adding new scsi fencing devices to a cluster configuration, cluster nodes needed to be restarted in order to unfence newly added scsi devices.
Result:
New command 'pcs stonith update-scsi-devices' updates scsi devices in running cluster and unfence them on each cluster node.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-11-09 17:33:12 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Chris Feist
2019-10-09 15:15:02 UTC
Chris, Can you provide more details about what needs to be done in pcs? Based on the original description I cannot remember what the issue is and how you propose it to be resolved. Thanks. Created attachment 1799688 [details]
proposed fix + tests
Added command:
pcs stonith update-scsi-devices
Test:
# pcs stonith update-scsi-devices scsi-fencing set <device-path1> <device-path2>
Scsi devices should be unfenced and updated and no resources should be restarted.
Test:
[root@r8-node-01 ~]# rpm -q pcs
pcs-0.10.8-3.el8.x86_64
export disk1=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b
export disk2=/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe
export disk3=/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4
export disk4=/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x23, there are NO registered reservation keys
PR generation=0xf, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
[root@r8-node-01 ~]# pcs stonith create scsi-fencing fence_scsi devices="$disk1" pcmk_host_check="static-list" pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action="off" meta provides="unfencing"
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x25, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0xf, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
[root@r8-node-01 ~]# for i in $(seq -w 01 04); do pcs resource create d-$i ocf:pacemaker:Dummy; done
[root@r8-node-01 ~]# pcs stonith
* scsi-fencing (stonith:fence_scsi): Started r8-node-01
[root@r8-node-01 ~]# pcs resource
* d-01 (ocf::pacemaker:Dummy): Started r8-node-02
* d-02 (ocf::pacemaker:Dummy): Started r8-node-01
* d-03 (ocf::pacemaker:Dummy): Started r8-node-02
* d-04 (ocf::pacemaker:Dummy): Started r8-node-01
[root@r8-node-01 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
### Updating scsi devices (adding 3 new devices):
[root@r8-node-01 ~]# pcs stonith update-scsi-devices scsi-fencing set $disk1 $disk2 $disk3 $disk4
Result: scsi devices have been updated, no resources have been restarted and devices are unfenced, there are keys frome each node on the devices.
[root@r8-node-01 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe,/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063 pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
[root@r8-node-01 ~]# tail -n 0 -f /var/log/messages
Jul 8 15:22:19 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul 8 15:22:20 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 8, saving inputs in /var/lib/pacemaker/pengine/pe-input-163.bz2
Jul 8 15:22:20 r8-node-01 pacemaker-controld[624029]: notice: Transition 8 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-163.bz2): Complete
Jul 8 15:22:20 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Jul 8 15:22:39 r8-node-01 kernel: sdd: sdd1
Jul 8 15:22:39 r8-node-01 kernel: sda: sda1
Jul 8 15:22:39 r8-node-01 kernel: sdd: sdd1
Jul 8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul 8 15:22:40 r8-node-01 pacemaker-fenced[624025]: notice: Added 'scsi-fencing' to device list (1 active device)
Jul 8 15:22:40 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 9, saving inputs in /var/lib/pacemaker/pengine/pe-input-164.bz2
Jul 8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-164.bz2): Complete
Jul 8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
[root@r8-node-02 ~]# tail -n 0 -f /var/log/messages
Jul 8 15:22:39 r8-node-02 kernel: sdd: sdd1
Jul 8 15:22:39 r8-node-02 kernel: sd 6:0:0:0: Parameters changed
Jul 8 15:22:39 r8-node-02 kernel: sda: sda1
Jul 8 15:22:39 r8-node-02 kernel: sdd: sdd1
Jul 8 15:22:40 r8-node-02 pacemaker-fenced[577901]: notice: Added 'scsi-fencing' to device list (1 active device)
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x26, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x11, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x2, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x2, 2 registered reservation keys follow:
0x14080000
0x14080001
### Updating scsi devices (removing 2 devices):
[root@r8-node-01 ~]# pcs stonith update-scsi-devices scsi-fencing set $disk3 $disk4
Result: scsi devices have been updated, no resources have been restarted and devices are unfenced, there are keys frome each node on the devices.
NOTE: command does not undo unfencing which is previous cluster behavior
[root@r8-node-01 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4,/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063 pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
[root@r8-node-01 ~]# tail -n 0 -f /var/log/messages
Jul 8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul 8 15:34:48 r8-node-01 pacemaker-fenced[624025]: notice: Added 'scsi-fencing' to device list (1 active device)
Jul 8 15:34:48 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 10, saving inputs in /var/lib/pacemaker/pengine/pe-input-165.bz2
Jul 8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: Transition 10 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-165.bz2): Complete
Jul 8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
[root@r8-node-02 ~]# tail -n 0 -f /var/log/messages
Jul 8 15:34:48 r8-node-02 pacemaker-fenced[577901]: notice: Added 'scsi-fencing' to device list (1 active device)
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x26, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x11, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x3, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x3, 2 registered reservation keys follow:
0x14080000
0x14080001
### Fence node r8-node-02:
[root@r8-node-01 ~]# pcs stonith fence r8-node-02
Node: r8-node-02 fenced
[root@r8-node-01 ~]# for d in $disk{3,4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x4, 1 registered reservation key follows:
0x14080000
PR generation=0x4, 1 registered reservation key follows:
0x14080000
Created attachment 1804053 [details]
additional fixes + tests
Fixes for 3 special cases.
Modified command:
* pcs stonith update-scsi-devices
Test:
Environment:
A 3-node cluster with configured fence_scsi fencing and resources running on each node.
A)
1. Stop cluster on one of cluster nodes
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices
3. Command is successful
B)
1. Shutdown a cluster node or stop pcsd
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices without and with option --skip-offline
3. Command is successful using --skip-offline node
C)
1. on one cluster node fail shared device that is going to be added
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices
3. Command failed and devices are not updated
DevTestResults: [root@r8-node-01 pcs]# rpm -q pcs pcs-0.10.8-4.el8.x86_64 A) [root@r8-node-01 pcs]# pcs status pcsd r8-node-03: Online r8-node-02: Online r8-node-01: Online [root@r8-node-01 pcs]# pcs status nodes corosync Corosync Nodes: Online: r8-node-01 r8-node-02 Offline: r8-node-03 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 [root@r8-node-01 ~]# echo $? 0 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) B) [root@r8-node-01 pcs]# pcs status pcsd r8-node-03: Offline r8-node-02: Online r8-node-01: Online [root@r8-node-01 pcs]# pcs status nodes corosync Corosync Nodes: Online: r8-node-01 r8-node-02 Offline: r8-node-03 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 Error: Unable to connect to r8-node-03 (Failed to connect to r8-node-03 port 2224: Connection refused), use --skip-offline to override Error: Errors have occurred, therefore pcs is unable to continue [root@r8-node-01 ~]# echo $? 1 [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 --skip-offline Warning: Unable to connect to r8-node-03 (Failed to connect to r8-node-03 port 2224: Connection refused) [root@r8-node-01 ~]# echo $? 0 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) C) [root@r8-node-01 pcs]# pcs status pcsd r8-node-03: Online r8-node-02: Online r8-node-01: Online [root@r8-node-01 pcs]# pcs status nodes corosync Corosync Nodes: Online: r8-node-01 r8-node-02 r8-node-03 Offline: [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) ### Failing $disk2 on the r8-node-3: [root@r8-node-03 ~]# echo offline > /sys/block/sda/device/state [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 Error: r8-node-03: Unfencing failed: 2021-07-20 19:40:02,530 ERROR: Cannot get registration keys 2021-07-20 19:40:02,531 ERROR: Please use '-h' for usage Error: Errors have occurred, therefore pcs is unable to continue [root@r8-node-01 ~]# echo $? 1 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) AFTER: (first part of verification) ====== [root@virt-015 ~]# rpm -q pcs pcs-0.10.8-4.el8.x86_64 1. Possibility to update storage devices with working unfencing ---------------------------------------------------------------- ## finding usable shared disks [root@virt-015 ~]# ls -lr /dev/disk/by-id/ | grep -m 3 "sda\|sdb\|sdc" lrwxrwxrwx. 1 root root 9 Jul 27 09:47 wwn-0x6001405f29b5a9c236b40b594bd7d1d9 -> ../../sdc lrwxrwxrwx. 1 root root 9 Jul 27 09:47 wwn-0x6001405e0796ba98d9541a284013d803 -> ../../sda lrwxrwxrwx. 1 root root 9 Jul 27 09:47 wwn-0x60014059ae0b018a58042f999af8dba2 -> ../../sdb ## checking reservation keys on the disks # sda [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 PR generation=0x0, there are NO registered reservation keys # sdb [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 PR generation=0x0, there are NO registered reservation keys # sdc [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 PR generation=0x0, there are NO registered reservation keys > no reservation keys on the disks ## creating fence_scsi fence agent with one device (sda) [root@virt-015 ~]# pcs stonith create scsi-fencing fence_scsi devices="/dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803" pcmk_host_check="static-list" pcmk_host_list="virt-015 virt-016" pcmk_reboot_action="off" meta provides="unfencing" [root@virt-015 ~]# echo $? 0 [root@virt-015 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) > OK ## checking system log Jul 27 10:58:25 virt-015 pacemaker-fenced[50435]: notice: Added 'scsi-fencing' to device list (1 active device) Jul 27 10:58:25 virt-015 systemd[1]: Starting Check PMIE instances are running... Jul 27 10:58:25 virt-015 pacemaker-fenced[50435]: notice: scsi-fencing is eligible to fence (on) virt-015: static-list Jul 27 10:58:25 virt-015 pacemaker-fenced[50435]: notice: scsi-fencing is eligible to fence (on) virt-015: static-list Jul 27 10:58:26 virt-015 pacemaker-fenced[50435]: notice: Operation 'on' [62961] (call 23 from pacemaker-controld.50454) targeting virt-015 using scsi-fencing returned 0 (OK) Jul 27 10:58:26 virt-015 pacemaker-fenced[50435]: notice: Operation 'on' targeting virt-015 by virt-015 for pacemaker-controld.50454@virt-016: OK Jul 27 10:58:26 virt-015 pacemaker-controld[50439]: notice: virt-015 was successfully unfenced by virt-015 (at the request of virt-016) Jul 27 10:58:26 virt-015 pacemaker-fenced[50435]: notice: Operation 'on' targeting virt-016 by virt-016 for pacemaker-controld.50454@virt-016: OK Jul 27 10:58:26 virt-015 pacemaker-controld[50439]: notice: virt-016 was successfully unfenced by virt-016 (at the request of virt-016) > OK: fence scsi was successfully added with one device. Nodes were unfenced as seen in the log. ## checking the reservation keys (unfencing) # sda [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 PR generation=0x2, 2 registered reservation keys follow: 0xb8fb0000 0xb8fb0001 > OK: 2 reservation keys (2 nodes in the cluster) are registered for sda # sdb and sdc [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 PR generation=0x0, there are NO registered reservation keys [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 PR generation=0x0, there are NO registered reservation keys > OK: the other two disks have no registered reservation keys ## checking help for updating the storage devices (new command update-scsi-devices) [root@virt-015 ~]# pcs stonith update-scsi-devices Usage: pcs stonith update-scsi-devices... update-scsi-devices <stonith id> set <device-path> [<device-path>...] Update scsi fencing devices without affecting other resources. Stonith resource must be running on one cluster node. Each device will be unfenced on each cluster node running cluster. Supported fence agents: fence_scsi. > OK: Usage is present, man page is updated ## updating the storage devices (adding disks by setting all sda, sdb and sdc for scsi-fencing agent) [root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 [root@virt-015 ~]# echo $? 0 > OK [root@virt-015 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2,/dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803,/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) > OK: Update of devices in running cluster works. All three storage devices are now set for scsi-fencing agent. ## checking the reservation keys (unfencing) [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 PR generation=0x3, 2 registered reservation keys follow: 0xb8fb0000 0xb8fb0001 [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 PR generation=0x2, 2 registered reservation keys follow: 0xb8fb0001 0xb8fb0000 [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 PR generation=0x2, 2 registered reservation keys follow: 0xb8fb0001 0xb8fb0000 > OK: The keys are registered for all devices ## updating the storage devices (removing disks by setting only sbd and sdc for scsi-fencing agent) [root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 [root@virt-015 ~]# echo $? 0 > OK [root@virt-015 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2,/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) > OK: Devices were reduced to sbd and sdc ## checking the reservation keys (unfencing) [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 PR generation=0x3, 2 registered reservation keys follow: 0xb8fb0000 0xb8fb0001 [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 PR generation=0x3, 2 registered reservation keys follow: 0xb8fb0001 0xb8fb0000 [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 PR generation=0x3, 2 registered reservation keys follow: 0xb8fb0001 0xb8fb0000 > The keys remain on the removed disk, which is not an issue, as the fencing is not configured with the disk anymore ## omitting devices in update command [root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set Usage: pcs stonith update-scsi-devices... update-scsi-devices <stonith id> set <device-path> [<device-path>...] Update scsi fencing devices without affecting other resources. Stonith resource must be running on one cluster node. Each device will be unfenced on each cluster node running cluster. Supported fence agents: fence_scsi. Hint: You must specify set devices to be updated > OK: the command is hinting to specify devices ## omitting stonith id [root@virt-015 ~]# pcs stonith update-scsi-devices set /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 Usage: pcs stonith update-scsi-devices... update-scsi-devices <stonith id> set <device-path> [<device-path>...] Update scsi fencing devices without affecting other resources. Stonith resource must be running on one cluster node. Each device will be unfenced on each cluster node running cluster. Supported fence agents: fence_scsi. > OK ## ivalid device id [root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/invalidId Error: virt-016: Unfencing failed: 2021-07-27 13:29:39,937 ERROR: Failed: device "/dev/disk/by-id/invalidId" does not exist 2021-07-27 13:29:39,938 ERROR: Please use '-h' for usage Error: virt-015: Unfencing failed: 2021-07-27 13:29:39,941 ERROR: Failed: device "/dev/disk/by-id/invalidId" does not exist 2021-07-27 13:29:39,942 ERROR: Please use '-h' for usage Error: Unable to perform operation on any available node/host, therefore it is not possible to continue Error: Errors have occurred, therefore pcs is unable to continue [root@virt-015 ~]# echo $? 1 [root@virt-015 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2,/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) > OK: invalid device id is recognized by the unfencing action and is not set ## invalid stonith id [root@virt-015 ~]# pcs stonith update-scsi-devices invalid_scsi set /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 Error: resource 'invalid_scsi' does not exist Error: Errors have occurred, therefore pcs is unable to continue [root@virt-015 ~]# echo $? 1 > OK 2. Resources are not restarted when devices are updated -------------------------------------------------------- ## Creating resources [root@virt-015 ~]# pcs resource create r1 ocf:heartbeat:Dummy [root@virt-015 ~]# pcs resource create r2 ocf:pacemaker:Stateful promotable [root@virt-015 ~]# pcs resource create r3 ocf:pacemaker:Dummy clone [root@virt-015 ~]# pcs resource create r4 ocf:pacemaker:Dummy --group g1 [root@virt-015 ~]# pcs constraint colocation add r1 with r4 [root@virt-015 ~]# pcs resource * Clone Set: locking-clone [locking]: * Started: [ virt-015 virt-016 ] * r1 (ocf::heartbeat:Dummy): Started virt-016 * Clone Set: r2-clone [r2] (promotable): * Masters: [ virt-016 ] * Slaves: [ virt-015 ] * Clone Set: r3-clone [r3]: * Started: [ virt-015 virt-016 ] * Resource Group: g1: * r4 (ocf::pacemaker:Dummy): Started virt-016 ## getting time of the most recent start operation for each resource [root@virt-015 ~]# crm_resource --list-all-operations --resource r1 | grep start r1 (ocf::heartbeat:Dummy): Started: r1_start_0 (node=virt-016, call=85, rc=0, last-rc-change=Tue Jul 27 14:03:38 2021, exec=9ms): complete [root@virt-015 ~]# crm_resource --list-all-operations --resource r2 | grep start r2 (ocf::pacemaker:Stateful): Master: r2_start_0 (node=virt-015, call=126, rc=0, last-rc-change=Tue Jul 27 14:05:08 2021, exec=39ms): complete [root@virt-015 ~]# crm_resource --list-all-operations --resource r3 | grep start r3 (ocf::pacemaker:Dummy): Started: r3_start_0 (node=virt-016, call=122, rc=0, last-rc-change=Tue Jul 27 14:05:20 2021, exec=13ms): complete r3 (ocf::pacemaker:Dummy): Started: r3_start_0 (node=virt-015, call=135, rc=0, last-rc-change=Tue Jul 27 14:05:20 2021, exec=15ms): complete [root@virt-015 ~]# crm_resource --list-all-operations --resource r4 | grep start r4 (ocf::pacemaker:Dummy): Started: r4_start_0 (node=virt-016, call=130, rc=0, last-rc-change=Tue Jul 27 14:06:32 2021, exec=12ms): complete ## updating the fence_scsi [root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 [root@virt-015 ~]# echo $? 0 [root@virt-015 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) ## getting time of the most recent start operation for each resource again, to find out if some resource has restarted [root@virt-015 ~]# crm_resource --list-all-operations --resource r1 | grep start r1 (ocf::heartbeat:Dummy): Started: r1_start_0 (node=virt-016, call=85, rc=0, last-rc-change=Tue Jul 27 14:03:38 2021, exec=9ms): complete [root@virt-015 ~]# crm_resource --list-all-operations --resource r2 | grep start r2 (ocf::pacemaker:Stateful): Master: r2_start_0 (node=virt-015, call=126, rc=0, last-rc-change=Tue Jul 27 14:05:08 2021, exec=39ms): complete [root@virt-015 ~]# crm_resource --list-all-operations --resource r3 | grep start r3 (ocf::pacemaker:Dummy): Started: r3_start_0 (node=virt-016, call=122, rc=0, last-rc-change=Tue Jul 27 14:05:20 2021, exec=13ms): complete r3 (ocf::pacemaker:Dummy): Started: r3_start_0 (node=virt-015, call=135, rc=0, last-rc-change=Tue Jul 27 14:05:20 2021, exec=15ms): complete [root@virt-015 ~]# crm_resource --list-all-operations --resource r4 | grep start r4 (ocf::pacemaker:Dummy): Started: r4_start_0 (node=virt-016, call=130, rc=0, last-rc-change=Tue Jul 27 14:06:32 2021, exec=12ms): complete > OK: time of the most recent start operation for each resource stayed exactly the same, as before the update, thus the resources didn't restart. 3. Functionality and fencing ----------------------------- ## configuring a clustered lvm volume with a GFS2 filesystem (on sda), to test node's ability to write on the disk # lvm [root@virt-015 ~]# pvcreate /dev/sda Physical volume "/dev/sda" successfully created. [root@virt-015 ~]# vgcreate myvg /dev/sda Volume group "myvg" successfully created [root@virt-015 ~]# lvcreate -n lv01 -L 500M myvg Logical volume "lv01" created. [root@virt-015 ~]# lvs | grep lv01 lv01 myvg -wi-a----- 500.00m # filesystem [root@virt-015 ~]# pcs resource config locking-clone Clone: locking-clone Meta Attrs: interleave=true Group: locking Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: monitor interval=30s (dlm-monitor-interval-30s) start interval=0s timeout=90s (dlm-start-interval-0s) stop interval=0s timeout=100s (dlm-stop-interval-0s) Resource: lvmlockd (class=ocf provider=heartbeat type=lvmlockd) Operations: monitor interval=30s (lvmlockd-monitor-interval-30s) start interval=0s timeout=90s (lvmlockd-start-interval-0s) stop interval=0s timeout=90s (lvmlockd-stop-interval-0s) [root@virt-015 ~]# mkfs.gfs2 -p lock_dlm -j 2 -t STSRHTS12223:samba /dev/myvg/lv01 It appears to contain an existing filesystem (ext4) /dev/myvg/lv01 is a symbolic link to /dev/dm-2 This will destroy any data on /dev/dm-2 Are you sure you want to proceed? [y/n] y Discarding device contents (may take a while on large devices): Done Adding journals: Done Building resource groups: Done Creating quota file: Done Writing superblock and syncing: Done Device: /dev/myvg/lv01 Block size: 4096 Device size: 0.49 GB (128000 blocks) Filesystem size: 0.49 GB (127997 blocks) Journals: 2 Journal size: 8MB Resource groups: 4 Locking protocol: "lock_dlm" Lock table: "STSRHTS12223:samba" UUID: 0f66a376-28d5-4158-93f7-fb035cd482ff [root@virt-015 ~]# pcs resource create fs ocf:heartbeat:Filesystem device="/dev/myvg/lv01" directory="/mnt" fstype="gfs2" clone [root@virt-015 mnt]# pcs constraint order start locking-clone then fs-clone [root@virt-015 ~]# pcs resource config fs-clone Clone: fs-clone Resource: fs (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/myvg/lv01 directory=/mnt fstype=gfs2 Operations: monitor interval=20s timeout=40s (fs-monitor-interval-20s) start interval=0s timeout=60s (fs-start-interval-0s) stop interval=0s timeout=60s (fs-stop-interval-0s) # on the first node [root@virt-015 ~]# mount | grep /mnt /dev/mapper/myvg-lv01 on /mnt type gfs2 (rw,relatime,seclabel) # on the second node [root@virt-016 mnt]# mount | grep /mnt /dev/mapper/myvg-lv01 on /mnt type gfs2 (rw,relatime,seclabel) ## updating scsi fencing to use sda storage device [root@virt-015 mnt]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 [root@virt-015 mnt]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) [root@virt-015 mnt]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 PR generation=0x4, 2 registered reservation keys follow: 0xb8fb0000 0xb8fb0001 # on the first node [root@virt-015 mnt]# touch test1 # on the second node [root@virt-016 mnt]# touch test2 [root@virt-015 mnt]# ls test1 test2 > OK: scsi fencing is ready, both nodes are unfenced and have ability to write on the disk ## Fencing one of the node and checking its ability to write on the device [root@virt-016 mnt]# pcs stonith fence virt-016 Node: virt-016 fenced # on the first node [root@virt-015 mnt]# touch test3 [root@virt-015 mnt]# ls test1 test2 test3 > OK: The healthy node is still have ability to write and read from the disk # on the second (fenced) node [root@virt-016 mnt]# touch test4 > OK: This action will freeze and nothing is written to the disk, same for trying to read from the disk. ## Trying to mount the disk manually, as the cluster on the fenced node is stopped, same as the fs-clone resource [root@virt-016 myvg]# mount /dev/myvg/lv01 /mnt > OK: This action will freeze as well. The fenced node lost ability to write/read from the updated shared storage (sda) ## checking the reservation keys on the disk [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 PR generation=0x5, 1 registered reservation key follows: 0xb8fb0000 > OK: One key was removed and only the first node's key is present ## after rebooting the fenced node, checking if it's unfenced again [root@virt-015 /]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 PR generation=0x10, 2 registered reservation keys follow: 0xb8fb0000 0xb8fb0001 [root@virt-016 mnt]# ls test1 test2 test3 [root@virt-016 mnt]# touch test4 [root@virt-016 mnt]# ls test1 test2 test3 test4 > OK ## updating scsi fencing device to use sdb storage device [root@virt-015 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 [root@virt-015 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 pcmk_host_check=static-list pcmk_host_list="virt-015 virt-016" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 PR generation=0x6, 2 registered reservation keys follow: 0xb8fb0001 0xb8fb0000 ## checking node's ability to write on the disk # first node [root@virt-015 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 bs=1M count=1 oflag=direct 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00321169 s, 326 MB/s [root@virt-015 ~]# echo $? 0 # second node [root@virt-016 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 bs=1M count=1 oflag=direct 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00257591 s, 407 MB/s [root@virt-016 ~]# echo $? 0 > Both disks can directly write on the disk ## fencing one node [root@virt-016 ~]# pcs stonith fence virt-016 Node: virt-016 fenced [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 PR generation=0x7, 1 registered reservation key follows: 0xb8fb0000 > OK: The key on the updated device of the fenced node is missing ## checking node's ability to write on the disk again # first node [root@virt-015 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 bs=1M count=1 oflag=direct 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0024456 s, 429 MB/s [root@virt-015 ~]# echo $? 0 # second (fenced) node [root@virt-016 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 bs=1M count=1 oflag=direct dd: error writing '/dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2': Invalid exchange 1+0 records in 0+0 records out 0 bytes copied, 0.00658947 s, 0.0 kB/s [root@virt-016 ~]# echo $? 1 > OK: Fenced node can't write on the disk, that was updated for fence scsi agent ## checking other disks, that are not configured with the fence scsi [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 PR generation=0x3, 2 registered reservation keys follow: 0xb8fb0001 0xb8fb0000 [root@virt-015 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e0796ba98d9541a284013d803 PR generation=0x10, 2 registered reservation keys follow: 0xb8fb0000 0xb8fb0001 > OK: The keys on the disks that are not configured with fence scsi are untouched # first node [root@virt-015 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 bs=1M count=1 oflag=direct 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00333029 s, 315 MB/s [root@virt-015 ~]# echo $? 0 # second (fenced) node [root@virt-016 ~]# dd if=/dev/zero of=/dev/disk/by-id/wwn-0x6001405f29b5a9c236b40b594bd7d1d9 bs=1M count=1 oflag=direct 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00282057 s, 372 MB/s [root@virt-016 ~]# echo $? 0 > OK: The fenced node is still able to write on the disk, that is not in fence scsi configuration. ## Keys are back for the configured device after reboot [root@virt-016 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014059ae0b018a58042f999af8dba2 PR generation=0x8, 2 registered reservation keys follow: 0xb8fb0000 0xb8fb0001 > OK Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: pcs security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4142 |