Bug 1872378
| Summary: | [RFE] Provide a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Chris Feist <cfeist> | ||||||
| Component: | pcs | Assignee: | Miroslav Lisik <mlisik> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Severity: | urgent | Docs Contact: | Steven J. Levine <slevine> | ||||||
| Priority: | urgent | ||||||||
| Version: | 8.3 | CC: | cfeist, cluster-maint, cluster-qe, idevat, kgaillot, mlisik, mmazoure, mpospisi, nhostako, omular, sbradley, slevine, tojeline | ||||||
| Target Milestone: | rc | Keywords: | Triaged, ZStream | ||||||
| Target Release: | 8.5 | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | pcs-0.10.8-4.el8 | Doc Type: | Enhancement | ||||||
| Doc Text: |
.New `pcs` command to update SCSI fencing device without causing restart of all other resources
Updating a SCSI fencing device with the `pcs stonith update` command causes a restart of all resources running on the same node where the stonith resource was running. The new `pcs stonith update-scsi-devices` command allows you to update SCSI devices without causing a restart of other cluster resources.
|
Story Points: | --- | ||||||
| Clone Of: | 1872376 | ||||||||
| : | 2023845 2035332 (view as bug list) | Environment: | |||||||
| Last Closed: | 2021-11-09 17:33:51 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1872376 | ||||||||
| Bug Blocks: | 1894575, 2023845, 2024522, 2035332 | ||||||||
| Attachments: |
|
||||||||
|
Description
Chris Feist
2020-08-25 15:40:49 UTC
See Bug 1872376 for how I'm thinking this might work. pcs would unfence each node by directly executing the agent, and make a note of the current timestamp. Then it would call a new crm_resource option to generate a hash of the new resource parameters. Then it could make the parameter changes in the CIB, simultaneously updating the op-digest (and potentially op-secure-digest) attributes of the lrm_rsc_op for the fence device's start operation on the node where it's currently active, as well as the #node-unfenced, #digests-all, and #digests-secure node attributes for any node that has them, using the saved timestamp and generated hash. It's ugly but I can't think of anything better. FYI, the required Pacemaker support has been added upstream and should land in 8.4. Bug 1872376 has an example and the supported feature set. Created attachment 1799690 [details]
proposed fix + tests
Added command:
pcs stonith update-scsi-devices
Test:
# pcs stonith update-scsi-devices scsi-fencing set <device-path1> <device-path2>
Scsi devices should be unfenced and updated and no resources should be restarted.
Test:
[root@r8-node-01 ~]# rpm -q pcs
pcs-0.10.8-3.el8.x86_64
export disk1=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b
export disk2=/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe
export disk3=/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4
export disk4=/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x23, there are NO registered reservation keys
PR generation=0xf, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
[root@r8-node-01 ~]# pcs stonith create scsi-fencing fence_scsi devices="$disk1" pcmk_host_check="static-list" pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action="off" meta provides="unfencing"
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x25, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0xf, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
[root@r8-node-01 ~]# for i in $(seq -w 01 04); do pcs resource create d-$i ocf:pacemaker:Dummy; done
[root@r8-node-01 ~]# pcs stonith
* scsi-fencing (stonith:fence_scsi): Started r8-node-01
[root@r8-node-01 ~]# pcs resource
* d-01 (ocf::pacemaker:Dummy): Started r8-node-02
* d-02 (ocf::pacemaker:Dummy): Started r8-node-01
* d-03 (ocf::pacemaker:Dummy): Started r8-node-02
* d-04 (ocf::pacemaker:Dummy): Started r8-node-01
[root@r8-node-01 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
### Updating scsi devices (adding 3 new devices):
[root@r8-node-01 ~]# pcs stonith update-scsi-devices scsi-fencing set $disk1 $disk2 $disk3 $disk4
Result: scsi devices have been updated, no resources have been restarted and devices are unfenced, there are keys frome each node on the devices.
[root@r8-node-01 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe,/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063 pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
[root@r8-node-01 ~]# tail -n 0 -f /var/log/messages
Jul 8 15:22:19 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul 8 15:22:20 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 8, saving inputs in /var/lib/pacemaker/pengine/pe-input-163.bz2
Jul 8 15:22:20 r8-node-01 pacemaker-controld[624029]: notice: Transition 8 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-163.bz2): Complete
Jul 8 15:22:20 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Jul 8 15:22:39 r8-node-01 kernel: sdd: sdd1
Jul 8 15:22:39 r8-node-01 kernel: sda: sda1
Jul 8 15:22:39 r8-node-01 kernel: sdd: sdd1
Jul 8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul 8 15:22:40 r8-node-01 pacemaker-fenced[624025]: notice: Added 'scsi-fencing' to device list (1 active device)
Jul 8 15:22:40 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 9, saving inputs in /var/lib/pacemaker/pengine/pe-input-164.bz2
Jul 8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-164.bz2): Complete
Jul 8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
[root@r8-node-02 ~]# tail -n 0 -f /var/log/messages
Jul 8 15:22:39 r8-node-02 kernel: sdd: sdd1
Jul 8 15:22:39 r8-node-02 kernel: sd 6:0:0:0: Parameters changed
Jul 8 15:22:39 r8-node-02 kernel: sda: sda1
Jul 8 15:22:39 r8-node-02 kernel: sdd: sdd1
Jul 8 15:22:40 r8-node-02 pacemaker-fenced[577901]: notice: Added 'scsi-fencing' to device list (1 active device)
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x26, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x11, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x2, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x2, 2 registered reservation keys follow:
0x14080000
0x14080001
### Updating scsi devices (removing 2 devices):
[root@r8-node-01 ~]# pcs stonith update-scsi-devices scsi-fencing set $disk3 $disk4
Result: scsi devices have been updated, no resources have been restarted and devices are unfenced, there are keys frome each node on the devices.
NOTE: command does not undo unfencing which is previous cluster behavior
[root@r8-node-01 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4,/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063 pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
[root@r8-node-01 ~]# tail -n 0 -f /var/log/messages
Jul 8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul 8 15:34:48 r8-node-01 pacemaker-fenced[624025]: notice: Added 'scsi-fencing' to device list (1 active device)
Jul 8 15:34:48 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 10, saving inputs in /var/lib/pacemaker/pengine/pe-input-165.bz2
Jul 8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: Transition 10 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-165.bz2): Complete
Jul 8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
[root@r8-node-02 ~]# tail -n 0 -f /var/log/messages
Jul 8 15:34:48 r8-node-02 pacemaker-fenced[577901]: notice: Added 'scsi-fencing' to device list (1 active device)
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x26, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x11, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x3, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x3, 2 registered reservation keys follow:
0x14080000
0x14080001
### Fence node r8-node-02:
[root@r8-node-01 ~]# pcs stonith fence r8-node-02
Node: r8-node-02 fenced
[root@r8-node-01 ~]# for d in $disk{3,4}; do sg_persist -n -i -k -d $d ; done
PR generation=0x4, 1 registered reservation key follows:
0x14080000
PR generation=0x4, 1 registered reservation key follows:
0x14080000
Created attachment 1804075 [details]
additional fixes + tests
Fixes for 3 special cases.
Modified command:
* pcs stonith update-scsi-devices
Test:
Environment:
A 3-node cluster with configured fence_scsi fencing and resources running on each node.
A)
1. Stop cluster on one of cluster nodes
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices
3. Command is successful
B)
1. Shutdown a cluster node or stop pcsd
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices without and with option --skip-offline
3. Command is successful using --skip-offline node
C)
1. on one cluster node fail shared device that is going to be added
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices
3. Command failed and devices are not updated
DevTestResults: [root@r8-node-01 pcs]# rpm -q pcs pcs-0.10.8-4.el8.x86_64 A) [root@r8-node-01 pcs]# pcs status pcsd r8-node-03: Online r8-node-02: Online r8-node-01: Online [root@r8-node-01 pcs]# pcs status nodes corosync Corosync Nodes: Online: r8-node-01 r8-node-02 Offline: r8-node-03 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 [root@r8-node-01 ~]# echo $? 0 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) B) [root@r8-node-01 pcs]# pcs status pcsd r8-node-03: Offline r8-node-02: Online r8-node-01: Online [root@r8-node-01 pcs]# pcs status nodes corosync Corosync Nodes: Online: r8-node-01 r8-node-02 Offline: r8-node-03 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 Error: Unable to connect to r8-node-03 (Failed to connect to r8-node-03 port 2224: Connection refused), use --skip-offline to override Error: Errors have occurred, therefore pcs is unable to continue [root@r8-node-01 ~]# echo $? 1 [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 --skip-offline Warning: Unable to connect to r8-node-03 (Failed to connect to r8-node-03 port 2224: Connection refused) [root@r8-node-01 ~]# echo $? 0 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) C) [root@r8-node-01 pcs]# pcs status pcsd r8-node-03: Online r8-node-02: Online r8-node-01: Online [root@r8-node-01 pcs]# pcs status nodes corosync Corosync Nodes: Online: r8-node-01 r8-node-02 r8-node-03 Offline: [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) ### Failing $disk2 on the r8-node-3: [root@r8-node-03 ~]# echo offline > /sys/block/sda/device/state [root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 Error: r8-node-03: Unfencing failed: 2021-07-20 19:40:02,530 ERROR: Cannot get registration keys 2021-07-20 19:40:02,531 ERROR: Please use '-h' for usage Error: Errors have occurred, therefore pcs is unable to continue [root@r8-node-01 ~]# echo $? 1 [root@r8-node-01 ~]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: pcs security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4142 |