Bug 1991654
Summary: | update-scsi-devices command unfence a node without quorum | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Michal Mazourek <mmazoure> | ||||
Component: | pcs | Assignee: | Miroslav Lisik <mlisik> | ||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 8.5 | CC: | cfeist, cluster-maint, idevat, kmalyjur, lmiksik, mlisik, mpospisi, nhostako, omular, sbradley, tojeline | ||||
Target Milestone: | beta | Keywords: | Triaged | ||||
Target Release: | 8.5 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | pcs-0.10.10-4.el8 | Doc Type: | Bug Fix | ||||
Doc Text: |
The plan is to get the fix done before the bugged pcs packages are released.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 2003066 (view as bug list) | Environment: | |||||
Last Closed: | 2021-11-09 17:34:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 2003066 | ||||||
Attachments: |
|
Created attachment 1825859 [details]
proposed fix + tests
Updated command:
* pcs stonith update-scsi-devices
Test:
* setup a cluster with a fence scsi stonith resource
* setup resources running on each node
* block corosync traffic on one cleuster node and wait until node is fenced
* add scsi devices by using command `pcs stonith update-scsi-devices add` or pcs stonith update-scsi-devices set`
* see result, which should be that devices are unfenced only on nodes which are note fenced and resources are not restarted.
DevTestResults: [root@r8-node-01 ~]# rpm -q pcs pcs-0.10.10-4.el8.x86_64 Environment: Cluster with a fence_scsi stonith resource and reosurces running on each node. [root@r8-node-01 pcs]# pcs stonith config Resource: fence-scsi (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-monitor-interval-60s) [root@r8-node-01 pcs]# pcs resource * d-01 (ocf::pacemaker:Dummy): Started r8-node-02 * d-02 (ocf::pacemaker:Dummy): Started r8-node-03 * d-03 (ocf::pacemaker:Dummy): Started r8-node-01 * d-04 (ocf::pacemaker:Dummy): Started r8-node-02 * d-05 (ocf::pacemaker:Dummy): Started r8-node-03 * d-06 (ocf::pacemaker:Dummy): Started r8-node-01 [root@r8-node-01 pcs]# echo $disk{1..3} /dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b /dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe /dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4 [root@r8-node-01 pcs]# for disk in $disk{1..3}; do sg_persist -n -i -k -d $disk; done PR generation=0x8, 3 registered reservation keys follow: 0x14080000 0x14080001 0x14080002 PR generation=0x5, there are NO registered reservation keys PR generation=0x4, there are NO registered reservation keys ### Block corosync traffic: [root@r8-node-03 ~]# iptables -A INPUT ! -i lo -p udp --dport 5404 -j DROP && iptables -A INPUT ! -i lo -p udp --dport 5405 -j DROP && iptables -A OUTPUT ! -o lo -p udp --sport 5404 -j DROP && iptables -A OUTPUT ! -o lo -p udp --sport 5405 -j DROP [root@r8-node-03 ~]# pcs status nodes Pacemaker Nodes: Online: r8-node-03 Standby: Standby with resource(s) running: Maintenance: Offline: r8-node-01 r8-node-02 Pacemaker Remote Nodes: Online: Standby: Standby with resource(s) running: Maintenance: Offline: [root@r8-node-01 pcs]# for disk in $disk{1..3}; do sg_persist -n -i -k -d $disk; done PR generation=0x8, 3 registered reservation keys follow: 0x14080000 0x14080001 PR generation=0x5, there are NO registered reservation keys PR generation=0x4, there are NO registered reservation keys [root@r8-node-01 pcs]# pcs status nodes Pacemaker Nodes: Online: r8-node-01 r8-node-02 Standby: Standby with resource(s) running: Maintenance: Offline: r8-node-03 Pacemaker Remote Nodes: Online: Standby: Standby with resource(s) running: Maintenance: Offline: ### Add scsi devices [root@r8-node-01 pcs]# pcs stonith update-scsi-devices fence-scsi add $disk2 $disk3 r8-node-03: Unfencing skipped, device '/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b' is fenced [root@r8-node-01 pcs]# echo $? 0 ### Check registration keys on the disks [root@r8-node-01 pcs]# for disk in $disk{1..3}; do sg_persist -n -i -k -d $disk; done PR generation=0x9, 2 registered reservation keys follow: 0x14080000 0x14080001 PR generation=0x7, 2 registered reservation keys follow: 0x14080000 0x14080001 PR generation=0x6, 2 registered reservation keys follow: 0x14080000 0x14080001 There is no key of the fenced node. AFTER: ====== [root@virt-488 ~]# rpm -q pcs pcs-0.10.10-4.el8.x86_64 ## Environment: cluster with 3 nodes, 3 shared disks, dummy resource and scsi fencing # nodes [root@virt-488 ~]# pcs status nodes Pacemaker Nodes: Online: virt-488 virt-489 virt-527 Standby: Standby with resource(s) running: Maintenance: Offline: Pacemaker Remote Nodes: Online: Standby: Standby with resource(s) running: Maintenance: Offline: # disks [root@virt-488 ~]# ls -lr /dev/disk/by-id/ | grep -m 3 "sda\|sdb\|sdc" lrwxrwxrwx. 1 root root 9 Sep 30 10:54 wwn-0x60014057d85bd87407f4f498e819029a -> ../../sdc lrwxrwxrwx. 1 root root 9 Sep 30 10:54 wwn-0x60014057d43430762ed4fbfbc895e26e -> ../../sdb lrwxrwxrwx. 1 root root 9 Sep 30 10:54 wwn-0x600140566f7eadb8310437c8a08d9309 -> ../../sda [root@virt-488 ~]# export DISK1=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309 [root@virt-488 ~]# export DISK2=/dev/disk/by-id/wwn-0x60014057d43430762ed4fbfbc895e26e [root@virt-488 ~]# export DISK3=/dev/disk/by-id/wwn-0x60014057d85bd87407f4f498e819029a [root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done PR generation=0x0, there are NO registered reservation keys PR generation=0x0, there are NO registered reservation keys PR generation=0x0, there are NO registered reservation keys # scsi fencing [root@virt-488 ~]# pcs stonith create scsi-fencing fence_scsi devices="$DISK1" pcmk_host_check="static-list" pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action="off" meta provides="unfencing" [root@virt-488 ~]# echo $? 0 [root@virt-488 ~]# pcs stonith * scsi-fencing (stonith:fence_scsi): Started virt-488 [root@virt-488 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309 pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) # keys on the disks [root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done PR generation=0x3, 3 registered reservation keys follow: 0xc5370000 0xc5370002 0xc5370001 PR generation=0x0, there are NO registered reservation keys PR generation=0x0, there are NO registered reservation keys # resource with its start time [root@virt-488 ~]# pcs resource create dummy1 ocf:heartbeat:Dummy [root@virt-488 ~]# crm_resource --list-all-operations --resource dummy1 | grep start dummy1 (ocf::heartbeat:Dummy): Started: dummy1_start_0 (node=virt-489, call=65, rc=0, last-rc-change=Thu Sep 30 14:22:42 2021, exec=20ms): complete ## Fencing one node by blocking a corosync traffic [root@virt-527 ~]# ip6tables -A INPUT ! -i lo -p udp --dport 5404 -j DROP && ip6tables -A INPUT ! -i lo -p udp --dport 5405 -j DROP && ip6tables -A OUTPUT ! -o lo -p udp --sport 5404 -j DROP && ip6tables -A OUTPUT ! -o lo -p udp --sport 5405 -j DROP [root@virt-527 ~]# echo $? 0 [root@virt-527 ~]# corosync-quorumtool | grep Quorate Quorate: No # checking nodes [root@virt-488 ~]# pcs status nodes Pacemaker Nodes: Online: virt-488 virt-489 Standby: Standby with resource(s) running: Maintenance: Offline: virt-527 Pacemaker Remote Nodes: Online: Standby: Standby with resource(s) running: Maintenance: Offline: # checking the keys on the devices [root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done PR generation=0x4, 2 registered reservation keys follow: 0xc5370000 0xc5370001 PR generation=0x0, there are NO registered reservation keys PR generation=0x0, there are NO registered reservation keys ## Adding scsi device # update-scsi-devices add [root@virt-488 ~]# pcs stonith update-scsi-devices scsi-fencing add $DISK2 virt-527: Unfencing skipped, device '/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309' is fenced [root@virt-488 ~]# echo $? 0 > OK: Warning message that is notifying one device was skipped with unfencing [root@virt-488 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309,/dev/disk/by-id/wwn-0x60014057d43430762ed4fbfbc895e26e pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) [root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done PR generation=0x4, 2 registered reservation keys follow: 0xc5370000 0xc5370001 PR generation=0x2, 2 registered reservation keys follow: 0xc5370000 0xc5370001 PR generation=0x0, there are NO registered reservation keys > OK: A node without quorum wasn't unfenced # update-scsi-devices set [root@virt-488 ~]# pcs stonith update-scsi-devices scsi-fencing set $DISK3 virt-527: Unfencing skipped, devices '/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309', '/dev/disk/by-id/wwn-0x60014057d43430762ed4fbfbc895e26e' are fenced [root@virt-488 ~]# echo $? 0 [root@virt-488 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x60014057d85bd87407f4f498e819029a pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) [root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done PR generation=0x4, 2 registered reservation keys follow: 0xc5370000 0xc5370001 PR generation=0x2, 2 registered reservation keys follow: 0xc5370000 0xc5370001 PR generation=0x2, 2 registered reservation keys follow: 0xc5370001 0xc5370000 > OK: Fenced node's key isn't registered # combination of add and remove [root@virt-488 ~]# pcs stonith update-scsi-devices scsi-fencing remove $DISK3 add $DISK1 virt-527: Unfencing skipped, device '/dev/disk/by-id/wwn-0x60014057d85bd87407f4f498e819029a' is fenced [root@virt-488 ~]# echo $? 0 [root@virt-488 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309 pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) [root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done PR generation=0x5, 2 registered reservation keys follow: 0xc5370000 0xc5370001 PR generation=0x2, 2 registered reservation keys follow: 0xc5370000 0xc5370001 PR generation=0x2, 2 registered reservation keys follow: 0xc5370001 0xc5370000 > OK ## Rebooting the fenced node [root@virt-527 ~]# pcs status nodes Pacemaker Nodes: Online: virt-488 virt-489 virt-527 Standby: Standby with resource(s) running: Maintenance: Offline: Pacemaker Remote Nodes: Online: Standby: Standby with resource(s) running: Maintenance: Offline: [root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done PR generation=0x6, 3 registered reservation keys follow: 0xc5370000 0xc5370001 0xc5370002 PR generation=0x2, 2 registered reservation keys follow: 0xc5370000 0xc5370001 PR generation=0x2, 2 registered reservation keys follow: 0xc5370001 0xc5370000 > OK: Key of the node was added to the configured disk [root@virt-488 ~]# pcs stonith update-scsi-devices scsi-fencing add $DISK2 [root@virt-488 ~]# echo $? 0 [root@virt-488 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309,/dev/disk/by-id/wwn-0x60014057d43430762ed4fbfbc895e26e pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) [root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done PR generation=0x6, 3 registered reservation keys follow: 0xc5370000 0xc5370001 0xc5370002 PR generation=0x4, 3 registered reservation keys follow: 0xc5370000 0xc5370001 0xc5370002 PR generation=0x2, 2 registered reservation keys follow: 0xc5370001 0xc5370000 > OK ## Checking that the resource has not restarted [root@virt-488 ~]# crm_resource --list-all-operations --resource dummy1 | grep start dummy1 (ocf::heartbeat:Dummy): Started: dummy1_start_0 (node=virt-489, call=65, rc=0, last-rc-change=Thu Sep 30 14:22:42 2021, exec=20ms): complete > OK: Start time stayed the same Marking as VERIFIED for pcs-0.10.10-4.el8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: pcs security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4142 |
Description of problem: A new command 'pcs stonith update-scsi-devices' unfence a node, that is not quorate. Version-Release number of selected component (if applicable): pcs-0.10.8-4.el8 How reproducible: always Steps to Reproduce: ## Having scsi fencing set (3 disks, 3 nodes) [root@virt-499 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401,/dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c,/dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 pcmk_host_check=static-list pcmk_host_list="virt-499 virt-504 virt-519" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 PR generation=0x27, 3 registered reservation keys follow: 0x2e6e0000 0x2e6e0002 0x2e6e0001 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c PR generation=0x1c, 3 registered reservation keys follow: 0x2e6e0002 0x2e6e0000 0x2e6e0001 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401 PR generation=0x1a, 3 registered reservation keys follow: 0x2e6e0002 0x2e6e0001 0x2e6e0000 ## Fence one node by blocking corosync ports [root@virt-519 ~]# ip6tables -A INPUT ! -i lo -p udp --dport 5404 -j DROP && ip6tables -A INPUT ! -i lo -p udp --dport 5405 -j DROP && ip6tables -A OUTPUT ! -o lo -p udp --sport 5404 -j DROP && ip6tables -A OUTPUT ! -o lo -p udp --sport 5405 -j DROP [root@virt-519 ~]# corosync-quorumtool | grep Quorate Quorate: No # on other nodes [root@virt-499 ~]# corosync-quorumtool | grep Quorate Quorate: Yes Flags: Quorate [root@virt-504 ~]# corosync-quorumtool | grep Quorate Quorate: Yes Flags: Quorate ## Checking registered keys on the disks [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 PR generation=0x28, 2 registered reservation keys follow: 0x2e6e0000 0x2e6e0001 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c PR generation=0x1d, 2 registered reservation keys follow: 0x2e6e0000 0x2e6e0001 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401 PR generation=0x1b, 2 registered reservation keys follow: 0x2e6e0001 0x2e6e0000 > So far OK, the fence is recognized, the node's registration key was deleted from the disks ## Updating fence_scsi disks, while one node is still fenced [root@virt-499 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 /dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c /dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401 [root@virt-499 ~]# echo $? 0 [root@virt-519 ~]# corosync-quorumtool | grep Quorate Quorate: No [root@virt-499 ~]# pcs status | grep Node -A 2 Node List: * Online: [ virt-499 virt-504 ] * OFFLINE: [ virt-519 ] ## Checking registered keys on the disks again [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 PR generation=0x2a, 3 registered reservation keys follow: 0x2e6e0000 0x2e6e0001 0x2e6e0002 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c PR generation=0x1f, 3 registered reservation keys follow: 0x2e6e0000 0x2e6e0001 0x2e6e0002 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401 PR generation=0x1d, 3 registered reservation keys follow: 0x2e6e0001 0x2e6e0000 0x2e6e0002 Actual results: The update unfenced the node without quorum Expected results: The update preferably shouldn't unfence unquorate node