Bug 1991654
| Summary: | update-scsi-devices command unfence a node without quorum | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Michal Mazourek <mmazoure> | ||||
| Component: | pcs | Assignee: | Miroslav Lisik <mlisik> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 8.5 | CC: | cfeist, cluster-maint, idevat, kmalyjur, lmiksik, mlisik, mpospisi, nhostako, omular, sbradley, tojeline | ||||
| Target Milestone: | beta | Keywords: | Triaged | ||||
| Target Release: | 8.5 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pcs-0.10.10-4.el8 | Doc Type: | Bug Fix | ||||
| Doc Text: |
The plan is to get the fix done before the bugged pcs packages are released.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 2003066 (view as bug list) | Environment: | |||||
| Last Closed: | 2021-11-09 17:34:53 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2003066 | ||||||
| Attachments: |
|
||||||
Created attachment 1825859 [details]
proposed fix + tests
Updated command:
* pcs stonith update-scsi-devices
Test:
* setup a cluster with a fence scsi stonith resource
* setup resources running on each node
* block corosync traffic on one cleuster node and wait until node is fenced
* add scsi devices by using command `pcs stonith update-scsi-devices add` or pcs stonith update-scsi-devices set`
* see result, which should be that devices are unfenced only on nodes which are note fenced and resources are not restarted.
DevTestResults:
[root@r8-node-01 ~]# rpm -q pcs
pcs-0.10.10-4.el8.x86_64
Environment: Cluster with a fence_scsi stonith resource and reosurces running on each node.
[root@r8-node-01 pcs]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)
[root@r8-node-01 pcs]# pcs resource
* d-01 (ocf::pacemaker:Dummy): Started r8-node-02
* d-02 (ocf::pacemaker:Dummy): Started r8-node-03
* d-03 (ocf::pacemaker:Dummy): Started r8-node-01
* d-04 (ocf::pacemaker:Dummy): Started r8-node-02
* d-05 (ocf::pacemaker:Dummy): Started r8-node-03
* d-06 (ocf::pacemaker:Dummy): Started r8-node-01
[root@r8-node-01 pcs]# echo $disk{1..3}
/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b /dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe /dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4
[root@r8-node-01 pcs]# for disk in $disk{1..3}; do sg_persist -n -i -k -d $disk; done
PR generation=0x8, 3 registered reservation keys follow:
0x14080000
0x14080001
0x14080002
PR generation=0x5, there are NO registered reservation keys
PR generation=0x4, there are NO registered reservation keys
### Block corosync traffic:
[root@r8-node-03 ~]# iptables -A INPUT ! -i lo -p udp --dport 5404 -j DROP && iptables -A INPUT ! -i lo -p udp --dport 5405 -j DROP && iptables -A OUTPUT ! -o lo -p udp --sport 5404 -j DROP && iptables -A OUTPUT ! -o lo -p udp --sport 5405 -j DROP
[root@r8-node-03 ~]# pcs status nodes
Pacemaker Nodes:
Online: r8-node-03
Standby:
Standby with resource(s) running:
Maintenance:
Offline: r8-node-01 r8-node-02
Pacemaker Remote Nodes:
Online:
Standby:
Standby with resource(s) running:
Maintenance:
Offline:
[root@r8-node-01 pcs]# for disk in $disk{1..3}; do sg_persist -n -i -k -d $disk; done
PR generation=0x8, 3 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x5, there are NO registered reservation keys
PR generation=0x4, there are NO registered reservation keys
[root@r8-node-01 pcs]# pcs status nodes
Pacemaker Nodes:
Online: r8-node-01 r8-node-02
Standby:
Standby with resource(s) running:
Maintenance:
Offline: r8-node-03
Pacemaker Remote Nodes:
Online:
Standby:
Standby with resource(s) running:
Maintenance:
Offline:
### Add scsi devices
[root@r8-node-01 pcs]# pcs stonith update-scsi-devices fence-scsi add $disk2 $disk3
r8-node-03: Unfencing skipped, device '/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b' is fenced
[root@r8-node-01 pcs]# echo $?
0
### Check registration keys on the disks
[root@r8-node-01 pcs]# for disk in $disk{1..3}; do sg_persist -n -i -k -d $disk; done
PR generation=0x9, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x7, 2 registered reservation keys follow:
0x14080000
0x14080001
PR generation=0x6, 2 registered reservation keys follow:
0x14080000
0x14080001
There is no key of the fenced node.
AFTER:
======
[root@virt-488 ~]# rpm -q pcs
pcs-0.10.10-4.el8.x86_64
## Environment: cluster with 3 nodes, 3 shared disks, dummy resource and scsi fencing
# nodes
[root@virt-488 ~]# pcs status nodes
Pacemaker Nodes:
Online: virt-488 virt-489 virt-527
Standby:
Standby with resource(s) running:
Maintenance:
Offline:
Pacemaker Remote Nodes:
Online:
Standby:
Standby with resource(s) running:
Maintenance:
Offline:
# disks
[root@virt-488 ~]# ls -lr /dev/disk/by-id/ | grep -m 3 "sda\|sdb\|sdc"
lrwxrwxrwx. 1 root root 9 Sep 30 10:54 wwn-0x60014057d85bd87407f4f498e819029a -> ../../sdc
lrwxrwxrwx. 1 root root 9 Sep 30 10:54 wwn-0x60014057d43430762ed4fbfbc895e26e -> ../../sdb
lrwxrwxrwx. 1 root root 9 Sep 30 10:54 wwn-0x600140566f7eadb8310437c8a08d9309 -> ../../sda
[root@virt-488 ~]# export DISK1=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309
[root@virt-488 ~]# export DISK2=/dev/disk/by-id/wwn-0x60014057d43430762ed4fbfbc895e26e
[root@virt-488 ~]# export DISK3=/dev/disk/by-id/wwn-0x60014057d85bd87407f4f498e819029a
[root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done
PR generation=0x0, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
# scsi fencing
[root@virt-488 ~]# pcs stonith create scsi-fencing fence_scsi devices="$DISK1" pcmk_host_check="static-list" pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action="off" meta provides="unfencing"
[root@virt-488 ~]# echo $?
0
[root@virt-488 ~]# pcs stonith
* scsi-fencing (stonith:fence_scsi): Started virt-488
[root@virt-488 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309 pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
# keys on the disks
[root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done
PR generation=0x3, 3 registered reservation keys follow:
0xc5370000
0xc5370002
0xc5370001
PR generation=0x0, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
# resource with its start time
[root@virt-488 ~]# pcs resource create dummy1 ocf:heartbeat:Dummy
[root@virt-488 ~]# crm_resource --list-all-operations --resource dummy1 | grep start
dummy1 (ocf::heartbeat:Dummy): Started: dummy1_start_0 (node=virt-489, call=65, rc=0, last-rc-change=Thu Sep 30 14:22:42 2021, exec=20ms): complete
## Fencing one node by blocking a corosync traffic
[root@virt-527 ~]# ip6tables -A INPUT ! -i lo -p udp --dport 5404 -j DROP && ip6tables -A INPUT ! -i lo -p udp --dport 5405 -j DROP && ip6tables -A OUTPUT ! -o lo -p udp --sport 5404 -j DROP && ip6tables -A OUTPUT ! -o lo -p udp --sport 5405 -j DROP
[root@virt-527 ~]# echo $?
0
[root@virt-527 ~]# corosync-quorumtool | grep Quorate
Quorate: No
# checking nodes
[root@virt-488 ~]# pcs status nodes
Pacemaker Nodes:
Online: virt-488 virt-489
Standby:
Standby with resource(s) running:
Maintenance:
Offline: virt-527
Pacemaker Remote Nodes:
Online:
Standby:
Standby with resource(s) running:
Maintenance:
Offline:
# checking the keys on the devices
[root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done
PR generation=0x4, 2 registered reservation keys follow:
0xc5370000
0xc5370001
PR generation=0x0, there are NO registered reservation keys
PR generation=0x0, there are NO registered reservation keys
## Adding scsi device
# update-scsi-devices add
[root@virt-488 ~]# pcs stonith update-scsi-devices scsi-fencing add $DISK2
virt-527: Unfencing skipped, device '/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309' is fenced
[root@virt-488 ~]# echo $?
0
> OK: Warning message that is notifying one device was skipped with unfencing
[root@virt-488 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309,/dev/disk/by-id/wwn-0x60014057d43430762ed4fbfbc895e26e pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
[root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done
PR generation=0x4, 2 registered reservation keys follow:
0xc5370000
0xc5370001
PR generation=0x2, 2 registered reservation keys follow:
0xc5370000
0xc5370001
PR generation=0x0, there are NO registered reservation keys
> OK: A node without quorum wasn't unfenced
# update-scsi-devices set
[root@virt-488 ~]# pcs stonith update-scsi-devices scsi-fencing set $DISK3
virt-527: Unfencing skipped, devices '/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309', '/dev/disk/by-id/wwn-0x60014057d43430762ed4fbfbc895e26e' are fenced
[root@virt-488 ~]# echo $?
0
[root@virt-488 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/wwn-0x60014057d85bd87407f4f498e819029a pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
[root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done
PR generation=0x4, 2 registered reservation keys follow:
0xc5370000
0xc5370001
PR generation=0x2, 2 registered reservation keys follow:
0xc5370000
0xc5370001
PR generation=0x2, 2 registered reservation keys follow:
0xc5370001
0xc5370000
> OK: Fenced node's key isn't registered
# combination of add and remove
[root@virt-488 ~]# pcs stonith update-scsi-devices scsi-fencing remove $DISK3 add $DISK1
virt-527: Unfencing skipped, device '/dev/disk/by-id/wwn-0x60014057d85bd87407f4f498e819029a' is fenced
[root@virt-488 ~]# echo $?
0
[root@virt-488 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309 pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
[root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done
PR generation=0x5, 2 registered reservation keys follow:
0xc5370000
0xc5370001
PR generation=0x2, 2 registered reservation keys follow:
0xc5370000
0xc5370001
PR generation=0x2, 2 registered reservation keys follow:
0xc5370001
0xc5370000
> OK
## Rebooting the fenced node
[root@virt-527 ~]# pcs status nodes
Pacemaker Nodes:
Online: virt-488 virt-489 virt-527
Standby:
Standby with resource(s) running:
Maintenance:
Offline:
Pacemaker Remote Nodes:
Online:
Standby:
Standby with resource(s) running:
Maintenance:
Offline:
[root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done
PR generation=0x6, 3 registered reservation keys follow:
0xc5370000
0xc5370001
0xc5370002
PR generation=0x2, 2 registered reservation keys follow:
0xc5370000
0xc5370001
PR generation=0x2, 2 registered reservation keys follow:
0xc5370001
0xc5370000
> OK: Key of the node was added to the configured disk
[root@virt-488 ~]# pcs stonith update-scsi-devices scsi-fencing add $DISK2
[root@virt-488 ~]# echo $?
0
[root@virt-488 ~]# pcs stonith config
Resource: scsi-fencing (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/wwn-0x600140566f7eadb8310437c8a08d9309,/dev/disk/by-id/wwn-0x60014057d43430762ed4fbfbc895e26e pcmk_host_check=static-list pcmk_host_list="virt-488 virt-489 virt-527" pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)
[root@virt-488 ~]# for DISK in $DISK{1..3}; do sg_persist -n -i -k -d $DISK; done
PR generation=0x6, 3 registered reservation keys follow:
0xc5370000
0xc5370001
0xc5370002
PR generation=0x4, 3 registered reservation keys follow:
0xc5370000
0xc5370001
0xc5370002
PR generation=0x2, 2 registered reservation keys follow:
0xc5370001
0xc5370000
> OK
## Checking that the resource has not restarted
[root@virt-488 ~]# crm_resource --list-all-operations --resource dummy1 | grep start
dummy1 (ocf::heartbeat:Dummy): Started: dummy1_start_0 (node=virt-489, call=65, rc=0, last-rc-change=Thu Sep 30 14:22:42 2021, exec=20ms): complete
> OK: Start time stayed the same
Marking as VERIFIED for pcs-0.10.10-4.el8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: pcs security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4142 |
Description of problem: A new command 'pcs stonith update-scsi-devices' unfence a node, that is not quorate. Version-Release number of selected component (if applicable): pcs-0.10.8-4.el8 How reproducible: always Steps to Reproduce: ## Having scsi fencing set (3 disks, 3 nodes) [root@virt-499 ~]# pcs stonith config Resource: scsi-fencing (class=stonith type=fence_scsi) Attributes: devices=/dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401,/dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c,/dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 pcmk_host_check=static-list pcmk_host_list="virt-499 virt-504 virt-519" pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s) [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 PR generation=0x27, 3 registered reservation keys follow: 0x2e6e0000 0x2e6e0002 0x2e6e0001 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c PR generation=0x1c, 3 registered reservation keys follow: 0x2e6e0002 0x2e6e0000 0x2e6e0001 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401 PR generation=0x1a, 3 registered reservation keys follow: 0x2e6e0002 0x2e6e0001 0x2e6e0000 ## Fence one node by blocking corosync ports [root@virt-519 ~]# ip6tables -A INPUT ! -i lo -p udp --dport 5404 -j DROP && ip6tables -A INPUT ! -i lo -p udp --dport 5405 -j DROP && ip6tables -A OUTPUT ! -o lo -p udp --sport 5404 -j DROP && ip6tables -A OUTPUT ! -o lo -p udp --sport 5405 -j DROP [root@virt-519 ~]# corosync-quorumtool | grep Quorate Quorate: No # on other nodes [root@virt-499 ~]# corosync-quorumtool | grep Quorate Quorate: Yes Flags: Quorate [root@virt-504 ~]# corosync-quorumtool | grep Quorate Quorate: Yes Flags: Quorate ## Checking registered keys on the disks [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 PR generation=0x28, 2 registered reservation keys follow: 0x2e6e0000 0x2e6e0001 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c PR generation=0x1d, 2 registered reservation keys follow: 0x2e6e0000 0x2e6e0001 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401 PR generation=0x1b, 2 registered reservation keys follow: 0x2e6e0001 0x2e6e0000 > So far OK, the fence is recognized, the node's registration key was deleted from the disks ## Updating fence_scsi disks, while one node is still fenced [root@virt-499 ~]# pcs stonith update-scsi-devices scsi-fencing set /dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 /dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c /dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401 [root@virt-499 ~]# echo $? 0 [root@virt-519 ~]# corosync-quorumtool | grep Quorate Quorate: No [root@virt-499 ~]# pcs status | grep Node -A 2 Node List: * Online: [ virt-499 virt-504 ] * OFFLINE: [ virt-519 ] ## Checking registered keys on the disks again [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405e9ba8116b7a944cfb4b88b767 PR generation=0x2a, 3 registered reservation keys follow: 0x2e6e0000 0x2e6e0001 0x2e6e0002 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x6001405978d3d55b2f34d3481433377c PR generation=0x1f, 3 registered reservation keys follow: 0x2e6e0000 0x2e6e0001 0x2e6e0002 [root@virt-499 ~]# sg_persist -n -i -k -d /dev/disk/by-id/wwn-0x60014050ca58fa9f66b488491466c401 PR generation=0x1d, 3 registered reservation keys follow: 0x2e6e0001 0x2e6e0000 0x2e6e0002 Actual results: The update unfenced the node without quorum Expected results: The update preferably shouldn't unfence unquorate node