Bug 2033248

Summary: When a fence-scsi storage device becomes unavailable, pcs cannot remove it from configuration and add a running one
Product: Red Hat Enterprise Linux 9 Reporter: Miroslav Lisik <mlisik>
Component: pcsAssignee: Miroslav Lisik <mlisik>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 9.0CC: cluster-maint, cluster-qe, idevat, kmalyjur, mlisik, mpospisi, nhostako, omular, tojeline
Target Milestone: rcKeywords: Triaged
Target Release: 9.0Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.11.1-8.el9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 2032997 Environment:
Last Closed: 2022-05-17 12:19:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2032997    
Bug Blocks:    

Description Miroslav Lisik 2021-12-16 10:59:33 UTC
+++ This bug was initially created as a clone of Bug #2032997 +++

Description of problem:
Having a fence_scsi configured with a disk that suddenly gets unavailable, pcs cannot remove it and add a new disk or just add a new running disk itself (using 'pcs stonith update-scsi-devices' add/remove syntax).

Version-Release number of selected component (if applicable):
pcs-0.10.12-1.el8.

How reproducible:
always

Steps to Reproduce:

[root@virt-033 ~]# export DISK1=/dev/disk/by-id/scsi-360014055b5a24c2c777487cb03a19c98
[root@virt-033 ~]# export DISK2=/dev/disk/by-id/scsi-36001405c04d8aed56e04cd1a3b0aabe8
[root@virt-033 ~]# export DISK3=/dev/disk/by-id/scsi-36001405d95ea40f8b2a4a3d86d866e20

1. Configure fence_scsi with sda shared disk
[root@virt-033 ~]# pcs stonith create scsi-fencing fence_scsi devices="$DISK1" pcmk_host_check="static-list" pcmk_host_list="virt-033 virt-034 virt-035" pcmk_reboot_action="off" meta provides="unfencing"

2. Make the disk unavailable
[root@virt-033 ~]# echo "offline" > /sys/block/sda/device/state

3. 
[root@virt-033 ~]# pcs stonith update-scsi-devices scsi-fencing remove $DISK1 add $DISK3
Error: virt-033: Unfencing failed, unable to check status of device '/dev/disk/by-id/scsi-360014055b5a24c2c777487cb03a19c98': 2021-12-15 12:10:35,797 ERROR: Cannot get registration keys

2021-12-15 12:10:35,798 ERROR: Please use '-h' for usage
Error: Errors have occurred, therefore pcs is unable to continue
[root@virt-033 ~]# echo $?
1

[root@virt-033 ~]# pcs stonith update-scsi-devices scsi-fencing add $DISK3
Error: virt-033: Unfencing failed, unable to check status of device '/dev/disk/by-id/scsi-360014055b5a24c2c777487cb03a19c98': 2021-12-15 12:11:06,424 ERROR: Cannot get registration keys

2021-12-15 12:11:06,425 ERROR: Please use '-h' for usage
Error: Errors have occurred, therefore pcs is unable to continue
[root@virt-033 ~]# echo $?
1


Expected results:
It should be possible to remove an unavailable disk or add a new available disk 

Additional info:
This bug is also present in 8.5 and was introduced by the fix to prevent unfencing of nodes without quorum (bz1991654)

--- Additional comment from Nina Hostakova on 2021-12-16 10:30:09 CET ---

qa_ack+, see comment0

--- Additional comment from Miroslav Lisik on 2021-12-16 10:43:34 CET ---

There are 2 cases:

1) adding a new disk and removing the unavailable disk
This can be fixed.
There is a workaround: Update disk in 2 steps. First remove unavailable disks and then add a new disk.

2) add disk to a cluster with unavailable disks
This is ok to fail because it is not possible to check if the node was fenced.

Comment 1 Miroslav Lisik 2022-01-13 14:01:02 UTC
Proposed fix:
https://github.com/ClusterLabs/pcs/commit/c78222c335790c9f8a791ce85174fde11e75e766

Test:

[root@r90-node-01 pcs]# export $r90_disk1=/dev/disk/by-id/scsi-36001405ab8c8a45d1794808a8872f1c2
[root@r90-node-01 pcs]# export $r90_disk2=/dev/disk/by-id/scsi-36001405c1cf9f31e16e49b6942bf60c7


[root@r90-node-01 pcs]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-36001405ab8c8a45d1794808a8872f1c2 pcmk_host_check=static-list pcmk_host_list="r90-node-01 r90-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

[root@r90-node-01 pcs]# echo offline > /sys/block/$(basename $(readlink $r90_disk1))/device/state
[root@r90-node-01 pcs]# cat /sys/block/$(basename $(readlink $r90_disk1))/device/state
offline

[root@r90-node-01 pcs]# pcs stonith update-scsi-devices fence-scsi add $r90_disk2 remove $r90_disk1
[root@r90-node-01 pcs]# echo $?
0
[root@r90-node-01 pcs]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-36001405c1cf9f31e16e49b6942bf60c7 pcmk_host_check=static-list pcmk_host_list="r90-node-01 r90-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

Comment 2 Miroslav Lisik 2022-01-14 17:41:21 UTC
DevTestResults:

[root@r90-node-01 ~]# rpm -q pcs
pcs-0.11.1-8.el9.x86_64

[root@r90-node-01 pcs]# export $r90_disk1=/dev/disk/by-id/scsi-36001405ab8c8a45d1794808a8872f1c2
[root@r90-node-01 pcs]# export $r90_disk2=/dev/disk/by-id/scsi-36001405c1cf9f31e16e49b6942bf60c7

[root@r90-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-36001405ab8c8a45d1794808a8872f1c2 pcmk_host_check=static-list pcmk_host_list="r90-node-01 r90-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

[root@r90-node-01 ~]# echo offline > /sys/block/$(basename $(readlink $r90_disk1))/device/state
[root@r90-node-01 ~]# cat /sys/block/$(basename $(readlink $r90_disk1))/device/state
offline


[root@r90-node-01 ~]# pcs stonith update-scsi-devices fence-scsi add $r90_disk2 remove $r90_disk1
[root@r90-node-01 ~]# echo $?
0
[root@r90-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-36001405c1cf9f31e16e49b6942bf60c7 pcmk_host_check=static-list pcmk_host_list="r90-node-01 r90-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

Comment 9 errata-xmlrpc 2022-05-17 12:19:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: pcs), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2290