Bug 2179010
| Summary: | Need a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Tomas Jelinek <tojeline> | |
| Component: | pcs | Assignee: | Miroslav Lisik <mlisik> | |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | urgent | Docs Contact: | Steven J. Levine <slevine> | |
| Priority: | urgent | |||
| Version: | 8.7 | CC: | cfeist, cluster-maint, idevat, kgaillot, mlisik, mmazoure, mpospisi, nhostako, omular, sbradley, slevine, tojeline | |
| Target Milestone: | rc | Keywords: | Regression, Triaged, ZStream | |
| Target Release: | 8.9 | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | pcs-0.10.16-1.el8 | Doc Type: | Bug Fix | |
| Doc Text: |
.`pcs` command to update multipath SCSI devices now works correctly
Due to changes in the Pacemaker CIB file, the `pcs stonith update-scsi-devices` command stopped working as designed, causing an unwanted restart of some cluster resources. With this fix, this command works correctly and updates SCSI devices without requiring a restart of other cluster resources running on the same node.
|
Story Points: | --- | |
| Clone Of: | 2177996 | |||
| : | 2180706 2180707 (view as bug list) | Environment: | ||
| Last Closed: | 2023-11-14 15:22:35 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2177996 | |||
| Bug Blocks: | 2180706, 2180707 | |||
|
Description
Tomas Jelinek
2023-03-16 12:34:43 UTC
Upstream commit: https://github.com/ClusterLabs/pcs/commit/bf7d33bdd41f6e51321ae66cd521cefc93acb3a4 Updated commands: * pcs stonith update-scsi-devices Test: Setup cluster with a shared storage and fence_scsi fencing. Setup enough resources in order to have each node running some resource. Use `pcs stonith update-scsi-devices` command to modify scsi devices of the fence_scsi stonith device. Check that resources did not restart (journalctl, crm_rsource --list-operations) DevTestResults:
[root@r08-09-a ~]# rpm -q pcs
pcs-0.10.16-1.el8.x86_64
(pcs) [root@r08-09-a pcs]# pcs_test/suite --installed --traditional-verbose pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_nonrecurring_start_op_with_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_nonrecurring_start_op_with_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_monitor_with_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_monitor_with_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_one_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_one_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_no_monitor_ops (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_no_monitor_ops ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_default_monitor (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_default_monitor ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_timeouts (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_timeouts ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_digests_with_empty_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_digests_with_empty_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes_multi_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_all_digest_types (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_all_digest_types ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_no_digest_for_our_stonith_id (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_no_digest_for_our_stonith_id ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_digests_attrs (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_digests_attrs ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes_multi_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_various_start_ops_one_lrm_start_op (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_various_start_ops_one_lrm_start_op ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma_multi_value ... OK
----------------------------------------------------------------------
Ran 17 tests in 0.096s
OK
Additional testing on real cluster by mlisik:
root@r8-node-01 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)
[root@r8-node-01 ~]# rpm -q pcs pacemaker
pcs-0.10.16-1.el8.x86_64
pacemaker-2.1.6-1.el8.x86_64
[root@r8-node-01 ~]# export disk1=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d
[root@r8-node-01 ~]# export disk2=/dev/disk/by-id/scsi-36001405fb15e3edf2994db380037abac
[root@r8-node-01 ~]# export NODELIST=(r8-node-01 r8-node-02)
[root@r8-node-01 ~]# pcs host auth -u hacluster -p password ${NODELIST[*]}
r8-node-01: Authorized
r8-node-02: Authorized
[root@r8-node-01 ~]# pcs cluster setup HACluster ${NODELIST[*]} --start --wait
No addresses specified for host 'r8-node-01', using 'r8-node-01'
No addresses specified for host 'r8-node-02', using 'r8-node-02'
Destroying cluster on hosts: 'r8-node-01', 'r8-node-02'...
r8-node-01: Successfully destroyed cluster
r8-node-02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'r8-node-01', 'r8-node-02'
r8-node-01: successful removal of the file 'pcsd settings'
r8-node-02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'r8-node-01', 'r8-node-02'
r8-node-01: successful distribution of the file 'corosync authkey'
r8-node-01: successful distribution of the file 'pacemaker authkey'
r8-node-02: successful distribution of the file 'corosync authkey'
r8-node-02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'r8-node-01', 'r8-node-02'
r8-node-01: successful distribution of the file 'corosync.conf'
r8-node-02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'r8-node-01', 'r8-node-02'...
Waiting for node(s) to start: 'r8-node-01', 'r8-node-02'...
r8-node-01: Cluster started
r8-node-02: Cluster started
[root@r8-node-01 ~]# pcs stonith create fence-scsi fence_scsi devices=$disk1 pcmk_host_check=static-list pcmk_host_list="${NODELIST[*]}" pcmk_reboot_action=off meta provides=unfencing
[root@r8-node-01 ~]# for i in $(seq 1 ${#NODELIST[@]}); do pcs resource create "d$i" ocf:pacemaker:Dummy; done
[root@r8-node-01 ~]# pcs resource
* d1 (ocf::pacemaker:Dummy): Started r8-node-02
* d2 (ocf::pacemaker:Dummy): Started r8-node-01
[root@r8-node-01 ~]# pcs stonith
* fence-scsi (stonith:fence_scsi): Started r8-node-01
[root@r8-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
Attributes: fence-scsi-instance_attributes
devices=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d
pcmk_host_check=static-list
pcmk_host_list="r8-node-01 r8-node-02"
pcmk_reboot_action=off
Meta Attributes: fence-scsi-meta_attributes
provides=unfencing
Operations:
monitor: fence-scsi-monitor-interval-60s
interval=60s
[root@r8-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o1.txt
fence-scsi (stonith:fence_scsi): Started: fence-scsi_start_0 (node=r8-node-01, call=6, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=87ms): complete
fence-scsi (stonith:fence_scsi): Started: fence-scsi_monitor_60000 (node=r8-node-01, call=7, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=88ms): complete
fence-scsi (stonith:fence_scsi): Started: fence-scsi_monitor_0 (node=r8-node-02, call=5, rc=7, last-rc-change='Fri May 26 15:34:39 2023', exec=2ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_monitor_0 (node=r8-node-01, call=11, rc=7, last-rc-change='Fri May 26 15:34:40 2023', exec=14ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_start_0 (node=r8-node-02, call=10, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=18ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_monitor_10000 (node=r8-node-02, call=11, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=11ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_start_0 (node=r8-node-01, call=16, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=16ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_monitor_10000 (node=r8-node-01, call=17, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=13ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_monitor_0 (node=r8-node-02, call=15, rc=7, last-rc-change='Fri May 26 15:34:41 2023', exec=20ms): complete
[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi add $disk2
[root@r8-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o2.txt
fence-scsi (stonith:fence_scsi): Started: fence-scsi_start_0 (node=r8-node-01, call=6, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=87ms): complete
fence-scsi (stonith:fence_scsi): Started: fence-scsi_monitor_60000 (node=r8-node-01, call=7, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=88ms): complete
fence-scsi (stonith:fence_scsi): Started: fence-scsi_monitor_0 (node=r8-node-02, call=5, rc=7, last-rc-change='Fri May 26 15:34:39 2023', exec=2ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_monitor_0 (node=r8-node-01, call=11, rc=7, last-rc-change='Fri May 26 15:34:40 2023', exec=14ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_start_0 (node=r8-node-02, call=10, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=18ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_monitor_10000 (node=r8-node-02, call=11, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=11ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_start_0 (node=r8-node-01, call=16, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=16ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_monitor_10000 (node=r8-node-01, call=17, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=13ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_monitor_0 (node=r8-node-02, call=15, rc=7, last-rc-change='Fri May 26 15:34:41 2023', exec=20ms): complete
[root@r8-node-01 ~]# diff -u o1.txt o2.txt
[root@r8-node-01 ~]# echo $?
0
[root@r8-node-01 ~]# journalctl -n0 -f
-- Logs begin at Fri 2023-05-26 15:09:03 CEST. --
May 26 15:39:47 r8-node-01 pacemaker-fenced[4373]: notice: Added 'fence-scsi' to device list (1 active device)
[root@r8-node-02 ~]# journalctl -n0 -f
-- Logs begin at Fri 2023-05-26 15:09:05 CEST. --
May 26 15:39:47 r8-node-02 pacemaker-controld[3703]: notice: State transition S_IDLE -> S_POLICY_ENGINE
May 26 15:39:47 r8-node-02 pacemaker-fenced[3699]: notice: Added 'fence-scsi' to device list (1 active device)
May 26 15:39:47 r8-node-02 pacemaker-schedulerd[3702]: notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-400.bz2
May 26 15:39:47 r8-node-02 pacemaker-controld[3703]: notice: Transition 5 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-400.bz2): Complete
May 26 15:39:47 r8-node-02 pacemaker-controld[3703]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
[root@r8-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
Attributes: fence-scsi-instance_attributes
devices=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d,/dev/disk/by-
id/scsi-36001405fb15e3edf2994db380037abac
pcmk_host_check=static-list
pcmk_host_list="r8-node-01 r8-node-02"
pcmk_reboot_action=off
Meta Attributes: fence-scsi-meta_attributes
provides=unfencing
Operations:
monitor: fence-scsi-monitor-interval-60s
interval=60s
RESULT: SCSI device was added to the stonith configuration and resources were not restarted.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pcs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:6903 |