Bug 2179010
| Summary: | Need a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Tomas Jelinek <tojeline> | |
| Component: | pcs | Assignee: | Miroslav Lisik <mlisik> | |
| Status: | ON_QA --- | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | urgent | Docs Contact: | Steven J. Levine <slevine> | |
| Priority: | urgent | |||
| Version: | 8.7 | CC: | cfeist, cluster-maint, idevat, kgaillot, lmiksik, mlisik, mmazoure, mpospisi, nhostako, omular, sbradley, slevine, tojeline | |
| Target Milestone: | rc | Keywords: | Regression, Triaged, ZStream | |
| Target Release: | 8.9 | |||
| Hardware: | Unspecified | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | pcs-0.10.16-1.el8 | Doc Type: | Bug Fix | |
| Doc Text: |
.`pcs` command to update multipath SCSI devices now works correctly
Due to changes in the Pacemaker CIB file, the `pcs stonith update-scsi-devices` command stopped working as designed, causing an unwanted restart of some cluster resources. With this fix, this command works correctly and updates SCSI devices without requiring a restart of other cluster resources running on the same node.
|
Story Points: | --- | |
| Clone Of: | 2177996 | |||
| : | 2180706 2180707 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2177996 | |||
| Bug Blocks: | 2180706, 2180707 | |||
|
Description
Tomas Jelinek
2023-03-16 12:34:43 UTC
Upstream commit: https://github.com/ClusterLabs/pcs/commit/bf7d33bdd41f6e51321ae66cd521cefc93acb3a4 Updated commands: * pcs stonith update-scsi-devices Test: Setup cluster with a shared storage and fence_scsi fencing. Setup enough resources in order to have each node running some resource. Use `pcs stonith update-scsi-devices` command to modify scsi devices of the fence_scsi stonith device. Check that resources did not restart (journalctl, crm_rsource --list-operations) DevTestResults:
[root@r08-09-a ~]# rpm -q pcs
pcs-0.10.16-1.el8.x86_64
(pcs) [root@r08-09-a pcs]# pcs_test/suite --installed --traditional-verbose pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_nonrecurring_start_op_with_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_nonrecurring_start_op_with_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_monitor_with_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_1_monitor_with_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_one_timeout (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_one_timeout ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_no_monitor_ops (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_no_monitor_ops ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_default_monitor (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_default_monitor ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_timeouts (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_2_monitor_ops_with_timeouts ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_digests_with_empty_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_digests_with_empty_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes_multi_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_all_nodes ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_all_digest_types (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_all_digest_types ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_no_digest_for_our_stonith_id (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_no_digest_for_our_stonith_id ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_digests_attrs (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_digests_attrs ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_not_on_all_nodes_multi_value ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_various_start_ops_one_lrm_start_op (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_various_start_ops_one_lrm_start_op ... OK
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma_multi_value (subunit.RemotedTestCase)
pcs_test.tier0.lib.commands.test_stonith_update_scsi_devices.TestUpdateScsiDevicesDigestsSetScsi.test_transient_digests_attrs_without_last_comma_multi_value ... OK
----------------------------------------------------------------------
Ran 17 tests in 0.096s
OK
Additional testing on real cluster by mlisik:
root@r8-node-01 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)
[root@r8-node-01 ~]# rpm -q pcs pacemaker
pcs-0.10.16-1.el8.x86_64
pacemaker-2.1.6-1.el8.x86_64
[root@r8-node-01 ~]# export disk1=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d
[root@r8-node-01 ~]# export disk2=/dev/disk/by-id/scsi-36001405fb15e3edf2994db380037abac
[root@r8-node-01 ~]# export NODELIST=(r8-node-01 r8-node-02)
[root@r8-node-01 ~]# pcs host auth -u hacluster -p password ${NODELIST[*]}
r8-node-01: Authorized
r8-node-02: Authorized
[root@r8-node-01 ~]# pcs cluster setup HACluster ${NODELIST[*]} --start --wait
No addresses specified for host 'r8-node-01', using 'r8-node-01'
No addresses specified for host 'r8-node-02', using 'r8-node-02'
Destroying cluster on hosts: 'r8-node-01', 'r8-node-02'...
r8-node-01: Successfully destroyed cluster
r8-node-02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'r8-node-01', 'r8-node-02'
r8-node-01: successful removal of the file 'pcsd settings'
r8-node-02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'r8-node-01', 'r8-node-02'
r8-node-01: successful distribution of the file 'corosync authkey'
r8-node-01: successful distribution of the file 'pacemaker authkey'
r8-node-02: successful distribution of the file 'corosync authkey'
r8-node-02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'r8-node-01', 'r8-node-02'
r8-node-01: successful distribution of the file 'corosync.conf'
r8-node-02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'r8-node-01', 'r8-node-02'...
Waiting for node(s) to start: 'r8-node-01', 'r8-node-02'...
r8-node-01: Cluster started
r8-node-02: Cluster started
[root@r8-node-01 ~]# pcs stonith create fence-scsi fence_scsi devices=$disk1 pcmk_host_check=static-list pcmk_host_list="${NODELIST[*]}" pcmk_reboot_action=off meta provides=unfencing
[root@r8-node-01 ~]# for i in $(seq 1 ${#NODELIST[@]}); do pcs resource create "d$i" ocf:pacemaker:Dummy; done
[root@r8-node-01 ~]# pcs resource
* d1 (ocf::pacemaker:Dummy): Started r8-node-02
* d2 (ocf::pacemaker:Dummy): Started r8-node-01
[root@r8-node-01 ~]# pcs stonith
* fence-scsi (stonith:fence_scsi): Started r8-node-01
[root@r8-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
Attributes: fence-scsi-instance_attributes
devices=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d
pcmk_host_check=static-list
pcmk_host_list="r8-node-01 r8-node-02"
pcmk_reboot_action=off
Meta Attributes: fence-scsi-meta_attributes
provides=unfencing
Operations:
monitor: fence-scsi-monitor-interval-60s
interval=60s
[root@r8-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o1.txt
fence-scsi (stonith:fence_scsi): Started: fence-scsi_start_0 (node=r8-node-01, call=6, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=87ms): complete
fence-scsi (stonith:fence_scsi): Started: fence-scsi_monitor_60000 (node=r8-node-01, call=7, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=88ms): complete
fence-scsi (stonith:fence_scsi): Started: fence-scsi_monitor_0 (node=r8-node-02, call=5, rc=7, last-rc-change='Fri May 26 15:34:39 2023', exec=2ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_monitor_0 (node=r8-node-01, call=11, rc=7, last-rc-change='Fri May 26 15:34:40 2023', exec=14ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_start_0 (node=r8-node-02, call=10, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=18ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_monitor_10000 (node=r8-node-02, call=11, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=11ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_start_0 (node=r8-node-01, call=16, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=16ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_monitor_10000 (node=r8-node-01, call=17, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=13ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_monitor_0 (node=r8-node-02, call=15, rc=7, last-rc-change='Fri May 26 15:34:41 2023', exec=20ms): complete
[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi add $disk2
[root@r8-node-01 ~]# for r in fence-scsi d1 d2; do crm_resource --resource $r --list-operations; done |& tee o2.txt
fence-scsi (stonith:fence_scsi): Started: fence-scsi_start_0 (node=r8-node-01, call=6, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=87ms): complete
fence-scsi (stonith:fence_scsi): Started: fence-scsi_monitor_60000 (node=r8-node-01, call=7, rc=0, last-rc-change='Fri May 26 15:34:39 2023', exec=88ms): complete
fence-scsi (stonith:fence_scsi): Started: fence-scsi_monitor_0 (node=r8-node-02, call=5, rc=7, last-rc-change='Fri May 26 15:34:39 2023', exec=2ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_monitor_0 (node=r8-node-01, call=11, rc=7, last-rc-change='Fri May 26 15:34:40 2023', exec=14ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_start_0 (node=r8-node-02, call=10, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=18ms): complete
d1 (ocf::pacemaker:Dummy): Started: d1_monitor_10000 (node=r8-node-02, call=11, rc=0, last-rc-change='Fri May 26 15:34:40 2023', exec=11ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_start_0 (node=r8-node-01, call=16, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=16ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_monitor_10000 (node=r8-node-01, call=17, rc=0, last-rc-change='Fri May 26 15:34:41 2023', exec=13ms): complete
d2 (ocf::pacemaker:Dummy): Started: d2_monitor_0 (node=r8-node-02, call=15, rc=7, last-rc-change='Fri May 26 15:34:41 2023', exec=20ms): complete
[root@r8-node-01 ~]# diff -u o1.txt o2.txt
[root@r8-node-01 ~]# echo $?
0
[root@r8-node-01 ~]# journalctl -n0 -f
-- Logs begin at Fri 2023-05-26 15:09:03 CEST. --
May 26 15:39:47 r8-node-01 pacemaker-fenced[4373]: notice: Added 'fence-scsi' to device list (1 active device)
[root@r8-node-02 ~]# journalctl -n0 -f
-- Logs begin at Fri 2023-05-26 15:09:05 CEST. --
May 26 15:39:47 r8-node-02 pacemaker-controld[3703]: notice: State transition S_IDLE -> S_POLICY_ENGINE
May 26 15:39:47 r8-node-02 pacemaker-fenced[3699]: notice: Added 'fence-scsi' to device list (1 active device)
May 26 15:39:47 r8-node-02 pacemaker-schedulerd[3702]: notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-400.bz2
May 26 15:39:47 r8-node-02 pacemaker-controld[3703]: notice: Transition 5 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-400.bz2): Complete
May 26 15:39:47 r8-node-02 pacemaker-controld[3703]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
[root@r8-node-01 ~]# pcs stonith config
Resource: fence-scsi (class=stonith type=fence_scsi)
Attributes: fence-scsi-instance_attributes
devices=/dev/disk/by-id/scsi-3600140500e2fe60a3eb479bb39ca8d3d,/dev/disk/by-
id/scsi-36001405fb15e3edf2994db380037abac
pcmk_host_check=static-list
pcmk_host_list="r8-node-01 r8-node-02"
pcmk_reboot_action=off
Meta Attributes: fence-scsi-meta_attributes
provides=unfencing
Operations:
monitor: fence-scsi-monitor-interval-60s
interval=60s
RESULT: SCSI device was added to the stonith configuration and resources were not restarted.
|