Bug 1978013
| Summary: | Pacemaker can select wrong fence device when pcmk_host_map and dynamic-list are combined | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Ken Gaillot <kgaillot> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 9.0 | CC: | cluster-maint, cluster-qe, msmazova, phagara |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | 9.0 Beta | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-2.1.0-6.el9 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: If a fence device configured with pcmk_host_check="dynamic-list" failed its list action, and also had a pcmk_host_map configured, Pacemaker would wrongly assume the device could fence all the nodes listed in the host map.
Consequence: Pacemaker might wrongly select the device to fence one of the nodes in the host map that it couldn't actually fence.
Fix: Pacemaker now does not assume a fence device that fails its list action can fence any hosts.
Result: The proper device will be chosen for a node that requires fencing.
|
Story Points: | --- |
| Clone Of: | 1978010 | Environment: | |
| Last Closed: | 2021-12-07 21:57:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1978010 | ||
| Bug Blocks: | |||
|
Description
Ken Gaillot
2021-06-30 22:42:29 UTC
before fix ========== See https://bugzilla.redhat.com/show_bug.cgi?id=1978010#c5 after fix ========= > [root@virt-513 ~]# rpm -q pacemaker > pacemaker-2.1.0-11.el9.x86_64 Starting with: * a 2-node cluster * dummy fence agent installed on both nodes as /usr/sbin/fence_bz1978010: https://github.com/ClusterLabs/fence-agents/blob/master/agents/dummy/fence_dummy.py * per-node real fence device configured > [root@virt-513 ~]# pcs status > Cluster name: STSRHTS14461 > Cluster Summary: > * Stack: corosync > * Current DC: virt-514 (version 2.1.0-11.el9-7c3f660707) - partition with quorum > * Last updated: Tue Aug 24 19:55:16 2021 > * Last change: Tue Aug 24 15:44:18 2021 by root via cibadmin on virt-513 > * 2 nodes configured > * 2 resource instances configured > > Node List: > * Online: [ virt-513 virt-514 ] > > Full List of Resources: > * fence-virt-513 (stonith:fence_xvm): Started virt-513 > * fence-virt-514 (stonith:fence_xvm): Started virt-514 > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled Remove the monitor operation from second node's real fence device: > [root@virt-513 ~]# pcs cluster cib scope=resources cib.xml > [root@virt-513 ~]# cp cib.xml cib-updated.xml > [root@virt-513 ~]# vim cib-updated.xml > [root@virt-513 ~]# diff cib.xml cib-updated.xml > 19,21d18 > < <operations> > < <op name="monitor" interval="60s" id="fence-virt-514-monitor-interval-60s"/> > < </operations> > [root@virt-513 ~]# pcs cluster cib-push scope=resources cib-updated.xml > CIB updated Make the second node's real fence device prefer the second node: > [root@virt-513 ~]# pcs constraint location fence-virt-514 prefers virt-514 > [root@virt-513 ~]# pcs constraint list --full > Warning: This command is deprecated and will be removed. Please use 'pcs constraint config' instead. > Location Constraints: > Resource: fence-virt-514 > Enabled on: > Node: virt-514 (score:INFINITY) (id:location-fence-virt-514-virt-514-INFINITY) > Ordering Constraints: > Colocation Constraints: > Ticket Constraints: Create a dynamic fence device that always fails using the dummy fence agent: > [root@virt-513 ~]# pcs stonith create bz1978013 fence_bz1978013 pcmk_host_check="dynamic-list" pcmk_host_map='virt-513:frist;virt-514:second' type=fail Trigger fencing of the second node: > [root@virt-513 ~]# pcs stonith fence virt-514 > Node: virt-514 fenced Excerpt from the pacemaker-fenced log: > Aug 24 19:58:11.801 virt-513 pacemaker-fenced [54053] (handle_request) notice: Client stonith_admin.70108 wants to fence (reboot) virt-514 using any device > Aug 24 19:58:11.802 virt-513 pacemaker-fenced [54053] (initiate_remote_stonith_op) notice: Requesting peer fencing (reboot) targeting virt-514 | id=63dd8345 state=querying base_timeout=120 > Aug 24 19:58:11.807 virt-513 pacemaker-fenced [54053] (can_fence_host_with_device) notice: fence-virt-514 is eligible to fence (reboot) virt-514 (aka. 'virt-514.cluster-qe.lab.eng.brq.redhat.com'): static-list > Aug 24 19:58:11.807 virt-513 pacemaker-fenced [54053] (can_fence_host_with_device) notice: fence-virt-513 is not eligible to fence (reboot) virt-514: static-list > Aug 24 19:58:11.892 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_1[70109] error output [ 2021-08-24 19:58:11,881 ERROR: Failed: Unrecognised action 'list' ] > Aug 24 19:58:11.892 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_1[70109] error output [ ] > Aug 24 19:58:11.892 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_1[70109] error output [ 2021-08-24 19:58:11,883 ERROR: Please use '-h' for usage ] > Aug 24 19:58:11.892 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_1[70109] error output [ ] > Aug 24 19:58:11.892 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70109] stderr: [ 2021-08-24 19:58:11,881 ERROR: Failed: Unrecognised action 'list' ] > Aug 24 19:58:11.893 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70109] stderr: [ ] > Aug 24 19:58:11.893 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70109] stderr: [ 2021-08-24 19:58:11,883 ERROR: Please use '-h' for usage ] > Aug 24 19:58:11.893 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70109] stderr: [ ] > Aug 24 19:58:11.893 virt-513 pacemaker-fenced [54053] (internal_stonith_action_execute) info: Attempt 2 to execute fence_bz1978013 (list). remaining timeout is 120 > Aug 24 19:58:12.969 virt-513 pacemaker-fenced [54053] (process_remote_stonith_query) info: Query result 1 of 2 from virt-514 for virt-514/reboot (1 device) 63dd8345-31f3-48d6-ae70-753b3aadab96 > Aug 24 19:58:12.981 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_2[70112] error output [ 2021-08-24 19:58:12,966 ERROR: Failed: Unrecognised action 'list' ] > Aug 24 19:58:12.981 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_2[70112] error output [ ] > Aug 24 19:58:12.981 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_2[70112] error output [ 2021-08-24 19:58:12,967 ERROR: Please use '-h' for usage ] > Aug 24 19:58:12.981 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_2[70112] error output [ ] > Aug 24 19:58:12.981 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70112] stderr: [ 2021-08-24 19:58:12,966 ERROR: Failed: Unrecognised action 'list' ] > Aug 24 19:58:12.981 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70112] stderr: [ ] > Aug 24 19:58:12.982 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70112] stderr: [ 2021-08-24 19:58:12,967 ERROR: Please use '-h' for usage ] > Aug 24 19:58:12.982 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70112] stderr: [ ] > Aug 24 19:58:12.982 virt-513 pacemaker-fenced [54053] (update_remaining_timeout) info: Attempted to execute agent fence_bz1978013 (list) the maximum number of times (2) allowed > Aug 24 19:58:12.983 virt-513 pacemaker-fenced [54053] (process_remote_stonith_query) info: Query result 2 of 2 from virt-513 for virt-514/reboot (1 device) 63dd8345-31f3-48d6-ae70-753b3aadab96 > Aug 24 19:58:12.983 virt-513 pacemaker-fenced [54053] (process_remote_stonith_query) info: All query replies have arrived, continuing (2 expected/2 received) > Aug 24 19:58:12.983 virt-513 pacemaker-fenced [54053] (call_remote_stonith) info: Total timeout set to 120 for peer's fencing targeting virt-514 for stonith_admin.70108|id=63dd8345 > Aug 24 19:58:12.983 virt-513 pacemaker-fenced [54053] (call_remote_stonith) notice: Requesting that virt-513 perform 'reboot' action targeting virt-514 | for client stonith_admin.70108 (144s, 0s) > Aug 24 19:58:12.984 virt-513 pacemaker-fenced [54053] (can_fence_host_with_device) notice: fence-virt-514 is eligible to fence (reboot) virt-514 (aka. 'virt-514.cluster-qe.lab.eng.brq.redhat.com'): static-list > Aug 24 19:58:12.984 virt-513 pacemaker-fenced [54053] (can_fence_host_with_device) notice: fence-virt-513 is not eligible to fence (reboot) virt-514: static-list > Aug 24 19:58:13.062 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_1[70113] error output [ 2021-08-24 19:58:13,055 ERROR: Failed: Unrecognised action 'list' ] > Aug 24 19:58:13.062 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_1[70113] error output [ ] > Aug 24 19:58:13.062 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_1[70113] error output [ 2021-08-24 19:58:13,055 ERROR: Please use '-h' for usage ] > Aug 24 19:58:13.062 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_1[70113] error output [ ] > Aug 24 19:58:13.062 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70113] stderr: [ 2021-08-24 19:58:13,055 ERROR: Failed: Unrecognised action 'list' ] > Aug 24 19:58:13.062 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70113] stderr: [ ] > Aug 24 19:58:13.062 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70113] stderr: [ 2021-08-24 19:58:13,055 ERROR: Please use '-h' for usage ] > Aug 24 19:58:13.063 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70113] stderr: [ ] > Aug 24 19:58:13.063 virt-513 pacemaker-fenced [54053] (internal_stonith_action_execute) info: Attempt 2 to execute fence_bz1978013 (list). remaining timeout is 119 > Aug 24 19:58:14.144 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_2[70114] error output [ 2021-08-24 19:58:14,134 ERROR: Failed: Unrecognised action 'list' ] > Aug 24 19:58:14.144 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_2[70114] error output [ ] > Aug 24 19:58:14.144 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_2[70114] error output [ 2021-08-24 19:58:14,135 ERROR: Please use '-h' for usage ] > Aug 24 19:58:14.144 virt-513 pacemaker-fenced [54053] (log_op_output) notice: fence_bz1978013_list_2[70114] error output [ ] > Aug 24 19:58:14.145 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70114] stderr: [ 2021-08-24 19:58:14,134 ERROR: Failed: Unrecognised action 'list' ] > Aug 24 19:58:14.145 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70114] stderr: [ ] > Aug 24 19:58:14.145 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70114] stderr: [ 2021-08-24 19:58:14,135 ERROR: Please use '-h' for usage ] > Aug 24 19:58:14.145 virt-513 pacemaker-fenced [54053] (log_action) warning: fence_bz1978013[70114] stderr: [ ] > Aug 24 19:58:14.145 virt-513 pacemaker-fenced [54053] (update_remaining_timeout) info: Attempted to execute agent fence_bz1978013 (list) the maximum number of times (2) allowed > Aug 24 19:58:14.145 virt-513 pacemaker-fenced [54053] (stonith_fence_get_devices_cb) info: Found 1 matching device for target 'virt-514' > Aug 24 19:58:16.581 virt-513 pacemaker-fenced [54053] (log_operation) notice: Operation 'reboot' [70115] (call 2 from stonith_admin.70108) targeting virt-514 using fence-virt-514 returned 0 (OK) Result: The dummy fence device is ignored due to failing list action, the fallback real fence device is selected and used without an unnecessary 2 minute delay. |