Bug 1978010

Summary: Pacemaker can select wrong fence device when pcmk_host_map and dynamic-list are combined
Product: Red Hat Enterprise Linux 8 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: high    
Version: 8.4CC: cluster-maint, mezhang, msmazova, phagara, sbradley
Target Milestone: rcKeywords: Triaged
Target Release: 8.5Flags: pm-rhel: mirror+
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: pacemaker-2.1.0-4.el8 Doc Type: Bug Fix
Doc Text:
Cause: If a fence device configured with pcmk_host_check="dynamic-list" failed its list action, and also had a pcmk_host_map configured, Pacemaker would wrongly assume the device could fence all the nodes listed in the host map. Consequence: Pacemaker might wrongly select the device to fence one of the nodes in the host map that it couldn't actually fence. Fix: Pacemaker now does not assume a fence device that fails its list action can fence any hosts. Result: The proper device will be chosen for a node that requires fencing.
Story Points: ---
Clone Of:
: 1978013 (view as bug list) Environment:
Last Closed: 2021-11-09 18:44:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1978013    

Description Ken Gaillot 2021-06-30 22:28:42 UTC
Description of problem: If a fencing device is configured with pcmk_host_check set to "dynamic-list", and a pcmk_host_map option, then Pacemaker may wrongly select the device to fence a target in the host map if the device's list action fails.


Version-Release number of selected component (if applicable): all


How reproducible: See below


Steps to Reproduce:
1. Modify a fence agent so that its list, off, and reboot actions always fail (its status action should succeed).
2. Configure a cluster of at least 2 nodes.
3. Configure a standard fence device able to target one of the nodes (no topology). This simulates the scenario where this is the only device capable of fencing the node, so Pacemaker should select this device if the node needs fencing.
4. Remove any monitor operation for the standard fence device, and configure a location constraint preferring the target to run the device. This is a trick to make the device less preferred when more than one device is eligible (because there is no successful monitor, and it is available only from the target).
5. Configure a fence device using the modified fencing agent, pcmk_host_check="dynamic-list", and a pcmk_host_map with entries for all nodes (the alias names won't matter since the agent's list action will always fail). This simulates the scenario where pcmk_host_map includes at least one node the device can't fence (which is realistic since the intent of dynamic-list is that the fence may sometimes be able to fence a node and sometimes not). The idea is that if the list action did succeed, it would output only the alias of the node that doesn't use the standard fence device.
6. Cause fencing to be required for the node with the standard fence device.

Actual results: When the modified agent's list action fails, Pacemaker wrongly assumes the device can fence every node in pcmk_host_map, and selects it for fencing, which fails.

Expected results: Pacemaker always chooses the standard fencing device for the node that can only be fenced by that device.

Comment 1 Ken Gaillot 2021-06-30 22:34:56 UTC
This was fixed in the upstream master branch by commit a29f88f

Comment 5 Patrik Hagara 2021-08-24 17:51:44 UTC
* 2-node cluster
* dummy fence agent installed on both nodes as /usr/sbin/fence_bz1978010: https://github.com/ClusterLabs/fence-agents/blob/master/agents/dummy/fence_dummy.py
* per-node real fence device configured


before fix
==========

> [root@virt-242 ~]# rpm -q pacemaker
> pacemaker-2.0.5-9.el8.x86_64


> [root@virt-242 ~]# pcs status
> Cluster name: STSRHTS6491
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-243 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
>   * Last updated: Tue Aug 24 19:05:28 2021
>   * Last change:  Tue Aug 24 18:51:24 2021 by root via cibadmin on virt-242
>   * 2 nodes configured
>   * 2 resource instances configured
> 
> Node List:
>   * Online: [ virt-242 virt-243 ]
> 
> Full List of Resources:
>   * fence-virt-242	(stonith:fence_xvm):	 Started virt-242
>   * fence-virt-243	(stonith:fence_xvm):	 Started virt-243
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled


Remove the monitor operation from second node's real fence device:

> [root@virt-242 ~]# pcs cluster cib scope=resources cib.xml
> [root@virt-242 ~]# cp cib.xml cib-updated.xml
> [root@virt-242 ~]# vim cib-updated.xml 
> [root@virt-242 ~]# diff cib.xml cib-updated.xml 
> 19,21d18
> <     <operations>
> <       <op id="fence-virt-243-monitor-interval-60s" interval="60s" name="monitor"/>
> <     </operations>
> [root@virt-242 ~]# pcs cluster cib-push scope=resources cib-updated.xml 
> CIB updated


Make the second node's real fence device prefer the second node:

> [root@virt-242 ~]# pcs constraint location fence-virt-243 prefers virt-243
> [root@virt-242 ~]# pcs constraint list --full
> Location Constraints:
>   Resource: fence-virt-243
>     Enabled on:
>       Node: virt-243 (score:INFINITY) (id:location-fence-virt-243-virt-243-INFINITY)
> Ordering Constraints:
> Colocation Constraints:
> Ticket Constraints:


Create a dynamic fence device that always fails using the dummy fence agent:

> [root@virt-242 ~]# pcs stonith create bz1978010 fence_bz1978010 pcmk_host_check="dynamic-list" pcmk_host_map='virt-242:frist;virt-243:second' type=fail


Fence the second node:

> [root@virt-242 ~]# pcs stonith fence virt-243
[2 minute delay]
> Node: virt-243 fenced


Examine the logs:

> Aug 24 19:30:31 virt-242 pacemaker-fenced    [50076] (handle_request) 	notice: Client stonith_admin.56426.aeec0082 wants to fence (reboot) 'virt-243' with device '(any)'
> Aug 24 19:30:31 virt-242 pacemaker-fenced    [50076] (initiate_remote_stonith_op) 	notice: Requesting peer fencing (reboot) targeting virt-243 | id=87c28b4c-baf1-421b-99a3-cb66fd6bbf58 state=0
> Aug 24 19:30:31 virt-242 pacemaker-fenced    [50076] (can_fence_host_with_device) 	notice: fence-virt-243 is eligible to fence (reboot) virt-243 (aka. 'virt-243.cluster-qe.lab.eng.brq.redhat.com'): static-list
> Aug 24 19:30:31 virt-242 pacemaker-fenced    [50076] (can_fence_host_with_device) 	notice: fence-virt-242 is not eligible to fence (reboot) virt-243: static-list
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56427] error output [ 2021-08-24 19:30:32,032 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56427] error output [  ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56427] error output [ 2021-08-24 19:30:32,033 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56427] error output [  ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56427] stderr: [ 2021-08-24 19:30:32,032 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56427] stderr: [  ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56427] stderr: [ 2021-08-24 19:30:32,033 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56427] stderr: [  ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (internal_stonith_action_execute) 	info: Attempt 2 to execute fence_bz1978010 (list). remaining timeout is 119
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56429] error output [ 2021-08-24 19:30:33,115 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56429] error output [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56429] error output [ 2021-08-24 19:30:33,115 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56429] error output [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56429] stderr: [ 2021-08-24 19:30:33,115 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56429] stderr: [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56429] stderr: [ 2021-08-24 19:30:33,115 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56429] stderr: [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (update_remaining_timeout) 	info: Attempted to execute agent fence_bz1978010 (list) the maximum number of times (2) allowed
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (process_remote_stonith_query) 	info: Query result 1 of 2 from virt-242 for virt-243/reboot (2 devices) 87c28b4c-baf1-421b-99a3-cb66fd6bbf58
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (call_remote_stonith) 	info: Total timeout set to 240 for peer's fencing targeting virt-243 for stonith_admin.56426|id=87c28b4c-baf1-421b-99a3-cb66fd6bbf58
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (call_remote_stonith) 	notice: Requesting that virt-242 perform 'reboot' action targeting virt-243 | for client stonith_admin.56426 (288s, 0s)
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (can_fence_host_with_device) 	notice: fence-virt-243 is eligible to fence (reboot) virt-243 (aka. 'virt-243.cluster-qe.lab.eng.brq.redhat.com'): static-list
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (can_fence_host_with_device) 	notice: fence-virt-242 is not eligible to fence (reboot) virt-243: static-list
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (process_remote_stonith_query) 	info: Query result 2 of 2 from virt-243 for virt-243/reboot (2 devices) 87c28b4c-baf1-421b-99a3-cb66fd6bbf58
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56431] error output [ 2021-08-24 19:30:33,207 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56431] error output [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56431] error output [ 2021-08-24 19:30:33,207 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56431] error output [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56431] stderr: [ 2021-08-24 19:30:33,207 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56431] stderr: [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56431] stderr: [ 2021-08-24 19:30:33,207 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56431] stderr: [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (internal_stonith_action_execute) 	info: Attempt 2 to execute fence_bz1978010 (list). remaining timeout is 120
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56433] error output [ 2021-08-24 19:30:34,288 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56433] error output [  ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56433] error output [ 2021-08-24 19:30:34,289 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56433] error output [  ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56433] stderr: [ 2021-08-24 19:30:34,288 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56433] stderr: [  ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56433] stderr: [ 2021-08-24 19:30:34,289 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56433] stderr: [  ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (update_remaining_timeout) 	info: Attempted to execute agent fence_bz1978010 (list) the maximum number of times (2) allowed
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (stonith_fence_get_devices_cb) 	info: Found 2 matching devices for 'virt-243'
[2 minute delay]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (child_timeout_callback) 	warning: fence_bz1978010_reboot_1 process (PID 56435) timed out
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (operation_finished) 	warning: fence_bz1978010_reboot_1[56435] timed out after 120000ms
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_reboot_1[56435] error output [ WARNING:root:Parse error: Ignoring unknown option 'port=second' ]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_reboot_1[56435] error output [  ]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56435] stderr: [ WARNING:root:Parse error: Ignoring unknown option 'port=second' ]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56435] stderr: [  ]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_operation) 	error: Operation 'reboot' [56435] (call 2 from stonith_admin.56426) for host 'virt-243' with device 'bz1978010' returned: -62 (Timer expired), retrying with fence-virt-243
> Aug 24 19:32:37 virt-242 pacemaker-fenced    [50076] (log_operation) 	notice: Operation 'reboot' [56495] (call 2 from stonith_admin.56426) for host 'virt-243' with device 'fence-virt-243' returned: 0 (OK)


Result: The cluster incorrectly tries to fence using the bz1978010 device even though it's list operation is failing. That does not work and the fencing operation times out after 2 min. Cluster then falls back to the less-preferred real fence device, which succeeds.



after fix
=========

> [root@virt-128 ~]# rpm -q pacemaker
> pacemaker-2.1.0-6.el8.x86_64


Same setup as before.

Stonith config dump:

> [root@virt-128 ~]# pcs stonith config
>  Resource: fence-virt-128 (class=stonith type=fence_xvm)
>   Attributes: delay=5 pcmk_host_check=static-list pcmk_host_list=virt-128 pcmk_host_map=virt-128:virt-128.cluster-qe.lab.eng.brq.redhat.com
>   Operations: monitor interval=60s (fence-virt-128-monitor-interval-60s)
>  Resource: fence-virt-129 (class=stonith type=fence_xvm)
>   Attributes: pcmk_host_check=static-list pcmk_host_list=virt-129 pcmk_host_map=virt-129:virt-129.cluster-qe.lab.eng.brq.redhat.com
>  Resource: bz1978010 (class=stonith type=fence_bz1978010)
>   Attributes: pcmk_host_check=dynamic-list pcmk_host_map=virt-128:frist;virt-129:second type=fail
>   Operations: monitor interval=60s (bz1978010-monitor-interval-60s)


Constraints dump:

> [root@virt-128 ~]# pcs constraint config --full
> Location Constraints:
>   Resource: fence-virt-129
>     Enabled on:
>       Node: virt-129 (score:INFINITY) (id:location-fence-virt-129-virt-129-INFINITY)


Trigger fencing of the second node:

> [root@virt-128 ~]# pcs stonith fence virt-129
> Node: virt-129 fenced


Excerpt from the pacemaker-fenced log:

> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (handle_request) 	notice: Client stonith_admin.69536 wants to fence (reboot) virt-129 using any device
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (initiate_remote_stonith_op) 	notice: Requesting peer fencing (reboot) targeting virt-129 | id=0cbcc4bb state=querying base_timeout=120
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (can_fence_host_with_device) 	notice: fence-virt-128 is not eligible to fence (reboot) virt-129: static-list
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (can_fence_host_with_device) 	notice: fence-virt-129 is eligible to fence (reboot) virt-129 (aka. 'virt-129.cluster-qe.lab.eng.brq.redhat.com'): static-list
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (process_remote_stonith_query) 	info: Query result 1 of 2 from virt-129 for virt-129/reboot (1 device) 0cbcc4bb-dee8-44cf-8b46-1f56b490cd48
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69537] error output [ 2021-08-24 19:43:31,331 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69537] error output [  ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69537] error output [ 2021-08-24 19:43:31,331 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69537] error output [  ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69537] stderr: [ 2021-08-24 19:43:31,331 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69537] stderr: [  ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69537] stderr: [ 2021-08-24 19:43:31,331 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69537] stderr: [  ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (internal_stonith_action_execute) 	info: Attempt 2 to execute fence_bz1978010 (list). remaining timeout is 120
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69539] error output [ 2021-08-24 19:43:32,388 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69539] error output [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69539] error output [ 2021-08-24 19:43:32,388 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69539] error output [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69539] stderr: [ 2021-08-24 19:43:32,388 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69539] stderr: [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69539] stderr: [ 2021-08-24 19:43:32,388 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69539] stderr: [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (update_remaining_timeout) 	info: Attempted to execute agent fence_bz1978010 (list) the maximum number of times (2) allowed
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (process_remote_stonith_query) 	info: Query result 2 of 2 from virt-128 for virt-129/reboot (1 device) 0cbcc4bb-dee8-44cf-8b46-1f56b490cd48
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (process_remote_stonith_query) 	info: All query replies have arrived, continuing (2 expected/2 received) 
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (call_remote_stonith) 	info: Total timeout set to 120 for peer's fencing targeting virt-129 for stonith_admin.69536|id=0cbcc4bb
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (call_remote_stonith) 	notice: Requesting that virt-128 perform 'reboot' action targeting virt-129 | for client stonith_admin.69536 (144s, 0s)
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (can_fence_host_with_device) 	notice: fence-virt-128 is not eligible to fence (reboot) virt-129: static-list
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (can_fence_host_with_device) 	notice: fence-virt-129 is eligible to fence (reboot) virt-129 (aka. 'virt-129.cluster-qe.lab.eng.brq.redhat.com'): static-list
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69542] error output [ 2021-08-24 19:43:32,442 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69542] error output [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69542] error output [ 2021-08-24 19:43:32,443 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69542] error output [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69542] stderr: [ 2021-08-24 19:43:32,442 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69542] stderr: [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69542] stderr: [ 2021-08-24 19:43:32,443 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69542] stderr: [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (internal_stonith_action_execute) 	info: Attempt 2 to execute fence_bz1978010 (list). remaining timeout is 120
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69544] error output [ 2021-08-24 19:43:33,493 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69544] error output [  ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69544] error output [ 2021-08-24 19:43:33,493 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69544] error output [  ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69544] stderr: [ 2021-08-24 19:43:33,493 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69544] stderr: [  ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69544] stderr: [ 2021-08-24 19:43:33,493 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69544] stderr: [  ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (update_remaining_timeout) 	info: Attempted to execute agent fence_bz1978010 (list) the maximum number of times (2) allowed
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (stonith_fence_get_devices_cb) 	info: Found 1 matching device for target 'virt-129'
> Aug 24 19:43:36 virt-128 pacemaker-fenced    [64059] (log_operation) 	notice: Operation 'reboot' [69546] (call 2 from stonith_admin.69536) targeting virt-129 using fence-virt-129 returned 0 (OK)


Result: The dummy fence device is ignored due to failing list action, the fallback real fence device is selected and used without an unnecessary 2 minute delay.

Comment 7 errata-xmlrpc 2021-11-09 18:44:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:4267