RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1978010 - Pacemaker can select wrong fence device when pcmk_host_map and dynamic-list are combined
Summary: Pacemaker can select wrong fence device when pcmk_host_map and dynamic-list a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pacemaker
Version: 8.4
Hardware: All
OS: All
high
medium
Target Milestone: rc
: 8.5
Assignee: Ken Gaillot
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1978013
TreeView+ depends on / blocked
 
Reported: 2021-06-30 22:28 UTC by Ken Gaillot
Modified: 2021-11-10 01:04 UTC (History)
5 users (show)

Fixed In Version: pacemaker-2.1.0-4.el8
Doc Type: Bug Fix
Doc Text:
Cause: If a fence device configured with pcmk_host_check="dynamic-list" failed its list action, and also had a pcmk_host_map configured, Pacemaker would wrongly assume the device could fence all the nodes listed in the host map. Consequence: Pacemaker might wrongly select the device to fence one of the nodes in the host map that it couldn't actually fence. Fix: Pacemaker now does not assume a fence device that fails its list action can fence any hosts. Result: The proper device will be chosen for a node that requires fencing.
Clone Of:
: 1978013 (view as bug list)
Environment:
Last Closed: 2021-11-09 18:44:54 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Cluster Labs 5474 0 None None None 2021-06-30 22:28:42 UTC
Red Hat Knowledge Base (Solution) 6337161 0 None None None 2021-10-18 14:35:57 UTC
Red Hat Product Errata RHEA-2021:4267 0 None None None 2021-11-09 18:45:16 UTC

Description Ken Gaillot 2021-06-30 22:28:42 UTC
Description of problem: If a fencing device is configured with pcmk_host_check set to "dynamic-list", and a pcmk_host_map option, then Pacemaker may wrongly select the device to fence a target in the host map if the device's list action fails.


Version-Release number of selected component (if applicable): all


How reproducible: See below


Steps to Reproduce:
1. Modify a fence agent so that its list, off, and reboot actions always fail (its status action should succeed).
2. Configure a cluster of at least 2 nodes.
3. Configure a standard fence device able to target one of the nodes (no topology). This simulates the scenario where this is the only device capable of fencing the node, so Pacemaker should select this device if the node needs fencing.
4. Remove any monitor operation for the standard fence device, and configure a location constraint preferring the target to run the device. This is a trick to make the device less preferred when more than one device is eligible (because there is no successful monitor, and it is available only from the target).
5. Configure a fence device using the modified fencing agent, pcmk_host_check="dynamic-list", and a pcmk_host_map with entries for all nodes (the alias names won't matter since the agent's list action will always fail). This simulates the scenario where pcmk_host_map includes at least one node the device can't fence (which is realistic since the intent of dynamic-list is that the fence may sometimes be able to fence a node and sometimes not). The idea is that if the list action did succeed, it would output only the alias of the node that doesn't use the standard fence device.
6. Cause fencing to be required for the node with the standard fence device.

Actual results: When the modified agent's list action fails, Pacemaker wrongly assumes the device can fence every node in pcmk_host_map, and selects it for fencing, which fails.

Expected results: Pacemaker always chooses the standard fencing device for the node that can only be fenced by that device.

Comment 1 Ken Gaillot 2021-06-30 22:34:56 UTC
This was fixed in the upstream master branch by commit a29f88f

Comment 5 Patrik Hagara 2021-08-24 17:51:44 UTC
* 2-node cluster
* dummy fence agent installed on both nodes as /usr/sbin/fence_bz1978010: https://github.com/ClusterLabs/fence-agents/blob/master/agents/dummy/fence_dummy.py
* per-node real fence device configured


before fix
==========

> [root@virt-242 ~]# rpm -q pacemaker
> pacemaker-2.0.5-9.el8.x86_64


> [root@virt-242 ~]# pcs status
> Cluster name: STSRHTS6491
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-243 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
>   * Last updated: Tue Aug 24 19:05:28 2021
>   * Last change:  Tue Aug 24 18:51:24 2021 by root via cibadmin on virt-242
>   * 2 nodes configured
>   * 2 resource instances configured
> 
> Node List:
>   * Online: [ virt-242 virt-243 ]
> 
> Full List of Resources:
>   * fence-virt-242	(stonith:fence_xvm):	 Started virt-242
>   * fence-virt-243	(stonith:fence_xvm):	 Started virt-243
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled


Remove the monitor operation from second node's real fence device:

> [root@virt-242 ~]# pcs cluster cib scope=resources cib.xml
> [root@virt-242 ~]# cp cib.xml cib-updated.xml
> [root@virt-242 ~]# vim cib-updated.xml 
> [root@virt-242 ~]# diff cib.xml cib-updated.xml 
> 19,21d18
> <     <operations>
> <       <op id="fence-virt-243-monitor-interval-60s" interval="60s" name="monitor"/>
> <     </operations>
> [root@virt-242 ~]# pcs cluster cib-push scope=resources cib-updated.xml 
> CIB updated


Make the second node's real fence device prefer the second node:

> [root@virt-242 ~]# pcs constraint location fence-virt-243 prefers virt-243
> [root@virt-242 ~]# pcs constraint list --full
> Location Constraints:
>   Resource: fence-virt-243
>     Enabled on:
>       Node: virt-243 (score:INFINITY) (id:location-fence-virt-243-virt-243-INFINITY)
> Ordering Constraints:
> Colocation Constraints:
> Ticket Constraints:


Create a dynamic fence device that always fails using the dummy fence agent:

> [root@virt-242 ~]# pcs stonith create bz1978010 fence_bz1978010 pcmk_host_check="dynamic-list" pcmk_host_map='virt-242:frist;virt-243:second' type=fail


Fence the second node:

> [root@virt-242 ~]# pcs stonith fence virt-243
[2 minute delay]
> Node: virt-243 fenced


Examine the logs:

> Aug 24 19:30:31 virt-242 pacemaker-fenced    [50076] (handle_request) 	notice: Client stonith_admin.56426.aeec0082 wants to fence (reboot) 'virt-243' with device '(any)'
> Aug 24 19:30:31 virt-242 pacemaker-fenced    [50076] (initiate_remote_stonith_op) 	notice: Requesting peer fencing (reboot) targeting virt-243 | id=87c28b4c-baf1-421b-99a3-cb66fd6bbf58 state=0
> Aug 24 19:30:31 virt-242 pacemaker-fenced    [50076] (can_fence_host_with_device) 	notice: fence-virt-243 is eligible to fence (reboot) virt-243 (aka. 'virt-243.cluster-qe.lab.eng.brq.redhat.com'): static-list
> Aug 24 19:30:31 virt-242 pacemaker-fenced    [50076] (can_fence_host_with_device) 	notice: fence-virt-242 is not eligible to fence (reboot) virt-243: static-list
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56427] error output [ 2021-08-24 19:30:32,032 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56427] error output [  ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56427] error output [ 2021-08-24 19:30:32,033 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56427] error output [  ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56427] stderr: [ 2021-08-24 19:30:32,032 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56427] stderr: [  ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56427] stderr: [ 2021-08-24 19:30:32,033 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56427] stderr: [  ]
> Aug 24 19:30:32 virt-242 pacemaker-fenced    [50076] (internal_stonith_action_execute) 	info: Attempt 2 to execute fence_bz1978010 (list). remaining timeout is 119
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56429] error output [ 2021-08-24 19:30:33,115 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56429] error output [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56429] error output [ 2021-08-24 19:30:33,115 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56429] error output [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56429] stderr: [ 2021-08-24 19:30:33,115 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56429] stderr: [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56429] stderr: [ 2021-08-24 19:30:33,115 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56429] stderr: [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (update_remaining_timeout) 	info: Attempted to execute agent fence_bz1978010 (list) the maximum number of times (2) allowed
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (process_remote_stonith_query) 	info: Query result 1 of 2 from virt-242 for virt-243/reboot (2 devices) 87c28b4c-baf1-421b-99a3-cb66fd6bbf58
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (call_remote_stonith) 	info: Total timeout set to 240 for peer's fencing targeting virt-243 for stonith_admin.56426|id=87c28b4c-baf1-421b-99a3-cb66fd6bbf58
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (call_remote_stonith) 	notice: Requesting that virt-242 perform 'reboot' action targeting virt-243 | for client stonith_admin.56426 (288s, 0s)
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (can_fence_host_with_device) 	notice: fence-virt-243 is eligible to fence (reboot) virt-243 (aka. 'virt-243.cluster-qe.lab.eng.brq.redhat.com'): static-list
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (can_fence_host_with_device) 	notice: fence-virt-242 is not eligible to fence (reboot) virt-243: static-list
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (process_remote_stonith_query) 	info: Query result 2 of 2 from virt-243 for virt-243/reboot (2 devices) 87c28b4c-baf1-421b-99a3-cb66fd6bbf58
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56431] error output [ 2021-08-24 19:30:33,207 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56431] error output [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56431] error output [ 2021-08-24 19:30:33,207 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_1[56431] error output [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56431] stderr: [ 2021-08-24 19:30:33,207 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56431] stderr: [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56431] stderr: [ 2021-08-24 19:30:33,207 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56431] stderr: [  ]
> Aug 24 19:30:33 virt-242 pacemaker-fenced    [50076] (internal_stonith_action_execute) 	info: Attempt 2 to execute fence_bz1978010 (list). remaining timeout is 120
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56433] error output [ 2021-08-24 19:30:34,288 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56433] error output [  ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56433] error output [ 2021-08-24 19:30:34,289 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_list_2[56433] error output [  ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56433] stderr: [ 2021-08-24 19:30:34,288 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56433] stderr: [  ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56433] stderr: [ 2021-08-24 19:30:34,289 ERROR: Please use '-h' for usage ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56433] stderr: [  ]
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (update_remaining_timeout) 	info: Attempted to execute agent fence_bz1978010 (list) the maximum number of times (2) allowed
> Aug 24 19:30:34 virt-242 pacemaker-fenced    [50076] (stonith_fence_get_devices_cb) 	info: Found 2 matching devices for 'virt-243'
[2 minute delay]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (child_timeout_callback) 	warning: fence_bz1978010_reboot_1 process (PID 56435) timed out
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (operation_finished) 	warning: fence_bz1978010_reboot_1[56435] timed out after 120000ms
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_reboot_1[56435] error output [ WARNING:root:Parse error: Ignoring unknown option 'port=second' ]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_op_output) 	notice: fence_bz1978010_reboot_1[56435] error output [  ]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56435] stderr: [ WARNING:root:Parse error: Ignoring unknown option 'port=second' ]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_action) 	warning: fence_bz1978010[56435] stderr: [  ]
> Aug 24 19:32:34 virt-242 pacemaker-fenced    [50076] (log_operation) 	error: Operation 'reboot' [56435] (call 2 from stonith_admin.56426) for host 'virt-243' with device 'bz1978010' returned: -62 (Timer expired), retrying with fence-virt-243
> Aug 24 19:32:37 virt-242 pacemaker-fenced    [50076] (log_operation) 	notice: Operation 'reboot' [56495] (call 2 from stonith_admin.56426) for host 'virt-243' with device 'fence-virt-243' returned: 0 (OK)


Result: The cluster incorrectly tries to fence using the bz1978010 device even though it's list operation is failing. That does not work and the fencing operation times out after 2 min. Cluster then falls back to the less-preferred real fence device, which succeeds.



after fix
=========

> [root@virt-128 ~]# rpm -q pacemaker
> pacemaker-2.1.0-6.el8.x86_64


Same setup as before.

Stonith config dump:

> [root@virt-128 ~]# pcs stonith config
>  Resource: fence-virt-128 (class=stonith type=fence_xvm)
>   Attributes: delay=5 pcmk_host_check=static-list pcmk_host_list=virt-128 pcmk_host_map=virt-128:virt-128.cluster-qe.lab.eng.brq.redhat.com
>   Operations: monitor interval=60s (fence-virt-128-monitor-interval-60s)
>  Resource: fence-virt-129 (class=stonith type=fence_xvm)
>   Attributes: pcmk_host_check=static-list pcmk_host_list=virt-129 pcmk_host_map=virt-129:virt-129.cluster-qe.lab.eng.brq.redhat.com
>  Resource: bz1978010 (class=stonith type=fence_bz1978010)
>   Attributes: pcmk_host_check=dynamic-list pcmk_host_map=virt-128:frist;virt-129:second type=fail
>   Operations: monitor interval=60s (bz1978010-monitor-interval-60s)


Constraints dump:

> [root@virt-128 ~]# pcs constraint config --full
> Location Constraints:
>   Resource: fence-virt-129
>     Enabled on:
>       Node: virt-129 (score:INFINITY) (id:location-fence-virt-129-virt-129-INFINITY)


Trigger fencing of the second node:

> [root@virt-128 ~]# pcs stonith fence virt-129
> Node: virt-129 fenced


Excerpt from the pacemaker-fenced log:

> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (handle_request) 	notice: Client stonith_admin.69536 wants to fence (reboot) virt-129 using any device
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (initiate_remote_stonith_op) 	notice: Requesting peer fencing (reboot) targeting virt-129 | id=0cbcc4bb state=querying base_timeout=120
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (can_fence_host_with_device) 	notice: fence-virt-128 is not eligible to fence (reboot) virt-129: static-list
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (can_fence_host_with_device) 	notice: fence-virt-129 is eligible to fence (reboot) virt-129 (aka. 'virt-129.cluster-qe.lab.eng.brq.redhat.com'): static-list
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (process_remote_stonith_query) 	info: Query result 1 of 2 from virt-129 for virt-129/reboot (1 device) 0cbcc4bb-dee8-44cf-8b46-1f56b490cd48
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69537] error output [ 2021-08-24 19:43:31,331 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69537] error output [  ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69537] error output [ 2021-08-24 19:43:31,331 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69537] error output [  ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69537] stderr: [ 2021-08-24 19:43:31,331 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69537] stderr: [  ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69537] stderr: [ 2021-08-24 19:43:31,331 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69537] stderr: [  ]
> Aug 24 19:43:31 virt-128 pacemaker-fenced    [64059] (internal_stonith_action_execute) 	info: Attempt 2 to execute fence_bz1978010 (list). remaining timeout is 120
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69539] error output [ 2021-08-24 19:43:32,388 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69539] error output [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69539] error output [ 2021-08-24 19:43:32,388 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69539] error output [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69539] stderr: [ 2021-08-24 19:43:32,388 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69539] stderr: [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69539] stderr: [ 2021-08-24 19:43:32,388 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69539] stderr: [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (update_remaining_timeout) 	info: Attempted to execute agent fence_bz1978010 (list) the maximum number of times (2) allowed
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (process_remote_stonith_query) 	info: Query result 2 of 2 from virt-128 for virt-129/reboot (1 device) 0cbcc4bb-dee8-44cf-8b46-1f56b490cd48
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (process_remote_stonith_query) 	info: All query replies have arrived, continuing (2 expected/2 received) 
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (call_remote_stonith) 	info: Total timeout set to 120 for peer's fencing targeting virt-129 for stonith_admin.69536|id=0cbcc4bb
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (call_remote_stonith) 	notice: Requesting that virt-128 perform 'reboot' action targeting virt-129 | for client stonith_admin.69536 (144s, 0s)
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (can_fence_host_with_device) 	notice: fence-virt-128 is not eligible to fence (reboot) virt-129: static-list
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (can_fence_host_with_device) 	notice: fence-virt-129 is eligible to fence (reboot) virt-129 (aka. 'virt-129.cluster-qe.lab.eng.brq.redhat.com'): static-list
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69542] error output [ 2021-08-24 19:43:32,442 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69542] error output [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69542] error output [ 2021-08-24 19:43:32,443 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_1[69542] error output [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69542] stderr: [ 2021-08-24 19:43:32,442 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69542] stderr: [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69542] stderr: [ 2021-08-24 19:43:32,443 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69542] stderr: [  ]
> Aug 24 19:43:32 virt-128 pacemaker-fenced    [64059] (internal_stonith_action_execute) 	info: Attempt 2 to execute fence_bz1978010 (list). remaining timeout is 120
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69544] error output [ 2021-08-24 19:43:33,493 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69544] error output [  ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69544] error output [ 2021-08-24 19:43:33,493 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_op_output) 	notice: fence_bz1978010_list_2[69544] error output [  ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69544] stderr: [ 2021-08-24 19:43:33,493 ERROR: Failed: Unrecognised action 'list' ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69544] stderr: [  ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69544] stderr: [ 2021-08-24 19:43:33,493 ERROR: Please use '-h' for usage ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (log_action) 	warning: fence_bz1978010[69544] stderr: [  ]
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (update_remaining_timeout) 	info: Attempted to execute agent fence_bz1978010 (list) the maximum number of times (2) allowed
> Aug 24 19:43:33 virt-128 pacemaker-fenced    [64059] (stonith_fence_get_devices_cb) 	info: Found 1 matching device for target 'virt-129'
> Aug 24 19:43:36 virt-128 pacemaker-fenced    [64059] (log_operation) 	notice: Operation 'reboot' [69546] (call 2 from stonith_admin.69536) targeting virt-129 using fence-virt-129 returned 0 (OK)


Result: The dummy fence device is ignored due to failing list action, the fallback real fence device is selected and used without an unnecessary 2 minute delay.

Comment 7 errata-xmlrpc 2021-11-09 18:44:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:4267


Note You need to log in before you can comment on or make changes to this bug.