Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
As an FYI this had turned out to be due to the stonith device in pacemaker lacking the following:
ssl=1
ipport=443
pcmk_host_list="clusternode1 clusternode2 clusternode3"
pcmk_host_map="clusternode1:node1_vm_name clsuternode2:node2_vm_name clusternode3:node3_vm_name"
action=reboot
Needed to explicitly set the port and enable SSL and also define the mapping from cluster node to VM name as configured in RHEV-M.
Note - have raised a query on the pacemaker list in parallel to this Description of problem: Attempting to use fence_rhevm not fencing with pacemaker. fence_rhevm successful from the command line but fails when invoked within pacemaker. Version-Release number of selected component (if applicable): fence_rhevm - fence-agents-3.1.5-25.el6_4.2.x86_64 pacemaker - pacemaker-1.1.8-7.el6.x86_64 How reproducible: Create a basic 2 node cluster with cman/pacemaker/corosync, add fence_rhevm stonith resource, kill corosync on node to be fenced Steps to Reproduce: 1. create a basic cluster using pacemaker redirect in CMAN, for example cluster.conf as follows: <?xml version="1.0"?> <cluster config_version="1" name="testcluster"> <logging debug="off"/> <clusternodes> <clusternode name="defiant" nodeid="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="defiant"/> </method> </fence> </clusternode> <clusternode name="enterprise" nodeid="2"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="enterprise"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="pcmk" agent="fence_pcmk"/> </fencedevices> <cman two_node="1" expected_votes="1"> </cman> </cluster> 2. Add a fence_rhevm stonith resource pcs stonith create st-rhevm fence_rhevm login="{user}@{domain}" passwd="{password}" ssl=1 ipaddr="{hypervisor ip}" verbose=1 debug="/var/log/fence_debug.log" 3. Confirm that the stonith resource has started pcs status 4. Kill corosync on the node to fence ps -ef | grep coro kill -9 <pid> Actual results: Can see from /var/log/messages that there is a request to fence the node where you've killed corosync, but it fails. The debug log output from fence_rhevm doesn't a reboot request being logged only the vms command to list the available vms. Expected results: Expect that fencing occurs via pacemaker, I am able to fence successfully from the command line with fence_rhevm and stonith_admin: stonith_admin --reboot enterprise --tolerance 5s fence_rhevm -o reboot -a <hypervisor ip> -l <user>@<domain> -p <password> -n enterprise -z Additional info: /var/log/messages ----------------------------- May 21 22:21:32 defiant corosync[1245]: [TOTEM ] A processor failed, forming new configuration. May 21 22:21:34 defiant corosync[1245]: [QUORUM] Members[1]: 1 May 21 22:21:34 defiant corosync[1245]: [TOTEM ] A processor joined or left the membership and a new membership was formed. May 21 22:21:34 defiant kernel: dlm: closing connection to node 2 May 21 22:21:34 defiant corosync[1245]: [CPG ] chosen downlist: sender r(0) ip(10.10.25.152) ; members(old:2 left:1) May 21 22:21:34 defiant corosync[1245]: [MAIN ] Completed service synchronization, ready to provide service. May 21 22:21:34 defiant crmd[1749]: notice: crm_update_peer_state: cman_event_callback: Node enterprise[2] - state is now lost May 21 22:21:34 defiant crmd[1749]: warning: match_down_event: No match for shutdown action on enterprise May 21 22:21:34 defiant crmd[1749]: notice: peer_update_callback: Stonith/shutdown of enterprise not matched May 21 22:21:34 defiant crmd[1749]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ] May 21 22:21:34 defiant fenced[1302]: fencing node enterprise May 21 22:21:34 defiant logger: fence_pcmk[2219]: Requesting Pacemaker fence enterprise (reset) May 21 22:21:34 defiant stonith_admin[2220]: notice: crm_log_args: Invoked: stonith_admin --reboot enterprise --tolerance 5s May 21 22:21:35 defiant attrd[1747]: notice: attrd_local_callback: Sending full refresh (origin=crmd) May 21 22:21:35 defiant attrd[1747]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) May 21 22:21:36 defiant pengine[1748]: notice: unpack_config: On loss of CCM Quorum: Ignore May 21 22:21:36 defiant pengine[1748]: notice: process_pe_message: Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2 May 21 22:21:36 defiant crmd[1749]: notice: run_graph: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete May 21 22:21:36 defiant crmd[1749]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] May 21 22:21:44 defiant logger: fence_pcmk[2219]: Call to fence enterprise (reset) failed with rc=255 May 21 22:21:45 defiant fenced[1302]: fence enterprise dev 0.0 agent fence_pcmk result: error from agent May 21 22:21:45 defiant fenced[1302]: fence enterprise failed May 21 22:21:48 defiant fenced[1302]: fencing node enterprise May 21 22:21:48 defiant logger: fence_pcmk[2239]: Requesting Pacemaker fence enterprise (reset) May 21 22:21:48 defiant stonith_admin[2240]: notice: crm_log_args: Invoked: stonith_admin --reboot enterprise --tolerance 5s May 21 22:21:58 defiant logger: fence_pcmk[2239]: Call to fence enterprise (reset) failed with rc=255 May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent fence_pcmk result: error from agent May 21 22:21:58 defiant fenced[1302]: fence enterprise failed May 21 22:22:01 defiant fenced[1302]: fencing node enterprise /var/log/cluster/corosync.log ----------------------------- May 21 22:21:32 corosync [TOTEM ] A processor failed, forming new configuration. May 21 22:21:34 corosync [QUORUM] Members[1]: 1 May 21 22:21:34 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. May 21 22:21:34 [1749] defiant crmd: info: cman_event_callback: Membership 296: quorum retained May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership: Left[5.0] cib.2 May 21 22:21:34 [1744] defiant cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership: Member[5.0] cib.1 May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership: Left[5.0] stonith-ng.2 May 21 22:21:34 [1745] defiant stonith-ng: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline May 21 22:21:34 corosync [CPG ] chosen downlist: sender r(0) ip(10.10.25.152) ; members(old:2 left:1) May 21 22:21:34 corosync [MAIN ] Completed service synchronization, ready to provide service. May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership: Member[5.0] stonith-ng.1 May 21 22:21:34 [1749] defiant crmd: notice: crm_update_peer_state: cman_event_callback: Node enterprise[2] - state is now lost May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback: enterprise is now lost (was member) May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/150, version=0.22.3): OK (rc=0) May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership: Left[5.0] crmd.2 May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback: Client enterprise/peer now has status [offline] (DC=true) May 21 22:21:34 [1749] defiant crmd: warning: match_down_event: No match for shutdown action on enterprise May 21 22:21:34 [1749] defiant crmd: notice: peer_update_callback: Stonith/shutdown of enterprise not matched May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_expected: peer_update_callback: Node enterprise[2] - expected state is now down May 21 22:21:34 [1749] defiant crmd: info: abort_transition_graph: peer_update_callback:211 - Triggered transition abort (complete=1) : Node failure May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership: Member[5.0] crmd.1 May 21 22:21:34 [1749] defiant crmd: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ] May 21 22:21:34 [1749] defiant crmd: info: abort_transition_graph: do_te_invoke:163 - Triggered transition abort (complete=1) : Peer Halt May 21 22:21:34 [1749] defiant crmd: info: join_make_offer: Making join offers based on membership 296 May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_offer_all: join-7: Waiting on 1 outstanding join acks May 21 22:21:34 [1749] defiant crmd: info: update_dc: Set DC to defiant (3.0.7) May 21 22:21:34 [1749] defiant crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_finalize: join-7: Syncing the CIB from defiant to the rest of the cluster May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/154, version=0.22.5): OK (rc=0) May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/155, version=0.22.6): OK (rc=0) May 21 22:21:34 [1749] defiant crmd: info: stonith_action_create: Initiating action metadata for agent fence_rhevm (target=(null)) May 21 22:21:35 [1749] defiant crmd: info: do_dc_join_ack: join-7: Updating node state to member for defiant May 21 22:21:35 [1749] defiant crmd: info: erase_status_tag: Deleting xpath: //node_state[@uname='defiant']/lrm May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='defiant']/lrm (origin=local/crmd/156, version=0.22.7): OK (rc=0) May 21 22:21:35 [1749] defiant crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ] May 21 22:21:35 [1749] defiant crmd: info: abort_transition_graph: do_te_invoke:156 - Triggered transition abort (complete=1) : Peer Cancelled May 21 22:21:35 [1747] defiant attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd) May 21 22:21:35 [1747] defiant attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/158, version=0.22.9): OK (rc=0) May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/160, version=0.22.11): OK (rc=0) May 21 22:21:36 [1748] defiant pengine: info: unpack_config: Startup probes: enabled May 21 22:21:36 [1748] defiant pengine: notice: unpack_config: On loss of CCM Quorum: Ignore May 21 22:21:36 [1748] defiant pengine: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 May 21 22:21:36 [1748] defiant pengine: info: unpack_domains: Unpacking domains May 21 22:21:36 [1748] defiant pengine: info: determine_online_status_fencing: Node defiant is active May 21 22:21:36 [1748] defiant pengine: info: determine_online_status: Node defiant is online May 21 22:21:36 [1748] defiant pengine: info: native_print: st-rhevm (stonith:fence_rhevm): Started defiant May 21 22:21:36 [1748] defiant pengine: info: LogActions: Leave st-rhevm (Started defiant) May 21 22:21:36 [1748] defiant pengine: notice: process_pe_message: Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2 May 21 22:21:36 [1749] defiant crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] May 21 22:21:36 [1749] defiant crmd: info: do_te_invoke: Processing graph 64 (ref=pe_calc-dc-1369171296-118) derived from /var/lib/pacemaker/pengine/pe-input-60.bz2 May 21 22:21:36 [1749] defiant crmd: notice: run_graph: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete May 21 22:21:36 [1749] defiant crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Live CIB -------------------------------- <cib epoch="18" num_updates="88" admin_epoch="0" validate-with="pacemaker-1.2" update-origin="defiant" update-client="cibadmin" cib-last-written="Tue May 21 07:55:31 2013" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="defiant"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.8-7.el6-394e906"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="cman"/> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/> </cluster_property_set> </crm_config> <nodes> <node id="defiant" uname="defiant"/> <node id="enterprise" uname="enterprise"/> </nodes> <resources> <primitive class="stonith" id="st-rhevm" type="fence_rhevm"> <instance_attributes id="st-rhevm-instance_attributes"> <nvpair id="st-rhevm-instance_attributes-login" name="login" value="<user>@<domain>"/> <nvpair id="st-rhevm-instance_attributes-passwd" name="passwd" value="<password>"/> <nvpair id="st-rhevm-instance_attributes-debug" name="debug" value="/tmp/debug.log"/> <nvpair id="st-rhevm-instance_attributes-ssl" name="ssl" value="1"/> <nvpair id="st-rhevm-instance_attributes-verbose" name="verbose" value="1"/> <nvpair id="st-rhevm-instance_attributes-ipaddr" name="ipaddr" value="<hypervisor ip>"/> </instance_attributes> </primitive> </resources> <constraints/> </configuration> <status> <node_state id="defiant" uname="defiant" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member"> <transient_attributes id="defiant"> <instance_attributes id="status-defiant"> <nvpair id="status-defiant-probe_complete" name="probe_complete" value="true"/> </instance_attributes> </transient_attributes> <lrm id="defiant"> <lrm_resources> <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith"> <lrm_rsc_op id="st-rhevm_last_0" operation_key="st-rhevm_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7" transition-key="2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996" transition-magic="0:0;2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996" call-id="14" rc-code="0" op-status="0" interval="0" last-run="1369119332" last-rc-change="0" exec-time="232" queue-time="0" op-digest="3bc7e1ce413fe37998a289f77f85d159"/> </lrm_resource> </lrm_resources> </lrm> </node_state> <node_state id="enterprise" uname="enterprise" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member"> <lrm id="enterprise"> <lrm_resources> <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith"> <lrm_rsc_op id="st-rhevm_last_0" operation_key="st-rhevm_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb" transition-magic="0:7;5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1369170800" last-rc-change="0" exec-time="4" queue-time="0" op-digest="3bc7e1ce413fe37998a289f77f85d159"/> </lrm_resource> </lrm_resources> </lrm> <transient_attributes id="enterprise"> <instance_attributes id="status-enterprise"> <nvpair id="status-enterprise-probe_complete" name="probe_complete" value="true"/> </instance_attributes> </transient_attributes> </node_state> </status> </cib>