Description of problem: If the first fence method fails, fenced doesn't use the second method to fence a node. Probably, it can't get necessary information from ccs. Relevant entries from log: Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] entering GATHER state from 11. Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] Creating commit token because I am the rep. Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] entering COMMIT state. Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] entering RECOVERY state. Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] position [0] member 192.168.100.51: Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] previous ring seq 164 rep 192.168.100.51 Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] aru 203 high delivered 203 received flag 0 Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] position [1] member 192.168.100.53: Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] previous ring seq 164 rep 192.168.100.51 Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] aru 203 high delivered 203 received flag 0 Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] position [2] member 192.168.100.54: Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] previous ring seq 164 rep 192.168.100.51 Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] aru 203 high delivered 203 received flag 0 Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] Did not need to originate any messages in recovery. Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] Storing new sequence id for ring ac Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] Sending initial ORF token Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] CLM CONFIGURATION CHANGE Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] New Configuration: Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] r(0) ip(192.168.100.51) Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] r(0) ip(192.168.100.53) Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] r(0) ip(192.168.100.54) Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] Members Left: Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] r(0) ip(192.168.100.52) Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] Members Joined: Nov 27 16:47:52 192.168.100.51 openais[1590]: [SYNC ] This node is within the primary component and will provide service. Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] CLM CONFIGURATION CHANGE Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] New Configuration: Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] r(0) ip(192.168.100.51) Nov 27 16:47:52 192.168.100.51 fenced[1598]: node2.clean not a cluster member after 0 sec post_fail_delay Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] r(0) ip(192.168.100.53) Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] r(0) ip(192.168.100.54) Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] Members Left: Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] Members Joined: Nov 27 16:47:52 192.168.100.51 openais[1590]: [SYNC ] This node is within the primary component and will provide service. Nov 27 16:47:52 192.168.100.51 openais[1590]: [TOTEM] entering OPERATIONAL state. Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] got nodejoin message 192.168.100.51 Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] got nodejoin message 192.168.100.53 Nov 27 16:47:52 192.168.100.51 openais[1590]: [CLM ] got nodejoin message 192.168.100.54 Nov 27 16:47:52 192.168.100.51 openais[1590]: [CPG ] got joinlist message from node 1 Nov 27 16:47:52 192.168.100.51 fenced[1598]: fencing node "node2.clean" Nov 27 16:47:52 192.168.100.51 openais[1590]: [CPG ] got joinlist message from node 3 Nov 27 16:47:52 192.168.100.51 openais[1590]: [CPG ] got joinlist message from node 4 Nov 27 16:51:32 192.168.100.51 fenced[1598]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:192.168.100.202...ipmilan: Failed to connect after 30 seconds Failed Nov 27 16:51:32 192.168.100.51 ccsd[1584]: process_get: Invalid connection descriptor received. Nov 27 16:51:32 192.168.100.51 ccsd[1584]: Error while processing get: Invalid request descriptor Nov 27 16:51:32 192.168.100.51 fenced[1598]: fence "node2.clean" failed Nov 27 16:51:40 192.168.100.51 fenced[1598]: fencing node "node2.clean" Nov 27 16:51:40 192.168.100.51 ccsd[1584]: process_get: Invalid connection descriptor received. Nov 27 16:51:40 192.168.100.51 ccsd[1584]: Error while processing get: Invalid request descriptor Nov 27 16:51:40 192.168.100.51 fenced[1598]: fence "node2.clean" failed Nov 27 16:51:46 192.168.100.51 fenced[1598]: fencing node "node2.clean" Nov 27 16:51:46 192.168.100.51 ccsd[1584]: process_get: Invalid connection descriptor received. Nov 27 16:51:46 192.168.100.51 ccsd[1584]: Error while processing get: Invalid request descriptor Nov 27 16:51:46 192.168.100.51 fenced[1598]: fence "node2.clean" failed Nov 27 16:51:53 192.168.100.51 fenced[1598]: fencing node "node2.clean" Nov 27 16:51:53 192.168.100.51 ccsd[1584]: process_get: Invalid connection descriptor received. Nov 27 16:51:53 192.168.100.51 ccsd[1584]: Error while processing get: Invalid request descriptor and so on.... cat /etc/cluster/cluster.conf: <?xml version="1.0"?> <cluster alias="clean" config_version="15" name="clean"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="12"/> <cman> <multicast addr="224.0.0.1"/> </cman> <clusternodes> <clusternode name="node1.clean" nodeid="1" votes="1"> <multicast addr="224.0.0.1" interface="eth0"/> <fence> <method name="1"> <device name="APC" port="1"/> </method> </fence> </clusternode> <clusternode name="node2.clean" nodeid="2" votes="1"> <fence> <method name="1"> <device name="ipmi_node2"/> </method> <method name="2"> <device name="APC" port="2"/> </method> </fence> </clusternode> <clusternode name="node3.clean" nodeid="3" votes="1"> <fence> <method name="1"> <device name="ipmi_node3"/> </method> <method name="2"> <device name="APC" port="3"/> </method> </fence> </clusternode> <clusternode name="node4.clean" nodeid="4" votes="1"> <fence> <method name="1"> <device name="ipmi_node4"/> </method> <method name="2"> <device name="APC" port="4"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_apc" ipaddr="192.168.100.250" login="apc" name="APC" passwd="apc"/> <fencedevice agent="fence_ipmilan" ipaddr="192.168.100.203" login="Admin" name="ipmi_node3" passwd="ipmi"/> <fencedevice agent="fence_ipmilan" ipaddr="192.168.100.204" login="ADMIN" name="ipmi_node4" passwd="ipmi"/> <fencedevice agent="fence_ipmilan" ipaddr="192.168.100.202" login="ADMIN" name="ipmi_node2" passwd="ipmi"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="clean_0" ordered="0" restricted="1"> <failoverdomainnode name="node2.clean" priority="1"/> <failoverdomainnode name="node3.clean" priority="1"/> <failoverdomainnode name="node4.clean" priority="1"/> </failoverdomain> </failoverdomains> <resources> <lvm lv_name="shared_test" name="shared_clean_test" vg_name="shared_clean"/> <ip address="192.168.100.100" monitor_link="1"/> <script file="/etc/init.d/luci" name="luci"/> </resources> <service autostart="1" domain="clean_0" exclusive="0" name="luci_service" recovery="restart"> <ip ref="192.168.100.100"> <script ref="luci"/> </ip> </service> </rm> </cluster> Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Create cluster, provide some cluster node with two fence methods (fence_ipmilan and fence_apc), fence ipmi should be before fence apc. 2.Disconnect ethernet cable from the node with two fences 3. Actual results: cluster freezes Expected results: node gets fenced using fence_apc Additional info:
This problem is caused by long timeout of IPMI fence agent. It's duplicate of older bug, so I'm closing this one. *** This bug has been marked as a duplicate of bug 276541 ***