Description of problem: In the process of simulating a customer setup, i found fenced crashing when an agent fails to fence. <cluster name="fabbione" config_version="3"> <cman two_node="1" expected_votes="1"/> <clusternodes> <clusternode name="rhel5-node1" votes="1" nodeid="1"> <fence> <method name="single"> <device name="xvm" delay="30" domain="rhel5-node1"/> </method> <method name="2"/> </fence> </clusternode> <clusternode name="rhel5-node2" votes="1" nodeid="2"> <fence> <method name="single"> <device name="xvm" domain="rhel5-node2"/> </method> <method name="2"/> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="xvm" agent="fence_xvm"/> </fencedevices> </cluster> Version-Release number of selected component (if applicable): cman-2.0.115-96.el5_8.1 How reproducible: always Steps to Reproduce: 1. start the 2 nodes cluster, let the nodes join (this case it´s VMs) 2. stop fence_xvmd on the host (or replace agent with /bin/false) 3. killall -9 aisexec on one of the nodes Actual results: Apr 3 10:53:01 rhel5-node1 openais[2694]: [CPG ] got joinlist message from node 1 Apr 3 10:53:31 rhel5-node1 fenced[2713]: agent "fence_xvm" reports: Could not read /etc/cluster/fence_xvm.key; trying without authentication Timed out waiting for response Apr 3 10:53:31 rhel5-node1 kernel: fenced[2713]: segfault at 0000000000000018 rip 00002aeb6ecdc53b rsp 00007fffe7e4be40 error 4 Apr 3 10:53:31 rhel5-node1 groupd[2706]: fence daemon appears to be dead Apr 3 10:53:32 rhel5-node1 openais[2694]: [SERV ] Unloading all openais components [SNIP]
extra info, the crash is caused by empty <method name="2"/>
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
pushed to RHEL59 branch http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=77bb92ad9dbad9c8755e45211214ab1e099cdb0e tested this by killing node-03 with this config: <clusternode name="node-03" nodeid="3"> <fence> <method name="1"> <device name="f"/> </method> <method name="2"/> </fence> </clusternode> <fencedevices> <fencedevice name="t" agent="/root/fence_test0"/> <fencedevice name="f" agent="/root/fence_test1"/> </fencedevices> fence_test1 does exit(1) without fix, fenced segfaults, with fix it doesn't.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0076.html