Description of problem: When the first fence action takes more than 30 seconds and then fails every other fence attempt fails and the fence agent is not executed. Es: 1193232648 node "srvxen1.netcatalyst.it" not a cman member, cn 1 1193232648 node "srvxen1.netcatalyst.it" has not been fenced 1193232648 fencing node srvxen1.netcatalyst.it agent "fence_ilo" reports: failed to turn off 1193232700 node "srvxen1.netcatalyst.it" not a cman member, cn 1 1193232700 node "srvxen1.netcatalyst.it" has not been fenced 1193232700 fencing node srvxen1.netcatalyst.it 1193232705 node "srvxen1.netcatalyst.it" not a cman member, cn 1 1193232705 node "srvxen1.netcatalyst.it" has not been fenced 1193232705 fencing node srvxen1.netcatalyst.it 1193232710 node "srvxen1.netcatalyst.it" not a cman member, cn 1 1193232710 node "srvxen1.netcatalyst.it" has not been fenced 1193232710 fencing node srvxen1.netcatalyst.it 1193232715 node "srvxen1.netcatalyst.it" not a cman member, cn 1 1193232715 node "srvxen1.netcatalyst.it" has not been fenced I have instrumented fenced and find out that the problem is caused by ccs_get returning a timeout error in the count_methods. This problem has been fixed for RHEL4 in BZ#219633 but the patch has not been ported to RHEL5. I have ported the patch to RHEL5 and checked that the patch resolve the problem. Version-Release number of selected component (if applicable): cman-2.0.73 How reproducible: every time Steps to Reproduce: 1.configure a fence agent that takes more than 30 seconds to complete and return with an error Actual results: The fence agent is executed only the first time Expected results: The fence agent should execute the fence agent until the fence is successfull or the node rejoin the cluster Additional info: Patch that fix the problem on RHEL5 (cman-2.0.70-1.el5) attached. See Bug #219633 for further information
Created attachment 239311 [details] patch that fix the problem
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0347.html