Bug 354421 - fenced fails to execute the fence agent again if the first attempt fails and it takes more than 30 seconds to complete
Summary: fenced fails to execute the fence agent again if the first attempt fails and ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Ryan McCabe
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 219633
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-10-26 16:42 UTC by Marco Ceci
Modified: 2009-04-16 22:44 UTC (History)
6 users (show)

Fixed In Version: RHBA-2008-0347
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:58:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch that fix the problem (3.89 KB, patch)
2007-10-26 16:42 UTC, Marco Ceci
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0347 0 normal SHIPPED_LIVE cman bug fix and enhancement update 2008-05-20 12:39:41 UTC

Description Marco Ceci 2007-10-26 16:42:27 UTC
Description of problem:
When the first fence action takes more than 30 seconds and then fails every
other fence attempt fails and the fence agent is not executed.

Es:

1193232648 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232648 node "srvxen1.netcatalyst.it" has not been fenced
1193232648 fencing node srvxen1.netcatalyst.it
agent "fence_ilo" reports: failed to turn off

1193232700 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232700 node "srvxen1.netcatalyst.it" has not been fenced
1193232700 fencing node srvxen1.netcatalyst.it
1193232705 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232705 node "srvxen1.netcatalyst.it" has not been fenced
1193232705 fencing node srvxen1.netcatalyst.it
1193232710 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232710 node "srvxen1.netcatalyst.it" has not been fenced
1193232710 fencing node srvxen1.netcatalyst.it
1193232715 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232715 node "srvxen1.netcatalyst.it" has not been fenced

I have instrumented fenced and find out that the problem is caused by ccs_get
returning a timeout error in the count_methods. This problem has been fixed for
RHEL4 in BZ#219633 but the patch has not been ported to RHEL5. I have ported the
patch to RHEL5 and checked that the patch resolve the problem.

Version-Release number of selected component (if applicable):
cman-2.0.73

How reproducible:
every time

Steps to Reproduce:
1.configure a fence agent that takes more than 30 seconds to complete and return
with an error
  
Actual results:
The fence agent is executed only the first time

Expected results:
The fence agent should execute the fence agent until the fence is successfull or
the node rejoin the cluster

Additional info:

Patch that fix the problem on RHEL5 (cman-2.0.70-1.el5) attached.

See Bug #219633 for further information

Comment 1 Marco Ceci 2007-10-26 16:42:27 UTC
Created attachment 239311 [details]
patch that fix the problem

Comment 5 errata-xmlrpc 2008-05-21 15:58:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0347.html



Note You need to log in before you can comment on or make changes to this bug.