Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 354421 - fenced fails to execute the fence agent again if the first attempt fails and it takes more than 30 seconds to complete
fenced fails to execute the fence agent again if the first attempt fails and ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Ryan McCabe
Cluster QE
:
Depends On: 219633
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-26 12:42 EDT by Marco Ceci
Modified: 2009-04-16 18:44 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHBA-2008-0347
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:58:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch that fix the problem (3.89 KB, patch)
2007-10-26 12:42 EDT, Marco Ceci
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0347 normal SHIPPED_LIVE cman bug fix and enhancement update 2008-05-20 08:39:41 EDT

  None (edit)
Description Marco Ceci 2007-10-26 12:42:27 EDT
Description of problem:
When the first fence action takes more than 30 seconds and then fails every
other fence attempt fails and the fence agent is not executed.

Es:

1193232648 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232648 node "srvxen1.netcatalyst.it" has not been fenced
1193232648 fencing node srvxen1.netcatalyst.it
agent "fence_ilo" reports: failed to turn off

1193232700 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232700 node "srvxen1.netcatalyst.it" has not been fenced
1193232700 fencing node srvxen1.netcatalyst.it
1193232705 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232705 node "srvxen1.netcatalyst.it" has not been fenced
1193232705 fencing node srvxen1.netcatalyst.it
1193232710 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232710 node "srvxen1.netcatalyst.it" has not been fenced
1193232710 fencing node srvxen1.netcatalyst.it
1193232715 node "srvxen1.netcatalyst.it" not a cman member, cn 1
1193232715 node "srvxen1.netcatalyst.it" has not been fenced

I have instrumented fenced and find out that the problem is caused by ccs_get
returning a timeout error in the count_methods. This problem has been fixed for
RHEL4 in BZ#219633 but the patch has not been ported to RHEL5. I have ported the
patch to RHEL5 and checked that the patch resolve the problem.

Version-Release number of selected component (if applicable):
cman-2.0.73

How reproducible:
every time

Steps to Reproduce:
1.configure a fence agent that takes more than 30 seconds to complete and return
with an error
  
Actual results:
The fence agent is executed only the first time

Expected results:
The fence agent should execute the fence agent until the fence is successfull or
the node rejoin the cluster

Additional info:

Patch that fix the problem on RHEL5 (cman-2.0.70-1.el5) attached.

See Bug #219633 for further information
Comment 1 Marco Ceci 2007-10-26 12:42:27 EDT
Created attachment 239311 [details]
patch that fix the problem
Comment 5 errata-xmlrpc 2008-05-21 11:58:08 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0347.html

Note You need to log in before you can comment on or make changes to this bug.