Bug 472370

Summary:	fence_impilan blocks alternative fencing agents when connectivity to IPMI fails.
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Jan Friesse <jfriesse>
Component:	fence	Assignee:	Jim Parsons <jparsons>
Status:	CLOSED ERRATA	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4	CC:	bkahn, bstevens, cfeist, cluster-maint, cmarthal, djansa, edamato, hlawatschek, jfriesse, lhh, zheka
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-05-18 21:15:44 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	276541
Bug Blocks:

Description Jan Friesse 2008-11-20 14:45:01 UTC

+++ This bug was initially created as a clone of Bug #276541 +++

Description of problem:
If there is no connection to IPMI that is used as fencing device, fence_imilan
fails and no other fencing devices get the chance to intervene (perhaps due to a
very long timeout).

Version-Release number of selected component (if applicable):
fence-1.32.25-1 - fence_ipmilan

How reproducible:
Everytime you disable connectivity to IPMI and want to use IPMI for fencing on
the cluster. For example with iptables rule that rejects packets to its
destination IP.

Steps to Reproduce:
1. iptables -A OUTPUT -d <ipmi_ip> -j REJECT
2. fence_node <nodename>
3. watch /var/log/messages and output of command
  
Actual results:
You will get a timeout after a _very_ long period. In the meantime and after the
command finally returns a failure, alternative fencing agents aren't tried. So
the whole fencing process fails even if there are other fencing agents enabled
and verified to work otherwise.


Expected results:
After a reasonable amount of time (much less than now) the agent should return
that it failed to fence the node. Then other fencing agents should get a chance
to fence the failed server with success.


Additional info:
# uname -a
Linux axqa02rc_1 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64
x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)

--- Additional comment from jparsons on 2007-09-12 15:44:37 EDT ---

Lon - what are your thoughts about adding/adjusting the ipmilan timeout?

--- Additional comment from jfriesse on 2008-11-20 08:56:48 EDT ---

*** Bug 401481 has been marked as a duplicate of this bug. ***

--- Additional comment from jfriesse on 2008-11-20 09:01:22 EDT ---

*** Bug 452894 has been marked as a duplicate of this bug. ***

--- Additional comment from jfriesse on 2008-11-20 09:18:46 EDT ---

Created an attachment (id=324178)
Patch fixing this bug

Bug was cause by very long timeout in IPMI agent.

This patch adjust timeout to default value 10s which should be enough for most today IPMI implementations. It also removes retries, because this job is done
by fenced.

Because some devices still need longer timeouts, timeout is adjustable by parameter -t (or timeout for stdin and XML configuration).

Comment 4 errata-xmlrpc 2009-05-18 21:15:44 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1050.html