Bug 545682 - fence_ilo fails to reboot, possibly timing problem with ilo2 1.70
Summary: fence_ilo fails to reboot, possibly timing problem with ilo2 1.70
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: fence
Version: 4
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
Assignee: Marek Grac
QA Contact: Cluster QE
URL:
Whiteboard:
: 608500 (view as bug list)
Depends On:
Blocks: 562261 562263
TreeView+ depends on / blocked
 
Reported: 2009-12-09 05:01 UTC by Michael Kearey
Modified: 2018-10-27 15:34 UTC (History)
7 users (show)

Fixed In Version: fence-1.32.68-5.el4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-16 16:15:33 UTC
Embargoed:


Attachments (Terms of Use)
Proposed patch (1.82 KB, patch)
2010-01-20 18:07 UTC, Marek Grac
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0266 0 normal SHIPPED_LIVE Low: fence security, bug fix, and enhancement update 2011-02-16 16:14:08 UTC

Description Michael Kearey 2009-12-09 05:01:45 UTC
Description of problem:

When a node in the cluster is fenced, the fencing agent turns the power off but then fails to turn the power back on. The fencing action is still regarded as "successful".

This appears to be same issue as descibed in Bug 507514 at:
https://bugzilla.redhat.com/show_bug.cgi?id=507514

The Ilo2 versions this happened were 1.79 and 1.80.

How reproducible:

Seems to happen every time.

Steps to Reproduce:

Disconnect one node from cluster by executing e.g. "ifdown bond0" and let the cluster fence the node.

Actual results:

The server is powered off. BUT will not be powered on again as is required with a reboot action from fence agent.

Expected results:

The server is powered on after being powered off. ie it is rebooted.

Additional info:

I have tested adding "time.sleep(5)" statement to fencing.py as described in bugzille. With sleep(5) the fencing works again as expected, and the node is powered back on as it should be.

Version-Release number of selected component (if applicable):


Additional Information:

Changes have been made to releive the problems as per :

https://bugzilla.redhat.com/show_bug.cgi?id=507514#c21

We would like to see if the changes are possible for RHEL4

Comment 1 Marek Grac 2009-12-09 15:23:10 UTC
It can be fixed in next update, unfortunately general solution (as in RHEL5) is not possible, so I will just add a special option for hp ilo. If you need timing feature also for some other agents, please let me know.

Comment 3 Marek Grac 2010-01-20 18:07:45 UTC
Created attachment 385741 [details]
Proposed patch

Option -G (power_wait on stdin/in cluster.conf) was added for fence_ilo. It defines how many seconds will fence agent wait after issuing power OFF command in 'reboot' action (only one used by fence daemon). Feel free to test this patch. It does not change default behaviour but '-G 5' should be equal to time.sleep(5). But this won't slow down other fence agents or older iLO that are fast enough.

Comment 15 Marek Grac 2010-07-14 12:54:33 UTC
*** Bug 608500 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2011-02-16 16:15:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0266.html


Note You need to log in before you can comment on or make changes to this bug.