Bug 479179

Summary: fence_ilo has trouble seeing when a host is back up
Product: Red Hat Enterprise Linux 5 Reporter: Guil Barros <gbarros>
Component: cmanAssignee: Marek Grac <mgrac>
Status: CLOSED NEXTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: cluster-maint, edamato, jparsons, rmccabe
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-08 16:54:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Guil Barros 2009-01-07 19:14:18 UTC
Description of problem:
Using fence_ilo to fence a host (via luci, fence_node, or fence_ilo) is unreliable and most of the time has trouble seeing that a host is back up. This leads to the cluster fencing the node a second time, generally failing again and only then transitioning the service over to another node.

Version-Release number of selected component (if applicable):
cman-2.0.84-2.el5_2.3
kernel-2.6.18-92.1.22.el5
ilo2 1.70   12/02/2008
hp dl360 g5

How reproducible:
every time...

Comment 1 Marek Grac 2009-01-08 12:40:59 UTC
Tested on HP DL360 G2 / ilo2 (v 1.91) with invalid certificate:

time ./fence_ilo.pl -o reboot -l root -p X -a proliant06-ilo.englab.brq.redhat.com 
real    1m4.605s
user    0m0.477s
sys     0m0.032s

Fence agent for ilo was completely ported to new infrastructure in RHEL5.3. If it is possible then please try it. You will need ilo/fence_ilo.py, lib/fencing.py and lib/telnet_ssl.py if you wish to take them directly from our git (branch RHEL5). 

time ./fence_ilo.py -o reboot -l root -p X -a proliant06-ilo.englab.brq.redhat.com
Success: Rebooted

real    0m13.749s
user    0m8.627s
sys     0m1.027s

Comment 3 Guil Barros 2009-01-08 15:17:57 UTC
Seems to work much better, thanks. I take it no testing has been done regarding just upgrading cman on a rhel 5.2 box? :)


# time /root/tmp/fence_ilo -v -a 1.2.3.4 -l redhat -p redhat123
<?xml version="1.0"?>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<RIBCL VERSION="2.0">

<LOGIN USER_LOGIN = "redhat" PASSWORD = "redhat123">

<RIB_INFO MODE="read"><GET_FW_VERSION />

</RIB_INFO>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
<INFORM>Scripting utility should be updated to the latest version.</INFORM>
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
<GET_FW_VERSION
   FIRMWARE_VERSION = "1.70"
   FIRMWARE_DATE    = "Dec 02 2008"
   MANAGEMENT_PROCESSOR    = "iLO2"
    />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

</LOGIN>

<LOGIN USER_LOGIN = "redhat" PASSWORD = "redhat123">

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<SERVER_INFO MODE = "read"><GET_HOST_POWER_STATUS/>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
<GET_HOST_POWER
    HOST_POWER="ON"
    />
</RIBCL>

</SERVER_INFO></LOGIN>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<LOGIN USER_LOGIN = "redhat" PASSWORD = "redhat123">

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<SERVER_INFO MODE = "write"><HOLD_PWR_BTN TOGGLE="yes" />

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

</SERVER_INFO></LOGIN>

<LOGIN USER_LOGIN = "redhat" PASSWORD = "redhat123">

<SERVER_INFO MODE = "read"><GET_HOST_POWER_STATUS/>

</SERVER_INFO></LOGIN>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
<GET_HOST_POWER
    HOST_POWER="OFF"
    />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<LOGIN USER_LOGIN = "redhat" PASSWORD = "redhat123">

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<SERVER_INFO MODE = "write"><HOLD_PWR_BTN TOGGLE="yes" />

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

</SERVER_INFO></LOGIN>

<LOGIN USER_LOGIN = "redhat" PASSWORD = "redhat123">

<SERVER_INFO MODE = "read"><GET_HOST_POWER_STATUS/>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

</SERVER_INFO></LOGIN>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
<GET_HOST_POWER
    HOST_POWER="OFF"
    />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<LOGIN USER_LOGIN = "redhat" PASSWORD = "redhat123">

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<SERVER_INFO MODE = "read"><GET_HOST_POWER_STATUS/>

</SERVER_INFO></LOGIN>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
</RIBCL>

<?xml version="1.0"?>
<RIBCL VERSION="2.22"/>
<RESPONSE
    STATUS="0x0000"
    MESSAGE='No error'
     />
<GET_HOST_POWER
    HOST_POWER="ON"Success: Rebooted

real	0m13.839s
user	0m7.668s
sys	0m1.099s

Comment 4 Marek Grac 2009-01-08 16:54:21 UTC
Please don't event try to do it :)

Using new fence agent on RHEL 5.2 should work without any problem as interface between fence daemon and fence agent is fully compatibile. 

Closing as it works in RHEL 5.3