Bug 489809

Summary: Broken device detection for DRAC3 ERA/O in fence_drac
Product: Red Hat Enterprise Linux 5 Reporter: Gordan Bobic <gordan>
Component: cmanAssignee: Marek Grac <mgrac>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: cluster-maint, cward, djansa, edamato, jkortus
Target Milestone: ---Keywords: OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: cman-2.0.115-18.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 08:41:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to fix operation of fence_drac on Dell embedded DRAC3 ERA/O cards none

Description Gordan Bobic 2009-03-11 23:12:39 UTC
Created attachment 334870 [details]
Patch to fix operation of fence_drac on Dell embedded DRAC3 ERA/O cards

Description of problem:

Fencing agent for Dell Remote Access Controller (DRAC) ERA/O (DRAC3 variant) has broken device detection. The fencing agent looks for the following regular expression:

/Dell Embedded Remote Access Controller \(ERA\)\nFirmware Version/

The actual device string (with latest firmware), is:

Dell Embedded Remote Access Controller (ERA/O)
Firmware Version 3.37 (Build 08.13)

Thus, the regular expression match should be:
/Dell Embedded Remote Access Controller \(ERA\/O\)\nFirmware Version/

Version-Release number of selected component (if applicable):
All versions of cman up to and including 2.0.98 in RHEL 5.3

How reproducible:

100%

Steps to Reproduce:
1. Set up a cluster of nodes, one of which has a DRAC3 ERA/O in it (e.g. Dell PowerEdge 1650)
2. Pull the plug on the DRAC3 node.
3. Cluster will hang. The surviving nodes will try to fence using DRAC but won't be able to identify the DRAC card, and will keep failing.

Actual results:
Cluster hangs indefinitely waiting for the node to get fenced.

Expected results:
Node gets fenced and cluster resumes operation.

Additional info:
Patch to fix this is attached.

Comment 1 Marek Grac 2009-03-20 11:50:29 UTC
Thanks for patch.

But I would like to ask you if you can help us with writing new fence agent (fence_drac5.py) to support also your device. I will try to write it (using old agent) but I don't have device to test it. I believe that we can do that in 2-3 iterations (I will need just verbose output).

Comment 2 Gordan Bobic 2009-03-20 12:06:27 UTC
Sure, I'll be happy to test it for you and forward any output back to you. Please email me the instructions.

Comment 6 Gordan Bobic 2009-11-09 01:06:04 UTC
I see this patch hasn't made it into RHEL5.4 (cman-2.0.115-1.el5_4.3). Is it likely to get pushed out any time soon? The current fence_drac agent completely fails to work on the DRAC 3 ERA/O management modules without the provided patch.

Comment 7 Marek Grac 2009-11-09 14:47:54 UTC
Patch changed so it should not break backward compatibility.

--
if (/Dell Embedded Remote Access Controller \(ERA(\/O)?\)\nFirmware Version/m)

--

If it is possible, please try test build: cman-2.0.115-18.el5

Comment 8 Gordan Bobic 2009-11-09 17:20:22 UTC
Gladly, where can I get the new package?

Comment 9 Marek Grac 2009-11-11 11:44:26 UTC
Sure, 

http://marx.fedorapeople.org/cman-2.0.115-18.el5.src.rpm

Comment 13 Marek Grac 2010-02-24 15:18:18 UTC
@Gordan: Can you please test a new package and send results?

Comment 14 Jaroslav Kortus 2010-03-15 17:13:17 UTC
Any feedback on this yet?

Comment 15 Gordan Bobic 2010-03-15 21:01:47 UTC
Sorry, forgot to get back to you about this. The updated version linked above has been working absolutely fine for me.

Comment 16 Jaroslav Kortus 2010-03-16 09:54:51 UTC
Thank you.

Comment 18 errata-xmlrpc 2010-03-30 08:41:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0266.html