Bug 466788

Summary: Fencing support for Dell M1000e CMC
Product: [Retired] Red Hat Cluster Suite Reporter: Skippy <david>
Component: fenceAssignee: Marek Grac <mgrac>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 4CC: bturner, cfeist, cluster-maint, cmarthal, dchuha, edamato, jruemker, mgrac, mnielsen, praveen_paladugu, syeghiay, vvaldez
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 21:15:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fencing library
none
Fencing agent
none
Log of output while testing fence_drac5.py from comment #5 and #6
none
patch to fencing agent fence_drac5.py in comment #6
none
Fencing agent #2
none
Testing code from comment #12 none

Description Skippy 2008-10-13 16:17:02 UTC
Fencing doesn't support the Dell M600 (a box of blades) CMC for turning off and on the power.The CMC is a management facility for the M600 chassis. 

Fortunately, this is problem that is already solved. See my patch for fence_drac here:

https://www.redhat.com/archives/linux-cluster/2008-August/msg00028.html
https://www.redhat.com/archives/linux-cluster/2008-August/msg00027.html
https://www.redhat.com/archives/linux-cluster/2008-August/msg00026.html

This patch modifies the code for fencing a 1950. This has been checked against a 1950 that it still works.

Comment 2 mnielsen 2009-01-28 16:36:05 UTC
I've patched my own fence_drac for my environment and verified this works for the m1000e chassis with m600 blades in it. when is this going to make it in to a release?

Comment 3 Marek Grac 2009-02-10 16:40:47 UTC
We are working on integration of these patches to new infrastructure (drac5.py) same as in 5.x - with ssh support, better logs, ... Can you test our new fence agent (I'll add it there or send a mail)? Unfortunately I don't have direct access to such machines so I can not work on it faster.

Comment 4 mnielsen 2009-02-10 17:08:44 UTC
I'd be happy to test anything. I live in the DEV environment here, so I can get away with breaking stuff if that happens. E-mail or post a link, either way is fine with me. I still have to burn it, v-scan it, and import it to our "disconnected" network.

Comment 5 Marek Grac 2009-02-10 17:57:13 UTC
Created attachment 331445 [details]
Fencing library

Please set PYTHONPATH to directory with this library. You will also need
package 'pexpect'

Comment 6 Marek Grac 2009-02-10 17:59:30 UTC
Created attachment 331446 [details]
Fencing agent

Example of usage:

./fence_drac5.py -o on -a 1.1.1.1 -l user -p pass -x -v -m server-16 -o status

If it is possible please test also -o list (should print all available devices). 
If there are any problems please be sure to use -v (verbose) where should be everything.

Comment 7 Chris Feist 2009-04-17 21:15:26 UTC
We don't have this unit and haven't heard back about testing so I'm bumping this to 4.9.

Comment 9 Vinny Valdez 2009-04-20 19:19:42 UTC
Created attachment 340410 [details]
Log of output while testing fence_drac5.py from comment #5 and #6

I have access to this equipment so I can test anything you need.

I tried the agent posted in comment #5/6, but I was using RHEL 5.3 and cman-2.0.98-1.el5.  I can try with RHEL 4 later this week.

The agent successfully powers off a blade, but does not power it back on.  It will also power on a blade that is already off, but the script fails after the first command it issues.

This is because it is looking for a return text of "ON|OFF" after executing a command, but the blade takes a few seconds to complete the action, and the script polls every second, so it actually displays "Powering OFF|ON" for about 4 seconds before displaying "ON|OFF".  I patched it by adding checks for these on the re.compile at line 38, see attached patch (I'm sure this could be done more eloquently, I just hacked it in to work).

Also see attachments showing command results.

Comment 10 Vinny Valdez 2009-04-20 19:21:19 UTC
Created attachment 340411 [details]
patch to fencing agent fence_drac5.py in comment #6

Patch to successfully fence blades in a Dell CMC DRAC.

Comment 12 Marek Grac 2009-04-22 10:12:54 UTC
Created attachment 340710 [details]
Fencing agent #2

We were working on fencing agent with Bobby Shepherd outside of bugzilla. Only possible problem is with operation 'list' which should show you available machines/modules to work with. As this operation is not part of the Cluster Suite yet, it should not be a problem but please check it if it works.

Comment 15 Vinny Valdez 2009-04-22 16:47:43 UTC
This worked, thank you.

The only thing I noticed is that when it issues the "powerup" command, it exits after the status returns "Powering On", which technically is not actually powered on yet until the status returns "ON".  This behavior does work on "powerdown", in that it does additional "powerstatus" until the return changes from "Powering Off" to "OFF".  Log to follow.

The list command works as long as I specified a module name, and fails if I don't.

Comment 16 Vinny Valdez 2009-04-22 16:51:41 UTC
Created attachment 340770 [details]
Testing code from comment #12

Comment 17 Vinny Valdez 2009-04-22 18:36:03 UTC
I also noticed that Conga writes "modulename=" in the cluster.conf, when fence_drac5 is looking for "module_name=".  See Bug 496724 for details.

Comment 20 Marek Grac 2009-04-24 10:06:52 UTC
comment #15)
We have very similar situation with LPAR (Starting, Running, Shutting Down) and all of them are ON. I believe that same is true for this device. In case you use some power fence like APC, then ON means machine has power and it does not matter what machine itself is doing. 

comment #17)
I can create compatibility layer to accept also 'modulename' but this option at drac5 is new, so we should not create any regression.

comment #16)
Monitor and list should work without name of module. This patch will fix it:

103c103
< 		if 0 == options.has_key("-m"):
---
> 		if 0 == options.has_key("-m") and 0 == ["monitor", "list"].count(option["-o"].lower()):

Problem is that I'm not sure if listing is working as expected. Is there any result of running it without verbose? It should print available modules.

Comment 22 errata-xmlrpc 2009-05-18 21:15:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1050.html