Bug 466788 - Fencing support for Dell M1000e CMC
Summary: Fencing support for Dell M1000e CMC
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: fence
Version: 4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
Assignee: Marek Grac
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-10-13 16:17 UTC by Skippy
Modified: 2009-05-28 15:03 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-18 21:15:37 UTC
Embargoed:


Attachments (Terms of Use)
Fencing library (17.38 KB, application/octet-stream)
2009-02-10 17:57 UTC, Marek Grac
no flags Details
Fencing agent (3.11 KB, application/octet-stream)
2009-02-10 17:59 UTC, Marek Grac
no flags Details
Log of output while testing fence_drac5.py from comment #5 and #6 (3.32 KB, text/plain)
2009-04-20 19:19 UTC, Vinny Valdez
no flags Details
patch to fencing agent fence_drac5.py in comment #6 (528 bytes, patch)
2009-04-20 19:21 UTC, Vinny Valdez
no flags Details | Diff
Fencing agent #2 (3.16 KB, application/octet-stream)
2009-04-22 10:12 UTC, Marek Grac
no flags Details
Testing code from comment #12 (2.66 KB, text/plain)
2009-04-22 16:51 UTC, Vinny Valdez
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1050 0 normal SHIPPED_LIVE Fence bug-fix and enhancement update 2009-05-18 21:15:16 UTC

Description Skippy 2008-10-13 16:17:02 UTC
Fencing doesn't support the Dell M600 (a box of blades) CMC for turning off and on the power.The CMC is a management facility for the M600 chassis. 

Fortunately, this is problem that is already solved. See my patch for fence_drac here:

https://www.redhat.com/archives/linux-cluster/2008-August/msg00028.html
https://www.redhat.com/archives/linux-cluster/2008-August/msg00027.html
https://www.redhat.com/archives/linux-cluster/2008-August/msg00026.html

This patch modifies the code for fencing a 1950. This has been checked against a 1950 that it still works.

Comment 2 mnielsen 2009-01-28 16:36:05 UTC
I've patched my own fence_drac for my environment and verified this works for the m1000e chassis with m600 blades in it. when is this going to make it in to a release?

Comment 3 Marek Grac 2009-02-10 16:40:47 UTC
We are working on integration of these patches to new infrastructure (drac5.py) same as in 5.x - with ssh support, better logs, ... Can you test our new fence agent (I'll add it there or send a mail)? Unfortunately I don't have direct access to such machines so I can not work on it faster.

Comment 4 mnielsen 2009-02-10 17:08:44 UTC
I'd be happy to test anything. I live in the DEV environment here, so I can get away with breaking stuff if that happens. E-mail or post a link, either way is fine with me. I still have to burn it, v-scan it, and import it to our "disconnected" network.

Comment 5 Marek Grac 2009-02-10 17:57:13 UTC
Created attachment 331445 [details]
Fencing library

Please set PYTHONPATH to directory with this library. You will also need
package 'pexpect'

Comment 6 Marek Grac 2009-02-10 17:59:30 UTC
Created attachment 331446 [details]
Fencing agent

Example of usage:

./fence_drac5.py -o on -a 1.1.1.1 -l user -p pass -x -v -m server-16 -o status

If it is possible please test also -o list (should print all available devices). 
If there are any problems please be sure to use -v (verbose) where should be everything.

Comment 7 Chris Feist 2009-04-17 21:15:26 UTC
We don't have this unit and haven't heard back about testing so I'm bumping this to 4.9.

Comment 9 Vinny Valdez 2009-04-20 19:19:42 UTC
Created attachment 340410 [details]
Log of output while testing fence_drac5.py from comment #5 and #6

I have access to this equipment so I can test anything you need.

I tried the agent posted in comment #5/6, but I was using RHEL 5.3 and cman-2.0.98-1.el5.  I can try with RHEL 4 later this week.

The agent successfully powers off a blade, but does not power it back on.  It will also power on a blade that is already off, but the script fails after the first command it issues.

This is because it is looking for a return text of "ON|OFF" after executing a command, but the blade takes a few seconds to complete the action, and the script polls every second, so it actually displays "Powering OFF|ON" for about 4 seconds before displaying "ON|OFF".  I patched it by adding checks for these on the re.compile at line 38, see attached patch (I'm sure this could be done more eloquently, I just hacked it in to work).

Also see attachments showing command results.

Comment 10 Vinny Valdez 2009-04-20 19:21:19 UTC
Created attachment 340411 [details]
patch to fencing agent fence_drac5.py in comment #6

Patch to successfully fence blades in a Dell CMC DRAC.

Comment 12 Marek Grac 2009-04-22 10:12:54 UTC
Created attachment 340710 [details]
Fencing agent #2

We were working on fencing agent with Bobby Shepherd outside of bugzilla. Only possible problem is with operation 'list' which should show you available machines/modules to work with. As this operation is not part of the Cluster Suite yet, it should not be a problem but please check it if it works.

Comment 15 Vinny Valdez 2009-04-22 16:47:43 UTC
This worked, thank you.

The only thing I noticed is that when it issues the "powerup" command, it exits after the status returns "Powering On", which technically is not actually powered on yet until the status returns "ON".  This behavior does work on "powerdown", in that it does additional "powerstatus" until the return changes from "Powering Off" to "OFF".  Log to follow.

The list command works as long as I specified a module name, and fails if I don't.

Comment 16 Vinny Valdez 2009-04-22 16:51:41 UTC
Created attachment 340770 [details]
Testing code from comment #12

Comment 17 Vinny Valdez 2009-04-22 18:36:03 UTC
I also noticed that Conga writes "modulename=" in the cluster.conf, when fence_drac5 is looking for "module_name=".  See Bug 496724 for details.

Comment 20 Marek Grac 2009-04-24 10:06:52 UTC
comment #15)
We have very similar situation with LPAR (Starting, Running, Shutting Down) and all of them are ON. I believe that same is true for this device. In case you use some power fence like APC, then ON means machine has power and it does not matter what machine itself is doing. 

comment #17)
I can create compatibility layer to accept also 'modulename' but this option at drac5 is new, so we should not create any regression.

comment #16)
Monitor and list should work without name of module. This patch will fix it:

103c103
< 		if 0 == options.has_key("-m"):
---
> 		if 0 == options.has_key("-m") and 0 == ["monitor", "list"].count(option["-o"].lower()):

Problem is that I'm not sure if listing is working as expected. Is there any result of running it without verbose? It should print available modules.

Comment 22 errata-xmlrpc 2009-05-18 21:15:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1050.html


Note You need to log in before you can comment on or make changes to this bug.