466788 – Fencing support for Dell M1000e CMC

Bug 466788 - Fencing support for Dell M1000e CMC

Summary: Fencing support for Dell M1000e CMC

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	fence
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Assignee:	Marek Grac
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-10-13 16:17 UTC by Skippy
Modified:	2009-05-28 15:03 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-05-18 21:15:37 UTC
Embargoed:

Attachments	(Terms of Use)
Fencing library (17.38 KB, application/octet-stream) 2009-02-10 17:57 UTC, Marek Grac	no flags	Details
Fencing agent (3.11 KB, application/octet-stream) 2009-02-10 17:59 UTC, Marek Grac	no flags	Details
Log of output while testing fence_drac5.py from comment #5 and #6 (3.32 KB, text/plain) 2009-04-20 19:19 UTC, Vinny Valdez	no flags	Details
patch to fencing agent fence_drac5.py in comment #6 (528 bytes, patch) 2009-04-20 19:21 UTC, Vinny Valdez	no flags	Details \| Diff
Fencing agent #2 (3.16 KB, application/octet-stream) 2009-04-22 10:12 UTC, Marek Grac	no flags	Details
Testing code from comment #12 (2.66 KB, text/plain) 2009-04-22 16:51 UTC, Vinny Valdez	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:1050	0	normal	SHIPPED_LIVE	Fence bug-fix and enhancement update	2009-05-18 21:15:16 UTC

Description Skippy 2008-10-13 16:17:02 UTC

Fencing doesn't support the Dell M600 (a box of blades) CMC for turning off and on the power.The CMC is a management facility for the M600 chassis. 

Fortunately, this is problem that is already solved. See my patch for fence_drac here:

https://www.redhat.com/archives/linux-cluster/2008-August/msg00028.html
https://www.redhat.com/archives/linux-cluster/2008-August/msg00027.html
https://www.redhat.com/archives/linux-cluster/2008-August/msg00026.html

This patch modifies the code for fencing a 1950. This has been checked against a 1950 that it still works.

Comment 2 mnielsen 2009-01-28 16:36:05 UTC

I've patched my own fence_drac for my environment and verified this works for the m1000e chassis with m600 blades in it. when is this going to make it in to a release?

Comment 3 Marek Grac 2009-02-10 16:40:47 UTC

We are working on integration of these patches to new infrastructure (drac5.py) same as in 5.x - with ssh support, better logs, ... Can you test our new fence agent (I'll add it there or send a mail)? Unfortunately I don't have direct access to such machines so I can not work on it faster.

Comment 4 mnielsen 2009-02-10 17:08:44 UTC

I'd be happy to test anything. I live in the DEV environment here, so I can get away with breaking stuff if that happens. E-mail or post a link, either way is fine with me. I still have to burn it, v-scan it, and import it to our "disconnected" network.

Comment 5 Marek Grac 2009-02-10 17:57:13 UTC

Created attachment 331445 [details]
Fencing library

Please set PYTHONPATH to directory with this library. You will also need
package 'pexpect'

Comment 6 Marek Grac 2009-02-10 17:59:30 UTC

Created attachment 331446 [details]
Fencing agent

Example of usage:

./fence_drac5.py -o on -a 1.1.1.1 -l user -p pass -x -v -m server-16 -o status

If it is possible please test also -o list (should print all available devices). 
If there are any problems please be sure to use -v (verbose) where should be everything.

Comment 7 Chris Feist 2009-04-17 21:15:26 UTC

We don't have this unit and haven't heard back about testing so I'm bumping this to 4.9.

Comment 9 Vinny Valdez 2009-04-20 19:19:42 UTC

Created attachment 340410 [details]
Log of output while testing fence_drac5.py from comment #5 and #6

I have access to this equipment so I can test anything you need.

I tried the agent posted in comment #5/6, but I was using RHEL 5.3 and cman-2.0.98-1.el5.  I can try with RHEL 4 later this week.

The agent successfully powers off a blade, but does not power it back on.  It will also power on a blade that is already off, but the script fails after the first command it issues.

This is because it is looking for a return text of "ON|OFF" after executing a command, but the blade takes a few seconds to complete the action, and the script polls every second, so it actually displays "Powering OFF|ON" for about 4 seconds before displaying "ON|OFF".  I patched it by adding checks for these on the re.compile at line 38, see attached patch (I'm sure this could be done more eloquently, I just hacked it in to work).

Also see attachments showing command results.

Comment 10 Vinny Valdez 2009-04-20 19:21:19 UTC

Created attachment 340411 [details]
patch to fencing agent fence_drac5.py in comment #6

Patch to successfully fence blades in a Dell CMC DRAC.

Comment 12 Marek Grac 2009-04-22 10:12:54 UTC

Created attachment 340710 [details]
Fencing agent #2

We were working on fencing agent with Bobby Shepherd outside of bugzilla. Only possible problem is with operation 'list' which should show you available machines/modules to work with. As this operation is not part of the Cluster Suite yet, it should not be a problem but please check it if it works.

Comment 15 Vinny Valdez 2009-04-22 16:47:43 UTC

This worked, thank you.

The only thing I noticed is that when it issues the "powerup" command, it exits after the status returns "Powering On", which technically is not actually powered on yet until the status returns "ON".  This behavior does work on "powerdown", in that it does additional "powerstatus" until the return changes from "Powering Off" to "OFF".  Log to follow.

The list command works as long as I specified a module name, and fails if I don't.

Comment 16 Vinny Valdez 2009-04-22 16:51:41 UTC

Created attachment 340770 [details]
Testing code from comment #12

Comment 17 Vinny Valdez 2009-04-22 18:36:03 UTC

I also noticed that Conga writes "modulename=" in the cluster.conf, when fence_drac5 is looking for "module_name=".  See Bug 496724 for details.

Comment 20 Marek Grac 2009-04-24 10:06:52 UTC

comment #15)
We have very similar situation with LPAR (Starting, Running, Shutting Down) and all of them are ON. I believe that same is true for this device. In case you use some power fence like APC, then ON means machine has power and it does not matter what machine itself is doing. 

comment #17)
I can create compatibility layer to accept also 'modulename' but this option at drac5 is new, so we should not create any regression.

comment #16)
Monitor and list should work without name of module. This patch will fix it:

103c103
< 		if 0 == options.has_key("-m"):
---
> 		if 0 == options.has_key("-m") and 0 == ["monitor", "list"].count(option["-o"].lower()):

Problem is that I'm not sure if listing is working as expected. Is there any result of running it without verbose? It should print available modules.

Comment 22 errata-xmlrpc 2009-05-18 21:15:37 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1050.html

Note You need to log in before you can comment on or make changes to this bug.