Fencing doesn't support the Dell M600 (a box of blades) CMC for turning off and on the power.The CMC is a management facility for the M600 chassis. Fortunately, this is problem that is already solved. See my patch for fence_drac here: https://www.redhat.com/archives/linux-cluster/2008-August/msg00028.html https://www.redhat.com/archives/linux-cluster/2008-August/msg00027.html https://www.redhat.com/archives/linux-cluster/2008-August/msg00026.html This patch modifies the code for fencing a 1950. This has been checked against a 1950 that it still works.
I've patched my own fence_drac for my environment and verified this works for the m1000e chassis with m600 blades in it. when is this going to make it in to a release?
We are working on integration of these patches to new infrastructure (drac5.py) same as in 5.x - with ssh support, better logs, ... Can you test our new fence agent (I'll add it there or send a mail)? Unfortunately I don't have direct access to such machines so I can not work on it faster.
I'd be happy to test anything. I live in the DEV environment here, so I can get away with breaking stuff if that happens. E-mail or post a link, either way is fine with me. I still have to burn it, v-scan it, and import it to our "disconnected" network.
Created attachment 331445 [details] Fencing library Please set PYTHONPATH to directory with this library. You will also need package 'pexpect'
Created attachment 331446 [details] Fencing agent Example of usage: ./fence_drac5.py -o on -a 1.1.1.1 -l user -p pass -x -v -m server-16 -o status If it is possible please test also -o list (should print all available devices). If there are any problems please be sure to use -v (verbose) where should be everything.
We don't have this unit and haven't heard back about testing so I'm bumping this to 4.9.
Created attachment 340410 [details] Log of output while testing fence_drac5.py from comment #5 and #6 I have access to this equipment so I can test anything you need. I tried the agent posted in comment #5/6, but I was using RHEL 5.3 and cman-2.0.98-1.el5. I can try with RHEL 4 later this week. The agent successfully powers off a blade, but does not power it back on. It will also power on a blade that is already off, but the script fails after the first command it issues. This is because it is looking for a return text of "ON|OFF" after executing a command, but the blade takes a few seconds to complete the action, and the script polls every second, so it actually displays "Powering OFF|ON" for about 4 seconds before displaying "ON|OFF". I patched it by adding checks for these on the re.compile at line 38, see attached patch (I'm sure this could be done more eloquently, I just hacked it in to work). Also see attachments showing command results.
Created attachment 340411 [details] patch to fencing agent fence_drac5.py in comment #6 Patch to successfully fence blades in a Dell CMC DRAC.
Created attachment 340710 [details] Fencing agent #2 We were working on fencing agent with Bobby Shepherd outside of bugzilla. Only possible problem is with operation 'list' which should show you available machines/modules to work with. As this operation is not part of the Cluster Suite yet, it should not be a problem but please check it if it works.
This worked, thank you. The only thing I noticed is that when it issues the "powerup" command, it exits after the status returns "Powering On", which technically is not actually powered on yet until the status returns "ON". This behavior does work on "powerdown", in that it does additional "powerstatus" until the return changes from "Powering Off" to "OFF". Log to follow. The list command works as long as I specified a module name, and fails if I don't.
Created attachment 340770 [details] Testing code from comment #12
I also noticed that Conga writes "modulename=" in the cluster.conf, when fence_drac5 is looking for "module_name=". See Bug 496724 for details.
comment #15) We have very similar situation with LPAR (Starting, Running, Shutting Down) and all of them are ON. I believe that same is true for this device. In case you use some power fence like APC, then ON means machine has power and it does not matter what machine itself is doing. comment #17) I can create compatibility layer to accept also 'modulename' but this option at drac5 is new, so we should not create any regression. comment #16) Monitor and list should work without name of module. This patch will fix it: 103c103 < if 0 == options.has_key("-m"): --- > if 0 == options.has_key("-m") and 0 == ["monitor", "list"].count(option["-o"].lower()): Problem is that I'm not sure if listing is working as expected. Is there any result of running it without verbose? It should print available modules.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1050.html