Description of problem: A customer rep asks: Are there any plans to expand the list of DRACs supported by the fence_drac agent, beyond DRAC III/XT & DRAC/MC? The PowerEdge 1850(and other newer servers) uses the DRAC 4/I, which is unsupported and doesn't work with the current fencing agent. Firstly, the getmodinfo is an unsupported command on the 4/I. But, and upgrade in firmware(to v 1.35) replaced the support for the command. So that fixes that problem. But, even with the command being returned, the fencing agent doesn't work. It establishes the telnet session and gets the "unable to determine power state" error and quits.
Created attachment 123406 [details] Patch to fence_drac from RHEL4U2 branch
Thanks for the patch. This required one minor amendment, and then worked with 4/I firmware 1.20, but now the field reports that it does not work with the latest firmware release (1.30). One other note: This agent will NOT work with the DRAC 4/P, which I think is unfenceable, as it does not have a way to determine power status.
I'm running Dell PowerEdge 1955 blade servers in a chassis with DRAC/MC 1.3 firmware, and I'm seeing a similar problem. The fence_drac is able to power off/on the blade, but its not returning the correct status after the power is switched off or on. Example command issued: # fence_drac -a 10.0.0.20 -l username -p password -D debug.txt -m Server-10 -v -o off detected drac version 'DRAC/MC' failed: telnet returned: pattern match timed-out Result: Server is shut off harshly (ie. about 3 services are shutdown in init 6, then power is cut to the machine). Problem: Its great that the server is getting shut down, but fence_node gets thrown into a death loop rebooting the server using fence_drac as the fence program.
I fixed the 1.3 problem. It was due to the telnet interface being significantly slower in 1.3 than 1.2, and the 5 second timeout was too short for $telnet_timeout. /sbin/fence_drac, line 33: From: my $telnet_timeout = 5; # Seconds to wait for matching telent response To: my $telnet_timeout = 10; # Seconds to wait for matching telent response
4/I support is added to the agent and the telnet timeout has been increased. In addition, DRAC 4/P support has been added, and has been tested for 4/p firmware version 1.40
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0138.html