From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2 Description of problem: We will release HA for Power. but we have no fence script for HMC of Power which supply the https way to power off/on server. And customer in china will deploy it as soon as the RHCS4.5 for power released. Version-Release number of selected component (if applicable): RHEL4.5 How reproducible: Always Steps to Reproduce: We will release HA for Power. but we have no fence script for HMC of Power which supply the https way to power off/on server. And customer in china will deploy it as soon as the RHCS4.5 for power released.h Actual Results: We will release HA for Power. but we have no fence script for HMC of Power which supply the https way to power off/on server. And customer in china will deploy it as soon as the RHCS4.5 for power released. Expected Results: We will release HA for Power. but we have no fence script for HMC of Power which supply the https way to power off/on server. And customer in china will deploy it as soon as the RHCS4.5 for power released. Additional info:
Without a specification that describes a method for remote management of this system, labor estimates are meaningless. Can we please have a link to where the HTTPS connection protocol for these systems resides? Rather than have engineers googling for information?
Jim, I put Scott Moser as CC on this bug. He is the IBM system p on-site engineer and can help you out.
Excluding Blade systems, which are managed by the BladeCenter, each power system has an FSP in it. The FSP in power5 systems is addressable via the 2 ethernet ports labeled HMC1 and HMC2. There are 2 supported ways of communicating with the FSP: 1. https to the FSP. This provides the ability to configure the FSP [such as configuring which IP addresses to use] and do limited other function. It cannot be used to manage logical partitions. 2. HMC (Hardware Management Console). An HMC is an x86 machine that manages 1 or more power machines by communicating with the FSP. The method of communication is not published. The HMC allows the user to manage the machines in 1 ways a.) remote or console GUI (WSM) and b.) command line access to group of commands. Setting up the command line access and the commands that are available are documented at http://www.redbooks.ibm.com/infocenter/eserver/v1r3s/index.jsp?topic=/iphai/usingtheremotecommandline.htm and http://www.redbooks.ibm.com/infocenter/eserver/v1r3s/index.jsp?topic=/iphcx/hmckickoff.htm . This interface is the most suitable for remote control of a managed system or its Logical PARtitions.
The hardware we are using to qualify GFS and Cluster Suite on ppc does not include an HMC. We are trying to use fence_apc but the reboot time for the hardware is much longer when the power is cut and it is causing timeouts during our testing.
Is there any chance we can get access to the "unpublished" protocol for talking to the FSP via the HMC ports? Adding the cost of an HMC would certainly deter customers from choosing RHCS. Also, what is the performance hit from having to talk to the HMC vs direct to the FSP?
(In reply to comment #6) > Is there any chance we can get access to the "unpublished" protocol for talking > to the FSP via the HMC ports? I can't speak for certain, but there are no products which do not require an HMC that manage a power system. I would highly doubt that anyone would be interested in releasing the protocol of communication between HMC and fsp. If nothing else, that would end up requiring backwards compatibility or support. > Adding the cost of an HMC would certainly deter > customers from choosing RHCS. Because there is no way to partition a Power system other than by using an HMC, any higher end system is almost certainly controlled by an HMC. The cost of the hardware simply wouldn't be justified without partitioning. For lower end Power systems, it is possible that the machine is configured without a HMC and runs a single OS on "bare metal". I do not know how common this situation is. Here, the only way to remotely power cycle the machine would be via the ASM (https). To do so, you would need some HTTP client that would log in as 'admin' (default password is 'admin') and request a reboot that way. > Also, what is the performance hit from having to > talk to the HMC vs direct to the FSP? I'm sure that control via the HMC is not the absolute fastest mechanism, but I do not believe the overhead is significant when compared to the boot times of the system. Additionally, because there is no "direct to the FSP" option, there really is no comparision to be made.
This feature has not gotten any attention in some time, but is still very much of interest. Any updates? Anything IBM can do to assist?
Created attachment 306521 [details] Fence agent for HMC / LPAR
Created attachment 306522 [details] Fence library
Fence agent for LPAR / HMC is ready, it uses ssh instead of https but this should not be a problem as cmd line interfaces tend to be more stable over the time then parsing web pages.
I took a look at the fencing agent and have a couple minor comments. In get_power_status, it appears as if this function expects the state of the LPAR to be either Running or Not Activated. Its unclear what happens if the state is something else, like Error, which would be the state of the partition if it were to panic. I'm not sure what the expected behavior of set_power_status is when the desired action is to power off the machine, but the current implementation will effectively flip the virtual power switch on the LPAR. If a graceful, OS directed shutdown is preferred, there is a separate syntax for that. Other than those two questions/comments, the script looks fine from an HMC command usage perspective.
I'll fix try to fix get_power_status. But set_power_status works same as on the other fencing devices (eg APC), we just cut the power off - STONITH. If you think that you will need also graceful shutdown, it is not a problem to do it. But IMHO it can't be a supported solution.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1050.html