Bug 237266 - ask for a fence script for Power server's HMC
Summary: ask for a fence script for Power server's HMC
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: fence
Version: 4
Hardware: ppc64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Marek Grac
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 448133
TreeView+ depends on / blocked
 
Reported: 2007-04-20 15:03 UTC by Leon Li
Modified: 2009-05-18 21:15 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-18 21:15:52 UTC
Embargoed:


Attachments (Terms of Use)
Fence agent for HMC / LPAR (2.40 KB, text/x-python)
2008-05-23 16:56 UTC, Marek Grac
no flags Details
Fence library (10.54 KB, text/x-python)
2008-05-23 16:56 UTC, Marek Grac
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1050 0 normal SHIPPED_LIVE Fence bug-fix and enhancement update 2009-05-18 21:15:16 UTC

Description Leon Li 2007-04-20 15:03:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2

Description of problem:
We will release HA for Power. but we have no fence script for HMC of Power which supply the https way to power off/on server. 

And customer in china will deploy it as soon as the RHCS4.5 for power released.

Version-Release number of selected component (if applicable):
RHEL4.5

How reproducible:
Always


Steps to Reproduce:
We will release HA for Power. but we have no fence script for HMC of Power which supply the https way to power off/on server. 

And customer in china will deploy it as soon as the RHCS4.5 for power released.h

Actual Results:
We will release HA for Power. but we have no fence script for HMC of Power which supply the https way to power off/on server. 

And customer in china will deploy it as soon as the RHCS4.5 for power released.

Expected Results:
We will release HA for Power. but we have no fence script for HMC of Power which supply the https way to power off/on server. 

And customer in china will deploy it as soon as the RHCS4.5 for power released.

Additional info:

Comment 2 Jim Parsons 2007-04-20 15:32:39 UTC
Without a specification that describes a method for remote management of this
system, labor estimates are meaningless. 

Can we please have a link to where the HTTPS connection protocol for these
systems resides? Rather than have engineers googling for information?

Comment 3 Konrad Rzeszutek 2007-04-20 16:21:08 UTC
Jim,
I put Scott Moser as CC on this bug. He is the IBM system p on-site engineer and
can help you out.

Comment 4 Scott Moser 2007-04-20 17:25:51 UTC
Excluding Blade systems, which are managed by the BladeCenter, each power system
has an FSP in it.  The FSP in power5 systems is addressable via the 2 ethernet
ports labeled HMC1 and HMC2.  There are 2 supported ways of communicating with
the FSP:
  1. https to the FSP.  This provides the ability to configure the FSP [such as
configuring which IP addresses to use] and do limited other function.  It cannot
be used to manage logical partitions.
  2. HMC (Hardware Management Console). An HMC is an x86 machine that manages 1
or more power machines by communicating with the FSP.  The method of
communication is not published.   The HMC allows the user to manage the machines
 in 1 ways a.) remote or console GUI (WSM) and b.) command line access to group
of commands.

  Setting up the command line access and the commands that are available are
documented at
http://www.redbooks.ibm.com/infocenter/eserver/v1r3s/index.jsp?topic=/iphai/usingtheremotecommandline.htm
and
http://www.redbooks.ibm.com/infocenter/eserver/v1r3s/index.jsp?topic=/iphcx/hmckickoff.htm
.  This interface is the most suitable for remote control of a managed system or
its Logical PARtitions.



Comment 5 Nate Straz 2007-04-20 22:48:33 UTC
The hardware we are using to qualify GFS and Cluster Suite on ppc does not
include an HMC.  We are trying to use fence_apc but the reboot time for the
hardware is much longer when the power is cut and it is causing timeouts during
our testing.

Comment 6 Nate Straz 2007-04-23 18:48:53 UTC
Is there any chance we can get access to the "unpublished" protocol for talking
to the FSP via the HMC ports?  Adding the cost of an HMC would certainly deter
customers from choosing RHCS.  Also, what is the performance hit from having to
talk to the HMC vs direct to the FSP?

Comment 7 Scott Moser 2007-04-23 19:54:00 UTC
(In reply to comment #6)
> Is there any chance we can get access to the "unpublished" protocol for talking
> to the FSP via the HMC ports? 

I can't speak for certain, but there are no products which do not require an HMC
that manage a power system.  I would highly doubt that anyone would be
interested in releasing the protocol of communication between HMC and fsp.  If
nothing else, that would end up requiring backwards compatibility or support.

> Adding the cost of an HMC would certainly deter
> customers from choosing RHCS.  

Because there is no way to partition a Power system other than by using an HMC,
any higher end system is almost certainly controlled by an HMC.  The cost of the
hardware simply wouldn't be justified without partitioning.

For lower end Power systems, it is possible that the machine is configured
without a HMC and runs a single OS on "bare metal".  I do not know how common
this situation is.  Here, the only way to remotely power cycle the machine would
be via the ASM (https).  To do so, you would need some HTTP client that would
log in as 'admin' (default password is 'admin') and request a reboot that way.

> Also, what is the performance hit from having to
> talk to the HMC vs direct to the FSP?

I'm sure that control via the HMC is not the absolute fastest mechanism, but I
do not believe the overhead is significant when compared to the boot times of
the system.  Additionally, because there is no "direct to the FSP" option, there
really is no comparision to be made.


Comment 8 Brad Peters 2008-03-11 21:13:03 UTC
This feature has not gotten any attention in some time, but is still very much
of interest.  

Any updates?  Anything IBM can do to assist?

Comment 10 Marek Grac 2008-05-23 16:56:25 UTC
Created attachment 306521 [details]
Fence agent for HMC / LPAR

Comment 11 Marek Grac 2008-05-23 16:56:52 UTC
Created attachment 306522 [details]
Fence library

Comment 12 Marek Grac 2008-05-23 17:13:55 UTC
Fence agent for LPAR / HMC is ready, it uses ssh instead of https but this
should not be a problem as cmd line interfaces tend to be more stable over the
time then parsing web pages.

Comment 13 Brian King 2008-07-10 18:49:36 UTC
I took a look at the fencing agent and have a couple minor comments.

In get_power_status, it appears as if this function expects the state of the
LPAR to be either Running or Not Activated. Its unclear what happens if the
state is something else, like Error, which would be the state of the partition
if it were to panic.

I'm not sure what the expected behavior of set_power_status is when the desired
action is to power off the machine, but the current implementation will
effectively flip the virtual power switch on the LPAR. If a graceful, OS
directed shutdown is preferred, there is a separate syntax for that.

Other than those two questions/comments, the script looks fine from an HMC
command usage perspective.

Comment 14 Marek Grac 2008-07-11 09:01:04 UTC
I'll fix try to fix get_power_status. But set_power_status works same as on the
other fencing devices (eg APC), we just cut the power off - STONITH. If you
think that you will need also graceful shutdown, it is not a problem to do it.
But IMHO it can't be a supported solution.

Comment 19 errata-xmlrpc 2009-05-18 21:15:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1050.html


Note You need to log in before you can comment on or make changes to this bug.