103721 – segmentation fault in clustonith -v -l with WTI serial power switches

Bug 103721 - segmentation fault in clustonith -v -l with WTI serial power switches

Summary: segmentation fault in clustonith -v -l with WTI serial power switches

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	clumanager
Sub Component:
Version:	2.1
Hardware:	i586
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Lon Hohberger
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	87937
TreeView+	depends on / blocked

Reported:	2003-09-04 12:10 UTC by Alexander Landgraf
Modified:	2007-11-30 22:06 UTC (History)
CC List:	2 users (show)
Fixed In Version:	1.0.25-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2003-12-10 20:27:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch to rps10.c to fix segmentation fault (353 bytes, patch) 2003-09-04 13:07 UTC, Lon Hohberger	no flags	Details \| Diff
ser output talking to a WTI RPS10 power switch for debugging purposes (382 bytes, text/plain) 2003-10-10 06:59 UTC, Alexander Landgraf	no flags	Details
Patch to change expect-text to allow "PRS" (964 bytes, patch) 2003-10-10 12:53 UTC, Lon Hohberger	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2003:330	0	normal	SHIPPED_LIVE	New clumanager package fixes various bugs	2003-12-19 05:00:00 UTC

Description Alexander Landgraf 2003-09-04 12:10:47 UTC

Description of problem:

I have the problem that  the rps10.so library always gives me a
"segmentation fault" on execution. Also if I only try to query
it. So running a "clustonith -v -l" gives me the following output
 (debugging turned on):

-----------------------------------------------------------------------------

[root@eval13 stonithlib]# clustonith -v -l
Determining Switch Type...st_new entered.
st_new returning.
st_new entered.
st_new returning.
st_status entered (WTI_RPS10)
st_status calling RPSConnect (WTI_RPS10)
RPSConnect entered.
Calling dtrtoggle (WTI_RPS10)
dtrtoggle Complete (WTI_RPS10)
RPSConnect_Ready: Waiting for Ready
RPSConnect_Complete: sending status command
Sending *?
Segmentation fault 
Since I put a few more debugging code into the sources (fprints only) I can see
that the problem occu
res by executing the FD_SET calls.

-----------------------------------------------------------------------------

static int
RPSSendCommand (struct WTI_RPS10 *ctx, int outlet, char command, int timeout)
{
        char            writebuf[10]; /* all commands are 9 chars long! */
        int             return_val;  /* system call result */
        fd_set          wfds, xfds;
                                     /*  list of FDs for select() */
        struct timeval  tv;          /*  */
        FD_ZERO(&wfds);
        FD_ZERO(&xfds);
        if (outlet == 10) {        /* Send to ALL outlets */
          snprintf (writebuf, sizeof(writebuf), "%s*%c\r",
                    WTIpassword, command);
        } else {
          snprintf (writebuf, sizeof(writebuf), "%s%d%c\r",
                    WTIpassword, outlet, command);
        }
        if (gbl_debug) printf ("Sending %s\n", writebuf);
        /* Make sure the serial port won't block on us. use select()  */
        FD_SET(ctx->fd, &wfds);
        if (gbl_debug) printf ("FD_SET on wfds done");
        FD_SET(ctx->fd, &xfds);
        if (gbl_debug) printf ("FD_SET on xfds done");
        tv.tv_sec = timeout;
        tv.tv_usec = 0;
        return_val = select(ctx->fd+1, NULL, &wfds,&xfds, &tv);
        if (gbl_debug) printf ("select done");

-----------------------------------------------------------------------------

Commenting those out gives the following output:

[root@eval13 stonithlib]# clustonith -v -l
Determining Switch Type...st_new entered.
st_new returning.
st_new entered.
st_new returning.
st_status entered (WTI_RPS10)
st_status calling RPSConnect (WTI_RPS10)
RPSConnect entered.
Calling dtrtoggle (WTI_RPS10)
dtrtoggle Complete (WTI_RPS10)
RPSConnect_Ready: Waiting for Ready
RPSConnect_Complete: sending status command
Sending *?
FD_SET on wfds done
FD_SET on xfds done
select done
FAILED
Unable to determine power switch type.
Unable to determine default power switch type.

-----------------------------------------------------------------------------

And I get that in syslog:
Aug 15 11:19:02 eval13 clustonith: Did not find string: 'RPS-10 Ready' fromWTI
RPS10 Power Switch.
Aug 15 11:19:12 eval13 clustonith: WTI_RPS10: Timeout writing to /dev/ttyS0
Aug 15 11:19:12 eval13 clustonith[2239]: <err> clu_stonith_check: stonith device
with IPaddr eval7 ha
s bad status
Aug 15 11:19:12 eval13 clustonith[2239]: <err> clu_stonith_init: failed
clu_stonith_check().
Aug 15 11:19:12 eval13 clustonith[2239]: <err> clu_stonith_type: failed init.

-----------------------------------------------------------------------------

But accessing the rps10 through /dev/ttyS0 via Minicom works fine.
ANY IDEA?
 
Best regards,
 
Alex.

***************************************************
Alexander Landgraf
Senior System Engineer
Advanced UniByte GmbH
Birnenweg 15
72766 Reutlingen
Voice:  +49 7121/483-281
Fax:    +49 7121/483-289
email:  alexander.landgraf
WWW:    http://www.advanced-unibyte.de
***************************************************


Version-Release number of selected component (if applicable):

clumanager-1.0.19-2


How reproducible: always


Steps to Reproduce:
1. just run clustonith -v -l mit WTI serial power switches
    
Actual results:


Expected results:


Additional info:

Comment 1 Lon Hohberger 2003-09-04 13:05:36 UTC

I bet that it's the same behavior as:

http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=92460
(this one was in cluquorumd though)

There was a nasty bug in RPSConnectComplete where the bounds weren't being
checked on the file descriptor, causing a segmentation fault.

Try the following RPM.  Note that it's not Red Hat errata, but it may solve the
problem.  Let me know if it does or not.  Additionally, remember that the RPS-10
doesn't have the notion of a host list the way that network power switches do.

Comment 2 Lon Hohberger 2003-09-04 13:07:56 UTC

Created attachment 94197 [details]
Patch to rps10.c to fix segmentation fault

Comment 3 Alexander Landgraf 2003-09-04 13:14:17 UTC

syslog(LOG_ERR, "%s: device %s is not open!", WTIid, ctx->device);

... and what happend and will happen if "the device is not open"?

Alex.

Comment 4 Lon Hohberger 2003-09-04 13:17:47 UTC

These are unofficial testing RPMs and so are unsupportable by Red Hat Support:

http://people.redhat.com/lhh/.testing/clumanager-1.0.23-0.5.i386.rpm
http://people.redhat.com/lhh/.testing/clumanager-1.0.23-0.5.src.rpm

Comment 5 Alexander Landgraf 2003-09-04 13:23:08 UTC

Well. And do you think that patching rps10.c in clumanager-1.0.19-xx solves 
the problem that the cluster service doesn't start through if the RPS-10 
switches are configured into the cluster.conf? The question will ask if that 
fix also causes the cluster services itself to behave well if the rps10.so lib 
is patched in that way. Does it recognize the switches and is it able to do 
powercycling on the nodes?

:)

Alex.

Comment 6 Alexander Landgraf 2003-09-04 13:24:13 UTC

... and I can't use clumanager 1.0.23-0.5 as long it is unsupported. Does the 
10.0.23-xx rps10 module work in the 1.0.19?

Alex.

Comment 7 Lon Hohberger 2003-09-04 13:38:47 UTC

Segfault should be fixed; look in to other funny behavior ;)

Comment 8 Alexander Landgraf 2003-09-04 13:49:56 UTC

Segfault should be fixed but it will work with the old clumanager? And which 
other funny behavior is already known?

Comment 9 Lon Hohberger 2003-09-04 14:26:49 UTC

The patch I sent fixes a segfault which occurs when the STONITH initialization
fails.  Generally, this is caused by the STONITH module's inability to have
exclusive access to the device or the inability to set the proper mode.  These
can be caused by one or more of the following:

- Pointing at the wrong /dev/ttySx in the cluster configuration
- Kernel has serial console enabled on the same device connected to the RPS-10
- minicom, getty, agetty, or other program has the serial port open for some reason
- Port not connected.

The code is fairly well-tested; this might be a configuration problem.

Here's the output I get from 1.0.19-2 with everything configured properly:

[root@wind utils]# clustonith -l -v -v -v -v
Checking cluster state...Not Active
Reading cluster config file
Determining local node id...1
Determining Switch Type...WTI RPS10 Power Switch (#1)
STONITH stonith_new("rps10")...0x8083ce8
wind
STONITH destroy(0x8083ce8)

If I unplug the RPS10 module or have another application open which has access
to the port, I get:

[root@wind utils]# clustonith -l -v -v -v
Checking cluster state...Not Active
Reading cluster config file
Determining local node id...1
Determining Switch Type...Segmentation fault

This segmentation fault is in RPSSendCommand, at the first FD_SET on line 327
(of the unmodified rps10.c from 1.0.19), and is consistent with what you are seeing.

FYI, the other funny behavior fixed in 1.0.23 is the fact that clustonith
reports some operations for the "wrong" host (note that it reported for "wind").
 This isn't really a bug, but it throws some people off.

Comment 10 Alexander Landgraf 2003-09-04 15:09:06 UTC

Well. But I used minicom to verify that the correct port ist used, closed 
minicom and did a clustonith... then. I had no configuration error for sure. 
All of ...

- Pointing at the wrong /dev/ttySx in the cluster configuration -> NO
- Kernel has serial console enabled on the same device connected to the RPS-
10 -> NO
- minicom, getty, agetty, or other program has the serial port open for some 
reason -> NO (lsof)
- Port not connected -> NOT EVEN

.. was properly checked :(. Should the segfault occure however? For me the 
error seem to occure in the syscall FD_SET, not in send. Is that Possible?

Alex.

Comment 11 Lon Hohberger 2003-09-04 17:29:46 UTC

With the above fix, the segmentation fault doesn't occur (ie, the FD_SET never
happens, so the segfault doesn't occur).  I can not, however, reproduce your
specific behavior with everything in place as I know it.  

As a last-ditch effort, ensure that the DIP switches on your RPS10 are
configured as follows:

1: Down
2: Down
3: Up
4: Down

The STONITH module _only_ runs at 9600bps.

Failing that, can you file an incident with Red Hat Support for me?

https://enterprise.redhat.com/issue-tracker/

Comment 12 Alexander Landgraf 2003-09-04 19:12:11 UTC

The dips are setup correct. I'm also communicating with 9600 baud under minicom.

What do you mean with "Failing that, can you file an incident with Red Hat 
Support for me?"? With "Failing that" you mean that if that's also setup 
correct I shall fill out the form? (sorry I'm German)

Alex.

Comment 13 Lon Hohberger 2003-09-05 12:52:13 UTC

Yes, please file an incident with RH Support so this can be properly tracked. 
They're better equipped to isolate the problem and how to reproduce it.  (For
instance, we both have proper setups, but yours doesn't work and mine does.)

http://enterprise.redhat.com/issue-tracker

Please include in the incident:
- output from 'sysreport'
- contents /etc/cluster.conf from both members (they should be the same, but
just in case...)

Comment 14 Alexander Landgraf 2003-09-09 06:24:20 UTC

okay ... I got the cluster.confs and the sysreports, but I don't get into 
http://enterprise.redhat.com/issue-tracker. What do I have to do herefor? Am I 
really able to?

Best regards,

Alex.

Comment 17 Lon Hohberger 2003-10-08 21:52:27 UTC

I really don't think the segfault is what's causing the power switch problems.     

Here's why: As you know, the segmentation fault occurs in the
ctx->fd == -1.  Indexing an array at index -1 is just *asking* for trouble. 
There are only a few ways the file descriptor could be -1:

- The serial port couldn't be opened.
- The serial port attributes could not be set correctly.
- Data pending on the serial port could not be flushed.
- We did not receive expected data from the RPS10.

If any of those happened, a corresponding message would have appeared in the
system log:

- "WTI_RPS10: Can't open /dev/ttyS1 : <reason>"
- "WTI_RPS10: Can't set attributes /dev/ttyS1 : <reason>"
- "WTI_RPS10: Can't flush /dev/ttyS1 : <reason>"
- "Did not find string 'RPS-10 Ready' from RPS10..."

So anyway, the first 3 occurred successfully, based on the Bugzilla data and
looking at the code:

  RPSConnect entered.  
  Calling dtrtoggle (WTI_RPS10) <-- port open!
  dtrtoggle Complete (WTI_RPS10) <-- DTR toggle success
  RPSConnect_Ready: Waiting for Ready
  RPSConnect_Complete: sending status command

It should have said "Got Ready", followed by "Got NL", and should never have
proceeded in to RPSConnect_Complete().  The fact that it gets a segmentation
fault in RPSConnect_Complete is orthagonal to the fact that it missed the
"RPS-10 Ready" string.  Because the DTR toggle succeeded, it looks like you have
a valid configuration (looks can be deceiving, though!) ).

So, what we need to do is devise a plan for determining whether there's some
sort of difference in the output of the WTI RPS-10 (I doubt it, I suppose it is
the logical next step).

You'll need a little script-able serial dumb-terminal program (minicom won't
work for this).  You can get it from here:

http://people.redhat.com/lhh/ser-1.0.2-1.src.rpm
http://people.redhat.com/lhh/ser-1.0.2-1rhel2.1.i386.rpm

- Plug the RPS-10 into the wall (for power..).
- Plug the serial port into one of your machines (for this example, 
  I will use "ttyS0" as the serial port), but don't plug any machines into 
  its power port.
- Run: script foo.txt
- Run: ser /dev/ttyS0 9600
  - ser will complain that there's no carrier detect - that's normal,
    the RPS-10 doesn't use DCD.
- Push "Ctrl-A Ctrl-Z".  It should say "[HANGUP]".
  After about 10 seconds, it should say "RPS-10 Ready"
- Issue a few commands to the RPS-10 unit.
  - To issue a command, type the following while holding the "Ctrl" key:
    bxxbxx
  - Issue the "0T" command (type ^B^X^X^B^X^X0T<ENTER>)
  - Issue the "0?" command (type ^B^X^X^B^X^X0?<ENTER>)
- Press Ctrl-A Ctrl-X to quit ser.
- Type exit.  script will now exit, and say:
  "Script done, file os foo.txt"
- Upload unaltered foo.txt into Bugzilla...

Comment 18 Alexander Landgraf 2003-10-09 11:56:11 UTC

okay ... I gonna install the package and do the tests ... but only tonight or 
tomorrow morning. All my machines in the Lab are currently used by customer 
demos and installations.

Best regards,

Alex.

PS: however? Why wouldn't that work with minicom. As I remember I got the same 
results using minicom. I had been able to switch ports off and on ... and I 
got an "RPS-10 Ready" after sending a hangup :)

Comment 19 Lon Hohberger 2003-10-09 12:52:40 UTC

minicom doesn't capture all the output (including control characters); using
"ser" run from within "script" does.  The idea is that it might be something
really simple - like the way carriage-returns and line-feeds are
generated/handled by the RPS-10 units you have - I'm just trying to cover all
the bases :)

Or it could be something even more simple - like the RPS10 driver isn't waiting
long enough for the "RPS-10 Ready" message ... we'll cross that bridge when we
come to it.

Comment 20 Alexander Landgraf 2003-10-09 12:54:35 UTC

okay ... CR and LF. Makes sense. I'll let you know ... certainly tomorrow!

Thanks,

Alex.

Comment 21 Alexander Landgraf 2003-10-10 06:59:35 UTC

Created attachment 95093 [details]
ser output talking to a WTI RPS10 power switch for debugging purposes

Comment 22 Alexander Landgraf 2003-10-10 07:00:28 UTC

... here's the output. But it doesn't look so bad :(.

Best regards,

Alex.

PS: where are you located? USA? GB?

Comment 23 Alexander Landgraf 2003-10-10 12:03:28 UTC

... after inserting the "if ( ... < 0 )"s into the code I get the following:

console:
[root@eval4 stonith]# clustonith -vS
Determining Switch Type...FAILED
Unable to determine power switch type.
Unable to determine default power switch type.

messages:
Oct 10 13:56:04 eval4 clustonith: Did not find string: 'RPS-10 Ready' fromWTI 
RPS10 Power Switch.
Oct 10 13:56:04 eval4 clustonith: WTI_RPS10: device /dev/ttyS0 is not open!
Oct 10 13:56:04 eval4 clustonith[3037]: <err> clu_stonith_check: stonith 
device with IPaddr eval8 has bad status
Oct 10 13:56:04 eval4 clustonith[3037]: <err> clu_stonith_init: failed 
clu_stonith_check().
Oct 10 13:56:04 eval4 clustonith[3037]: <err> clu_stonith_type: failed init.

Comment 24 Lon Hohberger 2003-10-10 12:33:17 UTC

Ah ha!  Your unit is saying:

"PRS-10 Ready", not
"RPS-10 Ready".

That would definitely break the expect-ish code we use to talk to the power
controller...

Very strange.  I'll have a test-package pretty soon.  Hold tight.

Comment 25 Lon Hohberger 2003-10-10 12:53:52 UTC

Created attachment 95100 [details]
Patch to change expect-text to allow "PRS"

from clumanager-1.0.x/:
  patch -p0 < rps10-prs.patch

from clumanager-1.0.x/src/stonithlib/:
  patch -p2 < rps10-prs.patch

Comment 26 Lon Hohberger 2003-10-10 12:56:06 UTC

1.0.23 with the patch applied:

http://people.redhat.com/lhh/clumanager-1.0.24-0.1.i386.rpm
http://people.redhat.com/lhh/clumanager-1.0.24-0.1.src.rpm

Comment 27 Alexander Landgraf 2003-10-10 13:23:29 UTC

... well. "PRS". That something I really oversaw. Very, very strange. Might it 
be an issue caused by WTI? I will call them and ask how that may have 
happened. I'm really sorry but I obiously never really parsed the letters very 
thoroughly or carefully. I'm will report the results to you.

Best regards and have a good weekend,

Alex.

Comment 28 Lon Hohberger 2003-10-10 17:55:33 UTC

I wouldn't worry about the "PRS" vs "RPS" thing.  I think it is more important
whether or not the driver works, and whether you can then get your machines back
online.

Comment 29 Alexander Landgraf 2003-10-13 08:23:22 UTC

Well ...

[root@eval4 root]# clustonith -vr eval4
Determining Switch Type...WTI RPS10 Power Switch (#0)
Successfully power cycled host eval4.
[root@eval4 root]#

messages:
Oct 13 10:09:55 eval4 clustonith: Host eval4 being rebooted.

... and it really switches off an on :)!

Looks much better now :). But what I really don't understand is the name I 
have to put behind -r option. Eval4 is the current host .. but the PowerSwitch 
attached to it switches the other host. In that case eval8. Is there a good 
explanation which name I have to use under which condition? Is the cluquorumd 
doing the right thing when running .. I mean switching the other node? And how 
to I have to use the WTI NPS-230 and APC AP-9212 Switches? The NPS-230 is 
fully redundant. So can I however also use two of those boxes? Or do I have to 
use just a single one (one IP)? And what do I have to configure as IP or Name 
in cluconfig for node4?

Choose one of the following power switches:
  o NONE
  o RPS10
  o BAYTECH
  o APCSERIAL
  o APCMASTER
  o WTI_NPS
  o SW_WATCHDOG
Power switch [RPS10]: WTI_NPS
Enter IP address or hostname used to access the power switch [eval4]: temp0
Looking for host temp0 (may take a few seconds)...
Warning: Host temp0 not responding
Keep your selection? [yes]: yes
Enter the name of the outlet that power cycles member 'eval4' [eval4]:
Enter the password for the power switch [10]:

... well. Do I have to put the IP and Port here which switches eval4? And does 
that work correctly using two power switches? And also with APC switches?

Thanks very much - so far - however :).

Maybe you're able to give me a last input about power switches (questions 
above). Maybe there's also a technical white paper?

Best regards,

Alex.

Comment 30 Lon Hohberger 2003-10-14 15:45:46 UTC

First, I need to point out: Cluquorumd + clupowerd work _properly_ when network
power switches are in use.

Clustonith is difficult to operate properly when network power switches are in use.

This is because in the RHEL-2.1 version of clumanager, power switches are
indexed by member #.  Because of this, the "clustonith" utility, when it was
originally written, was designed with "One Power Switch Per Member" - which is
why when network-power switches are in use, it behaves erratically.

It's basically backwards logic: Serial power switches are handled as though
they're wired to the _opposite_ member; Network power controllers are handled as
if wired to the current member.

With the 1.0.24-0.1 version, you have a workaround.  There's a command line
option, "-o" which specifies "Function using the 'other' member's assigned ID" -
which causes it to function.

Here's when to use "clustonith -o -r <other_member>":
(1) When *two* network-based power switches are in use.
(2) When calling "clustonith" on member #1 when one network power switch is in use.

Here's when to use "clustonith -r <other_member>":
(1) Two serial power switches are in use
(2) Calling "clustonith" on member #0 when one network power switch is in use.

(the two "2" notes above may be backwards...)

FYI, this strange behavior is not present in the RHEL3 beta, and is a
"won't-fix" for RHEL-2.1 (besides the already mentioned "-o" workaround).

Comment 32 Alexander Landgraf 2003-10-20 09:30:00 UTC

Well. Now we have the problem that the APC AP9212s are no longer available. 
The successor switch is the AP7920. Did you already recognize that? For our 
last project we had not been able to get AP9212s any longer. But the AP7920 
has a much better Network Module. The old one (AP9112 always hungup by network 
broadcasts ... e.g. nmbd broadcasts). The newer doesn't!

Do you have any input for me? When do you plan to support the AP7920s?

Best regards,

Alex.

Comment 33 Lon Hohberger 2003-10-20 13:02:03 UTC

The 1.0.24-x.x RPM also has a driver for the AP9225 called "apcplus", but if you
use the AP9225, you can't use it in Daisy-Chain mode.

We don't have a 7920 for development.

Comment 34 Alexander Landgraf 2003-10-20 13:22:32 UTC

... well. But since the AP9212 are no longer available in the market (you'll 
see soon) I'm sure that you will need to evaluate the AP7920s soon. Are there 
already any plans to do that? What about support in AS3.0?

Another urgent thing I would need is the support for software RAIDs as block 
device in cluster services. Will there be any support for that in RH-AS-3.0? 
Or do you already have appropriate scripts to do that?

Best regards,

Alex.

Comment 35 Lon Hohberger 2003-10-20 13:58:20 UTC

We don't currently have a 7920 for development.  You can try using the AP9225
"apcplus" driver.  It may or may not work.  In either case, we can't support it
until we get a 7920 for development.  I'll ask around and see what I can do, bu

Software-RAID is not supported for clustering in RHEL 2.1 or RHEL 3, neither are
host bus adapters with on-board RAID (ex: the Adaptec AAC series RAID controllers).

We recommend fibre-attached RAID arrays for best performance/reliability in HA
clusters.

Comment 36 Alexander Landgraf 2003-10-20 14:15:11 UTC

Well. Isn't the AP9225 a serially attached switch? Due to a distance of about 
200m between the two nodes I need to use the network based switches.

--

Concerning the software RAID I just do the following: I write my 
own /usr/share/cluster/services/svclib_raid1 script which I put into 
the /usr/share/cluster/services/service script to start the raid before 
the "device start" function is called during startup and to stop the raid 
after the "device stop" function is called during shutdown. The only problem I 
have is that if you provide a newer clumanager.rpm my own scripts will be 
overwritten. Do you have any idea how to solve that problem? I would suggest 
that you provide not only a single script per service in the cluster service. 
It should be possible to define if the user script has to be run before 
starting a cluster's service (prestart) or after a cluster's service is 
stopped (poststop). Default ist poststart and prestop, right?

Best regards,

Alex.

Comment 37 Lon Hohberger 2003-10-20 14:42:58 UTC

The 9225 does have serial ports, but Red Hat Cluster Manager only uses the
network-based capabilities of it, provided by an AP9606 Web/SNMP management
card. Unfortunately, for your purposes, the AP9225 may not be suitable for your
environment - the AP9606 is the same card used in the AP9211 and AP9212 :(

You can submit a feature request via a Bugzilla RFE for the APC AP7920 against
"Red Hat Enterprise Linux Public Beta" / "taroon-rc" / "clumanager", but I can't
make any promises as to when/if it will get done.

WRT Software RAID: Integration w/ the cluster software is *not* the problem. 
Software RAID can not coordinate access to the RAID set across multiple
machines.  Thus, the risk for data corruption is non-trivial and *always*
present when software RAID is used for clustered services.

For more information:

http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/ch-hardware.html#S2-HARDWARE-SHAREDSTOR

Comment 38 Alexander Landgraf 2003-10-27 06:33:35 UTC

"The 9225 does have serial ports, but Red Hat Cluster Manager only uses the
network-based capabilities of it, provided by an AP9606 Web/SNMP management
card."

Well ... we found out that the 9225+9606 ist just the american 110V model's 
name for the europeen AP9212. And these are "end of life" @ APC. I would be 
real glad if you could figure out which APC switches RedHat plans to support 
next - and in which version of the clumanager.

Best regards,

Alex.

Comment 39 Larry Troan 2003-11-04 16:48:40 UTC

PER BUG 108148
------ Additional Comments From lhh  2003-10-28 10:26 -------
CVS build of clumanager:

http://people.redhat.com/lhh/.testing/clumanager-1.2.5-0.1.89.2.8.i386.rpm
http://people.redhat.com/lhh/.testing/clumanager-1.2.5-0.1.89.2.8.src.rpm

Fixes: #103721, #106465, #107274, #107276, #108148

Comment 40 Suzanne Hillman 2003-12-10 20:27:10 UTC

This is not something that can be tested in-house, due to not having
power switches which say "PRS-10 Ready", instead of "RPS-10 Ready".
The fact that the submitter seems happy with the fix makes me willing
to mark this closed.

Comment 41 Alexander Landgraf 2003-12-11 05:52:32 UTC

Well. The "PRS-10" <-> "RPS-10" fix worked pretty well. Thanks again. 
So you may close the call.

Best regards,

Alex.

Note You need to log in before you can comment on or make changes to this bug.