Bug 134489

Summary: fence_apc fails when outlet number is not printed in outlet control/configuration menu
Product: [Retired] Red Hat Cluster Suite Reporter: Richard Keech <rkeech>
Component: fenceAssignee: Jim Parsons <jparsons>
Status: CLOSED ERRATA QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: benmc, cluster-maint, jbacik, nobody, rkenna
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0138 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-10 21:23:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output file from manual fence_apc command
none
patch to resolve issue none

Description Richard Keech 2004-10-04 01:21:31 UTC
Description of problem:

The query/response between the fence_apc fencing agent and
an "APC Switched Rack PDU" fails with "failed: unrecognised menu
response" message.  This leads to a fencing failure.

Version-Release number of selected component (if applicable):

GFS 6.0.0-15.
APC model AP7921, Softare version 2.6.1.

How reproducible:

consistent

Steps to Reproduce:

1.  start with acdtive two-node GFS cluster; cluster master
is node1, cluster client is node2.  no slaves in cluster.
2.  test fencing with "gulmtool forceexpire node1:core node2"
3.  
    using a local replacement the fencing succeeds.
  
Actual results:

   using the standard fence_apc agent the fencing fails.

Expected results:

  fencing should work.

   using a modified fence_apc agent the fencing works OK.


Additional info:

A diff between the original fence_apc and the modified one is 
provided here:

27c27
< my $immediate = 'immediate'; # # Or 'delayed' - action string prefix
on menu
---
> my $immediate = 'Immediate'; # # Or 'delayed' - action string prefix
on menu
38c38
< $opt_o = 'reboot';           # Default fence action.
---
> $opt_o = 'Reboot';           # Default fence action.
52c52
< $BUILD_DATE="(built Thu Sep 23 12:01:28 EDT 2004)";
---
> $BUILD_DATE="(built Mon Sep 27 12:48:07 EST 2004)";
222c222
<                       /--\s*control console.*(\d+)\s*-\s*device
manager/is  ||
---
>                       /--\s*Control Console.*(\d+)\s*-\s*device
manager/is  ||
237c237
<                       /--\s*Outlet
$switchnum:$opt_n\D.*(\d+)\s*-\s*outlet
control\s*$switchnum:?$opt_n\D/ism ||
---
>                       /--\s*Outlet
$switchnum:$opt_n\D.*(\d+)\s*-\s*Outlet
Control\s*$switchnum:?$opt_n\D/ism ||
248a249,250
>                       /--\s*Outlet.*(\d+)\s*-\s*Control Outlet/is  ||
>
250c252
<                       /--\s*Outlet $opt_n\D.*(\d+)\s*-\s*control
outlet\s+$opt_n\D/ism
---
>                       /--\s*Outlet $opt_n\D.*(\d+)\s*-\s*Control
Outlet\s+$opt_n\D/ism
282c284
<                       /--\s*control console.*(\d+)\s*-\s*Logout/is
---
>                       /--\s*Control Console.*(\d+)\s*-\s*Logout/is
300c302,303
<       if (! /$immediate $opt_o.*outlet $opt_n\s.*YES.*to continue/si ) {
---
>       #if (! /$immediate $opt_o.*outlet $opt_n\s.*YES.*to
continue/si ) {
>       if (! /$immediate $opt_o.*YES.*to continue/si ) {

Comment 1 Adam "mantis" Manthei 2004-10-04 18:40:31 UTC
The only lines that I can see that will have any relevance to changing
the scripts behavior are as follows:

300c302,303
<       if (! /$immediate $opt_o.*outlet $opt_n\s.*YES.*to continue/si ) {
---
>       #if (! /$immediate $opt_o.*outlet $opt_n\s.*YES.*to
continue/si ) {
>       if (! /$immediate $opt_o.*YES.*to continue/si ) {


The other lines all change the casing of the words inside a regex that
 is case insensitive.

I'd like to see why the menus differ for our two APC's.  Mine works
just fine.  Could you please rerun the test from the commandline with
the -v option and then post the resulting /tmp/apclog file?  e.g.
`fence_apc -a 1.2.3.4 -l apc -p apc -n 1 -v`


Comment 2 Ben McConaghy 2004-10-14 04:40:02 UTC
Created attachment 105180 [details]
Output file from manual fence_apc command

Comment 3 Adam "mantis" Manthei 2004-10-14 16:52:32 UTC
The command line that generated that output would have been
*extremely* usefull too.  I assume that either outlet 1 or 3 were
specified (i.e. "full" or "vega") on the commanline with the -n option.

This version of the APC firmware is one that I have not yet had a
chance to use.  All other APC interfaces that I have logged into
display a menu option, the outlet number and the outlet name in the
"Outlet Control/Configuration" menu.  The output from your test run
only indicates a menu option and an outlet name (i.e. the outlet
number is missing).

For example, my menus appear as follows:
------- Outlet Control/Configuration ---------------------------

     1- Outlet 1: Outlet 1                 ON
     2- Outlet 2: Outlet 2                 ON
     3- Outlet 3: Outlet 3                 ON
     4- Outlet 4: Outlet 4                 ON
     5- Outlet 5: vega                     ON
     6- Outlet 6: Outlet 6                 ON
     7- Outlet 7: Outlet 7                 ON
     8- Outlet 8: Outlet 8                 ON
     9- Master Control/Configuration

     <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log


While yours on the otherhand is:

------- Outlet Control/Configuration ------------------------------

     1- full                     ON
     2- Outlet 2 (mebsuta1)      ON
     3- vega                     ON
     4- Outlet 4 (mebsuta2)      ON
     5- Outlet 5 (mebsuta3)      ON
     6- Outlet 6 (mebsuta4)      ON
     7- Outlet 7                 ON
     8- Outlet 8                 ON
     9- Master Control/Configuration

     <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log

As a work around, I would suggest renaming "vega" to "Outlet 3 (vega)"
and "full" to "Outlet 1 (full)".  If this doesn't work for you, please
 update this bug with:
1. the command line arguments that produced the above output
2. a test run with the -v on the modified script complete with the
comand line arguments

If renaming the outlet fixes the problem, we will need to determine if
there should be an alternative parameter added to the script that uses
the outlet name instead of the outlet number, or we will need to
modify documentation to force users to adhere to naming convention for
their outlets.

Comment 4 Ben McConaghy 2004-10-18 02:13:05 UTC
Renaming the outlet does allow the script to function, and we have
done this with outlets 2, 4, 5, 6 above as a workaround. However we
would like to be able to name the outlet simply the name of the host
if possible. Would it be possible to use the outlet number at the
start of the line to match with, as this is not to my knowledge user
configurable?

Comment 5 Adam "mantis" Manthei 2004-10-18 14:19:49 UTC
In all versions of the APC firmware that I have seen, the outlet
control /configuration menu prints the outlet number in the format
"Outlet 0" followed by the outlet name.  I have made the assumption
that all the APCs out there have that same functionality.  An
additional feature could maybe be added to the agent that allowed you
to use the outlet name instead of the outlet number.  I will need to
investigate this possibility a little more.

Comment 7 Kiersten (Kerri) Anderson 2006-08-08 14:43:14 UTC
Moving this to be a version 4 bug.  We will not address this in a RHEL3 based
product, but will consider it for the 4.5 release.

Comment 9 Josef Bacik 2006-09-13 15:05:22 UTC
Created attachment 136161 [details]
patch to resolve issue

iirc you dont have access to IT so i'm going to attatch the patch directly

Comment 15 Jim Parsons 2007-01-31 19:04:10 UTC
Yee Haa! The new apc agent is checked into the rhel4 tree. Name your outlets and
even group them if you want. we won't be fooled again.

uh...but some additional testing would be good!  :)

Comment 19 Red Hat Bugzilla 2007-05-10 21:23:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0138.html