Bug 169994

Summary: fence_apc does not work with large number of outlets
Product: [Retired] Red Hat Cluster Suite Reporter: Axel Thimm <axel.thimm>
Component: fenceAssignee: Jim Parsons <jparsons>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint, jbacik, jordan, kgonzale, sfolkwil
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 1.32.25-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-07 08:45:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 164915    
Attachments:
Description Flags
A patch to fix this problem
none
typescript from runtime
none
slighly more legible typescript -- ran through strings.
none
snmp apc agent that works on large power strips
none
fence_apc for many outlets none

Description Axel Thimm 2005-10-06 10:16:42 UTC
Description of problem:
If an apc switch has too many outlets to display on one single page it requires
the user to "Press <ENTER> to continue". The script assumes that there is no
such prompt, and will fail with

failed: unrecognised menu response

Version-Release number of selected component (if applicable):
fence-1.32.1-0

How reproducible:
always

Steps to Reproduce:
1.Get a switch with enough outlets (e.g. 25)
2.Try to fence_apc any outlet on this switch
3.
  
Actual results:
fails and returns an error

Expected results:
should fence the outlet

Additional info:
This is the relevant part of /tmp/apclog:

------- Device Manager -------------------------------------------------------
-

     1- Phase Monitor/Configuration
     2- Outlet Restriction Configuration
     3- Outlet Control/Config
     4- Power Supply Status

     <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log
> 3

^M------- Outlet Control/Config ------------------------------------------------
-

     1- Outlet  1: .                       ON
     2- Outlet  2: .                       ON
     3- Outlet  3: .                       ON
     4- Outlet  4: .                       ON
[...]
    21- Outlet 21: junior (atrpms)         ON
    22- Outlet 22: zerberus (secnet)       ON
^M        Press <ENTER> to continue...^M                                    ^M  
  23- Outlet 23: w4 (webserver)          ON
    24- Outlet 24: zs04 (mailserver)       ON
    25- Master Control/Config

     ?- Help, <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log
> 

^M------- Outlet Control/Config ------------------------------------------------
-

     1- Outlet  1: .                       ON
     2- Outlet  2: .                       ON
     3- Outlet  3: .                       ON
     4- Outlet  4: .                       ON
[...]
    21- Outlet 21: junior (atrpms)         ON
    22- Outlet 22: zerberus (secnet)       ON
^M        Press <ENTER> to continue...^M                                    ^M  
  23- Outlet 23: w4 (webserver)          ON
    24- Outlet 24: zs04 (mailserver)       ON
    25- Master Control/Config

     ?- Help, <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log
> 

^M------- Device Manager -------------------------------------------------------
-

     1- Phase Monitor/Configuration
     2- Outlet Restriction Configuration
     3- Outlet Control/Config
     4- Power Supply Status

     <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log
> 
[...]

Comment 1 Jordan Schweller 2005-10-06 16:34:58 UTC
I've seen this too.  I think there needs to be some logic after the "# Identify
next option" section of subroutine navigate.  This is something I was looking
at, but have not had time to get it working:

                # special mod for AP7941
                # if(in outlet control section)
                #  {
                #     if(outlet match)
                #       {
                #          send space;
                #          choose outlet;
                #          goto next;
                #       }
                #     send space;
                #     if(outlet match)
                #       {
                #          choose outlet;
                #          goto next;
                #       }
                #  }
                if(/--\s*Outlet
Control.*(\d+)\s*-\s+Outlet\s+\d{1,2}\D[^\n]*\s(?-i:ON|OFF)\*?\s/ism)
                {
                    print "DEBUGGING: In outlet control section...\n";
                    if( ! /--\s*Outlet
Control.*(\d+)\s*-\s+Outlet\s+$opt_n\D[^\n]*\s(?-i:ON|OFF)\*?\s/ism)
                    {
                        print "DEBUGGING: Inside 1-22 section\n";
                        $t->print("");
                        $t->print($1);
                    }
                    print "DEBUGGING: attempting to lookup in 23-25 section\n";
                    $t->print("");
                    if(/--\s*Outlet
Control.*(\d+)\s*-\s+Outlet\s+$opt_n\D[^\n]*\s(?-i:ON|OFF)\*?\s/ism)
                    {
                        print "DEBUGGING: Inise 23-25 section\n";
                        $t->print($1);
                        next;
                    }
                }
                # end special mod for AP7941


Comment 2 Josef Bacik 2005-11-02 23:04:05 UTC
Created attachment 120667 [details]
A patch to fix this problem

This patch "should" fix the problem, but as I do not have this kind of hardware
laying around I can't tell if it will.	Let me know if it works for you.  I
patched the RPM directly, you should be able to apply them to /sbin/fence_apc
manually.

Comment 3 Jordan Schweller 2005-11-03 14:08:00 UTC
Created attachment 120688 [details]
typescript from runtime

This is the attachment with the full story on whats going on with the new
fence_apc patch.

Comment 4 Jordan Schweller 2005-11-03 14:14:03 UTC
Created attachment 120689 [details]
slighly more legible typescript -- ran through strings.

This is the same typescript as above, but ran through strings to make it more
readible.  Sorry for the redundancy.

Comment 6 Jim Parsons 2005-11-22 13:26:12 UTC
The test version of fence_apc_snmp has helped with this issue, but not
eliminated it completely yet. There is an intermittent problem with the snmp
agent which has been tracked down to a build setting in net-snmp. New version of
agent is forthcoming.

Comment 7 Jim Parsons 2005-12-01 14:27:23 UTC
New version of snmp script sent. Had meeting with APC representitives, and this
issue was raised. We now have a fallback strategy if the latest script is not
completely reliable.

Comment 8 Jim Parsons 2005-12-05 21:18:04 UTC
The root problem here turns out not to be the script. The customer was logged
into the switch intermittantly while fencing was occurring; and APC devices
allow only one log in session at a time. The snmp script tested for this issue
is a valid solution to the APC devices which have a large number of outlets that
do not fit on one screen.

Comment 9 Jim Parsons 2006-04-25 11:55:38 UTC
One other note - the standard perl fence agent for APC has been fixed to work
with large outlet strips as well as the snmp agent.

Comment 10 Axel Thimm 2006-04-25 17:37:01 UTC
Where can I get this fixed perl fence agent? I checked both RHEL4 and FC5 and
neither can cope with 24 ports:

fence-1.32.18-0 (RHEL4)
fence-1.32.17-0.FC5.1 (FC5)

Thanks!

Comment 11 Jim Parsons 2006-04-25 18:20:10 UTC
Created attachment 128210 [details]
snmp apc agent that works on large power strips

This first attachment is an snmp version of the agent - the interface is
identical to the current perl apc agent. Untar and check out the readme.

Comment 12 Jim Parsons 2006-04-25 18:23:16 UTC
Created attachment 128211 [details]
fence_apc for many outlets

Please let me know how this works out.

Comment 18 Axel Thimm 2007-02-07 08:45:14 UTC
I tested it on a 25 port switch. I also tested RHEL4's fence-1.32.25-1 and it
worked as well. Thanks!