Bug 241217

Summary: fence_apc 1.32.45 doesn't work
Product: [Retired] Red Hat Cluster Suite Reporter: Jonny <jschulz>
Component: fenceAssignee: Marek Grac <mgrac>
Status: CLOSED NEXTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint, mgrac, newbery
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-04 09:43:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Jonny 2007-05-24 13:34:58 UTC
Description of problem:

The cluster manager try to fence a cluster node and it fails with the following
error (from /var/log/messages):

May 24 14:26:26 hostname fenced[5139]: fencing node "cl-node-74"
May 24 14:26:41 hostname fenced[5139]: agent "fence_apc" reports: Traceback 
(most recent call last):   File "/sbin/fence_apc", line 798, in ?     mai
n()   File "/sbin/fence_apc", line 345, in main     do_power_off(sock)   File "/
sbin/fence_apc", line 782, in do_power_off     x = do_power_switch(sock, "o
ff")   File "/sbi
May 24 14:26:41 hostname fenced[5139]: agent "fence_apc" reports: n/fence_apc", 
line 590, in do_power_switch     result_code, response = power_off(tx
t + ndbuf)   File "/sbin/fence_apc", line 786, in power_off     x = 
power_switch(buffer, False, "2", "3");   File "/sbin/fence_apc", line 779, in 
tch     raise "un
May 24 14:26:41 hostname fenced[5139]: agent "fence_apc" reports: known screen 
encountered in \n" + str(lines) + "\n" unknown screen encountered in  
['3', '', '', '------- Power Supply Status -------------------------------------
--------------', '', '          Primary Power Supply Status: OK', '        
Secondary Power S
May 24 14:26:41 hostname fenced[5139]: agent "fence_apc" reports: upply Status: 
OK', '', '', '     <ESC>- Back, <ENTER>- Refresh', '> ']  

Then I copied an older version (1.32.25) from another cluster into /sbin at it 
works really nice but that cannot be the solution.

Version-Release number of selected component (if applicable):


How reproducible:

Just test it with "ifdown bond0" and take a look to /var/log/messages. Th 
cluster manager will try to fence the node because he missed him to many 
heartbeats ... and so on ...

Additional info:

uname -a
Linux hostname 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64 x86_64 
x86_64 GNU/Linux

Comment 1 Robert Clark 2007-06-05 13:42:49 UTC
I see the same error with fence-1.32.45-1.0.1 on an i686 system. This is
definitely a regression from 1.32.25-1 which I was using before the upgrade to 4U5.

APC details:
Model: AP7920
Manufacture Date: 07/02/2006
Hardware Revision: B2
Network Management Card AOS: 2.7.0
Rack PDU APP: 2.7.3

Comment 2 Robert Clark 2007-06-05 14:56:07 UTC
Here's the relevant piece of code:

elif i.find(DEVICE_MANAGER) != (-1):
  if switchnum != "":
    res = switchnum + "\r"
    res = "3\r"
  return (NOT_COMPLETE, res)

which operaties on the following menu:

------- Device Manager --------------------------------------------------------

     1- Phase Monitor
     2- Outlet Control
     3- Power Supply Status

Looks like the code is assuming "Outlet Control" will be option 3, which it
isn't in this case. The old version uses a regular expression to identifiy the
correct option which succeeds.

Comment 3 Jim Parsons 2008-02-27 14:26:05 UTC
This is a duplicate of 246216

*** This bug has been marked as a duplicate of 246216 ***

Comment 5 Marek Grac 2009-03-04 09:43:52 UTC
There will be new fence agent for APC in RHCS 4.8 (same as in 5.3) which includes ssh support, non-root accounts support and also fix for this problem.