Description of problem: The cluster manager try to fence a cluster node and it fails with the following error (from /var/log/messages): ---------------------------------------------------------------------------- May 24 14:26:26 hostname fenced[5139]: fencing node "cl-node-74" May 24 14:26:41 hostname fenced[5139]: agent "fence_apc" reports: Traceback (most recent call last): File "/sbin/fence_apc", line 798, in ? mai n() File "/sbin/fence_apc", line 345, in main do_power_off(sock) File "/ sbin/fence_apc", line 782, in do_power_off x = do_power_switch(sock, "o ff") File "/sbi May 24 14:26:41 hostname fenced[5139]: agent "fence_apc" reports: n/fence_apc", line 590, in do_power_switch result_code, response = power_off(tx t + ndbuf) File "/sbin/fence_apc", line 786, in power_off x = power_switch(buffer, False, "2", "3"); File "/sbin/fence_apc", line 779, in power_swi tch raise "un May 24 14:26:41 hostname fenced[5139]: agent "fence_apc" reports: known screen encountered in \n" + str(lines) + "\n" unknown screen encountered in ['3', '', '', '------- Power Supply Status ------------------------------------- --------------', '', ' Primary Power Supply Status: OK', ' Secondary Power S May 24 14:26:41 hostname fenced[5139]: agent "fence_apc" reports: upply Status: OK', '', '', ' <ESC>- Back, <ENTER>- Refresh', '> '] ---------------------------------------------------------------------------- Then I copied an older version (1.32.25) from another cluster into /sbin at it works really nice but that cannot be the solution. Version-Release number of selected component (if applicable): fence-1.32.45-1 How reproducible: Just test it with "ifdown bond0" and take a look to /var/log/messages. Th cluster manager will try to fence the node because he missed him to many heartbeats ... and so on ... Additional info: uname -a Linux hostname 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
I see the same error with fence-1.32.45-1.0.1 on an i686 system. This is definitely a regression from 1.32.25-1 which I was using before the upgrade to 4U5. APC details: Model: AP7920 Manufacture Date: 07/02/2006 Hardware Revision: B2 Network Management Card AOS: 2.7.0 Rack PDU APP: 2.7.3
Here's the relevant piece of code: elif i.find(DEVICE_MANAGER) != (-1): if switchnum != "": res = switchnum + "\r" else: res = "3\r" return (NOT_COMPLETE, res) which operaties on the following menu: ------- Device Manager -------------------------------------------------------- 1- Phase Monitor 2- Outlet Control 3- Power Supply Status Looks like the code is assuming "Outlet Control" will be option 3, which it isn't in this case. The old version uses a regular expression to identifiy the correct option which succeeds.
This is a duplicate of 246216 *** This bug has been marked as a duplicate of 246216 ***
There will be new fence agent for APC in RHCS 4.8 (same as in 5.3) which includes ssh support, non-root accounts support and also fix for this problem.