Bug 460054 - fence_apc fails with pexpect exception
fence_apc fails with pexpect exception
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
All Linux
medium Severity medium
: rc
: ---
Assigned To: Marek Grac
Cluster QE
Depends On:
Blocks: 501586
  Show dependency treegraph
Reported: 2008-08-25 16:19 EDT by Nate Straz
Modified: 2009-05-19 16:51 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 501586 (view as bug list)
Last Closed: 2009-01-20 16:51:55 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Nate Straz 2008-08-25 16:19:03 EDT
Description of problem:

While running revolver I noticed the following error in the logs of the fencing node:

Aug 25 15:03:53 tank-01 fenced[2419]: agent "fence_apc" reports: Success: Rebooted Traceback (most recent call last):   File "/sbin/fence_apc", line 200, in ?     main()   File "/sbin/fence_apc", line 197, in main     conn.close()   File "/usr/lib/python2.4/site-packages/pexpect.py", line 666, in close     raise Except
Aug 25 15:03:53 tank-01 fenced[2419]: agent "fence_apc" reports: ionPexpect ('close() could not terminate the child using terminate()') pexpect.ExceptionPexpect: close() could not terminate the child using terminate() Exception exceptions.OSError: <exceptions.OSError instance at 0xb7f44b8c> in <bound method fspawn.__de
Aug 25 15:03:53 tank-01 fenced[2419]: agent "fence_apc" reports: l__ of <fencing.fspawn object at 0xb7ccd7ec>> ignored 
Aug 25 15:03:53 tank-01 fenced[2419]: fence "tank-03" failed

The fencing action seems successful, but exception is causing fenced to see a failure which causes the fence agent to run again.

The fence agent works some of the time, but once the failure occurs, it continues to fail.

Version-Release number of selected component (if applicable):

How reproducible:
Unknown, but I've seen it on more than one occasion

Steps to Reproduce:
1. run revolver
Actual results:

Expected results:

Additional info:
Comment 1 Marek Grac 2008-09-03 10:46:04 EDT
Unable to reproduce on APC 7951 tested (500x on, status, off).

If I'm right and it always fails on line 197 then we can just ignore this exception as it doesn't matter if we correctly close the connection as it will be closed anyway in next step. Can you tell me type of your device (or even better let me access it for these tests) ?
Comment 2 Nate Straz 2008-09-03 10:59:18 EDT
About System

Model Number      : AP9606       Serial Number     : WA0124008454
Manufacture Date  : 02/15/2002   Hardware Revision : G9

My guess was that the process it was trying to terminate already exitted so the kill failed.  I'm not sure what the OSError exception was in this case.
Comment 3 Marek Grac 2008-09-11 12:06:05 EDT
Fixed as proposed solution
Comment 6 errata-xmlrpc 2009-01-20 16:51:55 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.