Bug 460054

Summary: fence_apc fails with pexpect exception
Product: Red Hat Enterprise Linux 5 Reporter: Nate Straz <nstraz>
Component: cmanAssignee: Marek Grac <mgrac>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.3CC: cluster-maint, edamato
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 501586 (view as bug list) Environment:
Last Closed: 2009-01-20 21:51:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 501586    

Description Nate Straz 2008-08-25 20:19:03 UTC
Description of problem:

While running revolver I noticed the following error in the logs of the fencing node:

Aug 25 15:03:53 tank-01 fenced[2419]: agent "fence_apc" reports: Success: Rebooted Traceback (most recent call last):   File "/sbin/fence_apc", line 200, in ?     main()   File "/sbin/fence_apc", line 197, in main     conn.close()   File "/usr/lib/python2.4/site-packages/pexpect.py", line 666, in close     raise Except
Aug 25 15:03:53 tank-01 fenced[2419]: agent "fence_apc" reports: ionPexpect ('close() could not terminate the child using terminate()') pexpect.ExceptionPexpect: close() could not terminate the child using terminate() Exception exceptions.OSError: <exceptions.OSError instance at 0xb7f44b8c> in <bound method fspawn.__de
Aug 25 15:03:53 tank-01 fenced[2419]: agent "fence_apc" reports: l__ of <fencing.fspawn object at 0xb7ccd7ec>> ignored 
Aug 25 15:03:53 tank-01 fenced[2419]: fence "tank-03" failed

The fencing action seems successful, but exception is causing fenced to see a failure which causes the fence agent to run again.

The fence agent works some of the time, but once the failure occurs, it continues to fail.

Version-Release number of selected component (if applicable):
cman-2.0.87-5.el5

How reproducible:
Unknown, but I've seen it on more than one occasion

Steps to Reproduce:
1. run revolver
  
Actual results:


Expected results:


Additional info:

Comment 1 Marek Grac 2008-09-03 14:46:04 UTC
Unable to reproduce on APC 7951 tested (500x on, status, off).

If I'm right and it always fails on line 197 then we can just ignore this exception as it doesn't matter if we correctly close the connection as it will be closed anyway in next step. Can you tell me type of your device (or even better let me access it for these tests) ?

Comment 2 Nate Straz 2008-09-03 14:59:18 UTC
About System

Model Number      : AP9606       Serial Number     : WA0124008454
Manufacture Date  : 02/15/2002   Hardware Revision : G9

My guess was that the process it was trying to terminate already exitted so the kill failed.  I'm not sure what the OSError exception was in this case.

Comment 3 Marek Grac 2008-09-11 16:06:05 UTC
Fixed as proposed solution

Comment 6 errata-xmlrpc 2009-01-20 21:51:55 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0189.html