Bug 501586 - fence agents (fence_apc, fence_wti) fails with pexpect exception
Summary: fence agents (fence_apc, fence_wti) fails with pexpect exception
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Marek Grac
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 460054
Blocks: 501890 504589
TreeView+ depends on / blocked
 
Reported: 2009-05-19 20:51 UTC by Nate Straz
Modified: 2016-04-26 15:56 UTC (History)
3 users (show)

Fixed In Version: cman-2.0.112-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of: 460054
: 501890 504589 (view as bug list)
Environment:
Last Closed: 2009-09-02 11:09:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to fix exceptions.OSError - import + VMWare (3.49 KB, patch)
2009-07-23 09:37 UTC, Marek Grac
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1341 0 normal SHIPPED_LIVE Low: cman security, bug fix, and enhancement update 2009-09-01 10:43:16 UTC

Description Nate Straz 2009-05-19 20:51:55 UTC
+++ This bug was initially created as a clone of Bug #460054 +++

Description of problem:

I hit this problem again during RHEL 5.4 testing with revolver.  In my three node cluster, dash-01 was continuously fencing dash-02 until I intervened and rebooted dash-01.

Version-Release number of selected component (if applicable):
cman-2.0.101-1.el5

How reproducible:
Unknown

Steps to Reproduce:
1. run revolver
  
Actual results:

Message repeated in /var/log/messages on dash-01:

May 19 14:28:18 dash-01 fenced[8514]: fencing node "dash-02"
May 19 14:28:25 dash-01 fenced[8514]: agent "fence_apc" reports: Success: Rebooted Traceback (most recent call last):   File "/sbin/fence_apc", line 216, in ?     main()   File "/sbin/fence_apc", line 211, in main     conn.close()   File "/usr/lib/python2.4/site-packages/pexpect.py", line 666, in close     raise Except
May 19 14:28:25 dash-01 fenced[8514]: agent "fence_apc" reports: ionPexpect ('close() could not terminate the child using terminate()') pexpect.ExceptionPexpect: close() could not terminate the child using terminate() Exception exceptions.OSError: <exceptions.OSError instance at 0x2b28b126bc20> in <bound method fspawn.
May 19 14:28:25 dash-01 fenced[8514]: agent "fence_apc" reports: __del__ of <fencing.fspawn object at 0x2b28b0012e90>> ignored
May 19 14:28:25 dash-01 fenced[8514]: fence "dash-02" failed

Which cleans up as:

Success: Rebooted
Traceback (most recent call last):
   File "/sbin/fence_apc", line 216, in ?
     main()
   File "/sbin/fence_apc", line 211, in main
     conn.close()
   File "/usr/lib/python2.4/site-packages/pexpect.py", line 666, in close
    raise ExceptionPexpect ('close() could not terminate the child using terminate()') pexpect.ExceptionPexpect: close() could not terminate the child using terminate()
 Exception exceptions.OSError: <exceptions.OSError instance at 0x2b28b126bc20> in <bound method fspawn.__del__ of <fencing.fspawn object at 0x2b28b0012e90>> ignored

This looks like the exception which fence_apc should actually catch is ExecptionPexpect instead of OSError.

Expected results:


Additional info:

Comment 3 Nate Straz 2009-06-05 16:55:16 UTC
While running regressions on 5.3.z I was able to hit this with the fence_wti agent also.

Jun  5 00:52:52 z1 fenced[5635]: fencing node "z4"
Jun  5 00:52:59 z1 fenced[5635]: agent "fence_wti" reports: Success: Rebooted Traceback (most recent call last):   File "/sbin/fence_wti", line 109, in ?     main()   File "/sbin/fen
ce_wti", line 106, in main     conn.close()   File "/usr/lib/python2.4/site-packages/pexpect.py", line 666, in close     raise Except
Jun  5 00:52:59 z1 fenced[5635]: agent "fence_wti" reports: ionPexpect ('close() could not terminate the child using terminate()') pexpect.ExceptionPexpect: close() could not termina
te the child using terminate() Exception exceptions.OSError: <exceptions.OSError instance at 0xb7eedd0c> in <bound method fspawn.__de
Jun  5 00:52:59 z1 fenced[5635]: agent "fence_wti" reports: l__ of <fencing.fspawn object at 0xb7c7492c>> ignored
Jun  5 00:52:59 z1 fenced[5635]: fence "z4" failed

This eventually led to z1 being overwhelmed with telnet processes and z1 needed to be fenced.

All fence agents which use pexpect.py should handle the ExceptionPexpect exception on conn.close()

Comment 5 Nate Straz 2009-06-18 19:04:03 UTC
Verified that handling of ExceptionPexpect is included in cman-2.0.108-1.el5.

Comment 6 Nate Straz 2009-07-21 22:01:42 UTC
I hit this during revolver testing:

Jul 20 18:14:10 basic-p2 fenced[1699]: agent "fence_lpar" reports: Success: Rebooted Traceback (most recent call last):   File "/sbin/fence_lpar", line 134, in ?     main()   File "/sbin/fence_lpar", line 128, in main     except exceptions.OSError: NameError: global name 'exceptions' is not defined Exception exceptions.O
Jul 20 18:14:10 basic-p2 fenced[1699]: agent "fence_lpar" reports: SError: <exceptions.OSError instance at 0xf7cfe120> in <bound method fspawn.__del__ of <fencing.fspawn object at 0xf7cf37d0>> ignored

 Success: Rebooted Traceback (most recent call last):
   File "/sbin/fence_lpar", line 134, in ?     main()
   File "/sbin/fence_lpar", line 128, in main     except exceptions.OSError:
NameError: global name 'exceptions' is not defined Exception exceptions.OSError: <exceptions.OSError instance at 0xf7cfe120> in <bound method fspawn.__del__ of <fencing.fspawn object at 0xf7cf37d0>> ignored

It appears that a line was added to check against for OSError, but exceptions was never imported in any of the fence agents the line was added to.

Comment 8 Marek Grac 2009-07-23 09:37:03 UTC
Created attachment 354830 [details]
Patch to fix exceptions.OSError - import + VMWare

Proposed patch to fix problem found during tests.

Comment 13 errata-xmlrpc 2009-09-02 11:09:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html


Note You need to log in before you can comment on or make changes to this bug.