This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 501586 - fence agents (fence_apc, fence_wti) fails with pexpect exception
fence agents (fence_apc, fence_wti) fails with pexpect exception
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.3
All Linux
medium Severity medium
: rc
: ---
Assigned To: Marek Grac
Cluster QE
:
Depends On: 460054
Blocks: 501890 504589
  Show dependency treegraph
 
Reported: 2009-05-19 16:51 EDT by Nate Straz
Modified: 2016-04-26 11:56 EDT (History)
3 users (show)

See Also:
Fixed In Version: cman-2.0.112-1.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 460054
: 501890 504589 (view as bug list)
Environment:
Last Closed: 2009-09-02 07:09:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to fix exceptions.OSError - import + VMWare (3.49 KB, patch)
2009-07-23 05:37 EDT, Marek Grac
no flags Details | Diff

  None (edit)
Description Nate Straz 2009-05-19 16:51:55 EDT
+++ This bug was initially created as a clone of Bug #460054 +++

Description of problem:

I hit this problem again during RHEL 5.4 testing with revolver.  In my three node cluster, dash-01 was continuously fencing dash-02 until I intervened and rebooted dash-01.

Version-Release number of selected component (if applicable):
cman-2.0.101-1.el5

How reproducible:
Unknown

Steps to Reproduce:
1. run revolver
  
Actual results:

Message repeated in /var/log/messages on dash-01:

May 19 14:28:18 dash-01 fenced[8514]: fencing node "dash-02"
May 19 14:28:25 dash-01 fenced[8514]: agent "fence_apc" reports: Success: Rebooted Traceback (most recent call last):   File "/sbin/fence_apc", line 216, in ?     main()   File "/sbin/fence_apc", line 211, in main     conn.close()   File "/usr/lib/python2.4/site-packages/pexpect.py", line 666, in close     raise Except
May 19 14:28:25 dash-01 fenced[8514]: agent "fence_apc" reports: ionPexpect ('close() could not terminate the child using terminate()') pexpect.ExceptionPexpect: close() could not terminate the child using terminate() Exception exceptions.OSError: <exceptions.OSError instance at 0x2b28b126bc20> in <bound method fspawn.
May 19 14:28:25 dash-01 fenced[8514]: agent "fence_apc" reports: __del__ of <fencing.fspawn object at 0x2b28b0012e90>> ignored
May 19 14:28:25 dash-01 fenced[8514]: fence "dash-02" failed

Which cleans up as:

Success: Rebooted
Traceback (most recent call last):
   File "/sbin/fence_apc", line 216, in ?
     main()
   File "/sbin/fence_apc", line 211, in main
     conn.close()
   File "/usr/lib/python2.4/site-packages/pexpect.py", line 666, in close
    raise ExceptionPexpect ('close() could not terminate the child using terminate()') pexpect.ExceptionPexpect: close() could not terminate the child using terminate()
 Exception exceptions.OSError: <exceptions.OSError instance at 0x2b28b126bc20> in <bound method fspawn.__del__ of <fencing.fspawn object at 0x2b28b0012e90>> ignored

This looks like the exception which fence_apc should actually catch is ExecptionPexpect instead of OSError.

Expected results:


Additional info:
Comment 3 Nate Straz 2009-06-05 12:55:16 EDT
While running regressions on 5.3.z I was able to hit this with the fence_wti agent also.

Jun  5 00:52:52 z1 fenced[5635]: fencing node "z4"
Jun  5 00:52:59 z1 fenced[5635]: agent "fence_wti" reports: Success: Rebooted Traceback (most recent call last):   File "/sbin/fence_wti", line 109, in ?     main()   File "/sbin/fen
ce_wti", line 106, in main     conn.close()   File "/usr/lib/python2.4/site-packages/pexpect.py", line 666, in close     raise Except
Jun  5 00:52:59 z1 fenced[5635]: agent "fence_wti" reports: ionPexpect ('close() could not terminate the child using terminate()') pexpect.ExceptionPexpect: close() could not termina
te the child using terminate() Exception exceptions.OSError: <exceptions.OSError instance at 0xb7eedd0c> in <bound method fspawn.__de
Jun  5 00:52:59 z1 fenced[5635]: agent "fence_wti" reports: l__ of <fencing.fspawn object at 0xb7c7492c>> ignored
Jun  5 00:52:59 z1 fenced[5635]: fence "z4" failed

This eventually led to z1 being overwhelmed with telnet processes and z1 needed to be fenced.

All fence agents which use pexpect.py should handle the ExceptionPexpect exception on conn.close()
Comment 5 Nate Straz 2009-06-18 15:04:03 EDT
Verified that handling of ExceptionPexpect is included in cman-2.0.108-1.el5.
Comment 6 Nate Straz 2009-07-21 18:01:42 EDT
I hit this during revolver testing:

Jul 20 18:14:10 basic-p2 fenced[1699]: agent "fence_lpar" reports: Success: Rebooted Traceback (most recent call last):   File "/sbin/fence_lpar", line 134, in ?     main()   File "/sbin/fence_lpar", line 128, in main     except exceptions.OSError: NameError: global name 'exceptions' is not defined Exception exceptions.O
Jul 20 18:14:10 basic-p2 fenced[1699]: agent "fence_lpar" reports: SError: <exceptions.OSError instance at 0xf7cfe120> in <bound method fspawn.__del__ of <fencing.fspawn object at 0xf7cf37d0>> ignored

 Success: Rebooted Traceback (most recent call last):
   File "/sbin/fence_lpar", line 134, in ?     main()
   File "/sbin/fence_lpar", line 128, in main     except exceptions.OSError:
NameError: global name 'exceptions' is not defined Exception exceptions.OSError: <exceptions.OSError instance at 0xf7cfe120> in <bound method fspawn.__del__ of <fencing.fspawn object at 0xf7cf37d0>> ignored

It appears that a line was added to check against for OSError, but exceptions was never imported in any of the fence agents the line was added to.
Comment 8 Marek Grac 2009-07-23 05:37:03 EDT
Created attachment 354830 [details]
Patch to fix exceptions.OSError - import + VMWare

Proposed patch to fix problem found during tests.
Comment 13 errata-xmlrpc 2009-09-02 07:09:03 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html

Note You need to log in before you can comment on or make changes to this bug.