Bug 1401704

Summary: Python exception error when configuring fence_rhevm
Product: Red Hat Enterprise Linux 6 Reporter: Sam Yangsao <syangsao>
Component: fence-agentsAssignee: Marek Grac <mgrac>
Status: CLOSED WONTFIX QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.8CC: abeekhof, cluster-maint, rbalakri
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 10:40:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sam Yangsao 2016-12-05 21:56:58 UTC
Description of problem:

An 'unknown error' after trying to configure fence_rhevm with a valid option

Version-Release number of selected component (if applicable):

# uname -a
Linux sap1 2.6.32-642.4.2.el6.x86_64 #1 SMP Mon Aug 15 02:06:41 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qa |egrep 'corosync|pacemaker|pcs|fence'
fence-virt-0.2.3-19.el6.x86_64
pacemaker-1.1.14-8.el6_8.1.x86_64
corosync-1.4.7-5.el6.x86_64
pcs-0.9.148-7.el6_8.1.x86_64
pacemaker-cluster-libs-1.1.14-8.el6_8.1.x86_64
pacemaker-cli-1.1.14-8.el6_8.1.x86_64
fence-agents-4.0.15-12.el6.x86_64
pacemaker-libs-1.1.14-8.el6_8.1.x86_64
libxshmfence-1.2-1.el6.x86_64
corosynclib-1.4.7-5.el6.x86_64

How reproducible:

Always

Steps to Reproduce:

1.  Install rhel 6 with the latest pacemaker bits as of 12/05/2016
2.  Configure stonith with the following option:

# pcs stonith create fence_sap1 fence_rhevm port="sap1" ipaddr="10.15.108.21" action="reboot" login="admin@internal" passwd="redhat" pcmk_host_list="sap1" ssl=1

3.  crmd crashes with the results below, but the 

Actual results:

# pcs status
Cluster name: sap_pacemaker
Last updated: Mon Dec  5 15:54:33 2016		Last change: Mon Dec  5 15:03:36 2016 by root via cibadmin on sap1
Stack: cman
Current DC: sap1 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 2 resources configured

Online: [ sap1 sap2 ]

Full list of resources:

 fence_sap2	(stonith:fence_rhevm):	Started sap1
 fence_sap1	(stonith:fence_rhevm):	Stopped

Failed Actions:
* fence_sap1_start_0 on sap2 'unknown error' (1): call=82, status=Error, exitreason='none',
    last-rc-change='Mon Dec  5 15:03:36 2016', queued=0ms, exec=2182ms
* fence_sap1_start_0 on sap1 'unknown error' (1): call=80, status=Error, exitreason='none',
    last-rc-change='Mon Dec  5 15:03:40 2016', queued=0ms, exec=2150ms


PCSD Status:
  sap1: Online
  sap2: Online

# /var/log/messages file

Dec  5 15:00:23 sap1 crmd[20483]:   notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Dec  5 15:00:23 sap1 stonith-ng[20479]:   notice: Added 'fence_sap1' to the device list (2 active devices)
Dec  5 15:00:23 sap1 pengine[20482]:   notice: Start   fence_sap1#011(sap2)
Dec  5 15:00:23 sap1 pengine[20482]:   notice: Calculated Transition 77: /var/lib/pacemaker/pengine/pe-input-77.bz2
Dec  5 15:00:23 sap1 crmd[20483]:   notice: Initiating action 4: monitor fence_sap1_monitor_0 on sap2
Dec  5 15:00:23 sap1 crmd[20483]:   notice: Initiating action 3: monitor fence_sap1_monitor_0 on sap1 (local)
Dec  5 15:00:23 sap1 crmd[20483]:   notice: Operation fence_sap1_monitor_0: not running (node=sap1, call=72, rc=7, cib-update=193, confirmed=true)
Dec  5 15:00:23 sap1 crmd[20483]:   notice: Initiating action 7: start fence_sap1_start_0 on sap2
Dec  5 15:00:26 sap1 crmd[20483]:  warning: Action 7 (fence_sap1_start_0) on sap2 failed (target: 0 vs. rc: 1): Error
Dec  5 15:00:26 sap1 crmd[20483]:   notice: Transition aborted by fence_sap1_start_0 'modify' on sap2: Event failed (magic=4:1;7:77:0:78492043-f970-40c7-a553-cc6a95a6f17e, cib=0.25.3, source=match_graph_event:381, 0)
Dec  5 15:00:26 sap1 crmd[20483]:  warning: Action 7 (fence_sap1_start_0) on sap2 failed (target: 0 vs. rc: 1): Error
Dec  5 15:00:26 sap1 crmd[20483]:   notice: Transition 77 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-77.bz2): Complete
Dec  5 15:00:26 sap1 pengine[20482]:  warning: Processing failed op start for fence_sap1 on sap2: unknown error (1)
Dec  5 15:00:26 sap1 pengine[20482]:  warning: Processing failed op start for fence_sap1 on sap2: unknown error (1)
Dec  5 15:00:26 sap1 pengine[20482]:   notice: Recover fence_sap1#011(Started sap2)
Dec  5 15:00:26 sap1 pengine[20482]:   notice: Calculated Transition 78: /var/lib/pacemaker/pengine/pe-input-78.bz2
Dec  5 15:00:26 sap1 crmd[20483]:   notice: Initiating action 1: stop fence_sap1_stop_0 on sap2
Dec  5 15:00:26 sap1 crmd[20483]:   notice: Transition aborted by status-sap2-fail-count-fence_sap1, fail-count-fence_sap1=INFINITY: Transient attribute change (create cib=0.25.4, source=abort_unless_down:329, path=/cib/status/node_state[@id='sap2']/transient_attributes[@id='sap2']/instance_attributes[@id='status-sap2'], 0)
Dec  5 15:00:26 sap1 crmd[20483]:   notice: Transition 78 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-78.bz2): Stopped
Dec  5 15:00:26 sap1 pengine[20482]:  warning: Processing failed op start for fence_sap1 on sap2: unknown error (1)
Dec  5 15:00:26 sap1 pengine[20482]:  warning: Forcing fence_sap1 away from sap2 after 1000000 failures (max=1000000)
Dec  5 15:00:26 sap1 pengine[20482]:   notice: Start   fence_sap1#011(sap1)
Dec  5 15:00:26 sap1 pengine[20482]:   notice: Calculated Transition 79: /var/lib/pacemaker/pengine/pe-input-79.bz2
Dec  5 15:00:26 sap1 crmd[20483]:   notice: Initiating action 5: start fence_sap1_start_0 on sap1 (local)
Dec  5 15:00:27 sap1 abrt: detected unhandled Python exception in '/usr/sbin/fence_rhevm'
Dec  5 15:00:27 sap1 abrt-server[23380]: Saved Python crash dump of pid 23375 to /var/spool/abrt/pyhook-2016-12-05-15:00:27-23375
Dec  5 15:00:27 sap1 abrtd: Directory 'pyhook-2016-12-05-15:00:27-23375' creation detected
Dec  5 15:00:27 sap1 stonith-ng[20479]:  warning: fence_rhevm[23375] stderr: [ Traceback (most recent call last): ]
Dec  5 15:00:27 sap1 stonith-ng[20479]:  warning: fence_rhevm[23375] stderr: [   File "/usr/sbin/fence_rhevm", line 165, in <module> ]
Dec  5 15:00:27 sap1 stonith-ng[20479]:  warning: fence_rhevm[23375] stderr: [     main() ]
Dec  5 15:00:27 sap1 stonith-ng[20479]:  warning: fence_rhevm[23375] stderr: [   File "/usr/sbin/fence_rhevm", line 160, in main ]
Dec  5 15:00:27 sap1 stonith-ng[20479]:  warning: fence_rhevm[23375] stderr: [     result = fence_action(None, options, set_power_status, get_power_status, get_list) ]
Dec  5 15:00:27 sap1 stonith-ng[20479]:  warning: fence_rhevm[23375] stderr: [   File "/usr/share/fence/fencing.py", line 821, in fence_action ]
Dec  5 15:00:27 sap1 stonith-ng[20479]:  warning: fence_rhevm[23375] stderr: [     status = status.upper() ]
Dec  5 15:00:27 sap1 stonith-ng[20479]:  warning: fence_rhevm[23375] stderr: [ AttributeError: 'NoneType' object has no attribute 'upper' ]
Dec  5 15:00:27 sap1 abrtd: Duplicate: core backtrace
Dec  5 15:00:27 sap1 abrtd: DUP_OF_DIR: /var/spool/abrt/pyhook-2016-12-05-14:09:04-19068
Dec  5 15:00:27 sap1 abrtd: Deleting problem directory pyhook-2016-12-05-15:00:27-23375 (dup of pyhook-2016-12-05-14:09:04-19068)
Dec  5 15:00:27 sap1 abrtd: Sending an email...
Dec  5 15:00:27 sap1 abrtd: Email was sent to: root@localhost
Dec  5 15:00:28 sap1 abrt: detected unhandled Python exception in '/usr/sbin/fence_rhevm'
Dec  5 15:00:28 sap1 abrt-server[23397]: Not saving repeating crash in '/usr/sbin/fence_rhevm'
Dec  5 15:00:28 sap1 stonith-ng[20479]:  warning: fence_rhevm[23383] stderr: [ Traceback (most recent call last): ]
Dec  5 15:00:28 sap1 stonith-ng[20479]:  warning: fence_rhevm[23383] stderr: [   File "/usr/sbin/fence_rhevm", line 165, in <module> ]
Dec  5 15:00:28 sap1 stonith-ng[20479]:  warning: fence_rhevm[23383] stderr: [     main() ]
Dec  5 15:00:28 sap1 stonith-ng[20479]:  warning: fence_rhevm[23383] stderr: [   File "/usr/sbin/fence_rhevm", line 160, in main ]
Dec  5 15:00:28 sap1 stonith-ng[20479]:  warning: fence_rhevm[23383] stderr: [     result = fence_action(None, options, set_power_status, get_power_status, get_list) ]
Dec  5 15:00:28 sap1 stonith-ng[20479]:  warning: fence_rhevm[23383] stderr: [   File "/usr/share/fence/fencing.py", line 821, in fence_action ]
Dec  5 15:00:28 sap1 stonith-ng[20479]:  warning: fence_rhevm[23383] stderr: [     status = status.upper() ]
Dec  5 15:00:28 sap1 stonith-ng[20479]:  warning: fence_rhevm[23383] stderr: [ AttributeError: 'NoneType' object has no attribute 'upper' ]
Dec  5 15:00:28 sap1 stonith-ng[20479]:   notice: Operation 'monitor' [23383] for device 'fence_sap1' returned: -201 (Generic Pacemaker error)
Dec  5 15:00:29 sap1 crmd[20483]:    error: Operation fence_sap1_start_0 (node=sap1, call=73, status=4, cib-update=196, confirmed=true) Error
Dec  5 15:00:29 sap1 crmd[20483]:  warning: Action 5 (fence_sap1_start_0) on sap1 failed (target: 0 vs. rc: 1): Error
Dec  5 15:00:29 sap1 crmd[20483]:   notice: Transition aborted by fence_sap1_start_0 'modify' on sap1: Event failed (magic=4:1;5:79:0:78492043-f970-40c7-a553-cc6a95a6f17e, cib=0.25.7, source=match_graph_event:381, 0)
Dec  5 15:00:29 sap1 crmd[20483]:  warning: Action 5 (fence_sap1_start_0) on sap1 failed (target: 0 vs. rc: 1): Error
Dec  5 15:00:29 sap1 crmd[20483]:   notice: Transition 79 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-79.bz2): Complete
Dec  5 15:00:29 sap1 attrd[20481]:   notice: Sending flush op to all hosts for: fail-count-fence_sap1 (INFINITY)
Dec  5 15:00:29 sap1 attrd[20481]:   notice: Sent update 187: fail-count-fence_sap1=INFINITY
Dec  5 15:00:29 sap1 attrd[20481]:   notice: Sending flush op to all hosts for: last-failure-fence_sap1 (1480971629)
Dec  5 15:00:29 sap1 attrd[20481]:   notice: Sent update 189: last-failure-fence_sap1=1480971629
Dec  5 15:00:29 sap1 attrd[20481]:   notice: Sending flush op to all hosts for: fail-count-fence_sap1 (INFINITY)
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Processing failed op start for fence_sap1 on sap2: unknown error (1)
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Processing failed op start for fence_sap1 on sap1: unknown error (1)
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Processing failed op start for fence_sap1 on sap1: unknown error (1)
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Forcing fence_sap1 away from sap1 after 1000000 failures (max=1000000)
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Forcing fence_sap1 away from sap2 after 1000000 failures (max=1000000)
Dec  5 15:00:29 sap1 pengine[20482]:   notice: Stop    fence_sap1#011(sap1)
Dec  5 15:00:29 sap1 pengine[20482]:   notice: Calculated Transition 80: /var/lib/pacemaker/pengine/pe-input-80.bz2
Dec  5 15:00:29 sap1 attrd[20481]:   notice: Sent update 191: fail-count-fence_sap1=INFINITY
Dec  5 15:00:29 sap1 attrd[20481]:   notice: Sending flush op to all hosts for: last-failure-fence_sap1 (1480971629)
Dec  5 15:00:29 sap1 attrd[20481]:   notice: Sent update 193: last-failure-fence_sap1=1480971629
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Processing failed op start for fence_sap1 on sap2: unknown error (1)
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Processing failed op start for fence_sap1 on sap1: unknown error (1)
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Processing failed op start for fence_sap1 on sap1: unknown error (1)
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Forcing fence_sap1 away from sap1 after 1000000 failures (max=1000000)
Dec  5 15:00:29 sap1 pengine[20482]:  warning: Forcing fence_sap1 away from sap2 after 1000000 failures (max=1000000)
Dec  5 15:00:29 sap1 pengine[20482]:   notice: Stop    fence_sap1#011(sap1)
Dec  5 15:00:29 sap1 pengine[20482]:   notice: Calculated Transition 81: /var/lib/pacemaker/pengine/pe-input-81.bz2
Dec  5 15:00:29 sap1 crmd[20483]:   notice: Initiating action 2: stop fence_sap1_stop_0 on sap1 (local)
Dec  5 15:00:29 sap1 crmd[20483]:   notice: Operation fence_sap1_stop_0: ok (node=sap1, call=74, rc=0, cib-update=200, confirmed=true)
Dec  5 15:00:29 sap1 crmd[20483]:   notice: Transition 81 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-81.bz2): Complete
Dec  5 15:00:29 sap1 crmd[20483]:   notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]

Expected results:

pacemaker should give us more information about the error, whether or not the 'ssl=1' option is valid

Additional info:

Works fine _without_ the 'ssl=1' option

Comment 3 Andrew Beekhof 2016-12-05 22:17:27 UTC
At best this might be something for pcs, but it sounds more like a bug in the fence_rhevm agent.

Re-assigning.

Comment 5 Marek Grac 2016-12-07 12:34:28 UTC
I agree that this problem with fence agents. Issue mentioned in debug (line 821) should be already fixed in 6.9 (rhbz#1361623) - so I believe that it is duplicate. Can you retest it with latest build, please?

However, I'm quite surprised that it happends only with SSL. Can you please re-run it with verbose flag. So I can see complete communication between fence agent and device?

Comment 6 Sam Yangsao 2016-12-07 14:44:33 UTC
(In reply to Marek Grac from comment #5)
> I agree that this problem with fence agents. Issue mentioned in debug (line
> 821) should be already fixed in 6.9 (rhbz#1361623) - so I believe that it is
> duplicate. Can you retest it with latest build, please?

Will test with this build.

> 
> However, I'm quite surprised that it happends only with SSL. Can you please
> re-run it with verbose flag. So I can see complete communication between
> fence agent and device?

It also occurs with 'ssl_insecure=1' as well, have not tried many options.

Which verbose flag are you referring to?  just -vvv?  

Thanks!

Comment 7 Marek Grac 2016-12-07 15:32:46 UTC
One -v is enough for us, or verbose=1 if you are using fence agent via pcs

ssl_insecure is same as ssl in RHEL6

Comment 9 Jan Kurik 2017-12-06 10:40:52 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/