Hide Forgot
Description of problem: An 'unknown error' after trying to configure fence_rhevm with a valid option Version-Release number of selected component (if applicable): # uname -a Linux sap1 2.6.32-642.4.2.el6.x86_64 #1 SMP Mon Aug 15 02:06:41 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux # rpm -qa |egrep 'corosync|pacemaker|pcs|fence' fence-virt-0.2.3-19.el6.x86_64 pacemaker-1.1.14-8.el6_8.1.x86_64 corosync-1.4.7-5.el6.x86_64 pcs-0.9.148-7.el6_8.1.x86_64 pacemaker-cluster-libs-1.1.14-8.el6_8.1.x86_64 pacemaker-cli-1.1.14-8.el6_8.1.x86_64 fence-agents-4.0.15-12.el6.x86_64 pacemaker-libs-1.1.14-8.el6_8.1.x86_64 libxshmfence-1.2-1.el6.x86_64 corosynclib-1.4.7-5.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. Install rhel 6 with the latest pacemaker bits as of 12/05/2016 2. Configure stonith with the following option: # pcs stonith create fence_sap1 fence_rhevm port="sap1" ipaddr="10.15.108.21" action="reboot" login="admin@internal" passwd="redhat" pcmk_host_list="sap1" ssl=1 3. crmd crashes with the results below, but the Actual results: # pcs status Cluster name: sap_pacemaker Last updated: Mon Dec 5 15:54:33 2016 Last change: Mon Dec 5 15:03:36 2016 by root via cibadmin on sap1 Stack: cman Current DC: sap1 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum 2 nodes and 2 resources configured Online: [ sap1 sap2 ] Full list of resources: fence_sap2 (stonith:fence_rhevm): Started sap1 fence_sap1 (stonith:fence_rhevm): Stopped Failed Actions: * fence_sap1_start_0 on sap2 'unknown error' (1): call=82, status=Error, exitreason='none', last-rc-change='Mon Dec 5 15:03:36 2016', queued=0ms, exec=2182ms * fence_sap1_start_0 on sap1 'unknown error' (1): call=80, status=Error, exitreason='none', last-rc-change='Mon Dec 5 15:03:40 2016', queued=0ms, exec=2150ms PCSD Status: sap1: Online sap2: Online # /var/log/messages file Dec 5 15:00:23 sap1 crmd[20483]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Dec 5 15:00:23 sap1 stonith-ng[20479]: notice: Added 'fence_sap1' to the device list (2 active devices) Dec 5 15:00:23 sap1 pengine[20482]: notice: Start fence_sap1#011(sap2) Dec 5 15:00:23 sap1 pengine[20482]: notice: Calculated Transition 77: /var/lib/pacemaker/pengine/pe-input-77.bz2 Dec 5 15:00:23 sap1 crmd[20483]: notice: Initiating action 4: monitor fence_sap1_monitor_0 on sap2 Dec 5 15:00:23 sap1 crmd[20483]: notice: Initiating action 3: monitor fence_sap1_monitor_0 on sap1 (local) Dec 5 15:00:23 sap1 crmd[20483]: notice: Operation fence_sap1_monitor_0: not running (node=sap1, call=72, rc=7, cib-update=193, confirmed=true) Dec 5 15:00:23 sap1 crmd[20483]: notice: Initiating action 7: start fence_sap1_start_0 on sap2 Dec 5 15:00:26 sap1 crmd[20483]: warning: Action 7 (fence_sap1_start_0) on sap2 failed (target: 0 vs. rc: 1): Error Dec 5 15:00:26 sap1 crmd[20483]: notice: Transition aborted by fence_sap1_start_0 'modify' on sap2: Event failed (magic=4:1;7:77:0:78492043-f970-40c7-a553-cc6a95a6f17e, cib=0.25.3, source=match_graph_event:381, 0) Dec 5 15:00:26 sap1 crmd[20483]: warning: Action 7 (fence_sap1_start_0) on sap2 failed (target: 0 vs. rc: 1): Error Dec 5 15:00:26 sap1 crmd[20483]: notice: Transition 77 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-77.bz2): Complete Dec 5 15:00:26 sap1 pengine[20482]: warning: Processing failed op start for fence_sap1 on sap2: unknown error (1) Dec 5 15:00:26 sap1 pengine[20482]: warning: Processing failed op start for fence_sap1 on sap2: unknown error (1) Dec 5 15:00:26 sap1 pengine[20482]: notice: Recover fence_sap1#011(Started sap2) Dec 5 15:00:26 sap1 pengine[20482]: notice: Calculated Transition 78: /var/lib/pacemaker/pengine/pe-input-78.bz2 Dec 5 15:00:26 sap1 crmd[20483]: notice: Initiating action 1: stop fence_sap1_stop_0 on sap2 Dec 5 15:00:26 sap1 crmd[20483]: notice: Transition aborted by status-sap2-fail-count-fence_sap1, fail-count-fence_sap1=INFINITY: Transient attribute change (create cib=0.25.4, source=abort_unless_down:329, path=/cib/status/node_state[@id='sap2']/transient_attributes[@id='sap2']/instance_attributes[@id='status-sap2'], 0) Dec 5 15:00:26 sap1 crmd[20483]: notice: Transition 78 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-78.bz2): Stopped Dec 5 15:00:26 sap1 pengine[20482]: warning: Processing failed op start for fence_sap1 on sap2: unknown error (1) Dec 5 15:00:26 sap1 pengine[20482]: warning: Forcing fence_sap1 away from sap2 after 1000000 failures (max=1000000) Dec 5 15:00:26 sap1 pengine[20482]: notice: Start fence_sap1#011(sap1) Dec 5 15:00:26 sap1 pengine[20482]: notice: Calculated Transition 79: /var/lib/pacemaker/pengine/pe-input-79.bz2 Dec 5 15:00:26 sap1 crmd[20483]: notice: Initiating action 5: start fence_sap1_start_0 on sap1 (local) Dec 5 15:00:27 sap1 abrt: detected unhandled Python exception in '/usr/sbin/fence_rhevm' Dec 5 15:00:27 sap1 abrt-server[23380]: Saved Python crash dump of pid 23375 to /var/spool/abrt/pyhook-2016-12-05-15:00:27-23375 Dec 5 15:00:27 sap1 abrtd: Directory 'pyhook-2016-12-05-15:00:27-23375' creation detected Dec 5 15:00:27 sap1 stonith-ng[20479]: warning: fence_rhevm[23375] stderr: [ Traceback (most recent call last): ] Dec 5 15:00:27 sap1 stonith-ng[20479]: warning: fence_rhevm[23375] stderr: [ File "/usr/sbin/fence_rhevm", line 165, in <module> ] Dec 5 15:00:27 sap1 stonith-ng[20479]: warning: fence_rhevm[23375] stderr: [ main() ] Dec 5 15:00:27 sap1 stonith-ng[20479]: warning: fence_rhevm[23375] stderr: [ File "/usr/sbin/fence_rhevm", line 160, in main ] Dec 5 15:00:27 sap1 stonith-ng[20479]: warning: fence_rhevm[23375] stderr: [ result = fence_action(None, options, set_power_status, get_power_status, get_list) ] Dec 5 15:00:27 sap1 stonith-ng[20479]: warning: fence_rhevm[23375] stderr: [ File "/usr/share/fence/fencing.py", line 821, in fence_action ] Dec 5 15:00:27 sap1 stonith-ng[20479]: warning: fence_rhevm[23375] stderr: [ status = status.upper() ] Dec 5 15:00:27 sap1 stonith-ng[20479]: warning: fence_rhevm[23375] stderr: [ AttributeError: 'NoneType' object has no attribute 'upper' ] Dec 5 15:00:27 sap1 abrtd: Duplicate: core backtrace Dec 5 15:00:27 sap1 abrtd: DUP_OF_DIR: /var/spool/abrt/pyhook-2016-12-05-14:09:04-19068 Dec 5 15:00:27 sap1 abrtd: Deleting problem directory pyhook-2016-12-05-15:00:27-23375 (dup of pyhook-2016-12-05-14:09:04-19068) Dec 5 15:00:27 sap1 abrtd: Sending an email... Dec 5 15:00:27 sap1 abrtd: Email was sent to: root@localhost Dec 5 15:00:28 sap1 abrt: detected unhandled Python exception in '/usr/sbin/fence_rhevm' Dec 5 15:00:28 sap1 abrt-server[23397]: Not saving repeating crash in '/usr/sbin/fence_rhevm' Dec 5 15:00:28 sap1 stonith-ng[20479]: warning: fence_rhevm[23383] stderr: [ Traceback (most recent call last): ] Dec 5 15:00:28 sap1 stonith-ng[20479]: warning: fence_rhevm[23383] stderr: [ File "/usr/sbin/fence_rhevm", line 165, in <module> ] Dec 5 15:00:28 sap1 stonith-ng[20479]: warning: fence_rhevm[23383] stderr: [ main() ] Dec 5 15:00:28 sap1 stonith-ng[20479]: warning: fence_rhevm[23383] stderr: [ File "/usr/sbin/fence_rhevm", line 160, in main ] Dec 5 15:00:28 sap1 stonith-ng[20479]: warning: fence_rhevm[23383] stderr: [ result = fence_action(None, options, set_power_status, get_power_status, get_list) ] Dec 5 15:00:28 sap1 stonith-ng[20479]: warning: fence_rhevm[23383] stderr: [ File "/usr/share/fence/fencing.py", line 821, in fence_action ] Dec 5 15:00:28 sap1 stonith-ng[20479]: warning: fence_rhevm[23383] stderr: [ status = status.upper() ] Dec 5 15:00:28 sap1 stonith-ng[20479]: warning: fence_rhevm[23383] stderr: [ AttributeError: 'NoneType' object has no attribute 'upper' ] Dec 5 15:00:28 sap1 stonith-ng[20479]: notice: Operation 'monitor' [23383] for device 'fence_sap1' returned: -201 (Generic Pacemaker error) Dec 5 15:00:29 sap1 crmd[20483]: error: Operation fence_sap1_start_0 (node=sap1, call=73, status=4, cib-update=196, confirmed=true) Error Dec 5 15:00:29 sap1 crmd[20483]: warning: Action 5 (fence_sap1_start_0) on sap1 failed (target: 0 vs. rc: 1): Error Dec 5 15:00:29 sap1 crmd[20483]: notice: Transition aborted by fence_sap1_start_0 'modify' on sap1: Event failed (magic=4:1;5:79:0:78492043-f970-40c7-a553-cc6a95a6f17e, cib=0.25.7, source=match_graph_event:381, 0) Dec 5 15:00:29 sap1 crmd[20483]: warning: Action 5 (fence_sap1_start_0) on sap1 failed (target: 0 vs. rc: 1): Error Dec 5 15:00:29 sap1 crmd[20483]: notice: Transition 79 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-79.bz2): Complete Dec 5 15:00:29 sap1 attrd[20481]: notice: Sending flush op to all hosts for: fail-count-fence_sap1 (INFINITY) Dec 5 15:00:29 sap1 attrd[20481]: notice: Sent update 187: fail-count-fence_sap1=INFINITY Dec 5 15:00:29 sap1 attrd[20481]: notice: Sending flush op to all hosts for: last-failure-fence_sap1 (1480971629) Dec 5 15:00:29 sap1 attrd[20481]: notice: Sent update 189: last-failure-fence_sap1=1480971629 Dec 5 15:00:29 sap1 attrd[20481]: notice: Sending flush op to all hosts for: fail-count-fence_sap1 (INFINITY) Dec 5 15:00:29 sap1 pengine[20482]: warning: Processing failed op start for fence_sap1 on sap2: unknown error (1) Dec 5 15:00:29 sap1 pengine[20482]: warning: Processing failed op start for fence_sap1 on sap1: unknown error (1) Dec 5 15:00:29 sap1 pengine[20482]: warning: Processing failed op start for fence_sap1 on sap1: unknown error (1) Dec 5 15:00:29 sap1 pengine[20482]: warning: Forcing fence_sap1 away from sap1 after 1000000 failures (max=1000000) Dec 5 15:00:29 sap1 pengine[20482]: warning: Forcing fence_sap1 away from sap2 after 1000000 failures (max=1000000) Dec 5 15:00:29 sap1 pengine[20482]: notice: Stop fence_sap1#011(sap1) Dec 5 15:00:29 sap1 pengine[20482]: notice: Calculated Transition 80: /var/lib/pacemaker/pengine/pe-input-80.bz2 Dec 5 15:00:29 sap1 attrd[20481]: notice: Sent update 191: fail-count-fence_sap1=INFINITY Dec 5 15:00:29 sap1 attrd[20481]: notice: Sending flush op to all hosts for: last-failure-fence_sap1 (1480971629) Dec 5 15:00:29 sap1 attrd[20481]: notice: Sent update 193: last-failure-fence_sap1=1480971629 Dec 5 15:00:29 sap1 pengine[20482]: warning: Processing failed op start for fence_sap1 on sap2: unknown error (1) Dec 5 15:00:29 sap1 pengine[20482]: warning: Processing failed op start for fence_sap1 on sap1: unknown error (1) Dec 5 15:00:29 sap1 pengine[20482]: warning: Processing failed op start for fence_sap1 on sap1: unknown error (1) Dec 5 15:00:29 sap1 pengine[20482]: warning: Forcing fence_sap1 away from sap1 after 1000000 failures (max=1000000) Dec 5 15:00:29 sap1 pengine[20482]: warning: Forcing fence_sap1 away from sap2 after 1000000 failures (max=1000000) Dec 5 15:00:29 sap1 pengine[20482]: notice: Stop fence_sap1#011(sap1) Dec 5 15:00:29 sap1 pengine[20482]: notice: Calculated Transition 81: /var/lib/pacemaker/pengine/pe-input-81.bz2 Dec 5 15:00:29 sap1 crmd[20483]: notice: Initiating action 2: stop fence_sap1_stop_0 on sap1 (local) Dec 5 15:00:29 sap1 crmd[20483]: notice: Operation fence_sap1_stop_0: ok (node=sap1, call=74, rc=0, cib-update=200, confirmed=true) Dec 5 15:00:29 sap1 crmd[20483]: notice: Transition 81 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-81.bz2): Complete Dec 5 15:00:29 sap1 crmd[20483]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Expected results: pacemaker should give us more information about the error, whether or not the 'ssl=1' option is valid Additional info: Works fine _without_ the 'ssl=1' option
At best this might be something for pcs, but it sounds more like a bug in the fence_rhevm agent. Re-assigning.
I agree that this problem with fence agents. Issue mentioned in debug (line 821) should be already fixed in 6.9 (rhbz#1361623) - so I believe that it is duplicate. Can you retest it with latest build, please? However, I'm quite surprised that it happends only with SSL. Can you please re-run it with verbose flag. So I can see complete communication between fence agent and device?
(In reply to Marek Grac from comment #5) > I agree that this problem with fence agents. Issue mentioned in debug (line > 821) should be already fixed in 6.9 (rhbz#1361623) - so I believe that it is > duplicate. Can you retest it with latest build, please? Will test with this build. > > However, I'm quite surprised that it happends only with SSL. Can you please > re-run it with verbose flag. So I can see complete communication between > fence agent and device? It also occurs with 'ssl_insecure=1' as well, have not tried many options. Which verbose flag are you referring to? just -vvv? Thanks!
One -v is enough for us, or verbose=1 if you are using fence agent via pcs ssl_insecure is same as ssl in RHEL6
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/