Bug 1128931 - add OCF exit reason string support
Summary: add OCF exit reason string support
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker
Version: 7.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Andrew Beekhof
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1128933
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-11 21:23 UTC by David Vossel
Modified: 2015-08-24 06:36 UTC (History)
5 users (show)

Fixed In Version: pacemaker-1.1.12-4.el6
Doc Type: Enhancement
Doc Text:
Feature: Allow OCF agents to provide a textual reason for a failure Reason: Even standardized return codes required the admin to look into the agent to see what might have caused it Result: The reason for a failure can be easily presented to admins via the CLI and GUI
Clone Of:
: 1128933 (view as bug list)
Environment:
Last Closed: 2015-03-05 10:00:17 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0440 normal SHIPPED_LIVE pacemaker bug fix and enhancement update 2015-03-05 14:37:57 UTC

Description David Vossel 2014-08-11 21:23:52 UTC
Description of problem:

A common complaint we receive is that it is very difficult to debug why resource-agents fail. OCF scripts have a limited number of return codes available and those return codes can mean numerous things depending on each agent.

To fix this, the resource-agents have introduced the concept of an exit reason string. This string is returned on stderr and indicates to the calling process why a failure occurred.

Pacemaker needs the ability to both parse this string from the OCF output, and present the string to the user via crm_mon -1 and crm_mon --as-xml. From there tools like pcs should be able to present the user with useful information as to exactly why a resource failed rather than a generic return code.

Comment 2 michal novacek 2014-12-02 16:21:31 UTC
I have verified that it is possible to see exit reason of the resource agents with 
pacemaker-1.1.12-13.el7.x86_64

-----

[root@virt-063 ~]# rpm -q pacemaker 
pacemaker-1.1.12-13.el7.x86_64
[root@virt-063 ~]# pcs resource create vd VirtualDomain config=a
[root@virt-063 ~]# pcs status
Cluster name: STSRHTS31212
Last updated: Tue Dec  2 17:18:41 2014
Last change: Tue Dec  2 17:18:37 2014 via cibadmin on virt-063
Stack: corosync
Current DC: virt-063 (1) - partition with quorum
Version: 1.1.12-a14efad
3 Nodes configured
10 Resources configured

Online: [ virt-063 virt-069 virt-072 ]

Full list of resources:

 fence-virt-063 (stonith:fence_xvm):    Started virt-063 
 fence-virt-069 (stonith:fence_xvm):    Started virt-069 
 fence-virt-072 (stonith:fence_xvm):    Started virt-072 
 Clone Set: dlm-clone [dlm]
     Started: [ virt-063 virt-069 virt-072 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-063 virt-069 virt-072 ]
 vd     (ocf::heartbeat:VirtualDomain): Stopped 

Failed actions:
    vd_monitor_0 on virt-072 'not installed' (5): call=33, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=37ms
    vd_monitor_0 on virt-072 'not installed' (5): call=33, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=37ms
    vd_monitor_0 on virt-063 'not installed' (5): call=37, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=44ms
    vd_monitor_0 on virt-063 'not installed' (5): call=37, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=44ms
    vd_monitor_0 on virt-069 'not installed' (5): call=33, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=38ms
    vd_monitor_0 on virt-069 'not installed' (5): call=33, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=38ms


PCSD Status:
  virt-063: Offline
  virt-069: Online
  virt-072: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: unknown/disabled

[root@virt-063 ~]# crm_mon --as-xml
<?xml version="1.0"?>
<crm_mon version="1.1.12">
    <summary>
        <last_update time="Tue Dec  2 17:18:49 2014" />
        <last_change time="Tue Dec  2 17:18:37 2014" user="" client="cibadmin" origin="virt-063" />
        <stack type="corosync" />
        <current_dc present="true" version="1.1.12-a14efad" name="virt-063" id="1" with_quorum="true" />
        <nodes_configured number="3" expected_votes="unknown" />
        <resources_configured number="10" />
    </summary>
    <nodes>
        <node name="virt-063" id="1" online="true" standby="false" standby_onfail="false" maintenance="false" pending="false" unclean="false" shutdown="false" expected_up="true" is_dc="true" resources_running="3" type="member" />
        <node name="virt-069" id="2" online="true" standby="false" standby_onfail="false" maintenance="false" pending="false" unclean="false" shutdown="false" expected_up="true" is_dc="false" resources_running="3" type="member" />
        <node name="virt-072" id="3" online="true" standby="false" standby_onfail="false" maintenance="false" pending="false" unclean="false" shutdown="false" expected_up="true" is_dc="false" resources_running="3" type="member" />
    </nodes>
    <resources>
        <resource id="fence-virt-063" resource_agent="stonith:fence_xvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
            <node name="virt-063" id="1" cached="false"/>
        </resource>
        <resource id="fence-virt-069" resource_agent="stonith:fence_xvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
            <node name="virt-069" id="2" cached="false"/>
        </resource>
        <resource id="fence-virt-072" resource_agent="stonith:fence_xvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
            <node name="virt-072" id="3" cached="false"/>
        </resource>
        <clone id="dlm-clone" multi_state="false" unique="false" managed="true" failed="false" failure_ignored="false" >
            <resource id="dlm" resource_agent="ocf::pacemaker:controld" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-072" id="3" cached="false"/>
            </resource>
            <resource id="dlm" resource_agent="ocf::pacemaker:controld" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-063" id="1" cached="false"/>
            </resource>
            <resource id="dlm" resource_agent="ocf::pacemaker:controld" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-069" id="2" cached="false"/>
            </resource>
        </clone>
        <clone id="clvmd-clone" multi_state="false" unique="false" managed="true" failed="false" failure_ignored="false" >
            <resource id="clvmd" resource_agent="ocf::heartbeat:clvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-072" id="3" cached="false"/>
            </resource>
            <resource id="clvmd" resource_agent="ocf::heartbeat:clvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-063" id="1" cached="false"/>
            </resource>
            <resource id="clvmd" resource_agent="ocf::heartbeat:clvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-069" id="2" cached="false"/>
            </resource>
        </clone>
    </resources>
    <failures>
        <failure op_key="vd_monitor_0" node="virt-072" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="33" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="37" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-072" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="33" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="37" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-063" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="37" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="44" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-063" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="37" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="44" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-069" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="33" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="38" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-069" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="33" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="38" interval="0" task="monitor" />
    </failures>
</crm_mon>
[root@virt-063 ~]#

Comment 4 errata-xmlrpc 2015-03-05 10:00:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0440.html


Note You need to log in before you can comment on or make changes to this bug.