Bug 1128931

Summary: add OCF exit reason string support
Product: Red Hat Enterprise Linux 7 Reporter: David Vossel <dvossel>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.1CC: cluster-maint, djansa, dvossel, fdinitto, mnovacek
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.12-4.el6 Doc Type: Enhancement
Doc Text:
Feature: Allow OCF agents to provide a textual reason for a failure Reason: Even standardized return codes required the admin to look into the agent to see what might have caused it Result: The reason for a failure can be easily presented to admins via the CLI and GUI
Story Points: ---
Clone Of:
: 1128933 (view as bug list) Environment:
Last Closed: 2015-03-05 10:00:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1128933    
Bug Blocks:    

Description David Vossel 2014-08-11 21:23:52 UTC
Description of problem:

A common complaint we receive is that it is very difficult to debug why resource-agents fail. OCF scripts have a limited number of return codes available and those return codes can mean numerous things depending on each agent.

To fix this, the resource-agents have introduced the concept of an exit reason string. This string is returned on stderr and indicates to the calling process why a failure occurred.

Pacemaker needs the ability to both parse this string from the OCF output, and present the string to the user via crm_mon -1 and crm_mon --as-xml. From there tools like pcs should be able to present the user with useful information as to exactly why a resource failed rather than a generic return code.

Comment 2 michal novacek 2014-12-02 16:21:31 UTC
I have verified that it is possible to see exit reason of the resource agents with 
pacemaker-1.1.12-13.el7.x86_64

-----

[root@virt-063 ~]# rpm -q pacemaker 
pacemaker-1.1.12-13.el7.x86_64
[root@virt-063 ~]# pcs resource create vd VirtualDomain config=a
[root@virt-063 ~]# pcs status
Cluster name: STSRHTS31212
Last updated: Tue Dec  2 17:18:41 2014
Last change: Tue Dec  2 17:18:37 2014 via cibadmin on virt-063
Stack: corosync
Current DC: virt-063 (1) - partition with quorum
Version: 1.1.12-a14efad
3 Nodes configured
10 Resources configured

Online: [ virt-063 virt-069 virt-072 ]

Full list of resources:

 fence-virt-063 (stonith:fence_xvm):    Started virt-063 
 fence-virt-069 (stonith:fence_xvm):    Started virt-069 
 fence-virt-072 (stonith:fence_xvm):    Started virt-072 
 Clone Set: dlm-clone [dlm]
     Started: [ virt-063 virt-069 virt-072 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-063 virt-069 virt-072 ]
 vd     (ocf::heartbeat:VirtualDomain): Stopped 

Failed actions:
    vd_monitor_0 on virt-072 'not installed' (5): call=33, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=37ms
    vd_monitor_0 on virt-072 'not installed' (5): call=33, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=37ms
    vd_monitor_0 on virt-063 'not installed' (5): call=37, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=44ms
    vd_monitor_0 on virt-063 'not installed' (5): call=37, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=44ms
    vd_monitor_0 on virt-069 'not installed' (5): call=33, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=38ms
    vd_monitor_0 on virt-069 'not installed' (5): call=33, status=complete, exit-reason='Setup problem: couldn't find command: virsh', last-rc-change='Tue Dec  2 17:18:37 2014', queued=0ms, exec=38ms


PCSD Status:
  virt-063: Offline
  virt-069: Online
  virt-072: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: unknown/disabled

[root@virt-063 ~]# crm_mon --as-xml
<?xml version="1.0"?>
<crm_mon version="1.1.12">
    <summary>
        <last_update time="Tue Dec  2 17:18:49 2014" />
        <last_change time="Tue Dec  2 17:18:37 2014" user="" client="cibadmin" origin="virt-063" />
        <stack type="corosync" />
        <current_dc present="true" version="1.1.12-a14efad" name="virt-063" id="1" with_quorum="true" />
        <nodes_configured number="3" expected_votes="unknown" />
        <resources_configured number="10" />
    </summary>
    <nodes>
        <node name="virt-063" id="1" online="true" standby="false" standby_onfail="false" maintenance="false" pending="false" unclean="false" shutdown="false" expected_up="true" is_dc="true" resources_running="3" type="member" />
        <node name="virt-069" id="2" online="true" standby="false" standby_onfail="false" maintenance="false" pending="false" unclean="false" shutdown="false" expected_up="true" is_dc="false" resources_running="3" type="member" />
        <node name="virt-072" id="3" online="true" standby="false" standby_onfail="false" maintenance="false" pending="false" unclean="false" shutdown="false" expected_up="true" is_dc="false" resources_running="3" type="member" />
    </nodes>
    <resources>
        <resource id="fence-virt-063" resource_agent="stonith:fence_xvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
            <node name="virt-063" id="1" cached="false"/>
        </resource>
        <resource id="fence-virt-069" resource_agent="stonith:fence_xvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
            <node name="virt-069" id="2" cached="false"/>
        </resource>
        <resource id="fence-virt-072" resource_agent="stonith:fence_xvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
            <node name="virt-072" id="3" cached="false"/>
        </resource>
        <clone id="dlm-clone" multi_state="false" unique="false" managed="true" failed="false" failure_ignored="false" >
            <resource id="dlm" resource_agent="ocf::pacemaker:controld" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-072" id="3" cached="false"/>
            </resource>
            <resource id="dlm" resource_agent="ocf::pacemaker:controld" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-063" id="1" cached="false"/>
            </resource>
            <resource id="dlm" resource_agent="ocf::pacemaker:controld" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-069" id="2" cached="false"/>
            </resource>
        </clone>
        <clone id="clvmd-clone" multi_state="false" unique="false" managed="true" failed="false" failure_ignored="false" >
            <resource id="clvmd" resource_agent="ocf::heartbeat:clvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-072" id="3" cached="false"/>
            </resource>
            <resource id="clvmd" resource_agent="ocf::heartbeat:clvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-063" id="1" cached="false"/>
            </resource>
            <resource id="clvmd" resource_agent="ocf::heartbeat:clvm" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" >
                <node name="virt-069" id="2" cached="false"/>
            </resource>
        </clone>
    </resources>
    <failures>
        <failure op_key="vd_monitor_0" node="virt-072" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="33" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="37" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-072" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="33" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="37" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-063" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="37" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="44" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-063" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="37" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="44" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-069" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="33" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="38" interval="0" task="monitor" />
        <failure op_key="vd_monitor_0" node="virt-069" exitstatus="not installed" exitreason="Setup problem: couldn&apos;t find command: virsh" exitcode="5" call="33" status="complete" last-rc-change="Tue Dec  2 17:18:37 2014" queued="0" exec="38" interval="0" task="monitor" />
    </failures>
</crm_mon>
[root@virt-063 ~]#

Comment 4 errata-xmlrpc 2015-03-05 10:00:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0440.html