Bug 1955792

Summary: RFE: crm_resource option to pass OCF_CHECK_LEVEL to OCF resource agents
Product: Red Hat Enterprise Linux 8 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Chris Lumens <clumens>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: low Docs Contact:
Priority: high    
Version: 8.0CC: cluster-maint, kgaillot, msmazova
Target Milestone: betaKeywords: FutureFeature, Triaged
Target Release: 8.5   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: pacemaker-2.1.0-1.el8 Doc Type: No Doc Update
Doc Text:
This is more useful for use by pcs than by end users
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 18:44:49 UTC Type: Enhancement
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1816852, 2112270    

Description Ken Gaillot 2021-04-30 19:15:51 UTC
Description of problem: Pacemaker's crm_resource command-line tool has the --force-check option to directly run an OCF resource agent's monitor action, and --validate to directly run an agent's validate-all action. The recently adopted OCF Resource Agent API 1.1 standard allows those agent actions to behave differently depending on the value of the OCF_CHECK_LEVEL environment variable. A user could explicitly set that variable before calling crm_resource, but it would be more convenient to have a tool option for it.

Comment 1 Ken Gaillot 2021-04-30 19:21:52 UTC
I'm thinking this could either be a new option (e.g. --check-level <N>), or the --validate and --force-check options could be modified to take the check level as an optional argument (e.g. --validate to leave OCF_CHECK_LEVEL unset and --validate 10 to set OCF_CHECK_LEVEL to 10).

Comment 4 Ken Gaillot 2021-05-20 20:41:03 UTC
Feature added upstream by commit 3905e7ea

Comment 11 Markéta Smazová 2021-06-24 15:14:01 UTC
>   [root@virt-539 ~]# rpm -q pacemaker
>   pacemaker-2.1.0-2.el8.x86_64

Check man/help:

>   [root@virt-539 ~]# man crm_resource | grep validate=LEVEL -A4
>          --validate=LEVEL
>                 Validate resource configuration by calling agent's validate-all action. The configuration may be speci‐
>                 fied  either  by  giving  an  existing  resource  name  with -r, or by specifying --class, --agent, and
>                 --provider arguments, along with any number of --option arguments. An optional LEVEL  argument  can  be
>                 given to control the level of checking performed.

>   [root@virt-539 ~]# man crm_resource | grep force-check=LEVEL -A2
>          --force-check=LEVEL
>                 (Advanced)  Bypass  the  cluster and check the state of a resource on the local node. An optional LEVEL
>                 argument can be given to control the level of checking performed.

>   [root@virt-539 ~]# crm_resource --help-all | grep "validate=LEVEL" -A5
>     --validate=LEVEL                  Validate resource configuration by calling agent's validate-all
>                                       action. The configuration may be specified either by giving an
>                                       existing resource name with -r, or by specifying --class,
>                                       --agent, and --provider arguments, along with any number of
>                                       --option arguments. An optional LEVEL argument can be given
>                                       to control the level of checking performed.

>   [root@virt-539 ~]# crm_resource --help-all | grep "force-check=LEVEL" -A2
>     --force-check=LEVEL               (Advanced) Bypass the cluster and check the state of a resource on
>                                       the local node. An optional LEVEL argument can be given
>                                       to control the level of checking performed.

Have a cluster with a resource:

>   [root@virt-539 ~]# pcs status
>   Cluster name: STSRHTS20356
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-539 (version 2.1.0-2.el8-7c3f660707) - partition with quorum
>     * Last updated: Wed Jun 23 10:51:40 2021
>     * Last change:  Wed Jun 23 10:51:27 2021 by root via cibadmin on virt-539
>     * 2 nodes configured
>     * 4 resource instances configured

>   Node List:
>     * Online: [ virt-539 virt-548 ]

>   Full List of Resources:
>     * fence-virt-539	(stonith:fence_xvm):	 Started virt-539
>     * fence-virt-548	(stonith:fence_xvm):	 Started virt-548
>     * dummy_1	(ocf::pacemaker:Dummy):	 Started virt-539

>   Daemon Status:
>     corosync: active/disabled
>     pacemaker: active/disabled
>     pcsd: active/enabled

Try to validate resource configuration:

>   [root@virt-539 ~]# crm_resource --class=ocf --provider=pacemaker --agent=Dummy --validate
>   Operation validate-all for test (ocf:pacemaker:Dummy) returned: 'ok' (0)

>   [root@virt-539 ~]# crm_resource --resource dummy_1 --validate
>   Operation validate-all for dummy_1 (ocf:pacemaker:Dummy) returned: 'ok' (0)

>   [root@virt-539 ~]# crm_resource --resource dummy_1 --validate=10
>   Operation validate-all for dummy_1 (ocf:pacemaker:Dummy) returned: 'ok' (0)

Try to force-check the state of a resource for the various values of `OCF_CHECK_LEVEL` environment variable:

>   [root@virt-539 ~]# crm_resource --resource dummy_1 --force-check
>   Operation monitor for dummy_1 (ocf:pacemaker:Dummy) returned: 'ok' (0)

>   [root@virt-539 ~]# crm_resource --resource dummy_1 --force-check=10
>   Operation monitor for dummy_1 (ocf:pacemaker:Dummy) failed: 'Timed Out' (2)
>   crm_resource: Error performing operation: Error occurred

>   [root@virt-539 ~]# crm_resource --resource dummy_1 --force-check=20
>   Operation monitor for dummy_1 (ocf:pacemaker:Dummy) returned: 'error' (1)
>   ocf-exit-reason:smoke detected near CPU fan
>   crm_resource: Error performing operation: Error occurred

>   [root@virt-539 ~]# crm_resource --resource dummy_1 --force-check=30
>   Operation monitor for dummy_1 (ocf:pacemaker:Dummy) returned: 'error' (1)
>   ocf-exit-reason:hyperdrive quota reached
>   crm_resource: Error performing operation: Error occurred

>   [root@virt-539 ~]# echo "42" > /run/Dummy-dummy_1.state; crm_resource --resource dummy_1 --force-check=40
>   Operation monitor for dummy_1 (ocf:pacemaker:Dummy) returned: 'unknown' (42)
>   ocf-exit-reason:CPU ejected. Observed leaving the Kronosnet galaxy at 42 times the speed of light.
>   crm_resource: Error performing operation: Unknown exit status

>   [root@virt-539 ~]# pcs resource disable dummy_1
>   [root@virt-539 ~]# pcs resource
>     * dummy_1	(ocf::pacemaker:Dummy):	 Stopped (disabled)

>   [root@virt-539 ~]# crm_resource --resource dummy_1 --force-check
>   Operation monitor for dummy_1 (ocf:pacemaker:Dummy) returned: 'not running' (7)

Exit reasons and statuses returned by ocf:pacemaker:Dummy agent for the various check levels correspond with those specified 
in: https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/Dummy#L215-L245


marking verified in pacemaker-2.1.0-2.el8

Comment 13 errata-xmlrpc 2021-11-09 18:44:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:4267