Bug 1637020

Summary:	RFE: support for value validation of resource agent's parameters before CIB modification
Product:	Red Hat Enterprise Linux 8	Reporter:	Miroslav Lisik <mlisik>
Component:	pacemaker	Assignee:	Chris Lumens <clumens>
Status:	CLOSED ERRATA	QA Contact:	cluster-qe <cluster-qe>
Severity:	low	Docs Contact:
Priority:	medium
Version:	8.0	CC:	abeekhof, cfeist, cluster-maint, cluster-qe, idevat, jpokorny, kgaillot, mnovacek, omular, phagara, tojeline
Target Milestone:	rc	Keywords:	FutureFeature
Target Release:	8.1	Flags:	pm-rhel: mirror+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	pacemaker-2.0.2-1.el8	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:	1637012
Clones:	1637023 (view as bug list)		Environment:
Last Closed:	2019-11-05 20:57:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1637023, 1682116
Bug Blocks:	1637012

Description Miroslav Lisik 2018-10-08 13:16:25 UTC

pcs will need some support implemented in pacemaker utilities like crm_resource in order to implement value validation of resource agent's parameters.

+++ This bug was initially created as a clone of Bug #1637012 +++

Description of problem:

Currently pcs can validate if given parameter is supported by resource agent and if it is required but pcs doesn't check if the given value for the parameter is valid.

Each resource agent validate its parameters but this happens after resource is created in CIB. It would be useful for user if pcs could report that values for resource agent are invalid before modification of CIB.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-10-08 09:04:15 EDT ---

Since this bug report was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-10-08 09:04:15 EDT ---

The keyword FutureFeature has been added. If this bug is not a FutureFeature, please remove from the Summary field any strings containing "RFE, rfe, FutureFeature, FEAT, Feat, feat".

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-10-08 09:07:40 EDT ---

The keyword FutureFeature has been added. If this bug is not a FutureFeature, please remove from the Summary field any strings containing "RFE, rfe, FutureFeature, FEAT, Feat, feat".

Comment 2 Tomas Jelinek 2018-10-08 14:26:50 UTC

This is basically bz1434936 for resource agents.

Comment 3 Ken Gaillot 2018-10-08 14:46:18 UTC

It should be feasible to implement a Bug 1434936 style interface for crm_resource (i.e. allowing the user to specify a list of parameters rather than use the resource configuration from the CIB).

However it would be possible to do this with the existing crm_resource by saving the current CIB to a file and then using CIB_file to add the resource configuration and run crm_resource, without affecting the live CIB. That would actually be two levels of validation, first that the XML validates, and then the agent logic.

Comment 4 Chris Lumens 2019-03-12 14:02:54 UTC

This is fixed by upstream commit 4b2e51d1eaeb5a94869a35eb6a5d61f4f2f1b689.

Comment 7 Patrik Hagara 2019-09-02 11:28:37 UTC

package version used for testing:

> [root@virt-042 ~]# rpm -q pacemaker
> pacemaker-2.0.2-2.el8.x86_64


* option present in help:

> [root@virt-042 ~]# crm_resource --help
[...]
>      --validate		Call the validate-all action.  There are two different things that can be validated.  One
> 				method is to validate the resource given by the -r option.  The other method is to validate
> 				a not-yet-existing resource.  This method requires the --class, --agent, and --provider
> 				arguments, plus at least one --option argument.
[...]


* option present in manual page:

> [root@virt-042 ~]# man crm_resource
[...]
>    Commands:
>        --validate
>               Call  the  validate-all  action.   There  are  two  different  things  that can be validated.  One method is to validate the resource given by the -r option.  The other method is to validate a
>               not-yet-existing resource.  This method requires the --class, --agent, and --provider arguments, plus at least one --option argument.
[...]


* call w/o any args:

> [root@virt-042 ~]# crm_resource --validate
> Must supply a resource id with -r
> Error performing operation: No such device or address
> [root@virt-042 ~]# echo $?
> 105


* validate existing resource:

> [root@virt-042 ~]# pcs resource create delay ocf:heartbeat:Delay startdelay=1
> [root@virt-042 ~]# crm_resource --validate --resource delay
> Operation validate-all for delay (ocf:heartbeat:Delay) returned: 'ok' (0)
> [root@virt-042 ~]# echo $?
> 0


* validate non-existent resource:

> [root@virt-042 ~]# crm_resource --validate --resource unknown
> Resource 'unknown' not found
> Error performing operation: No such device or address
> [root@virt-042 ~]# echo $?
> 105


* validate not-yet-existing resource, invalid usage:

> [root@virt-042 ~]# crm_resource --validate --class ocf
> ocf:: is not a known resource
> Invalid option(s) supplied, use --help for valid usage
> [root@virt-042 ~]# echo $?
> 64
> [root@virt-042 ~]# crm_resource --validate --provider heartbeat
> :heartbeat: is not a known resource
> Invalid option(s) supplied, use --help for valid usage
> [root@virt-042 ~]# echo $?
> 64
> [root@virt-042 ~]# crm_resource --validate --agent Delay
> ::Delay is not a known resource
> Invalid option(s) supplied, use --help for valid usage
> [root@virt-042 ~]# echo $?
> 64

Similar errors for combinations of k=2 from {--class, --provider, --agent} set, skipped here for brevity.

Issue #1: Is there any technical reason for having these 3 separate options instead of using the pcs-esque syntax (ie. a single "<class>:<provider>:<agent>" string, passed to eg. the "--agent" option while getting rid of the others)? It really seems like a bad decision to me, UX-wise.


* validate not-yet-existing resource, no --option used:

> [root@virt-042 ~]# crm_resource --validate --class ocf --provider heartbeat --agent Delay
> Operation validate-all for test (ocf:heartbeat:Delay) returned: 'ok' (0)
> [root@virt-042 ~]# echo $?
> 0

Issue #2: Notice how the help text and manual page lie about the requirement for --option usage.


* validate not-yet-existing resource, valid RA option name/value pair:

> [root@virt-042 ~]# crm_resource --validate --class ocf --provider heartbeat --agent Delay --option startdelay=2
> Operation validate-all for test (ocf:heartbeat:Delay) returned: 'ok' (0)
> [root@virt-042 ~]# echo $?
> 0


* validate not-yet-existing resource, option name unknown to RA:

> [root@virt-042 ~]# crm_resource --validate --class ocf --provider heartbeat --agent Delay --option foodelay=2
> Operation validate-all for test (ocf:heartbeat:Delay) returned: 'ok' (0)
> [root@virt-042 ~]# echo $?
> 0

Issue #3: crm_resource reports 'ok' and successful exit code, even though an option name unknown to the resource agent is used. If administrator decides to actually create a resource with such option, an error is (correctly) raised:

> [root@virt-042 ~]# pcs resource create foodelay ocf:heartbeat:Delay foodelay=1
> Error: invalid resource option 'foodelay', allowed options are: 'mondelay', 'startdelay', 'stopdelay', 'trace_file', 'trace_ra', use --force to override

I see no reason to ever use --force in the above command, and consequently for crm_resource --validate to return success in such a situation.


* validate not-yet-existing resource, invalid RA option value:

> [root@virt-042 ~]# crm_resource --validate --class ocf --provider heartbeat --agent Delay --option startdelay=foo
> Operation validate-all for test (ocf:heartbeat:Delay) returned: 'invalid parameter' (2)
> [root@virt-042 ~]# echo $?
> 1

Comment 9 Ken Gaillot 2019-09-03 21:11:12 UTC

(In reply to Patrik Hagara from comment #7)
<snip>
> > [root@virt-042 ~]# crm_resource --help
> [...]
> >      --validate		Call the validate-all action.  There are two different things that can be validated.  One
> > 				method is to validate the resource given by the -r option.  The other method is to validate
> > 				a not-yet-existing resource.  This method requires the --class, --agent, and --provider
> > 				arguments, plus at least one --option argument.
> [...]
> 
> * option present in manual page:
> 
> > [root@virt-042 ~]# man crm_resource
> [...]
> >    Commands:
> >        --validate
> >               Call  the  validate-all  action.   There  are  two  different  things  that can be validated.  One method is to validate the resource given by the -r option.  The other method is to validate a
> >               not-yet-existing resource.  This method requires the --class, --agent, and --provider arguments, plus at least one --option argument.
> [...]

These should say something like "and may take any number of --option arguments"

<snip>

> Issue #1: Is there any technical reason for having these 3 separate options
> instead of using the pcs-esque syntax (ie. a single
> "<class>:<provider>:<agent>" string, passed to eg. the "--agent" option
> while getting rid of the others)? It really seems like a bad decision to me,
> UX-wise.

Good point -- no, but it's already been released upstream, so it has to be kept for backward compatibility. I suppose we could accept either style. In any case, it's intended more for pcs to call rather than end users, so it won't really affect anyone.
 
> * validate not-yet-existing resource, no --option used:
> 
> > [root@virt-042 ~]# crm_resource --validate --class ocf --provider heartbeat --agent Delay
> > Operation validate-all for test (ocf:heartbeat:Delay) returned: 'ok' (0)
> > [root@virt-042 ~]# echo $?
> > 0
> 
> Issue #2: Notice how the help text and manual page lie about the requirement
> for --option usage.

as mentioned above
 
> * validate not-yet-existing resource, option name unknown to RA:
> 
> > [root@virt-042 ~]# crm_resource --validate --class ocf --provider heartbeat --agent Delay --option foodelay=2
> > Operation validate-all for test (ocf:heartbeat:Delay) returned: 'ok' (0)
> > [root@virt-042 ~]# echo $?
> > 0
> 
> Issue #3: crm_resource reports 'ok' and successful exit code, even though an
> option name unknown to the resource agent is used. If administrator decides
> to actually create a resource with such option, an error is (correctly)
> raised:
> 
> > [root@virt-042 ~]# pcs resource create foodelay ocf:heartbeat:Delay foodelay=1
> > Error: invalid resource option 'foodelay', allowed options are: 'mondelay', 'startdelay', 'stopdelay', 'trace_file', 'trace_ra', use --force to override
> 
> I see no reason to ever use --force in the above command, and consequently
> for crm_resource --validate to return success in such a situation.

This is actually an artifact of how the different components interact. pcs is the one enforcing the allowed options, not pacemaker or the resource agent -- so calling crm_resource directly skips that. When pcs implements its interface to this capability, it should do the same enforcement there.

It would also be possible for resource agents to make their validate action fail if given unrecognized parameters, but at this point, no resource agent does.
 
It's probably too late to get a new build just to update the help/man text, so we can open a separate BZ for that.

Comment 10 Jan Pokorný [poki] 2019-09-04 09:26:05 UTC

> It would also be possible for resource agents to make their validate
> action fail if given unrecognized parameters, but at this point, no
> resource agent does.

It'd be rather tricky given the layers the typical shell agent is
built from, unless the same order of call chaining across the layers
would be applied also to handling validate-all.

Take OCF_RESKEY_trace_ra for instance, it's only known directly to the
common skeleton/library layer, that's not something the agent itself
would be dealing with (and exporting in metadata) per se (shell is not
particularly good for any kind of systemic/structured wrapping, for
instance, less so for structured data, since metadata would then
presumably be tracked in a raw form, not as a final XML snippet).

Also, CGI and similar environment variables passing interfaces were
always looking at positive presence of the values only, not if there's
something superfluous in there.  Ditto random options at kernel
command-line, etc.  This is also how, e.g., systemd unit files
can gain new meaningful directives that will be just ignored when
(yet) unrecognized when verging on backward compatibility.

Comment 11 Patrik Hagara 2019-09-04 09:26:42 UTC

(In reply to Ken Gaillot from comment #9)
> This is actually an artifact of how the different components interact. pcs
> is the one enforcing the allowed options, not pacemaker or the resource
> agent -- so calling crm_resource directly skips that. When pcs implements
> its interface to this capability, it should do the same enforcement there.

Sorry, I was of the impression the latest RFEs (also bz#1572116) were part of some larger process to shift as much validation as possible down the stack, from pcs to pacemaker. And since it seemed like a sensible move (architecture-wise), I didn't even bother to verify that assumption. :)

Having the validation split this way -- when the underlying component is technically able to do the whole thing -- could be a nice stepping stone if there are plans to move it over completely. Are there such plans?

> It would also be possible for resource agents to make their validate action
> fail if given unrecognized parameters, but at this point, no resource agent
> does.

For invalid option values this makes sense, as those cannot be fully validated using the RA-provided metadata XML (as it specifies only the expected type of the value).

But why not make writing RAs simpler and implement option name (and value type) checks in pacemaker?

> It's probably too late to get a new build just to update the help/man text,
> so we can open a separate BZ for that.

Filed bz#1748805 to track that.

Comment 12 Ken Gaillot 2019-09-04 17:44:25 UTC

(In reply to Patrik Hagara from comment #11)
> (In reply to Ken Gaillot from comment #9)
> > This is actually an artifact of how the different components interact. pcs
> > is the one enforcing the allowed options, not pacemaker or the resource
> > agent -- so calling crm_resource directly skips that. When pcs implements
> > its interface to this capability, it should do the same enforcement there.
> 
> Sorry, I was of the impression the latest RFEs (also bz#1572116) were part
> of some larger process to shift as much validation as possible down the
> stack, from pcs to pacemaker. And since it seemed like a sensible move
> (architecture-wise), I didn't even bother to verify that assumption. :)

Actually for validation, we do the opposite -- higher-level tools do more hand-holding, lower-level components are liberal in what they accept.

Functionality on the other hand is always intended to be in pacemaker.

Part of the reason is intentional -- at the lowest level, the CIB is just XML, and allows users a lot of flexibility in how they edit it, even if it doesn't always make sense at the time (the user may be building a configuration in stages, or preparing for expected future changes). Also, pacemaker does not assume that agents provide correct meta-data; it can proceed with most functionality even if the meta-data command fails.

Another part is architectural. Pacemaker strictly segregates major functions; the CIB manager handles editing and synchronizing the CIB, and ensuring it conforms to pacemaker's XML schema, but it doesn't have any intelligence about the contents of the CIB as anything but pure XML, and doesn't have access to agent meta-data, so the only gating for edits is the schema. Also, the scheduler and command-line tools are mostly designed to be able to work on saved CIBs identically as if they were live (for troubleshooting, or testing planned changes), in which case agent meta-data may not be available.

Comment 13 Patrik Hagara 2019-09-05 08:01:06 UTC

Ah, thank you for the explanation, Ken.

Given that issue #1 from comment#7 is pretty much an internal implementation detail, #2 has now its own bz#1748805 and #3 is an architectural decision...

Marking verified in 2.0.2-2.el8.

Sorry for the noise. :)

Comment 15 errata-xmlrpc 2019-11-05 20:57:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3385