Bug 1469801 - pcs does not allow configuring per-operation instance attributes (completing the environment for executing such an action)
pcs does not allow configuring per-operation instance attributes (completing ...
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs (Show other bugs)
7.4
Unspecified Unspecified
medium Severity unspecified
: rc
: ---
Assigned To: Tomas Jelinek
cluster-qe@redhat.com
: FutureFeature
Depends On:
Blocks: 1469809 1470223
  Show dependency treegraph
 
Reported: 2017-07-11 17:07 EDT by Jan Pokorný
Modified: 2017-07-21 07:03 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1469809 1470223 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jan Pokorný 2017-07-11 17:07:49 EDT
Motivation example:

ocf:heartbeat:apache agent defines in its meta-data "statusurl"
parameter, amongst others.  This particular parameter is quite
unusual: it only has an effect when "monitor" operation is being
carried out.  One may want to leverage this to make aritificially
doubly-leveled monitor (akin to using OCF_CHECK_LEVEL for OCF agents
supporting that provision natively).

- define the agent _without_ statusurl parameter
- define first monitor operation, with interval=X, without
  extra properties (operation options)
- define second monitor operation, with interval=Y (Y != X),
  with statusurl property (operation options)

Everything will work as expected:
- each X steps on the timeline, first monitor runs, only checking
  the root index page
- each Y steps on the timeline, second monitor runs, only checking
  the status page (as backed by mod_status)


Achieving this arrangement was effectively prevented by the unfortunate
fix to [bug 1382597] -- one cannot make do even with the last-resort
--force switch.  It also means one cannot mimick the configuration
in Bundle Walk-Through in ClusterLabs wiki solely with the help of pcs:

https://wiki.clusterlabs.org/w/index.php?title=Bundle_Walk-Through&oldid=1614#Configure_the_cluster


So what do I propose?

- make "monitor" operation accept any parameters supported by the agent
  (modulo clashes with the operation's very own options), treat them
  the same as with OCF_CHECK_LEVEL (put into <instance_attributes> for
  the <op> in question)

- ditto for other operations, but require --force, because this asks for
  problems (it's up to user to keep start-stop operations logically
  in sync, otherwise the agent may go crazy)

- always check if the resource agent parameter specified down at the
  operation level is not accidentally overridding:
  - the default (i.e., the default for that is specified in meta-data)
  - the same parameter set properly at the resource level
  and for each, emit respective message and refuse, unless with --force
  (note that "statusurl" for ocf:heartbeat:apache has no default)

- when allowing the resource agent parameters being passed on, ensure
  the same validation takes places as when these are specified directly
  at the resource level
Comment 1 Jan Pokorný 2017-07-11 17:22:06 EDT
Note that this doesn't touch merely "pcs resource create", but also
"pcs resource op" family of commands.
Comment 2 Ivan Devat 2017-07-12 03:07:36 EDT
If I understand correctly (taking into account bug 1382597 comment 8), the proposal is:

[vm-rhel72-1 ~] $ pcs resource create webserver ocf:heartbeat:apache op monitor interval=10 op monitor interval=20 statusurl=http://localhost/server-status
[vm-rhel72-1 ~] $ pcs cluster cib|grep 'primitive.*id="webserver"' -A 11
      <primitive class="ocf" id="webserver" provider="heartbeat" type="apache">
        <operations>
          <op id="webserver-monitor-interval-10" interval="10" name="monitor"/>
          <op id="webserver-monitor-interval-20" interval="20" name="monitor">
            <instance_attributes id="webserver-monitor-interval-20-instance_attributes">
              <nvpair id="webserver-monitor-interval-20-instance_attributes-statusurl" name="statusurl" value="http://localhost/server-status"/>
            </instance_attributes>
          </op>
          <op id="webserver-start-interval-0s" interval="0s" name="start" timeout="40s"/>
          <op id="webserver-stop-interval-0s" interval="0s" name="stop" timeout="60s"/>
        </operations>
      </primitive>
      
The 'statusurl' option is accepted because is among parameters of the resource agent. When some option is not among parameters you proposing use the --force flag. Something like this:

[vm-rhel72-1 ~] $ pcs resource create webserver ocf:heartbeat:apache op monitor interval=10 op monitor interval=20 a=b
Error: invalid resource operation option 'a', allowed options are: OCF_CHECK_LEVEL, description, enabled, id, interval, interval-origin, name, on-fail, record-pending, requires, role, start-delay, timeout. It could be used as an instance attribute of operation, use --force to override.
[vm-rhel72-1 ~] $ pcs resource create webserver ocf:heartbeat:apache op monitor interval=10 op monitor interval=20 a=b --force
[vm-rhel72-1 ~] $ pcs cluster cib|grep 'primitive.*id="webserver"' -A 11
      <primitive class="ocf" id="webserver" provider="heartbeat" type="apache">
        <operations>
          <op id="webserver-monitor-interval-10" interval="10" name="monitor"/>
          <op id="webserver-monitor-interval-20" interval="20" name="monitor">
            <instance_attributes id="webserver-monitor-interval-20-instance_attributes">
              <nvpair id="webserver-monitor-interval-20-instance_attributes-a" name="a" value="b"/>
            </instance_attributes>
          </op>
          <op id="webserver-start-interval-0s" interval="0s" name="start" timeout="40s"/>
          <op id="webserver-stop-interval-0s" interval="0s" name="stop" timeout="60s"/>
        </operations>
      </primitive>
      
But how could be achieved this?

<op id="webserver-monitor-interval-20" interval="20" name="monitor">
  <instance_attributes id="webserver-monitor-interval-20-instance_attributes">
    <nvpair id="webserver-monitor-interval-20-instance_attributes-interval" name="role" value="some-specific-value"/>
  </instance_attributes>
</op>

If you try

[vm-rhel72-1 ~] $ pcs resource create webserver3 ocf:heartbeat:apache op monitor interval=10 op monitor interval=20 role=master

you get

<op id="webserver3-monitor-interval-20" interval="20" name="monitor" role="Master"/>

Ok, it is artificial. But I gues it will be better to separate operation attributes from instance attributes explicitly. Something like this:

[vm-rhel72-1 ~] $ pcs resource create webserver ocf:heartbeat:apache op monitor interval=10 op monitor interval=20 op-params statusurl=http://localhost/server-status
Comment 3 Jan Pokorný 2017-07-12 07:24:59 EDT
re [comment 2]:

> But how could be achieved this?
> 
> <op id="webserver-monitor-interval-20" interval="20" name="monitor">
>   <instance_attributes id="webserver-monitor-interval-20-instance_attributes">
>     <nvpair id="webserver-monitor-interval-20-instance_attributes-interval" name="interval" value="some-specific-value"/>
>   </instance_attributes>
> </op>

^ changed "role" instance attribute into "interval", which was likely
  intended per the attribute id


First, let me elaborate on how pacemaker itself handles cases like this:

1. direct op vs. instance_attribute source clash
   - OCF_RESKEY_CRM_meta_X carries op-sourced counterpart, OCF_RESKEY_X
     carries instance_attributes-sourced one (agents are expected to
     immediately respond only to OCF_RESKEY_X encoding of parameters)
   - for the purpose of triggering the action, pacemaker only follows
     op-sourced value (note: "interval" at op-level is mandatory
     to conform to the schema)

2. instance_attribute as the only source
   - both OCF_RESKEY{_CRM_meta,}_X carry instance_attributes-sourced
     value
   - for the purpose of triggering the action, pacemaker does _not_
     follow this instance_attributes-sourced value

3. op as the only source
   - at least in some cases (start-delay) not even OCF_RESKEY_CRM_meta_X
     carries op-sourced value
   - apparently, for the purpose of triggering the action, pacemaker does
     follow this op-sourced value


In other words, pacemaker does _not_ turn the hypothetical conflict into
anything real.


On the other hand, there's indeed a real clash when parameter of
resource agent shadows the native op-level one, in:

A. distinguishing the intended destination when entering such
   parameter
   - pcs resource create (as mentioned at the bottom of [comment 2])
   - "pcs resouce op" family of commands

B. distinguishing the source when listing operations
   - pcs resource --full


As it is expected that there will be no clash most of the time, I'd
rather see special prefix (colon?) devised for explicit confirmation
that the instance_attributes level is intended.

Hence, the alternative modification of the command from [comment 2]
could be:

$ pcs resource create webserver3 ocf:heartbeat:apache \
  op monitor interval=10 \
  op monitor interval=20 :role=master
Comment 4 Jan Pokorný 2017-07-12 07:29:57 EDT
... "pcs resource --full" could then be explicit like that, too:

> Resource: webserver3 (class=ocf provider=heartbeat type=apache)
>  Operations: monitor interval=10s
>              monitor interval=20s :role=master
                                    ^
Comment 5 Jan Pokorný 2017-07-12 08:18:34 EDT
re [comment 0]:

> - ditto for other operations, but require --force

As an exception, allow specifying trace_{ra,file} parameters
(see [bug 1402374]) as instance_attributes regardless -- it's
actually quite helpful to focus on debugging/tracing just
a particular operation.
Comment 6 Jan Pokorný 2017-07-12 10:25:47 EDT
Another note:

There's likely a bug in pacemaker leading to different treatment of
meta_attributes per-op vs. op_defaults:

- per-op:
  this only leads to carrying OCF_RESKEY_CRM_meta_X to the agent
  being executed 

- op_defaults:
  ditto, _but_ also acutally takes those values into account for
  action processing internally

Quick test using "pcs resource restart httpd":

<resources>
  <primitive class="ocf" id="httpd" provider="heartbeat" type="apache">
    <operations>
      <op id="httpd-monitor" name="monitor" interval="30s">
        <meta_attributes id="httpd-monitor-meta">
          <nvpair id="httpd-monitor-meta-start-delay" name="start-delay" value="41s"/>
        </meta_attributes>
      </op>
    </operations>
    <meta_attributes id="httpd-meta_attributes"/>
  </primitive>
</resources>

-->

> httpd successfully restarted

vs.

<op_defaults>
  <meta_attributes id="op_defaults">
    <nvpair id="op_defaults-start-delay" name="start-delay" value="41s"/>
  </meta_attributes>
</op_defaults>
<resources>
  <primitive class="ocf" id="httpd" provider="heartbeat" type="apache">
    <operations>
      <op id="httpd-monitor" name="monitor" interval="30s"/>
    </operations>
    <meta_attributes id="httpd-meta_attributes"/>
  </primitive>
</resources>

-->

> Error: Could not complete shutdown of httpd, 1 resources remaining
> Error performing operation: Timer expired
> 
> Set 'httpd' option: id=httpd-meta_attributes-target-role set=httpd-meta_attributes name=target-role=stopped
> Waiting for 1 resources to stop:
>  * httpd
>  * httpd
> Deleted 'httpd' option: id=httpd-meta_attributes-target-role name=target-role


Hence, I am not getting what's the purpose of per-op meta_attributes.
Comment 7 Jan Pokorný 2017-07-12 10:48:18 EDT
Disregard [comment 6] (see [bug 1470223 comment 1]) and point 2.
from [comment 3]: in fact pacemaker indeed takes instance_attributes
into account for the purpose of triggering the action.

But that's unfortunate, as it allows for ambiguities (is some parameter
meant for pacemaker inner working, is it to be passed to the agent,
or both -- point 2. is actually this very case).

Will reopen [bug 1470223] to at least avoid that likely undesired
ambiguity.

Note You need to log in before you can comment on or make changes to this bug.