Bug 2092950

Summary: pcs resource manage --monitor enables monitor for all resources in a group
Product: Red Hat Enterprise Linux 9 Reporter: Tomas Jelinek <tojeline>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact: Steven J. Levine <slevine>
Priority: medium    
Version: 9.0CC: cluster-maint, cluster-qe, idevat, mlisik, mmazoure, mpospisi, nhostako, nwahl, omular, sbradley, slevine, tojeline
Target Milestone: rcKeywords: Triaged
Target Release: 9.2   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: pcs-0.11.3-5.el9 Doc Type: Bug Fix
Doc Text:
.Enabling a single resource and monitoring operation no longer enables monitoring operations for all resources in a resource group Previously, after unmanaging all resources and monitoring operations in a resource group, managing one of the resources in that group along with its monitoring operation re-enabled the monitoring operations for all resources in the resource group. This could trigger unexpected cluster behavior. With this fix, managing a resource and re-enabling its monitoring operation re-enables the monitoring operation for that resource only and not for the other resources in a resource group.
Story Points: ---
Clone Of: 1918527 Environment:
Last Closed: 2023-05-09 07:18:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tomas Jelinek 2022-06-02 15:27:07 UTC
+++ This bug was initially created as a clone of Bug #1918527 +++

Description of problem:

Suppose that `pcs resource unmanage --monitor` is run on two individual resources in a group, and then `pcs resource manage --monitor` is run on one of those resources.

Both resources' monitor operations get re-enabled in this case. However, only the resource where the `manage --monitor` command was run gets re-managed.

Demo (with dummy1 and dummy2 inside dummygrp):
~~~
# pcs resource unmanage --monitor dummy1 && pcs resource unmanage --monitor dummy2

# pcs resource config dummy1 dummy2 | egrep 'Resource:|Meta|monitor'
 Resource: dummy1 (class=ocf provider=heartbeat type=Dummy)
  Meta Attrs: is-managed=false
              monitor enabled=false interval=10s timeout=20s (dummy1-monitor-interval-10s)
 Resource: dummy2 (class=ocf provider=heartbeat type=Dummy)
  Meta Attrs: is-managed=false
              monitor enabled=false interval=10s timeout=20s (dummy2-monitor-interval-10s)

# pcs resource manage --monitor dummy1

# pcs resource config dummy1 dummy2 | egrep 'Resource:|Meta|monitor'
 Resource: dummy1 (class=ocf provider=heartbeat type=Dummy)
              monitor interval=10s timeout=20s (dummy1-monitor-interval-10s)
 Resource: dummy2 (class=ocf provider=heartbeat type=Dummy)
  Meta Attrs: is-managed=false
              monitor interval=10s timeout=20s (dummy2-monitor-interval-10s)
~~~

IMO, the least surprising behavior would be to both re-manage and re-enable monitor on **only** the resource for which we ran `pcs resource manage --monitor`. Ideally the manage command should perform the inverse of the unmanage command.

However, even if we make the opposite case (to re-manage and re-enable monitor on all resources in the group), then we should be consistent. I don't see a reason to re-enable monitor on all resources when we only re-manage one resource.

-----

The cause:

With some debugging added:
~~~
# pcs resource manage --monitor dummy1
Iterating over _find_resources_expand_tags_or_raise:
resource_el is dummy1
Iterating over to_manage_set:
resource_el is dummy1
resource_el is dummygrp
Iterating over primitives_set:
resource_el is dummy1
resource_el is dummy2
~~~

find_resources_to_manage() grabs both the primitive ID and the group ID in accordance with the comment block below its doc string.
  - (https://github.com/ClusterLabs/pcs/blob/v0.10.7/pcs/lib/cib/resource/common.py#L237-L243)

Then find_primitives() updates primitives_set with all the primitives in dummygrp (dummy1 and dummy2).

We iterate over these primitives and enable all their monitor operations.


Note: I acknowledge that specifying the behavior for pcs resource enable/manage is quite difficult and that decisions often have unintended consequences, as we also discussed in BZ 1875632. In this case, I believe we should enable monitors on a particular primitive only if either (a) the primitive was specified explicitly on the command line or (b) the primitive's group/parent ID was specified explicitly on the command line.

In my example, that would look like: "enable monitor on dummy1, but don't enable monitor for dummy2 because neither dummy2 nor dummygrp was passed as a CLI argument."

(Without looking further, I'm not sure how tags would factor into this.)

-----

Version-Release number of selected component (if applicable):

pcs-0.10.6-4.el8

-----

How reproducible:

Always

-----

Steps to Reproduce:
1. Configure two resources in a group (e.g., dummy1 and dummy2 in dummygrp).
2. Run `pcs resource unmanage --monitor` on both resources.
3. Run `pcs resource manage --monitor` on only one resource (e.g., dummy1).

-----

Actual results:

The monitor operation is re-enabled for both resources. Only the specified resource is re-managed.

-----

Expected results:

The monitor operation is re-enabled only for the specified resource. Only the specified resource is re-managed.

-----

Additional info:

As a workaround, users can omit the `--monitor` keyword and instead manipulate the monitor operation manually.

Comment 1 Tomas Jelinek 2022-06-27 13:29:20 UTC
Upstream fix + tests: https://github.com/ClusterLabs/pcs/commit/ddbd11f255702a171ddbfc84cd219adce4e0ea7b

Test:
See Steps to Reproduce, Actual results, Expected results in comment 0

Comment 2 Miroslav Lisik 2022-10-26 13:10:13 UTC
DevTestResults:

[root@r92-1 ~]# rpm -q pcs
pcs-0.11.3-5.el9.x86_64

[root@r92-1 ~]# pcs resource
  * Resource Group: G:
    * d1        (ocf:pacemaker:Dummy):   Started r92-1
    * d2        (ocf:pacemaker:Dummy):   Started r92-1
[root@r92-1 ~]# pcs resource unmanage --monitor d1
[root@r92-1 ~]# pcs resource unmanage --monitor d2
[root@r92-1 ~]# pcs resource
  * Resource Group: G:
    * d1        (ocf:pacemaker:Dummy):   Started r92-1 (unmanaged)
    * d2        (ocf:pacemaker:Dummy):   Started r92-1 (unmanaged)
[root@r92-1 ~]# pcs resource manage --monitor d1
[root@r92-1 ~]# pcs resource
  * Resource Group: G:
    * d1        (ocf:pacemaker:Dummy):   Started r92-1
    * d2        (ocf:pacemaker:Dummy):   Started r92-1 (unmanaged)

Comment 7 Michal Mazourek 2023-01-17 15:48:11 UTC
The same test as in bz1918527 comment 10 was used to verify this bz.
Marking as VERIFIED for pcs-0.11.4-4.el9.

Comment 10 errata-xmlrpc 2023-05-09 07:18:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2151