2092950 – pcs resource manage --monitor enables monitor for all resources in a group

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2092950 - pcs resource manage --monitor enables monitor for all resources in a group

Summary: pcs resource manage --monitor enables monitor for all resources in a group

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	9.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	9.2
Assignee:	Tomas Jelinek
QA Contact:	cluster-qe@redhat.com
Docs Contact:	Steven J. Levine
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-06-02 15:27 UTC by Tomas Jelinek
Modified:	2023-05-16 15:54 UTC (History)
CC List:	12 users (show)
Fixed In Version:	pcs-0.11.3-5.el9
Doc Type:	Bug Fix
Doc Text:	.Enabling a single resource and monitoring operation no longer enables monitoring operations for all resources in a resource group Previously, after unmanaging all resources and monitoring operations in a resource group, managing one of the resources in that group along with its monitoring operation re-enabled the monitoring operations for all resources in the resource group. This could trigger unexpected cluster behavior. With this fix, managing a resource and re-enabling its monitoring operation re-enables the monitoring operation for that resource only and not for the other resources in a resource group.
Clone Of:	1918527
Environment:
Last Closed:	2023-05-09 07:18:23 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CLUSTERQE-6161	None	None	None	2022-11-11 20:35:55 UTC
Red Hat Issue Tracker	RHELPLAN-124088	None	None	None	2022-06-02 15:34:08 UTC
Red Hat Knowledge Base (Solution)	5721201	None	None	None	2023-05-16 15:54:17 UTC
Red Hat Product Errata	RHBA-2023:2151	None	None	None	2023-05-09 07:18:47 UTC

Description Tomas Jelinek 2022-06-02 15:27:07 UTC

+++ This bug was initially created as a clone of Bug #1918527 +++

Description of problem:

Suppose that `pcs resource unmanage --monitor` is run on two individual resources in a group, and then `pcs resource manage --monitor` is run on one of those resources.

Both resources' monitor operations get re-enabled in this case. However, only the resource where the `manage --monitor` command was run gets re-managed.

Demo (with dummy1 and dummy2 inside dummygrp):
~~~
# pcs resource unmanage --monitor dummy1 && pcs resource unmanage --monitor dummy2

# pcs resource config dummy1 dummy2 | egrep 'Resource:|Meta|monitor'
 Resource: dummy1 (class=ocf provider=heartbeat type=Dummy)
  Meta Attrs: is-managed=false
              monitor enabled=false interval=10s timeout=20s (dummy1-monitor-interval-10s)
 Resource: dummy2 (class=ocf provider=heartbeat type=Dummy)
  Meta Attrs: is-managed=false
              monitor enabled=false interval=10s timeout=20s (dummy2-monitor-interval-10s)

# pcs resource manage --monitor dummy1

# pcs resource config dummy1 dummy2 | egrep 'Resource:|Meta|monitor'
 Resource: dummy1 (class=ocf provider=heartbeat type=Dummy)
              monitor interval=10s timeout=20s (dummy1-monitor-interval-10s)
 Resource: dummy2 (class=ocf provider=heartbeat type=Dummy)
  Meta Attrs: is-managed=false
              monitor interval=10s timeout=20s (dummy2-monitor-interval-10s)
~~~

IMO, the least surprising behavior would be to both re-manage and re-enable monitor on **only** the resource for which we ran `pcs resource manage --monitor`. Ideally the manage command should perform the inverse of the unmanage command.

However, even if we make the opposite case (to re-manage and re-enable monitor on all resources in the group), then we should be consistent. I don't see a reason to re-enable monitor on all resources when we only re-manage one resource.

-----

The cause:

With some debugging added:
~~~
# pcs resource manage --monitor dummy1
Iterating over _find_resources_expand_tags_or_raise:
resource_el is dummy1
Iterating over to_manage_set:
resource_el is dummy1
resource_el is dummygrp
Iterating over primitives_set:
resource_el is dummy1
resource_el is dummy2
~~~

find_resources_to_manage() grabs both the primitive ID and the group ID in accordance with the comment block below its doc string.
  - (https://github.com/ClusterLabs/pcs/blob/v0.10.7/pcs/lib/cib/resource/common.py#L237-L243)

Then find_primitives() updates primitives_set with all the primitives in dummygrp (dummy1 and dummy2).

We iterate over these primitives and enable all their monitor operations.


Note: I acknowledge that specifying the behavior for pcs resource enable/manage is quite difficult and that decisions often have unintended consequences, as we also discussed in BZ 1875632. In this case, I believe we should enable monitors on a particular primitive only if either (a) the primitive was specified explicitly on the command line or (b) the primitive's group/parent ID was specified explicitly on the command line.

In my example, that would look like: "enable monitor on dummy1, but don't enable monitor for dummy2 because neither dummy2 nor dummygrp was passed as a CLI argument."

(Without looking further, I'm not sure how tags would factor into this.)

-----

Version-Release number of selected component (if applicable):

pcs-0.10.6-4.el8

-----

How reproducible:

Always

-----

Steps to Reproduce:
1. Configure two resources in a group (e.g., dummy1 and dummy2 in dummygrp).
2. Run `pcs resource unmanage --monitor` on both resources.
3. Run `pcs resource manage --monitor` on only one resource (e.g., dummy1).

-----

Actual results:

The monitor operation is re-enabled for both resources. Only the specified resource is re-managed.

-----

Expected results:

The monitor operation is re-enabled only for the specified resource. Only the specified resource is re-managed.

-----

Additional info:

As a workaround, users can omit the `--monitor` keyword and instead manipulate the monitor operation manually.

Comment 1 Tomas Jelinek 2022-06-27 13:29:20 UTC

Upstream fix + tests: https://github.com/ClusterLabs/pcs/commit/ddbd11f255702a171ddbfc84cd219adce4e0ea7b

Test:
See Steps to Reproduce, Actual results, Expected results in comment 0

Comment 2 Miroslav Lisik 2022-10-26 13:10:13 UTC

DevTestResults:

[root@r92-1 ~]# rpm -q pcs
pcs-0.11.3-5.el9.x86_64

[root@r92-1 ~]# pcs resource
  * Resource Group: G:
    * d1        (ocf:pacemaker:Dummy):   Started r92-1
    * d2        (ocf:pacemaker:Dummy):   Started r92-1
[root@r92-1 ~]# pcs resource unmanage --monitor d1
[root@r92-1 ~]# pcs resource unmanage --monitor d2
[root@r92-1 ~]# pcs resource
  * Resource Group: G:
    * d1        (ocf:pacemaker:Dummy):   Started r92-1 (unmanaged)
    * d2        (ocf:pacemaker:Dummy):   Started r92-1 (unmanaged)
[root@r92-1 ~]# pcs resource manage --monitor d1
[root@r92-1 ~]# pcs resource
  * Resource Group: G:
    * d1        (ocf:pacemaker:Dummy):   Started r92-1
    * d2        (ocf:pacemaker:Dummy):   Started r92-1 (unmanaged)

Comment 7 Michal Mazourek 2023-01-17 15:48:11 UTC

The same test as in bz1918527 comment 10 was used to verify this bz.
Marking as VERIFIED for pcs-0.11.4-4.el9.

Comment 10 errata-xmlrpc 2023-05-09 07:18:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2151

Note You need to log in before you can comment on or make changes to this bug.