1298585 – [RFE] pcs status output could be simpler when constraints are in place

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1298585 - [RFE] pcs status output could be simpler when constraints are in place

Summary: [RFE] pcs status output could be simpler when constraints are in place

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	7.3
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Ivan Devat
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1361533
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-14 13:43 UTC by Michele Baldessari
Modified:	2020-02-14 17:38 UTC (History)
CC List:	9 users (show)
Fixed In Version:	pcs-0.9.152-7.el7
Doc Type:	Enhancement
Doc Text:	Feature: allow hide inactive resources in a pcs status Reason: ability to show shorter, more readable output Result: pcs is able hide inactive resources in a pcs status, so the output is shorter and more readable
Clone Of:
Environment:
Last Closed:	2016-11-03 20:56:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
proposed fix 2 (9.43 KB, patch) 2016-08-08 11:39 UTC, Tomas Jelinek	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:2596	0	normal	SHIPPED_LIVE	Moderate: pcs security, bug fix, and enhancement update	2016-11-03 12:11:34 UTC

Description Michele Baldessari 2016-01-14 13:43:23 UTC

(I realize this one might not be trivial, but it is a bit of a pain point
in bigger openstack installations so here we go ;)

In an OSP installation with Instance HA the compute/hypervisor nodes use
pacemaker_remoted to manage services.

There are two type of nodes in this context: controllers and compute nodes.
Each node has a "osprole=compute/controller" depending on its role.

Now certain services will have a constraint to be run only on compute nodes and
other services will have a constraint to make them run only on controllers.

What would be nice is if pcs status did not show the stopped state for services
on the nodes where it may *never* run due to the aforementioned constraints.

Here is an example:
Current DC: overcloud-controller-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
5 nodes and 207 resources configured

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
RemoteOnline: [ overcloud-novacompute-0 overcloud-novacompute-1 ]

Full list of resources:

 ip-192.0.2.14  (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]
 ip-192.0.2.15  (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-1 ]
     Slaves: [ overcloud-controller-0 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]
 Clone Set: mongod-clone [mongod]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]

....

Now we know already due to the constraints that redis-master may never ever run 
on any compute nodes, so it is a bit confusing to see the list of all
the compute nodes in Stopped state there. Also because potentially
the compute nodes might scale up to very huge numbers (hundreds) which would
make the output almost unreadable.

Comment 2 Ken Gaillot 2016-01-19 16:41:12 UTC

The requested behavior is the default in upstream pacemaker since 1.1.13; crm_mon should print Stopped clones only if --inactive was specified. Is pcs using --inactive when calling crm_mon?

Comment 3 Tomas Jelinek 2016-01-20 09:59:21 UTC

Ken,

You are right, pcs calls crm_mon with -r option specified. I tested crm_mon from pacemaker-1.1.13-10.el7.x86_64 and it works as requested in comment 0.

We will handle this in pcs then.

Thanks.

Comment 4 Ivan Devat 2016-02-02 10:05:12 UTC

Michele,

could you run "crm_mon --one-shot" on your cluster and take a look on its output. If you are satisfied with it, we can add an option to pcs to show resources status like this. Realize however that in this output all stopped resources are omitted, not only those with a constraint to be run only on specific nodes.

Thanks

Comment 5 Michele Baldessari 2016-02-03 15:13:15 UTC

Hi Ivan,

I think it is definitely an improvement. What is needed is that the stopped
are not printed when it is expected they are not running on a node.

So if there is a constraint preventing it to run it should either not print
stopped at all or it should somehow display the fact that it is stopped on that
node because of constraint rules and not because it failed there.

We can split this BZ in two if you want:
1) We keep this one to add crm_mon --one-shot to pcs
2) We track via another BZ if we can add some logic to differentiate the display of stopped due to failure vs stopped due to constraints

Does that make sense?

Comment 6 Ken Gaillot 2016-02-03 15:34:12 UTC

The output as proposed here would follow your first suggestion, i.e. stopped resources would not be printed at all.

I suspect the output would get too cluttered with the other suggestion, trying to show a reason for every stopped resource. I think that level of detail would be better using a GUI or HTML, where the user could click on a resource for more detail; and/or a separate command-line option, for example "pcs resource why".

Comment 7 Michele Baldessari 2016-02-03 15:55:28 UTC

(In reply to Ken Gaillot from comment #6)
> The output as proposed here would follow your first suggestion, i.e. stopped
> resources would not be printed at all.
> 
> I suspect the output would get too cluttered with the other suggestion,
> trying to show a reason for every stopped resource. I think that level of
> detail would be better using a GUI or HTML, where the user could click on a
> resource for more detail; and/or a separate command-line option, for example
> "pcs resource why".

I see, that makes sense. How about we aim for something like:
"Show me stopped only in case of failure" and leave anything else out (e.g. stopped due to constraints)?

I think that would be the most useful output for sysadmin managing a somewhat
complex cluster via CLI.

Comment 8 Ken Gaillot 2016-02-03 22:14:06 UTC

Any failed actions are already listed in their own section. However there's not necessarily a direct connection with the resource being stopped; a resource might fail but then be recovered successfully, or it might be the stop action that failed.

I think the "crm_mon --one-shot" output should be sufficient.

Comment 9 Ivan Devat 2016-02-05 14:56:55 UTC

proposed fix:
https://github.com/feist/pcs/commit/c4e49ba368cf636337a276f814aac242987c4222

Setup:
[vm-rhel72-1 ~pcs/pcs] $ ./pcs resource create resource-dummy Dummy
[vm-rhel72-1 ~pcs/pcs] $ ./pcs constraint location resource-dummy avoids vm-rhel72-2=INFINITY
[vm-rhel72-1 ~pcs/pcs] $ ./pcs status | grep Stopped:
     Stopped: [ vm-rhel72-2 ]

Test:
[vm-rhel72-1 ~pcs/pcs] $ ./pcs status --hide-inactive | grep Stopped:
[vm-rhel72-1 ~pcs/pcs] $ ./pcs status --hide-inactive --full
Error: you cannot specify both --hide-inactive and --full

Cleanup:
[vm-rhel72-1 ~pcs/pcs] $ ./pcs resource delete resource-dummy
Attempting to stop: resource-dummy...Stopped

Comment 10 Mike McCune 2016-03-28 22:42:18 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 11 Ivan Devat 2016-05-31 12:07:19 UTC

Setup:
[vm-rhel72-1 ~] $ pcs resource create resource-dummy Dummy
[vm-rhel72-1 ~] $ pcs constraint location resource-dummy avoids vm-rhel72-3=INFINITY
[vm-rhel72-1 ~] $ pcs resource clone resource-dummy
[vm-rhel72-1 ~] $ pcs status | grep Stopped:
     Stopped: [ vm-rhel72-3 ]


Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64

No way to display status without stopped resources.


After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.151-1.el7.x86_64

[vm-rhel72-1 ~] $ pcs status --hide-inactive | grep Stopped:
[vm-rhel72-1 ~] $ pcs status --hide-inactive --full
Error: you cannot specify both --hide-inactive and --full
[vm-rhel72-1 ~] $ pcs resource delete resource-dummy
Attempting to stop: resource-dummy...Stopped

Comment 14 Tomas Jelinek 2016-08-08 11:39:11 UTC

Created attachment 1188649 [details]
proposed fix 2

Test:

[root@rh72-node1:~]# pcs resource 
 dummy  (ocf::heartbeat:Dummy): Started rh72-node1
 d1     (ocf::heartbeat:Dummy): Started rh72-node2
 d2     (ocf::heartbeat:Dummy): Stopped (disabled)
 d3     (ocf::heartbeat:Dummy): Started rh72-node2
 d4     (ocf::heartbeat:Dummy): Stopped (disabled)
 d5     (ocf::heartbeat:Dummy): Started rh72-node1
[root@rh72-node1:~]# pcs resource --hide-inactive
 dummy  (ocf::heartbeat:Dummy): Started rh72-node1
 d1     (ocf::heartbeat:Dummy): Started rh72-node2
 d3     (ocf::heartbeat:Dummy): Started rh72-node2
 d5     (ocf::heartbeat:Dummy): Started rh72-node1

Comment 15 Ivan Devat 2016-08-19 12:27:25 UTC

Setup:
[vm-rhel72-1 ~] $ pcs resource create d1 Dummy
[vm-rhel72-1 ~] $ pcs resource create d2 Dummy
[vm-rhel72-1 ~] $ pcs resource disable d1
[vm-rhel72-1 ~] $ pcs resource
 d1     (ocf::heartbeat:Dummy): Stopped (disabled)
 d2     (ocf::heartbeat:Dummy): Started vm-rhel72-3


Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-6.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource --hide-inactive
 d1     (ocf::heartbeat:Dummy): Stopped (disabled)
 d2     (ocf::heartbeat:Dummy): Started vm-rhel72-3


After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-7.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource --hide-inactive
 d2     (ocf::heartbeat:Dummy): Started vm-rhel72-3

Comment 21 errata-xmlrpc 2016-11-03 20:56:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2596.html

Note You need to log in before you can comment on or make changes to this bug.