Bug 1298585 - [RFE] pcs status output could be simpler when constraints are in place
[RFE] pcs status output could be simpler when constraints are in place
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs (Show other bugs)
7.3
All Linux
low Severity low
: rc
: ---
Assigned To: Ivan Devat
cluster-qe@redhat.com
: FutureFeature
Depends On: 1361533
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-14 08:43 EST by Michele Baldessari
Modified: 2017-01-25 03:14 EST (History)
9 users (show)

See Also:
Fixed In Version: pcs-0.9.152-7.el7
Doc Type: Enhancement
Doc Text:
Feature: allow hide inactive resources in a pcs status Reason: ability to show shorter, more readable output Result: pcs is able hide inactive resources in a pcs status, so the output is shorter and more readable
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-03 16:56:09 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
proposed fix 2 (9.43 KB, patch)
2016-08-08 07:39 EDT, Tomas Jelinek
no flags Details | Diff

  None (edit)
Description Michele Baldessari 2016-01-14 08:43:23 EST
(I realize this one might not be trivial, but it is a bit of a pain point
in bigger openstack installations so here we go ;)

In an OSP installation with Instance HA the compute/hypervisor nodes use
pacemaker_remoted to manage services.

There are two type of nodes in this context: controllers and compute nodes.
Each node has a "osprole=compute/controller" depending on its role.

Now certain services will have a constraint to be run only on compute nodes and
other services will have a constraint to make them run only on controllers.

What would be nice is if pcs status did not show the stopped state for services
on the nodes where it may *never* run due to the aforementioned constraints.

Here is an example:
Current DC: overcloud-controller-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
5 nodes and 207 resources configured

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
RemoteOnline: [ overcloud-novacompute-0 overcloud-novacompute-1 ]

Full list of resources:

 ip-192.0.2.14  (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]
 ip-192.0.2.15  (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-1 ]
     Slaves: [ overcloud-controller-0 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]
 Clone Set: mongod-clone [mongod]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ]

....

Now we know already due to the constraints that redis-master may never ever run 
on any compute nodes, so it is a bit confusing to see the list of all
the compute nodes in Stopped state there. Also because potentially
the compute nodes might scale up to very huge numbers (hundreds) which would
make the output almost unreadable.
Comment 2 Ken Gaillot 2016-01-19 11:41:12 EST
The requested behavior is the default in upstream pacemaker since 1.1.13; crm_mon should print Stopped clones only if --inactive was specified. Is pcs using --inactive when calling crm_mon?
Comment 3 Tomas Jelinek 2016-01-20 04:59:21 EST
Ken,

You are right, pcs calls crm_mon with -r option specified. I tested crm_mon from pacemaker-1.1.13-10.el7.x86_64 and it works as requested in comment 0.

We will handle this in pcs then.

Thanks.
Comment 4 Ivan Devat 2016-02-02 05:05:12 EST
Michele,

could you run "crm_mon --one-shot" on your cluster and take a look on its output. If you are satisfied with it, we can add an option to pcs to show resources status like this. Realize however that in this output all stopped resources are omitted, not only those with a constraint to be run only on specific nodes.

Thanks
Comment 5 Michele Baldessari 2016-02-03 10:13:15 EST
Hi Ivan,

I think it is definitely an improvement. What is needed is that the stopped
are not printed when it is expected they are not running on a node.

So if there is a constraint preventing it to run it should either not print
stopped at all or it should somehow display the fact that it is stopped on that
node because of constraint rules and not because it failed there.

We can split this BZ in two if you want:
1) We keep this one to add crm_mon --one-shot to pcs
2) We track via another BZ if we can add some logic to differentiate the display of stopped due to failure vs stopped due to constraints

Does that make sense?
Comment 6 Ken Gaillot 2016-02-03 10:34:12 EST
The output as proposed here would follow your first suggestion, i.e. stopped resources would not be printed at all.

I suspect the output would get too cluttered with the other suggestion, trying to show a reason for every stopped resource. I think that level of detail would be better using a GUI or HTML, where the user could click on a resource for more detail; and/or a separate command-line option, for example "pcs resource why".
Comment 7 Michele Baldessari 2016-02-03 10:55:28 EST
(In reply to Ken Gaillot from comment #6)
> The output as proposed here would follow your first suggestion, i.e. stopped
> resources would not be printed at all.
> 
> I suspect the output would get too cluttered with the other suggestion,
> trying to show a reason for every stopped resource. I think that level of
> detail would be better using a GUI or HTML, where the user could click on a
> resource for more detail; and/or a separate command-line option, for example
> "pcs resource why".

I see, that makes sense. How about we aim for something like:
"Show me stopped only in case of failure" and leave anything else out (e.g. stopped due to constraints)?

I think that would be the most useful output for sysadmin managing a somewhat
complex cluster via CLI.
Comment 8 Ken Gaillot 2016-02-03 17:14:06 EST
Any failed actions are already listed in their own section. However there's not necessarily a direct connection with the resource being stopped; a resource might fail but then be recovered successfully, or it might be the stop action that failed.

I think the "crm_mon --one-shot" output should be sufficient.
Comment 9 Ivan Devat 2016-02-05 09:56:55 EST
proposed fix:
https://github.com/feist/pcs/commit/c4e49ba368cf636337a276f814aac242987c4222

Setup:
[vm-rhel72-1 ~pcs/pcs] $ ./pcs resource create resource-dummy Dummy
[vm-rhel72-1 ~pcs/pcs] $ ./pcs constraint location resource-dummy avoids vm-rhel72-2=INFINITY
[vm-rhel72-1 ~pcs/pcs] $ ./pcs status | grep Stopped:
     Stopped: [ vm-rhel72-2 ]

Test:
[vm-rhel72-1 ~pcs/pcs] $ ./pcs status --hide-inactive | grep Stopped:
[vm-rhel72-1 ~pcs/pcs] $ ./pcs status --hide-inactive --full
Error: you cannot specify both --hide-inactive and --full

Cleanup:
[vm-rhel72-1 ~pcs/pcs] $ ./pcs resource delete resource-dummy
Attempting to stop: resource-dummy...Stopped
Comment 10 Mike McCune 2016-03-28 18:42:18 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Comment 11 Ivan Devat 2016-05-31 08:07:19 EDT
Setup:
[vm-rhel72-1 ~] $ pcs resource create resource-dummy Dummy
[vm-rhel72-1 ~] $ pcs constraint location resource-dummy avoids vm-rhel72-3=INFINITY
[vm-rhel72-1 ~] $ pcs resource clone resource-dummy
[vm-rhel72-1 ~] $ pcs status | grep Stopped:
     Stopped: [ vm-rhel72-3 ]


Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64

No way to display status without stopped resources.


After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.151-1.el7.x86_64

[vm-rhel72-1 ~] $ pcs status --hide-inactive | grep Stopped:
[vm-rhel72-1 ~] $ pcs status --hide-inactive --full
Error: you cannot specify both --hide-inactive and --full
[vm-rhel72-1 ~] $ pcs resource delete resource-dummy
Attempting to stop: resource-dummy...Stopped
Comment 14 Tomas Jelinek 2016-08-08 07:39 EDT
Created attachment 1188649 [details]
proposed fix 2

Test:

[root@rh72-node1:~]# pcs resource 
 dummy  (ocf::heartbeat:Dummy): Started rh72-node1
 d1     (ocf::heartbeat:Dummy): Started rh72-node2
 d2     (ocf::heartbeat:Dummy): Stopped (disabled)
 d3     (ocf::heartbeat:Dummy): Started rh72-node2
 d4     (ocf::heartbeat:Dummy): Stopped (disabled)
 d5     (ocf::heartbeat:Dummy): Started rh72-node1
[root@rh72-node1:~]# pcs resource --hide-inactive
 dummy  (ocf::heartbeat:Dummy): Started rh72-node1
 d1     (ocf::heartbeat:Dummy): Started rh72-node2
 d3     (ocf::heartbeat:Dummy): Started rh72-node2
 d5     (ocf::heartbeat:Dummy): Started rh72-node1
Comment 15 Ivan Devat 2016-08-19 08:27:25 EDT
Setup:
[vm-rhel72-1 ~] $ pcs resource create d1 Dummy
[vm-rhel72-1 ~] $ pcs resource create d2 Dummy
[vm-rhel72-1 ~] $ pcs resource disable d1
[vm-rhel72-1 ~] $ pcs resource
 d1     (ocf::heartbeat:Dummy): Stopped (disabled)
 d2     (ocf::heartbeat:Dummy): Started vm-rhel72-3


Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-6.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource --hide-inactive
 d1     (ocf::heartbeat:Dummy): Stopped (disabled)
 d2     (ocf::heartbeat:Dummy): Started vm-rhel72-3


After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-7.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource --hide-inactive
 d2     (ocf::heartbeat:Dummy): Started vm-rhel72-3
Comment 21 errata-xmlrpc 2016-11-03 16:56:09 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2596.html

Note You need to log in before you can comment on or make changes to this bug.