Bug 1298585
| Summary: | [RFE] pcs status output could be simpler when constraints are in place | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Michele Baldessari <michele> | ||||
| Component: | pcs | Assignee: | Ivan Devat <idevat> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 7.3 | CC: | cfeist, cluster-maint, kgaillot, michele, rmarigny, royoung, rsteiger, tojeline, vcojot | ||||
| Target Milestone: | rc | Keywords: | FutureFeature | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pcs-0.9.152-7.el7 | Doc Type: | Enhancement | ||||
| Doc Text: |
Feature:
allow hide inactive resources in a pcs status
Reason:
ability to show shorter, more readable output
Result:
pcs is able hide inactive resources in a pcs status, so the output is shorter and more readable
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-11-03 20:56:09 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1361533 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
The requested behavior is the default in upstream pacemaker since 1.1.13; crm_mon should print Stopped clones only if --inactive was specified. Is pcs using --inactive when calling crm_mon? Ken, You are right, pcs calls crm_mon with -r option specified. I tested crm_mon from pacemaker-1.1.13-10.el7.x86_64 and it works as requested in comment 0. We will handle this in pcs then. Thanks. Michele, could you run "crm_mon --one-shot" on your cluster and take a look on its output. If you are satisfied with it, we can add an option to pcs to show resources status like this. Realize however that in this output all stopped resources are omitted, not only those with a constraint to be run only on specific nodes. Thanks Hi Ivan, I think it is definitely an improvement. What is needed is that the stopped are not printed when it is expected they are not running on a node. So if there is a constraint preventing it to run it should either not print stopped at all or it should somehow display the fact that it is stopped on that node because of constraint rules and not because it failed there. We can split this BZ in two if you want: 1) We keep this one to add crm_mon --one-shot to pcs 2) We track via another BZ if we can add some logic to differentiate the display of stopped due to failure vs stopped due to constraints Does that make sense? The output as proposed here would follow your first suggestion, i.e. stopped resources would not be printed at all. I suspect the output would get too cluttered with the other suggestion, trying to show a reason for every stopped resource. I think that level of detail would be better using a GUI or HTML, where the user could click on a resource for more detail; and/or a separate command-line option, for example "pcs resource why". (In reply to Ken Gaillot from comment #6) > The output as proposed here would follow your first suggestion, i.e. stopped > resources would not be printed at all. > > I suspect the output would get too cluttered with the other suggestion, > trying to show a reason for every stopped resource. I think that level of > detail would be better using a GUI or HTML, where the user could click on a > resource for more detail; and/or a separate command-line option, for example > "pcs resource why". I see, that makes sense. How about we aim for something like: "Show me stopped only in case of failure" and leave anything else out (e.g. stopped due to constraints)? I think that would be the most useful output for sysadmin managing a somewhat complex cluster via CLI. Any failed actions are already listed in their own section. However there's not necessarily a direct connection with the resource being stopped; a resource might fail but then be recovered successfully, or it might be the stop action that failed. I think the "crm_mon --one-shot" output should be sufficient. proposed fix: https://github.com/feist/pcs/commit/c4e49ba368cf636337a276f814aac242987c4222 Setup: [vm-rhel72-1 ~pcs/pcs] $ ./pcs resource create resource-dummy Dummy [vm-rhel72-1 ~pcs/pcs] $ ./pcs constraint location resource-dummy avoids vm-rhel72-2=INFINITY [vm-rhel72-1 ~pcs/pcs] $ ./pcs status | grep Stopped: Stopped: [ vm-rhel72-2 ] Test: [vm-rhel72-1 ~pcs/pcs] $ ./pcs status --hide-inactive | grep Stopped: [vm-rhel72-1 ~pcs/pcs] $ ./pcs status --hide-inactive --full Error: you cannot specify both --hide-inactive and --full Cleanup: [vm-rhel72-1 ~pcs/pcs] $ ./pcs resource delete resource-dummy Attempting to stop: resource-dummy...Stopped This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions Setup:
[vm-rhel72-1 ~] $ pcs resource create resource-dummy Dummy
[vm-rhel72-1 ~] $ pcs constraint location resource-dummy avoids vm-rhel72-3=INFINITY
[vm-rhel72-1 ~] $ pcs resource clone resource-dummy
[vm-rhel72-1 ~] $ pcs status | grep Stopped:
Stopped: [ vm-rhel72-3 ]
Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64
No way to display status without stopped resources.
After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.151-1.el7.x86_64
[vm-rhel72-1 ~] $ pcs status --hide-inactive | grep Stopped:
[vm-rhel72-1 ~] $ pcs status --hide-inactive --full
Error: you cannot specify both --hide-inactive and --full
[vm-rhel72-1 ~] $ pcs resource delete resource-dummy
Attempting to stop: resource-dummy...Stopped
Created attachment 1188649 [details]
proposed fix 2
Test:
[root@rh72-node1:~]# pcs resource
dummy (ocf::heartbeat:Dummy): Started rh72-node1
d1 (ocf::heartbeat:Dummy): Started rh72-node2
d2 (ocf::heartbeat:Dummy): Stopped (disabled)
d3 (ocf::heartbeat:Dummy): Started rh72-node2
d4 (ocf::heartbeat:Dummy): Stopped (disabled)
d5 (ocf::heartbeat:Dummy): Started rh72-node1
[root@rh72-node1:~]# pcs resource --hide-inactive
dummy (ocf::heartbeat:Dummy): Started rh72-node1
d1 (ocf::heartbeat:Dummy): Started rh72-node2
d3 (ocf::heartbeat:Dummy): Started rh72-node2
d5 (ocf::heartbeat:Dummy): Started rh72-node1
Setup: [vm-rhel72-1 ~] $ pcs resource create d1 Dummy [vm-rhel72-1 ~] $ pcs resource create d2 Dummy [vm-rhel72-1 ~] $ pcs resource disable d1 [vm-rhel72-1 ~] $ pcs resource d1 (ocf::heartbeat:Dummy): Stopped (disabled) d2 (ocf::heartbeat:Dummy): Started vm-rhel72-3 Before Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.152-6.el7.x86_64 [vm-rhel72-1 ~] $ pcs resource --hide-inactive d1 (ocf::heartbeat:Dummy): Stopped (disabled) d2 (ocf::heartbeat:Dummy): Started vm-rhel72-3 After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.152-7.el7.x86_64 [vm-rhel72-1 ~] $ pcs resource --hide-inactive d2 (ocf::heartbeat:Dummy): Started vm-rhel72-3 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2596.html |
(I realize this one might not be trivial, but it is a bit of a pain point in bigger openstack installations so here we go ;) In an OSP installation with Instance HA the compute/hypervisor nodes use pacemaker_remoted to manage services. There are two type of nodes in this context: controllers and compute nodes. Each node has a "osprole=compute/controller" depending on its role. Now certain services will have a constraint to be run only on compute nodes and other services will have a constraint to make them run only on controllers. What would be nice is if pcs status did not show the stopped state for services on the nodes where it may *never* run due to the aforementioned constraints. Here is an example: Current DC: overcloud-controller-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum 5 nodes and 207 resources configured Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] RemoteOnline: [ overcloud-novacompute-0 overcloud-novacompute-1 ] Full list of resources: ip-192.0.2.14 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 Clone Set: haproxy-clone [haproxy] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ] Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ] ip-192.0.2.15 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 Master/Slave Set: redis-master [redis] Masters: [ overcloud-controller-1 ] Slaves: [ overcloud-controller-0 overcloud-controller-2 ] Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ] Clone Set: mongod-clone [mongod] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ] Clone Set: rabbitmq-clone [rabbitmq] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Stopped: [ overcloud-novacompute-0 overcloud-novacompute-1 ] .... Now we know already due to the constraints that redis-master may never ever run on any compute nodes, so it is a bit confusing to see the list of all the compute nodes in Stopped state there. Also because potentially the compute nodes might scale up to very huge numbers (hundreds) which would make the output almost unreadable.