Bug 1686426
| Summary: | Add option to crm_simulate to display additional info about cluster status, like node attributes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Frank Danapfel <fdanapfe> |
| Component: | pacemaker | Assignee: | Chris Lumens <clumens> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 8.0 | CC: | cfeist, clumens, cluster-maint, kgaillot, lmiksik, msmazova, phagara |
| Target Milestone: | pre-dev-freeze | Keywords: | FutureFeature, Triaged |
| Target Release: | 8.5 | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-2.1.0-3.el8 | Doc Type: | Enhancement |
| Doc Text: |
(It is questionable whether we need to document this, since there is no pcs interface.)
Feature: Pacemaker's crm_simulate command-line tool now accept a --show-attrs option to display node attributes in simulation output, and --show-failcounts to display resource fail counts.
Reason: Node attribute and resource fail count information was previously available by running crm_mon separately using a CIB_file environment variable, but that was inconvenient.
Result: Users can easily display additional information that factors into simulation results.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-11-09 18:44:49 UTC | Type: | Feature Request |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Frank Danapfel
2019-03-07 12:53:22 UTC
Moving to RHEL 8 because RHEL 7.7 is the last RHEL 7 feature release. Related with respect to the user interface, Bug 1330774 covers adding a pcs interface to crm_simulate. (In reply to Ken Gaillot from comment #2) > Moving to RHEL 8 because RHEL 7.7 is the last RHEL 7 feature release. > > Related with respect to the user interface, Bug 1330774 covers adding a pcs > interface to crm_simulate. I don't think these two bugs are really related. The feature requested in Bug 1330774 seems to be to allow predictions about how the cluster will behave when certain pcs commands are called, without actually performing the resulting actions (sort of doing a 'dry-run'). Whereas the intention of this bug is more to improve the capabilities to analyse cluster events that happened in the past (for example to do root cause analysis in support cases). I just realized that you may not be aware you can do:
pcs -f <cib-file> status --full
to get extended status info from a file. Is that sufficient for what you're interested in, or is being able to incorporate that info in a simulation the main goal?
I'm aware of using the '-f' option with 'pcs status full', but the main goal is to be able to incorporate the additional information into a simulation. Ken, any chance that this might still get fixed in RHEL8.2? Definitely not 8.2. There's a good chance for 8.3 but higher-priority items might intervene From the initial comment, it looks like node attrs, migration summary, and failed actions are what's being asked for here. I could potentially add anything that crm_mon can output (though I haven't double checked that), though adding everything would be a lot of work that I won't be able to finish for 8.3. If those are the only things, I'll get to adding command line options for them. Is there anything else, though? (In reply to Chris Lumens from comment #11) > From the initial comment, it looks like node attrs, migration summary, and > failed actions are what's being asked for here. I could potentially add > anything that crm_mon can output (though I haven't double checked that), > though adding everything would be a lot of work that I won't be able to > finish for 8.3. If those are the only things, I'll get to adding command > line options for them. Is there anything else, though? As mentioned in my initial description for the bug I'd like to see the same information in crm_simulate output that is also provided by 'crm_mon -1Arf' on a live cluster. (In reply to Frank Danapfel from comment #12) > (In reply to Chris Lumens from comment #11) > > From the initial comment, it looks like node attrs, migration summary, and > > failed actions are what's being asked for here. I could potentially add > > anything that crm_mon can output (though I haven't double checked that), > > though adding everything would be a lot of work that I won't be able to > > finish for 8.3. If those are the only things, I'll get to adding command > > line options for them. Is there anything else, though? > > As mentioned in my initial description for the bug I'd like to see the same > information in crm_simulate output that is also provided by 'crm_mon -1Arf' > on a live cluster. Some possibilities for the user interface: * We could reuse the existing --verbose/-V option for this. Multiple -V's currently enable debug logging, but a single -V is currently used only to change the action labels on the dot graph if --save-dotfile/-D is used. I don't think combining those features would bother anyone. * We could add a single new "extended cluster information" option. * We could borrow the --include/--exclude idea from crm_mon, encompassing this, --show-scores, and --show-utilization (overkill unless existing crm_mon code can be reused without much extra effort). due to typo in date (2020 vs 2021) in BRE rule "RHEL SySc Dev ITM-to-Deadline (8.5)" was incorrectly run ITR strip. reset the BZ values back. I'm getting close to being able to implement this, so it's worth spending some time thinking about the interface. > * We could reuse the existing --verbose/-V option for this. Multiple -V's > currently enable debug logging, but a single -V is currently used only to > change the action labels on the dot graph if --save-dotfile/-D is used. I > don't think combining those features would bother anyone. > > * We could add a single new "extended cluster information" option. > > * We could borrow the --include/--exclude idea from crm_mon, encompassing > this, --show-scores, and --show-utilization (overkill unless existing > crm_mon code can be reused without much extra effort). I think it might be possible to make the include/exclude code from crm_mon more generic so it could be shared among all command line tools. The most difficult stuff appears to be using mon_output_format_t for figuring out the default includes and the ban-related stuff in apply_include. But, maybe this could be handled by making the sections type more like how glib command line stuff does - flags for whether the action is a function call or setting a value, etc. That might be vastly overthinking it, but it would also be the most flexible approach and would likely be more useful elsewhere later. On the other hand, if we think people aren't going to want more than this extra crm_simulate information, we could get away with introducing a new option. I just don't want to add an option we have to support for years, and then later get more requests for controlling the output elsewhere. I don't especially like just reusing -V. (In reply to Chris Lumens from comment #24) > I'm getting close to being able to implement this, so it's worth spending > some time thinking about the interface. > > > * We could reuse the existing --verbose/-V option for this. Multiple -V's > > currently enable debug logging, but a single -V is currently used only to > > change the action labels on the dot graph if --save-dotfile/-D is used. I > > don't think combining those features would bother anyone. > > > > * We could add a single new "extended cluster information" option. > > > > * We could borrow the --include/--exclude idea from crm_mon, encompassing > > this, --show-scores, and --show-utilization (overkill unless existing > > crm_mon code can be reused without much extra effort). > > I think it might be possible to make the include/exclude code from crm_mon > more generic so it could be shared among all command line tools. The most > difficult stuff appears to be using mon_output_format_t for figuring out the > default includes and the ban-related stuff in apply_include. But, maybe > this could be handled by making the sections type more like how glib command > line stuff does - flags for whether the action is a function call or setting > a value, etc. That might be vastly overthinking it, but it would also be > the most flexible approach and would likely be more useful elsewhere later. I don't see any other tools that would really benefit from it, though a few could be shoehorned into that model. I think it would just be crm_mon and crm_simulate. > On the other hand, if we think people aren't going to want more than this > extra crm_simulate information, we could get away with introducing a new > option. I just don't want to add an option we have to support for years, > and then later get more requests for controlling the output elsewhere. Good question. It would only make sense to show things that might affect the simulation, so I would think that dc, stack, times, summary, failures, fencing history, and operations would never be needed. The current display is effectively nodes and resources (including inactive). That leaves attributes, bans, fail counts, options, and tickets as maybes. Most cluster options can affect the simulation, so I could even imagine users wanting to see more options than "options" currently shows in crm_mon, but we don't have to go that far. We already have --show-utilization and --show-scores that are in line with the idea. We could just add --show-attributes and --show-failcounts for what's requested here, and if more is desired in the future we just add more --show-* options. Or we borrow --include with just nodes,resources,utilization,scores,attributes,failcounts for now and add more later if desired. Either way we can expand pretty easily. I'm fine with either approach. I just noticed that crm_mon -A doesn't show utilization attributes, so there's no way to show those there currently. (Scores doesn't make sense for crm_mon.) > I don't especially like just reusing -V. That makes sense since different users might want different combinations of output. Fix merged upstream - https://github.com/ClusterLabs/pacemaker/pull/2335 Pacemaker's `crm_simulate` command-line tool now accept a `--show-attrs` option to display node attributes in simulation output, and `--show-failcounts` to display resource fail counts. > [root@virt-539 ~]# rpm -q pacemaker > pacemaker-2.1.0-2.el8.x86_64 Check new options in man/help. > [root@virt-539 ~]# man crm_simulate | grep show-attrs -A4 > -A, --show-attrs > Show node attributes > -c, --show-failcounts > Show resource fail counts > [root@virt-539 ~]# crm_simulate --help-operations | grep show-attrs -A1 > -A, --show-attrs Show node attributes > -c, --show-failcounts Show resource fail counts Have a cluster with resources and attributes: > [root@virt-539 ~]# pcs status --full > Cluster name: STSRHTS20356 > Cluster Summary: > * Stack: corosync > * Current DC: virt-548 (2) (version 2.1.0-2.el8-7c3f660707) - partition with quorum > * Last updated: Wed Jun 16 13:57:46 2021 > * Last change: Wed Jun 16 13:57:36 2021 by root via cibadmin on virt-539 > * 2 nodes configured > * 6 resource instances configured > Node List: > * Online: [ virt-539 (1) virt-548 (2) ] > Full List of Resources: > * fence-virt-539 (stonith:fence_xvm): Started virt-539 > * fence-virt-548 (stonith:fence_xvm): Started virt-548 > * Resource Group: dummy-group: > * dummy1 (ocf::pacemaker:Dummy): Started virt-548 > * dummy2 (ocf::pacemaker:Dummy): Started virt-548 > * Clone Set: dummy-clone [dummy]: > * dummy (ocf::pacemaker:Dummy): Started virt-539 > * dummy (ocf::pacemaker:Dummy): Started virt-548 > Node Attributes: > * Node: virt-539 (1): > * location : office > * order : primary > * shortname : node1 > * Node: virt-548 (2): > * location : office > * order : secondary > * shortname : node2 > Migration Summary: > Tickets: > PCSD Status: > virt-539: Online > virt-548: Online > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Fail resource and check cluster status: > [root@virt-539 ~]# crm_resource --fail --resource dummy1 --node virt-548 > [root@virt-539 ~]# crm_mon -1Arf > Cluster Summary: > * Stack: corosync > * Current DC: virt-548 (version 2.1.0-2.el8-7c3f660707) - partition with quorum > * Last updated: Wed Jun 16 14:23:24 2021 > * Last change: Wed Jun 16 13:57:36 2021 by root via cibadmin on virt-539 > * 2 nodes configured > * 6 resource instances configured > Node List: > * Online: [ virt-539 virt-548 ] > Full List of Resources: > * fence-virt-539 (stonith:fence_xvm): Started virt-539 > * fence-virt-548 (stonith:fence_xvm): Started virt-548 > * Resource Group: dummy-group: > * dummy1 (ocf::pacemaker:Dummy): Started virt-548 > * dummy2 (ocf::pacemaker:Dummy): Started virt-548 > * Clone Set: dummy-clone [dummy]: > * Started: [ virt-539 virt-548 ] > Node Attributes: > * Node: virt-539: > * location : office > * order : primary > * shortname : node1 > * Node: virt-548: > * location : office > * order : secondary > * shortname : node2 > Migration Summary: > * Node: virt-548: > * dummy1: migration-threshold=1000000 fail-count=1 last-failure='Wed Jun 16 14:23:09 2021' > Failed Resource Actions: > * dummy1_asyncmon_0 on virt-548 'error' (1): call=27, status='complete', exitreason='Simulated failure', last-rc-change='2021-06-16 14:23:09 +02:00', queued=0ms, exec=0ms Save CIB file: > [root@virt-539 ~]# pcs cluster cib > cib-copy.xml Run crm_simulate on CIB file: > [root@virt-539 ~]# crm_simulate -x cib-copy.xml > Current cluster status: > * Node List: > * Online: [ virt-539 virt-548 ] > * Full List of Resources: > * fence-virt-539 (stonith:fence_xvm): Started virt-539 > * fence-virt-548 (stonith:fence_xvm): Started virt-548 > * Resource Group: dummy-group: > * dummy1 (ocf::pacemaker:Dummy): Started virt-548 > * dummy2 (ocf::pacemaker:Dummy): Started virt-548 > * Clone Set: dummy-clone [dummy]: > * Started: [ virt-539 virt-548 ] Run crm_simulate on CIB file with the new options: > [root@virt-539 ~]# crm_simulate -x cib-copy.xml --show-attrs --show-failcounts > Current cluster status: > * Node List: > * Online: [ virt-539 virt-548 ] > * Full List of Resources: > * fence-virt-539 (stonith:fence_xvm): Started virt-539 > * fence-virt-548 (stonith:fence_xvm): Started virt-548 > * Resource Group: dummy-group: > * dummy1 (ocf::pacemaker:Dummy): Started virt-548 > * dummy2 (ocf::pacemaker:Dummy): Started virt-548 > * Clone Set: dummy-clone [dummy]: > * Started: [ virt-539 virt-548 ] > * Node Attributes: > * Node: virt-539: > * location : office > * order : primary > * shortname : node1 > * Node: virt-548: > * location : office > * order : secondary > * shortname : node2 > * Failed Resource Actions: > * dummy1_asyncmon_0 on virt-548 'error' (1): call=27, status='complete', exitreason='Simulated failure', last-rc-change='2021-06-16 14:23:09 +02:00', queued=0ms, exec=0ms Node Attributes and Failed Resource Actions are displayed, but Migration Summary is missing. I tested those new options and found out that "Migration Summary" section is not displayed when running `crm_simulate -x <cib-file> --show-failcounts --show-attrs `. Please see comment#31 for details. Could you please attach the CIB file you're using for testing to this bug report? Thanks! I've made a new PR that adds this functionality. See https://github.com/ClusterLabs/pacemaker/pull/2416. I think I previously assumed the failed-action-list message would cover the needs for this bug, but obviously that is incorrect. Additional fixes merged upstream > [root@virt-547 ~]# rpm -q pacemaker > pacemaker-2.1.0-3.el8.x86_64 > [root@virt-547 ~]# man crm_simulate | grep show-attrs -A4 > -A, --show-attrs > Show node attributes > -c, --show-failcounts > Show resource fail counts > [root@virt-547 ~]# crm_simulate --help-operations | grep show-attrs -A1 > -A, --show-attrs Show node attributes > -c, --show-failcounts Show resource fail counts Have a cluster with resources and attributes: > [root@virt-547 ~]# pcs status --full > Cluster name: STSRHTS15914 > Cluster Summary: > * Stack: corosync > * Current DC: virt-548 (2) (version 2.1.0-3.el8-7c3f660707) - partition with quorum > * Last updated: Mon Jul 12 17:29:12 2021 > * Last change: Mon Jul 12 17:28:21 2021 by root via crm_attribute on virt-547 > * 2 nodes configured > * 6 resource instances configured > Node List: > * Online: [ virt-547 (1) virt-548 (2) ] > Full List of Resources: > * fence-virt-547 (stonith:fence_xvm): Started virt-547 > * fence-virt-548 (stonith:fence_xvm): Started virt-548 > * Clone Set: stateful-clone [stateful] (promotable): > * stateful (ocf::pacemaker:Stateful): Master virt-548 > * stateful (ocf::pacemaker:Stateful): Slave virt-547 > * Resource Group: dummy-group: > * dummy1 (ocf::pacemaker:Dummy): Started virt-547 > * dummy2 (ocf::pacemaker:Dummy): Started virt-547 > Node Attributes: > * Node: virt-547 (1): > * location : office > * master-stateful : 10 > * order : secondary > * Node: virt-548 (2): > * location : office > * master-stateful : 10 > * order : primary > Migration Summary: > Tickets: > PCSD Status: > virt-547: Online > virt-548: Online > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Fail resource and check cluster status: > [root@virt-547 ~]# crm_resource --fail --resource stateful --node virt-547 > Waiting for 1 reply from the controller > ... got reply (done) > [root@virt-547 ~]# crm_mon -1Arf > Cluster Summary: > * Stack: corosync > * Current DC: virt-548 (version 2.1.0-3.el8-7c3f660707) - partition with quorum > * Last updated: Mon Jul 12 17:30:20 2021 > * Last change: Mon Jul 12 17:28:21 2021 by root via crm_attribute on virt-547 > * 2 nodes configured > * 6 resource instances configured > Node List: > * Online: [ virt-547 virt-548 ] > Full List of Resources: > * fence-virt-547 (stonith:fence_xvm): Started virt-547 > * fence-virt-548 (stonith:fence_xvm): Started virt-548 > * Clone Set: stateful-clone [stateful] (promotable): > * Masters: [ virt-548 ] > * Slaves: [ virt-547 ] > * Resource Group: dummy-group: > * dummy1 (ocf::pacemaker:Dummy): Started virt-547 > * dummy2 (ocf::pacemaker:Dummy): Started virt-547 > Node Attributes: > * Node: virt-547: > * location : office > * master-stateful : 5 > * order : secondary > * Node: virt-548: > * location : office > * master-stateful : 10 > * order : primary > Migration Summary: > * Node: virt-547: > * stateful: migration-threshold=1000000 fail-count=1 last-failure='Mon Jul 12 17:30:11 2021' > Failed Resource Actions: > * stateful_asyncmon_0 on virt-547 'error' (1): call=49, status='complete', exitreason='Simulated failure', last-rc-change='2021-07-12 17:30:11 +02:00', queued=0ms, exec=0ms Save CIB file: > [root@virt-547 ~]# pcs cluster cib > cib-copy.xml Run crm_simulate on the saved CIB file: > [root@virt-547 ~]# crm_simulate -x cib-copy.xml > Current cluster status: > * Node List: > * Online: [ virt-547 virt-548 ] > * Full List of Resources: > * fence-virt-547 (stonith:fence_xvm): Started virt-547 > * fence-virt-548 (stonith:fence_xvm): Started virt-548 > * Clone Set: stateful-clone [stateful] (promotable): > * Masters: [ virt-548 ] > * Slaves: [ virt-547 ] > * Resource Group: dummy-group: > * dummy1 (ocf::pacemaker:Dummy): Started virt-547 > * dummy2 (ocf::pacemaker:Dummy): Started virt-547 Run crm_simulate with the new options on the saved CIB file: > [root@virt-547 ~]# crm_simulate -x cib-copy.xml --show-attrs --show-failcounts > Current cluster status: > * Node List: > * Online: [ virt-547 virt-548 ] > * Full List of Resources: > * fence-virt-547 (stonith:fence_xvm): Started virt-547 > * fence-virt-548 (stonith:fence_xvm): Started virt-548 > * Clone Set: stateful-clone [stateful] (promotable): > * Masters: [ virt-548 ] > * Slaves: [ virt-547 ] > * Resource Group: dummy-group: > * dummy1 (ocf::pacemaker:Dummy): Started virt-547 > * dummy2 (ocf::pacemaker:Dummy): Started virt-547 > * Node Attributes: > * Node: virt-547: > * location : office > * master-stateful : 5 > * order : secondary > * Node: virt-548: > * location : office > * master-stateful : 10 > * order : primary > * Migration Summary: > * Node: virt-547: > * stateful: migration-threshold=1000000 fail-count=1 last-failure='Mon Jul 12 17:30:11 2021' > * Failed Resource Actions: > * stateful_asyncmon_0 on virt-547 'error' (1): call=49, status='complete', exitreason='Simulated failure', last-rc-change='2021-07-12 17:30:11 +02:00', queued=0ms, exec=0ms Output of `crm_simulate -x cib-copy.xml --show-attrs --show-failcounts` now shows also Node Attributes, Migration Summary and Failed Resource Actions. marking verified in pacemaker-2.1.0-3.el8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:4267 |