RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2065812 - Show node health states in crm_mon
Summary: Show node health states in crm_mon
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pacemaker
Version: 8.6
Hardware: All
OS: All
medium
medium
Target Milestone: rc
: 8.7
Assignee: Ken Gaillot
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-18 19:22 UTC by Ken Gaillot
Modified: 2022-11-23 16:48 UTC (History)
5 users (show)

Fixed In Version: pacemaker-2.1.3-1.el8
Doc Type: Enhancement
Doc Text:
Feature: If a cluster has a node health strategy configured, then nodes with at least one health attribute in "yellow" or "red" status will be indicated as such in pcs status output. Reason: Previously, there was no easy way to tell why resources were not running on a node with degraded health. Result: Degraded node health can be seen at a glance in pcs status output.
Clone Of:
Environment:
Last Closed: 2022-11-08 09:42:25 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-116155 0 None None None 2022-03-18 19:31:18 UTC
Red Hat Knowledge Base (Solution) 6987357 0 None None None 2022-11-23 16:48:35 UTC
Red Hat Product Errata RHBA-2022:7573 0 None None None 2022-11-08 09:42:42 UTC

Description Ken Gaillot 2022-03-18 19:22:34 UTC
Description of problem: Pacemaker offers a node health monitoring feature that can be configured to ban resources from a node if its health is degraded. However currently crm_mon's default display does not give any indication of degraded health. Users can only determine that by showing all node attributes.


Steps to Reproduce:
1. Configure and start a cluster.
2. Enable node health monitoring, for example: pcs property set node-health-strategy="migrate-on-red"
3. Simulate a node health monitor detecting a degraded condition by setting a node attribute with a name starting with '#health-' and a value of 'red', on any node.
4. pcs status

Actual results: Any resources will be moved off the node, and any clone instances there will be stopped, but there is no indication why.


Expected results: The "Node List" indicates the node's condition, maybe something like "Online but node health is red:".


Additional info: The node health feature is highly configurable (see https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#tracking-node-health ) and an important design choice will be what degree of detail we should show. The actual node health score is an integer, and "red", "yellow", and "green" are just convenient aliases for particular values.

My first idea is that we should say "green" if the score is below the yellow value, otherwise "yellow" if the score is below the red value, and "red" otherwise. That could be considered inaccurate if the score is not exactly those values, but I think it's what users would intuitively expect.

Comment 1 Ken Gaillot 2022-03-18 19:48:19 UTC
The message could be more specific, like "Online but node health score is $INTEGER (red):"

Comment 2 Ken Gaillot 2022-04-19 19:30:50 UTC
Feature merged upstream as of commit 398d8aa

With the final design, nodes with degraded health will be shown in pcs status like:

  * Node List:
    * Node node1: online (health is RED)

The indicator will say RED if at least one health attribute is red, otherwise YELLOW if at least one health attribute is yellow, otherwise no health status will be shown.

Comment 9 jrehova 2022-07-01 13:51:05 UTC
* 2-node cluster
* dummy fence agent installed on both nodes as /usr/sbin/fence_bz1978010: https://github.com/ClusterLabs/fence-agents/blob/master/agents/dummy/fence_dummy.py
* node-health-strategy: only-green

Testing
=========

> [root@virt-557 ~]# rpm -q pacemaker
> pacemaker-2.1.3-2.el8.x86_64

Pcs status of cluster:

> [root@virt-557 15:24:23 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:24:31 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Online: [ virt-557 virt-558 ]
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Started virt-557
>   * fence-virt-558	(stonith:fence_xvm):	 Started virt-558
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Started virt-557
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

Test of #health-cpu attribute, possibilities:

> [root@virt-557 15:24:32 ~]# attrd_updater --name "#health-cpu" --update "red" --node "virt-557"
> [root@virt-557 15:25:45 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:25:54 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is RED)
>   * Online: [ virt-558 ]
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Started virt-558
>   * fence-virt-558	(stonith:fence_xvm):	 Started virt-558
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Started virt-558
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:25:54 ~]# attrd_updater --name "#health-cpu" --update "red" --node "virt-558"
> [root@virt-557 15:26:23 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:26:30 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is RED)
>   * Node virt-558: online (health is RED)
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Stopped
>   * fence-virt-558	(stonith:fence_xvm):	 Stopped
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Stopped
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:27:34 ~]# attrd_updater --name "#health-cpu" --update "green" --node "virt-557"
> [root@virt-557 15:27:52 ~]# attrd_updater --name "#health-cpu" --update "yellow" --node "virt-558"
> [root@virt-557 15:28:20 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:28:35 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-558: online (health is YELLOW)
>   * Online: [ virt-557 ]
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Started virt-557
>   * fence-virt-558	(stonith:fence_xvm):	 Started virt-557
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Started virt-557
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:28:35 ~]# attrd_updater --name "#health-cpu" --update "green" --node "virt-558"
> [root@virt-557 15:29:07 ~]# attrd_updater --name "#health-cpu" --update "yellow" --node "virt-557"
> [root@virt-557 15:29:18 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:29:23 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is YELLOW)
>   * Online: [ virt-558 ]
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Started virt-558
>   * fence-virt-558	(stonith:fence_xvm):	 Started virt-558
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Started virt-558
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

Test of #health-iowait attribute, possibilities:

> [root@virt-557 15:29:41 ~]# attrd_updater --name "#health-iowait" --update "red" --node "virt-557"
> [root@virt-557 15:31:18 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:31:22 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is RED)
>   * Online: [ virt-558 ]
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Started virt-558
>   * fence-virt-558	(stonith:fence_xvm):	 Started virt-558
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Started virt-558
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:31:23 ~]# attrd_updater --name "#health-iowait" --update "red" --node "virt-558"
> [root@virt-557 15:31:35 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:31:37 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is RED)
>   * Node virt-558: online (health is RED)
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Stopped
>   * fence-virt-558	(stonith:fence_xvm):	 Stopped
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Stopped
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:31:51 ~]# attrd_updater --name "#health-iowait" --update "green" --node "virt-558"
> [root@virt-557 15:32:00 ~]# attrd_updater --name "#health-iowait" --update "yellow" --node "virt-557"
> [root@virt-557 15:32:19 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:32:24 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is YELLOW)
>   * Online: [ virt-558 ]
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Started virt-558
>   * fence-virt-558	(stonith:fence_xvm):	 Started virt-558
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Started virt-558
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:32:25 ~]# attrd_updater --name "#health-iowait" --update "yellow" --node "virt-558"
> [root@virt-557 15:32:43 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:32:47 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is YELLOW)
>   * Node virt-558: online (health is YELLOW)
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Stopped
>   * fence-virt-558	(stonith:fence_xvm):	 Stopped
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Stopped
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

Test of #health-healthsmart attribute, possibilities:

> [root@virt-557 15:33:38 ~]# attrd_updater --name "#health-healthsmart" --update "red" --node "virt-557"
> [root@virt-557 15:34:41 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:34:48 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is RED)
>   * Online: [ virt-558 ]
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Started virt-558
>   * fence-virt-558	(stonith:fence_xvm):	 Started virt-558
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Started virt-558
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:34:48 ~]# attrd_updater --name "#health-healthsmart" --update "red" --node "virt-558"
> [root@virt-557 15:34:55 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:34:57 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is RED)
>   * Node virt-558: online (health is RED)
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Stopped
>   * fence-virt-558	(stonith:fence_xvm):	 Stopped
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Stopped
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:35:21 ~]# attrd_updater --name "#health-healthsmart" --update "yellow" --node "virt-557"
> [root@virt-557 15:35:40 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:35:45 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is YELLOW)
>   * Online: [ virt-558 ]
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Started virt-558
>   * fence-virt-558	(stonith:fence_xvm):	 Started virt-558
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Started virt-558
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:35:45 ~]# attrd_updater --name "#health-healthsmart" --update "yellow" --node "virt-558"
> [root@virt-557 15:35:52 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:35:54 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is YELLOW)
>   * Node virt-558: online (health is YELLOW)
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Stopped
>   * fence-virt-558	(stonith:fence_xvm):	 Stopped
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Stopped
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

> [root@virt-557 15:36:20 ~]# attrd_updater --name "#health-healthsmart" --update "red" --node "virt-557"
> [root@virt-557 15:55:10 ~]# attrd_updater --name "#health-healthsmart" --update "yellow" --node "virt-558"
> [root@virt-557 15:55:21 ~]# pcs status
> Cluster name: STSRHTS20634
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-558 (version 2.1.3-2.el8-da2fd79c89) - partition with quorum
>   * Last updated: Wed Jun 29 15:55:28 2022
>   * Last change:  Wed Jun 29 15:17:05 2022 by root via cibadmin on virt-557
>   * 2 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Node virt-557: online (health is RED)
>   * Node virt-558: online (health is YELLOW)
> 
> Full List of Resources:
>   * fence-virt-557	(stonith:fence_xvm):	 Stopped
>   * fence-virt-558	(stonith:fence_xvm):	 Stopped
>   * resource_dummy	(ocf::pacemaker:Dummy):	 Stopped
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled

Result: The indicator said RED if at least one health attribute is red, otherwise YELLOW if at least one health attribute is yellow, otherwise no health status is shown. Cluster is acting well.

Comment 12 errata-xmlrpc 2022-11-08 09:42:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7573


Note You need to log in before you can comment on or make changes to this bug.