Hide Forgot
Description of problem: Pacemaker has a node health feature that allows certain OCF resource agents to be used to set node health attributes. If the node becomes unhealthy, and the appropriate health settings have been configured, Pacemaker will move all resources away from the node. The health agent itself will be moved away from the node, so the cluster will never learn when the node becomes healthy again, and the relevant node health attributes must be manually cleared to allow the node to be used again. Version-Release number of selected component (if applicable): All How reproducible: Trivially Steps to Reproduce: 1. Create a cluster with at least two nodes, fencing, and a resource. 2. pcs property set node-health-strategy=migrate-on-red 3. Configure a health monitor using a cloned health agent (e.g. ocf:pacemaker:HealthCPU) 4. On one node, create the relevant condition to trigger the health monitor (e.g. run an infinite loop in a shell to max out the CPU) Actual results: All resources including the health monitor are banned from the node, and once the condition ends, the resources do not move back Expected results: All resources except the health monitor are banned from the node, and once the condition ends, the health monitor detects it and resources are allowed to move back
Feature merged upstream as of commit cc8ed479
* 2-node cluster * dummy fence agent installed on one of node as /usr/sbin/fence_bz1978010: https://github.com/ClusterLabs/fence-agents/blob/master/agents/dummy/fence_dummy.py * node-health-strategy=migrate-on-red * cloned health agent ocf:pacemaker:HealthCPU on both nodes Status of cluster: > [root@virt-537 ~]# pcs status > Cluster name: STSRHTS21465 > Cluster Summary: > * Stack: corosync > * Current DC: virt-538 (version 2.1.4-3.el8-dc6eb4362e) - partition with quorum > * Last updated: Mon Jul 11 15:39:23 2022 > * Last change: Mon Jul 11 15:39:17 2022 by root via cibadmin on virt-537 > * 2 nodes configured > * 5 resource instances configured > > Node List: > * Online: [ virt-537 virt-538 ] > > Full List of Resources: > * fence-virt-537 (stonith:fence_xvm): Started virt-537 > * fence-virt-538 (stonith:fence_xvm): Started virt-538 > * Clone Set: resource_cpu-clone [resource_cpu]: > * Started: [ virt-537 virt-538 ] > * resource_dummy (ocf::pacemaker:Dummy): Started virt-537 > > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Version of pacemaker: > [root@virt-537 ~]# rpm -q pacemaker > pacemaker-2.1.4-3.el8.x86_64 Setting health strategy of cluster and update resource: > [root@virt-537 ~]# pcs property set node-health-strategy="migrate-on-red" > [root@virt-537 ~]# pcs resource update resource_cpu meta allow-unhealthy-nodes=true Condition: > [root@virt-537 ~]# while true; do echo -n "test_cpu"; done > cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cputest_cpu Status of cluster and migration to node virt-538: > [root@virt-537 ~]# pcs status > Cluster name: STSRHTS21465 > Cluster Summary: > * Stack: corosync > * Current DC: virt-538 (version 2.1.4-3.el8-dc6eb4362e) - partition with quorum > * Last updated: Mon Jul 11 16:34:19 2022 > * Last change: Mon Jul 11 16:31:04 2022 by root via cibadmin on virt-537 > * 2 nodes configured > * 5 resource instances configured > > Node List: > * Node virt-537: online (health is RED) > * Online: [ virt-538 ] > > Full List of Resources: > * fence-virt-537 (stonith:fence_xvm): Started virt-538 > * fence-virt-538 (stonith:fence_xvm): Started virt-538 > * Clone Set: resource_cpu-clone [resource_cpu]: > * Started: [ virt-538 ] > * Stopped: [ virt-537 ] > * resource_dummy (ocf::pacemaker:Dummy): Started virt-538 > > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7573