Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1983197

Summary: Pacemaker wrongly schedules probes on pending node
Product: Red Hat Enterprise Linux 9 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED CURRENTRELEASE QA Contact: cluster-qe <cluster-qe>
Severity: low Docs Contact:
Priority: high    
Version: 9.0CC: cherrylegler, cluster-maint, cluster-qe, msmazova, nwahl
Target Milestone: betaKeywords: Triaged
Target Release: 9.0 BetaFlags: pm-rhel: mirror+
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: pacemaker-2.1.0-6.el9 Doc Type: Bug Fix
Doc Text:
Cause: If a resource is unmanaged, Pacemaker sets actions for it to optional and skips a check that makes actions on pending nodes unrunnable. Consequence: Probes could wrongly be scheduled on pending nodes, and would time out. Fix: Make actions for pending nodes unrunnable before checking whether a resource is unmanaged. Result: Probes are scheduled after the pending node joins, and complete successfully.
Story Points: ---
Clone Of: 1982453 Environment:
Last Closed: 2021-12-07 21:57:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 2.1.1
Embargoed:
Bug Depends On: 1982453    
Bug Blocks:    

Description Ken Gaillot 2021-07-16 19:45:34 UTC
+++ This bug was initially created as a clone of Bug #1982453 +++

Description of problem: A cluster node can't run resources until it has completed the controller join process; until that time it is considered "pending". However, when a resource is unmanaged, Pacemaker can wrongly schedule probes for it on pending nodes.

Version-Release number of selected component (if applicable): any


How reproducible: Timing sensitive


Steps to Reproduce:
1. Configure and start a cluster of at least three nodes, then stop the cluster on one of the nodes.
2. Enable maintenance mode (pcs property set maintenance-mode=true), which will unmanage all resources.
3. Start the stopped node, and add a new resource to the configuration after the node has joined the Corosync membership but before it has completed the Pacemaker controller join sequence (i.e. while it is "pending").


Actual results: If the DC schedules probes for the new resource while the joining node is still pending, the probes will time out (after the probe timeout plus the value of cluster-delay).


Expected results: The DC does not schedule probes on the new node until it has finished joining, and the probes complete successfully.


Additional info: It might be possible to prolong the "pending" time by nice'ing pacmaker-controld on the joining node, or briefly SIGSTOP'ing it.

--- Additional comment from Reid Wahl on 2021-07-16 01:33:19 UTC ---

A few small clarification questions:


> when a resource is unmanaged, Pacemaker can wrongly schedule probes for it on pending nodes.

Is the issue (probes scheduled before the controller join is complete) reproducible only when a resource is unmanaged or the cluster is in maintenance mode? I want to confirm that the scope is limited to these situations, since probes do run when a node joins a cluster (without maintenance mode or unmanaged resources).

-----

> Configure and start a cluster of at least three nodes
> ...
> add a new resource to the configuration

I presume this is for easier/more reliable reproduction. As a record, the initial report occurred when the second node of a two-node cluster joined, with no new resources.

-----

(from email)
> I don't think there are any serious consequences, but if I'm
> overlooking something, let me know and I can raise it.

All I can think of is: If the node is still in pending state when a stop operation is scheduled in response to the failed probe, then the stop operation will also be lost and time out, causing fencing.

But if so, then that's a corner case within a corner case, the node is clearly having problems, and fencing might not be a bad thing.

--- Additional comment from Ken Gaillot on 2021-07-16 15:14:20 UTC ---

(In reply to Reid Wahl from comment #1)
> A few small clarification questions:
> 
> 
> > when a resource is unmanaged, Pacemaker can wrongly schedule probes for it on pending nodes.
> 
> Is the issue (probes scheduled before the controller join is complete)
> reproducible only when a resource is unmanaged or the cluster is in
> maintenance mode? I want to confirm that the scope is limited to these
> situations, since probes do run when a node joins a cluster (without
> maintenance mode or unmanaged resources).

Correct, it can only occur when the resource is unmanaged (whether directly or indirectly via maintenance mode).


> -----
> 
> > Configure and start a cluster of at least three nodes
> > ...
> > add a new resource to the configuration
> 
> I presume this is for easier/more reliable reproduction. As a record, the
> initial report occurred when the second node of a two-node cluster joined,
> with no new resources.
> 
> -----

That's what I was thinking, since it's timing-sensitive, but I'm not sure it will actually help vs. adding the resource beforehand. If we can prolong the joining period, the issue should occur regardless of whether the resource is new or pre-existing.


> (from email)
> > I don't think there are any serious consequences, but if I'm
> > overlooking something, let me know and I can raise it.
> 
> All I can think of is: If the node is still in pending state when a stop
> operation is scheduled in response to the failed probe, then the stop
> operation will also be lost and time out, causing fencing.
> 
> But if so, then that's a corner case within a corner case, the node is
> clearly having problems, and fencing might not be a bad thing.

Right, the probe timeout takes so long, that it would really unusual for a node to be pending that long. Of course, someone could tune the probe timeout and cluster-delay way down, but that would be highly unusual.

--- Additional comment from Ken Gaillot on 2021-07-16 19:32:24 UTC ---

Fixed upstream as of master branch commit b0347f7

Comment 1 Ken Gaillot 2021-07-16 20:05:33 UTC
Fixed upstream as of master branch commit b0347f7

Comment 8 Markéta Smazová 2021-09-02 17:02:05 UTC
before fix:
------------

Please see bz1982453#c10


after fix:
------------

>   [root@virt-542 16:50:41 ~]# rpm -q pacemaker
>   pacemaker-2.1.0-11.el9.x86_64


Setup 3 node cluster:

>   [root@virt-542 16:50:46 ~]# pcs status
>   Cluster name: STSRHTS15358
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-543 (version 2.1.0-11.el9-7c3f660707) - partition with quorum
>     * Last updated: Thu Sep  2 16:50:56 2021
>     * Last change:  Thu Sep  2 16:44:18 2021 by root via cibadmin on virt-542
>     * 3 nodes configured
>     * 3 resource instances configured

>   Node List:
>     * Online: [ virt-542 virt-543 virt-558 ]

>   Full List of Resources:
>     * fence-virt-542	(stonith:fence_xvm):	 Started virt-542
>     * fence-virt-543	(stonith:fence_xvm):	 Started virt-543
>     * fence-virt-558	(stonith:fence_xvm):	 Started virt-558

>   Daemon Status:
>     corosync: active/disabled
>     pacemaker: active/disabled
>     pcsd: active/enabled


Enable maintenance mode:

>   [root@virt-542 16:50:56 ~]# pcs property set maintenance-mode=true


Stop the cluster on node "virt-558":

>   [root@virt-542 16:51:23 ~]# pcs cluster stop virt-558
>   virt-558: Stopping Cluster (pacemaker)...
>   virt-558: Stopping Cluster (corosync)...


>   [root@virt-542 16:51:44 ~]# pcs status
>   Cluster name: STSRHTS15358
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-543 (version 2.1.0-11.el9-7c3f660707) - partition with quorum
>     * Last updated: Thu Sep  2 16:51:50 2021
>     * Last change:  Thu Sep  2 16:51:14 2021 by root via cibadmin on virt-542
>     * 3 nodes configured
>     * 3 resource instances configured

>                 *** Resource management is DISABLED ***
>     The cluster will not attempt to start, stop or recover services

>   Node List:
>     * Online: [ virt-542 virt-543 ]
>     * OFFLINE: [ virt-558 ]

>   Full List of Resources:
>     * fence-virt-542	(stonith:fence_xvm):	 Started virt-542 (unmanaged)
>     * fence-virt-543	(stonith:fence_xvm):	 Started virt-543 (unmanaged)
>     * fence-virt-558	(stonith:fence_xvm):	 Started virt-558 (unmanaged)

>   Daemon Status:
>     corosync: active/disabled
>     pacemaker: active/disabled
>     pcsd: active/enabled


Start cluster on the node "virt-558" again and add a new resource "dummy1" while the node "virt-558" is joining the cluster:

>   [root@virt-542 16:51:50 ~]# pcs cluster start virt-558 &>/dev/null & sleep 2.3; pcs resource create dummy1 ocf:pacemaker:Dummy
>   [1] 83292
>   [1]+  Done                    pcs cluster start virt-558 &> /dev/null


Maintenance mode is still enabled, new resource "dummy1" remains stopped, but doesn't fail:

>   [root@virt-542 16:52:14 ~]# pcs status
>   Cluster name: STSRHTS15358
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-543 (version 2.1.0-11.el9-7c3f660707) - partition with quorum
>     * Last updated: Thu Sep  2 16:52:18 2021
>     * Last change:  Thu Sep  2 16:52:14 2021 by root via cibadmin on virt-542
>     * 3 nodes configured
>     * 4 resource instances configured

>                 *** Resource management is DISABLED ***
>     The cluster will not attempt to start, stop or recover services

>   Node List:
>     * Online: [ virt-542 virt-543 virt-558 ]

>   Full List of Resources:
>     * fence-virt-542	(stonith:fence_xvm):	 Started virt-542 (unmanaged)
>     * fence-virt-543	(stonith:fence_xvm):	 Started virt-543 (unmanaged)
>     * fence-virt-558	(stonith:fence_xvm):	 Stopped (unmanaged)
>     * dummy1	(ocf:pacemaker:Dummy):	 Stopped (unmanaged)

>   Daemon Status:
>     corosync: active/disabled
>     pacemaker: active/disabled
>     pcsd: active/enabled


Pacemaker.log and messages logged probe request and notice "Result of probe operation for dummy1 on virt-522: not running" 
and did not log any "failed probe" warning:

Log excerpts:

>   [root@virt-543 ~]# tail -f /var/log/pacemaker/pacemaker.log | grep probe
>   Sep 02 16:52:14.493 virt-543.cluster-qe.lab.eng.brq.redhat.com pacemaker-controld  [221480] (do_lrm_rsc_op) 	notice: Requesting local execution of probe operation for dummy1 on virt-543 | transition_key=2:13:7:84a8324c-8d97-4676-b3b7-ce4f65160b8f op_key=dummy1_monitor_0
>   Sep 02 16:52:14.528 virt-543.cluster-qe.lab.eng.brq.redhat.com pacemaker-controld  [221480] (process_lrm_event) 	notice: Result of probe operation for dummy1 on virt-543: not running | rc=7 call=25 key=dummy1_monitor_0 confirmed=true cib-update=125

>   [root@virt-543 ~]# tail -f /var/log/messages | grep probe
>   Sep  2 16:52:14 virt-543 pacemaker-controld[221480]: notice: Requesting local execution of probe operation for dummy1 on virt-543
>   Sep  2 16:52:14 virt-543 pacemaker-controld[221480]: notice: Result of probe operation for dummy1 on virt-543: not running


Since there was no success in reproducing the issue in RHEL 8.4. (for more details see bz1982453#c10), marking verified as SanityOnly in pacemaker-2.1.0-11.el9.