Bug 1983197
| Summary: | Pacemaker wrongly schedules probes on pending node | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Ken Gaillot <kgaillot> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | low | Docs Contact: | |
| Priority: | high | ||
| Version: | 9.0 | CC: | cherrylegler, cluster-maint, cluster-qe, msmazova, nwahl |
| Target Milestone: | beta | Keywords: | Triaged |
| Target Release: | 9.0 Beta | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-2.1.0-6.el9 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: If a resource is unmanaged, Pacemaker sets actions for it to optional and skips a check that makes actions on pending nodes unrunnable.
Consequence: Probes could wrongly be scheduled on pending nodes, and would time out.
Fix: Make actions for pending nodes unrunnable before checking whether a resource is unmanaged.
Result: Probes are scheduled after the pending node joins, and complete successfully.
|
Story Points: | --- |
| Clone Of: | 1982453 | Environment: | |
| Last Closed: | 2021-12-07 21:57:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | 2.1.1 |
| Embargoed: | |||
| Bug Depends On: | 1982453 | ||
| Bug Blocks: | |||
|
Description
Ken Gaillot
2021-07-16 19:45:34 UTC
Fixed upstream as of master branch commit b0347f7 before fix: ------------ Please see bz1982453#c10 after fix: ------------ > [root@virt-542 16:50:41 ~]# rpm -q pacemaker > pacemaker-2.1.0-11.el9.x86_64 Setup 3 node cluster: > [root@virt-542 16:50:46 ~]# pcs status > Cluster name: STSRHTS15358 > Cluster Summary: > * Stack: corosync > * Current DC: virt-543 (version 2.1.0-11.el9-7c3f660707) - partition with quorum > * Last updated: Thu Sep 2 16:50:56 2021 > * Last change: Thu Sep 2 16:44:18 2021 by root via cibadmin on virt-542 > * 3 nodes configured > * 3 resource instances configured > Node List: > * Online: [ virt-542 virt-543 virt-558 ] > Full List of Resources: > * fence-virt-542 (stonith:fence_xvm): Started virt-542 > * fence-virt-543 (stonith:fence_xvm): Started virt-543 > * fence-virt-558 (stonith:fence_xvm): Started virt-558 > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Enable maintenance mode: > [root@virt-542 16:50:56 ~]# pcs property set maintenance-mode=true Stop the cluster on node "virt-558": > [root@virt-542 16:51:23 ~]# pcs cluster stop virt-558 > virt-558: Stopping Cluster (pacemaker)... > virt-558: Stopping Cluster (corosync)... > [root@virt-542 16:51:44 ~]# pcs status > Cluster name: STSRHTS15358 > Cluster Summary: > * Stack: corosync > * Current DC: virt-543 (version 2.1.0-11.el9-7c3f660707) - partition with quorum > * Last updated: Thu Sep 2 16:51:50 2021 > * Last change: Thu Sep 2 16:51:14 2021 by root via cibadmin on virt-542 > * 3 nodes configured > * 3 resource instances configured > *** Resource management is DISABLED *** > The cluster will not attempt to start, stop or recover services > Node List: > * Online: [ virt-542 virt-543 ] > * OFFLINE: [ virt-558 ] > Full List of Resources: > * fence-virt-542 (stonith:fence_xvm): Started virt-542 (unmanaged) > * fence-virt-543 (stonith:fence_xvm): Started virt-543 (unmanaged) > * fence-virt-558 (stonith:fence_xvm): Started virt-558 (unmanaged) > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Start cluster on the node "virt-558" again and add a new resource "dummy1" while the node "virt-558" is joining the cluster: > [root@virt-542 16:51:50 ~]# pcs cluster start virt-558 &>/dev/null & sleep 2.3; pcs resource create dummy1 ocf:pacemaker:Dummy > [1] 83292 > [1]+ Done pcs cluster start virt-558 &> /dev/null Maintenance mode is still enabled, new resource "dummy1" remains stopped, but doesn't fail: > [root@virt-542 16:52:14 ~]# pcs status > Cluster name: STSRHTS15358 > Cluster Summary: > * Stack: corosync > * Current DC: virt-543 (version 2.1.0-11.el9-7c3f660707) - partition with quorum > * Last updated: Thu Sep 2 16:52:18 2021 > * Last change: Thu Sep 2 16:52:14 2021 by root via cibadmin on virt-542 > * 3 nodes configured > * 4 resource instances configured > *** Resource management is DISABLED *** > The cluster will not attempt to start, stop or recover services > Node List: > * Online: [ virt-542 virt-543 virt-558 ] > Full List of Resources: > * fence-virt-542 (stonith:fence_xvm): Started virt-542 (unmanaged) > * fence-virt-543 (stonith:fence_xvm): Started virt-543 (unmanaged) > * fence-virt-558 (stonith:fence_xvm): Stopped (unmanaged) > * dummy1 (ocf:pacemaker:Dummy): Stopped (unmanaged) > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Pacemaker.log and messages logged probe request and notice "Result of probe operation for dummy1 on virt-522: not running" and did not log any "failed probe" warning: Log excerpts: > [root@virt-543 ~]# tail -f /var/log/pacemaker/pacemaker.log | grep probe > Sep 02 16:52:14.493 virt-543.cluster-qe.lab.eng.brq.redhat.com pacemaker-controld [221480] (do_lrm_rsc_op) notice: Requesting local execution of probe operation for dummy1 on virt-543 | transition_key=2:13:7:84a8324c-8d97-4676-b3b7-ce4f65160b8f op_key=dummy1_monitor_0 > Sep 02 16:52:14.528 virt-543.cluster-qe.lab.eng.brq.redhat.com pacemaker-controld [221480] (process_lrm_event) notice: Result of probe operation for dummy1 on virt-543: not running | rc=7 call=25 key=dummy1_monitor_0 confirmed=true cib-update=125 > [root@virt-543 ~]# tail -f /var/log/messages | grep probe > Sep 2 16:52:14 virt-543 pacemaker-controld[221480]: notice: Requesting local execution of probe operation for dummy1 on virt-543 > Sep 2 16:52:14 virt-543 pacemaker-controld[221480]: notice: Result of probe operation for dummy1 on virt-543: not running Since there was no success in reproducing the issue in RHEL 8.4. (for more details see bz1982453#c10), marking verified as SanityOnly in pacemaker-2.1.0-11.el9. |