Bug 1835717
Summary: | pacemaker never promotes a bundle until another transition unblocks it | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Michele Baldessari <michele> | |
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 8.2 | CC: | cfeist, cluster-maint, kgaillot, lmiccini, msmazova, pkomarov | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | 8.4 | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | pacemaker-2.0.5-1.el8 | Doc Type: | Bug Fix | |
Doc Text: |
Cause: When selecting promotable clone instances for promotion on guest nodes, Pacemaker considered whether the guest node itself could run resources, but not whether the guest resource creating it was runnable.
Consequence: An unrunnable guest could be chosen for promotion, unnecessarily leaving some instances unpromoted until the next natural transition.
Fix: Pacemaker now considers whether a guest node's guest resource is runnable when selecting nodes for promotion.
Result: All instances that can be promoted will be.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1935240 1935241 (view as bug list) | Environment: | ||
Last Closed: | 2021-05-18 15:26:41 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1885645 | |||
Bug Blocks: | 1935240, 1935241 |
Description
Michele Baldessari
2020-05-14 11:36:28 UTC
This is definitely a pacemaker scheduler bug Fixed upstream as of commit 8c9ee257 A workaround until the fix is available would be waiting a few seconds after the ban, then changing any configuration value (such as a dummy node attribute) to trigger a new transition. For testing purposes I use a bash alias that does attrd_updater -N "$(hostname)" -n trigger-transition -v "$(date)" (my test nodes are named the same as their hostname, you can use any node name) Verified , (undercloud) [stack@undercloud-0 ~]$ ansible controller -b -mshell -a'pcs cluster status'|grep version [WARNING]: Found both group and host with same name: undercloud * Current DC: controller-1 (version 2.0.5-2.el8-31aa4f5515) - partition with quorum * Current DC: controller-1 (version 2.0.5-2.el8-31aa4f5515) - partition with quorum * Current DC: controller-1 (version 2.0.5-2.el8-31aa4f5515) - partition with quorum * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]: * ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-2 * ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-0 * ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-1 [root@controller-2 ~]# pcs resource ban ovn-dbs-bundle controller-2 * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]: * ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Stopped controller-2 * ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-0 [..] * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]: * ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Stopped * ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Master controller-0 * ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:1782 |