Bug 1876173
| Summary: | A resource in a negatively colocated group can remain stopped if it hits its migration threshold | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Reid Wahl <nwahl> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | Steven J. Levine <slevine> |
| Priority: | high | ||
| Version: | 8.2 | CC: | cfeist, cluster-maint, jrehova, kgaillot, msmazova, slevine |
| Target Milestone: | rc | Keywords: | Reopened, Triaged |
| Target Release: | 8.9 | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-2.1.6-1.el8 | Doc Type: | Enhancement |
| Doc Text: |
.Pacemaker's scheduler now tries to satisfy all mandatory colocation constraints before trying to satisfy optional colocation constraints
Previously, colocation constraints were considered one by one regardless of whether they were mandatory or optional. This meant that certain resources could be unable to run even though a node assignment was possible. Pacemaker's scheduler now tries to satisfy all mandatory colocation constraints, including the implicit constraints between group members, before trying to satisfy optional colocation constraints. As a result, resources with a mix of optional and mandatory colocation constraints are now more likely to be able to run.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-11-14 15:32:34 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | 2.1.6 |
| Embargoed: | |||
I just realized that the cluster is prone to this issue even without migration-threshold, if a start operation fails during recovery. I removed the migration-threshold meta attribute and verified.
* Resource Group: dummya:
* dummya_1 (ocf::heartbeat:Dummy): Started node1
* dummya_2 (ocf::heartbeat:Dummy): Started node1
* Resource Group: dummyb:
* dummyb_1 (ocf::heartbeat:Dummy): Started node2
* dummyb_2 (ocf::heartbeat:Dummy): Stopped
Failed Resource Actions:
* dummyb_2_start_0 on node2 'error' (1): call=245, status='complete', exitreason='', last-rc-change='2020-09-05 20:20:12 -07:00', queued=0ms, exec=10ms
I found a configuration hack that seems to work. This approach uses a placeholder dummy variable in a technique similar to one that Ken proposed in a separate email thread (http://post-office.corp.redhat.com/archives/cluster-list/2020-May/msg00066.html), except this time to achieve negative colocation with particular behavior. As long as the placeholder resource stays online (which it should, barring user error), I think this will work. [root@fastvm-rhel-8-0-23 pacemaker]# pcs config | egrep '(Group|Resource|Meta Attrs):' Group: dummya Resource: dummya_1 (class=ocf provider=heartbeat type=Dummy) Resource: dummya_2 (class=ocf provider=heartbeat type=Dummy) Meta Attrs: migration-threshold=1 Group: dummyb Resource: dummyb_1 (class=ocf provider=heartbeat type=Dummy) Resource: dummyb_2 (class=ocf provider=heartbeat type=Dummy) Resource: placeholder (class=ocf provider=heartbeat type=Dummy) [root@fastvm-rhel-8-0-23 pacemaker]# pcs constraint colocation Colocation Constraints: placeholder with dummya (score:-INFINITY) dummyb with placeholder (score:5000) [root@fastvm-rhel-8-0-23 pacemaker]# pcs status ... * Resource Group: dummya: * dummya_1 (ocf::heartbeat:Dummy): Started node1 * dummya_2 (ocf::heartbeat:Dummy): Started node1 * Resource Group: dummyb: * dummyb_1 (ocf::heartbeat:Dummy): Started node2 * dummyb_2 (ocf::heartbeat:Dummy): Started node2 * placeholder (ocf::heartbeat:Dummy): Started node2 [root@fastvm-rhel-8-0-23 pacemaker]# crm_resource --fail --resource dummyb_2 --node node2 # # then start operation fails during recovery # # [root@fastvm-rhel-8-0-23 pacemaker]# pcs status * Resource Group: dummya: * dummya_1 (ocf::heartbeat:Dummy): Started node1 * dummya_2 (ocf::heartbeat:Dummy): Started node1 * Resource Group: dummyb: * dummyb_1 (ocf::heartbeat:Dummy): Started node1 * dummyb_2 (ocf::heartbeat:Dummy): Started node1 * placeholder (ocf::heartbeat:Dummy): Started node2 Failed Resource Actions: * dummyb_2_start_0 on node2 'error' (1): call=269, status='complete', exitreason='', last-rc-change='2020-09-05 20:50:06 -07:00', queued=0ms, exec=9ms Placed this workaround in KB 5374451. (In reply to Reid Wahl from comment #2) > I found a configuration hack that seems to work. After talking to the customer about the config hack with a dummy resource, a lot of the counter-intuitive nature of this comes down to the fact that the following two constraints behave differently when a non-base resource in the group reaches its migration threshold: (a) A constraint of -5000 with ASCS (the original config) (b) A constraint of 5000 with "not ASCS" (the config that uses a dummy resource as an intermediate) The group **cannot** fail over with (a) as long as the base resource is **allowed** to run in its current location. The group **can** fail over with (b) as long as the base resource is **allowed** to run in another location (and a non-base resource is **not allowed** to run in the current location after hitting migration-threshold). And as noted earlier, that might not be changeable within the current constraints scheme. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. Fixed in upstream 2.1 branch as of commit 0eae7d53b added docs Version of pacemaker: > [root@virt-016:~]# rpm -q pacemaker > pacemaker-2.1.6-2.el8.x86_64 Setting of cluster: > [root@virt-016:~]# pcs status > Cluster name: STSRHTS25395 > Cluster Summary: > * Stack: corosync (Pacemaker is running) > * Current DC: virt-016 (version 2.1.6-2.el8-6fdc9deea29) - partition with quorum > * Last updated: Mon Jul 10 22:41:08 2023 on virt-016 > * Last change: Mon Jul 10 22:38:16 2023 by root via cibadmin on virt-016 > * 2 nodes configured > * 6 resource instances configured > > Node List: > * Online: [ virt-016 virt-018 ] > > Full List of Resources: > * fence-virt-016 (stonith:fence_xvm): Started virt-016 > * fence-virt-018 (stonith:fence_xvm): Started virt-018 > * Resource Group: dummya: > * dummya_1 (ocf::heartbeat:Dummy): Started virt-016 > * dummya_2 (ocf::heartbeat:Dummy): Started virt-016 > * Resource Group: dummyb: > * dummyb_1 (ocf::heartbeat:Dummy): Started virt-018 > * dummyb_2 (ocf::heartbeat:Dummy): Started virt-018 > > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Resources in the cluster: > [root@virt-016:~]# crm_resource --list > Full List of Resources: > * fence-virt-016 (stonith:fence_xvm): Started > * fence-virt-018 (stonith:fence_xvm): Started > * Resource Group: dummya: > * dummya_1 (ocf::heartbeat:Dummy): Started > * dummya_2 (ocf::heartbeat:Dummy): Started > * Resource Group: dummyb: > * dummyb_1 (ocf::heartbeat:Dummy): Started > * dummyb_2 (ocf::heartbeat:Dummy): Started Setting meta attribute migration-threshold=1 for node dummyb_2: > [root@virt-016:~]# pcs resource create dummyb_2 ocf:heartbeat:Dummy meta migration-threshold=1 > [root@virt-016:~]# pcs cluster cib ... > </primitive> > <primitive class="ocf" id="dummyb_2" provider="heartbeat" type="Dummy"> > <meta_attributes id="dummyb_2-meta_attributes"> > <nvpair id="dummyb_2-meta_attributes-migration-threshold" name="migration-threshold" value="1"/> > </meta_attributes> > <operations> > <op id="dummyb_2-migrate_from-interval-0s" interval="0s" name="migrate_from" timeout="20s"/> > <op id="dummyb_2-migrate_to-interval-0s" interval="0s" name="migrate_to" timeout="20s"/> > <op id="dummyb_2-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/> > <op id="dummyb_2-reload-interval-0s" interval="0s" name="reload" timeout="20s"/> > <op id="dummyb_2-start-interval-0s" interval="0s" name="start" timeout="20s"/> > <op id="dummyb_2-stop-interval-0s" interval="0s" name="stop" timeout="20s"/> > </operations> > </primitive> ... Setting constraint colocation to -5000: > [root@virt-016:~]# pcs constraint colocation add dummyb with dummya score=-5000 > [root@virt-016:~]# pcs constraint colocation > Colocation Constraints: > dummyb with dummya (score:-5000) Failing resource dummyb_2 on node virt-018: > [root@virt-016:~]# crm_resource --fail --resource dummyb_2 --node virt-018 > Waiting for 1 reply from the controller > ... got reply (done) Checking if group dummyb is moved to another node: > [root@virt-016:~]# pcs status > Cluster name: STSRHTS25395 > Cluster Summary: > * Stack: corosync (Pacemaker is running) > * Current DC: virt-016 (version 2.1.6-2.el8-6fdc9deea29) - partition with quorum > * Last updated: Mon Jul 10 22:42:34 2023 on virt-016 > * Last change: Mon Jul 10 22:38:16 2023 by root via cibadmin on virt-016 > * 2 nodes configured > * 6 resource instances configured > > Node List: > * Online: [ virt-016 virt-018 ] > > Full List of Resources: > * fence-virt-016 (stonith:fence_xvm): Started virt-016 > * fence-virt-018 (stonith:fence_xvm): Started virt-018 > * Resource Group: dummya: > * dummya_1 (ocf::heartbeat:Dummy): Started virt-016 > * dummya_2 (ocf::heartbeat:Dummy): Started virt-016 > * Resource Group: dummyb: > * dummyb_1 (ocf::heartbeat:Dummy): Started virt-016 > * dummyb_2 (ocf::heartbeat:Dummy): Started virt-016 > > Failed Resource Actions: > * dummyb_2_asyncmon_0 on virt-018 'error' (1): call=66, status='complete', exitreason='Simulated failure', last-rc-change='Mon Jul 10 22:42:17 2023', queued=0ms, exec=0ms > > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled Result: Group dummyb was moved from virt-018 to virt-016. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:6970 |
Description of problem: Assume there are two groups with non-INFINITY negative colocation constraints between them, so that they prefer to run on separate nodes but are allowed to run on the same node. Say "dummyb with dummya -5000". Let a resource in the colocated group (dummyb), that is not at the base of the group, have a migration threshold of 1. Call that resource dummyb_2. When that resource fails, it remains in stopped state. It wants to migrate due to its migration threshold, but it cannot migrate because the resources closer to the base of the group prefer to remain on the current node. ``` [root@fastvm-rhel-8-0-23 pacemaker]# pcs config | egrep '(Group|Resource|Meta Attrs):' Group: dummya Resource: dummya_1 (class=ocf provider=heartbeat type=Dummy) Resource: dummya_2 (class=ocf provider=heartbeat type=Dummy) Group: dummyb Resource: dummyb_1 (class=ocf provider=heartbeat type=Dummy) Resource: dummyb_2 (class=ocf provider=heartbeat type=Dummy) Meta Attrs: migration-threshold=1 [root@fastvm-rhel-8-0-23 pacemaker]# pcs constraint colocation Colocation Constraints: dummyb with dummya (score:-5000) [root@fastvm-rhel-8-0-23 pacemaker]# pcs status ... * Resource Group: dummya: * dummya_1 (ocf::heartbeat:Dummy): Started node1 * dummya_2 (ocf::heartbeat:Dummy): Started node1 * Resource Group: dummyb: * dummyb_1 (ocf::heartbeat:Dummy): Started node2 * dummyb_2 (ocf::heartbeat:Dummy): Started node2 [root@fastvm-rhel-8-0-23 pacemaker]# crm_resource --fail --resource dummyb_2 --node node2 Waiting for 1 reply from the controller. OK [root@fastvm-rhel-8-0-23 pacemaker]# pcs status * Resource Group: dummya: * dummya_1 (ocf::heartbeat:Dummy): Started node1 * dummya_2 (ocf::heartbeat:Dummy): Started node1 * Resource Group: dummyb: * dummyb_1 (ocf::heartbeat:Dummy): Started node2 * dummyb_2 (ocf::heartbeat:Dummy): Stopped ``` This behavior is not reproducible if migration-threshold=1 on dummya_2 and we cause resource dummya_2 to fail, since group dummya is placed first. This isn't obviously a bug, as the behavior makes sense given the constraints and the migration-threshold. pcmk__native_allocate: dummyb_1 allocation score on node1: -5000 pcmk__native_allocate: dummyb_1 allocation score on node2: 0 pcmk__native_allocate: dummyb_2 allocation score on node1: -INFINITY pcmk__native_allocate: dummyb_2 allocation score on node2: -INFINITY So, irrespective of the difficulty of doing so, I don't know that we would even want to change Pacemaker's behavior. However, it would be really nice to configure the cluster so that the dummyb group migrates if dummyb_2 hits its migration-threshold, while still respecting the negative colocation constraint in general. Right now, dummyb_1 blocks that from happening. Maybe there's a way to rig the configuration to do that. ----- Version-Release number of selected component (if applicable): master, and pacemaker-1.1.21-4.el7 ----- How reproducible: Always ----- Steps to Reproduce: 1. Create a configuration like the one in the description. 2. Cause a failure of dummyb_2. ----- Actual results: dummyb_2 remains stopped, while dummyb_1 continues running on its original node. ----- Expected results: The dummyb group migrates to another node. ----- Additional info: This is holding up a customer's SAP NetWeaver deployment. We have a consultant working with them on the configuration. A negative colocation constraint between the ASCS group (placed first) and the ERS group (placed second) is a key part of an SAP NetWeaver Pacemaker configuration. The new factor introduced in this deployment is the migration-threshold attribute.