Bug 2218232

Summary: Cluster does not move resource group when colocation constraint exists for individual group member
Product: Red Hat Enterprise Linux 8 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.9CC: cluster-maint, msmazova, phagara
Target Milestone: rcKeywords: Triaged
Target Release: 8.9   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: pacemaker-2.1.6-3.el8 Doc Type: Bug Fix
Doc Text:
Cause: When assigning groups to a node, Pacemaker did not consider constraints that were configured explicitly with a group member instead of the group itself. Consequence: A group could be assigned to a node where some of its members were unable to run. Fix: Pacemaker now considers member colocations when assigning groups. Result: Groups run on the best available node.
Story Points: ---
Clone Of: 2218218 Environment:
Last Closed: 2023-11-14 15:32:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 2.1.7
Embargoed:
Bug Depends On: 2218218    
Bug Blocks:    

Description Ken Gaillot 2023-06-28 13:56:08 UTC
+++ This bug was initially created as a clone of Bug #2218218 +++

Description of problem:
when a resource with an existing ordering + colocation constraints against other resource(s) is added to a group, and that group's resources are already running on a node that's different from the node where this resource's dependencies are running, the resource will fail to start after it is added to the group.

Version-Release number of selected component (if applicable):
pacemaker-2.1.6-2.el9

How reproducible:
always

Steps to Reproduce:
1. create two colocated and ordered resources, "vip-dep" and "vip"
> pcs resource create vip-dep ocf:pacemaker:Dummy
> pcs resource create vip ocf:pacemaker:Dummy
> pcs constraint order start vip-dep then vip
> pcs constraint colocation add vip with vip-dep score=INFINITY

2. create a resource group "grp" with some random resources (resources inside a group have implicit ordering and colocation constraints)
> pcs resource create foo ocf:pacemaker:Dummy --group grp
> pcs resource create bar ocf:pacemaker:Dummy --group grp

3. due to resource load balancing, the "grp" group is now started on one node and our "vip" and "vip-dep" resources on another node

4. add "vip" to the "grp" resource group
> pcs resource group add grp vip

Actual results:
the "vip" resource will fail to start after adding it into the group that's already running on a different node that the "vip-dep" resource, on which "vip" has a colocation and ordering constraints

Expected results:
pacemaker should stop and move the "vip-dep" resource to the same node where "grp" is already running (or vice versa), so that all constraints are satisfied and all resources can start

Additional info:
upstream patch https://github.com/ClusterLabs/pacemaker/pull/3141

Comment 4 Markéta Smazová 2023-07-25 12:13:55 UTC
after fix:
----------

>   [root@virt-521 ~]# rpm -q pacemaker
>   pacemaker-2.1.6-4.el8.x86_64

Create two colocated and ordered resources:
>   [root@virt-521 ~]# pcs resource create blue1 ocf:pacemaker:Dummy
>   [root@virt-521 ~]# pcs resource create blue2 ocf:pacemaker:Dummy
>   [root@virt-521 ~]# pcs constraint order start blue1 then blue2
>   Adding blue1 blue2 (kind: Mandatory) (Options: first-action=start then-action=start)
>   [root@virt-521 ~]# pcs constraint colocation add blue2 with blue1 score=INFINITY

Create group with two resources:
>   [root@virt-521 ~]# pcs resource create green1 ocf:pacemaker:Dummy --group green-group
>   [root@virt-521 ~]# pcs resource create green2 ocf:pacemaker:Dummy --group green-group

The colocated and ordered resources "blue1" and "blue2" run on node "virt-521" and the group "green-group"
runs on node "virt-522":
>   [root@virt-521 ~]# pcs status
>   Cluster name: STSRHTS29909
>   Cluster Summary:
>     * Stack: corosync (Pacemaker is running)
>     * Current DC: virt-521 (version 2.1.6-4.el8-6fdc9deea29) - partition with quorum
>     * Last updated: Mon Jul 24 16:49:15 2023 on virt-521
>     * Last change:  Mon Jul 24 16:49:05 2023 by root via cibadmin on virt-521
>     * 2 nodes configured
>     * 6 resource instances configured

>   Node List:
>     * Online: [ virt-521 virt-522 ]

>   Full List of Resources:
>     * fence-virt-521	(stonith:fence_xvm):	 Started virt-521
>     * fence-virt-522	(stonith:fence_xvm):	 Started virt-522
>     * blue1	(ocf::pacemaker:Dummy):	 Started virt-521
>     * blue2	(ocf::pacemaker:Dummy):	 Started virt-521
>     * Resource Group: green-group:
>       * green1	(ocf::pacemaker:Dummy):	 Started virt-522
>       * green2	(ocf::pacemaker:Dummy):	 Started virt-522

>   Daemon Status:
>     corosync: active/enabled
>     pacemaker: active/enabled
>     pcsd: active/enabled

>   [root@virt-521 ~]# pcs constraint --full
>   Location Constraints:
>   Ordering Constraints:
>     start blue1 then start blue2 (kind:Mandatory) (id:order-blue1-blue2-mandatory)
>   Colocation Constraints:
>     blue2 with blue1 (score:INFINITY) (id:colocation-blue2-blue1-INFINITY)
>   Ticket Constraints:

Add resource "blue2" to the "green-group":
>   [root@virt-521 ~]# pcs resource group add green-group blue2
>   [root@virt-521 ~]# pcs status
>   Cluster name: STSRHTS29909
>   Cluster Summary:
>     * Stack: corosync (Pacemaker is running)
>     * Current DC: virt-521 (version 2.1.6-4.el8-6fdc9deea29) - partition with quorum
>     * Last updated: Mon Jul 24 16:50:46 2023 on virt-521
>     * Last change:  Mon Jul 24 16:50:20 2023 by root via cibadmin on virt-521
>     * 2 nodes configured
>     * 6 resource instances configured

>   Node List:
>     * Online: [ virt-521 virt-522 ]

>   Full List of Resources:
>     * fence-virt-521	(stonith:fence_xvm):	 Started virt-521
>     * fence-virt-522	(stonith:fence_xvm):	 Started virt-522
>     * blue1	(ocf::pacemaker:Dummy):	 Started virt-521
>     * Resource Group: green-group:
>       * green1	(ocf::pacemaker:Dummy):	 Started virt-521
>       * green2	(ocf::pacemaker:Dummy):	 Started virt-521
>       * blue2	(ocf::pacemaker:Dummy):	 Started virt-521

>   Daemon Status:
>     corosync: active/enabled
>     pacemaker: active/enabled
>     pcsd: active/enabled

RESULT: Resource group "green-group" moved to the node "virt-521" where the resources "blue1" and "blue2" originally started.

marking VERIFIED in pacemaker-2.1.6-4.el8

Comment 7 errata-xmlrpc 2023-11-14 15:32:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:6970