Bug 1943742 - Redundant anti-colocation chains can cause resources to flip-flop
Summary: Redundant anti-colocation chains can cause resources to flip-flop
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pacemaker
Version: 8.3
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Ken Gaillot
QA Contact: cluster-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-27 01:02 UTC by Reid Wahl
Modified: 2023-08-10 15:40 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5910051 0 None None None 2021-03-27 01:16:55 UTC

Description Reid Wahl 2021-03-27 01:02:38 UTC
Description of problem:

When there are multiple redundant anti-colocation chains, the scheduler may move resources back and forth on each transition. The demonstration below uses a four-node cluster with three anti-colocated resources.

I don't know of any legitimate reason to intentionally configure redundant constraints, but they can arise unintentionally in the course of configuration. For example, a customer encountered the flip-flopping behavior when they had the following constraint sets (in addition to some dummy colocator sets):

    set tstafs10a_colocator tstafs10b_colocator setoptions score=-INFINITY
    set tstafs10a_colocator tstafs10b_colocator tstafs10c_colocator setoptions score=-INFINITY
    set tstafs10a_colocator tstafs10b_colocator tstafs10c_colocator tstafs10d_colocator setoptions score=-INFINITY

These seemingly redundant constraints arose because the customer had an Ansible playbook that created a new group of resources (named by letter, e.g., group D) and then created a set constraint to negatively colocate the new group with the existing groups. They did this three times.

This is a low priority, since the solution is to delete the redundant constraints. I want to record it, however, since one would expect multiple redundant constraints to behave the same as a single constraint.

This might get addressed alongside BZ1943476.

There's a minimal demonstration in "Steps to Reproduce".

-----

Version-Release number of selected component (if applicable):

pacemaker-2.0.4-6.el8 / master

-----

How reproducible:

Always

-----

Steps to Reproduce:

1. On a 3-node cluster, create 3 dummy resources.

    # for i in {1..3}; do pcs resource create dummy$i ocf:heartbeat:Dummy; done

2. Anti-colocate them twice.

    # pcs constraint colocation set dummy1 dummy2 dummy3 setoptions score=-INFINITY
    # pcs constraint colocation set dummy1 dummy2 dummy3 setoptions score=-INFINITY --force
    Duplicate constraints:
      set dummy1 dummy2 dummy3 (id:colocation_set_d1d2d3_set) setoptions score=-INFINITY (id:colocation_set_d1d2d3)
    Warning: duplicate constraint already exists

3. Run a simulation based on the live CIB and save the resulting CIB.

    # crm_simulate -LS -O /tmp/next.xml
    ...
    Transition Summary:
     * Move       dummy1     ( node1 -> node2 )  
     * Move       dummy3     ( node2 -> node1 ) 

4. Run a simulation based on the resulting CIB.

    # crm_simulate -R -x /tmp/next.xml

-----

Actual results:

dummy1 and dummy3 get moved back to their initial locations.

    # crm_simulate -R -x /tmp/next.xml
    ...
    Transition Summary:
     * Move       dummy1     ( node2 -> node1 )  
     * Move       dummy3     ( node1 -> node2 )  

-----

Expected results:

dummy1 and dummy3 stay put.


Note You need to log in before you can comment on or make changes to this bug.