Bug 1943742
| Summary: | Redundant anti-colocation chains can cause resources to flip-flop | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Reid Wahl <nwahl> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | NEW --- | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 8.3 | CC: | cluster-maint |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: When there are multiple redundant anti-colocation chains, the scheduler may move resources back and forth on each transition. The demonstration below uses a four-node cluster with three anti-colocated resources. I don't know of any legitimate reason to intentionally configure redundant constraints, but they can arise unintentionally in the course of configuration. For example, a customer encountered the flip-flopping behavior when they had the following constraint sets (in addition to some dummy colocator sets): set tstafs10a_colocator tstafs10b_colocator setoptions score=-INFINITY set tstafs10a_colocator tstafs10b_colocator tstafs10c_colocator setoptions score=-INFINITY set tstafs10a_colocator tstafs10b_colocator tstafs10c_colocator tstafs10d_colocator setoptions score=-INFINITY These seemingly redundant constraints arose because the customer had an Ansible playbook that created a new group of resources (named by letter, e.g., group D) and then created a set constraint to negatively colocate the new group with the existing groups. They did this three times. This is a low priority, since the solution is to delete the redundant constraints. I want to record it, however, since one would expect multiple redundant constraints to behave the same as a single constraint. This might get addressed alongside BZ1943476. There's a minimal demonstration in "Steps to Reproduce". ----- Version-Release number of selected component (if applicable): pacemaker-2.0.4-6.el8 / master ----- How reproducible: Always ----- Steps to Reproduce: 1. On a 3-node cluster, create 3 dummy resources. # for i in {1..3}; do pcs resource create dummy$i ocf:heartbeat:Dummy; done 2. Anti-colocate them twice. # pcs constraint colocation set dummy1 dummy2 dummy3 setoptions score=-INFINITY # pcs constraint colocation set dummy1 dummy2 dummy3 setoptions score=-INFINITY --force Duplicate constraints: set dummy1 dummy2 dummy3 (id:colocation_set_d1d2d3_set) setoptions score=-INFINITY (id:colocation_set_d1d2d3) Warning: duplicate constraint already exists 3. Run a simulation based on the live CIB and save the resulting CIB. # crm_simulate -LS -O /tmp/next.xml ... Transition Summary: * Move dummy1 ( node1 -> node2 ) * Move dummy3 ( node2 -> node1 ) 4. Run a simulation based on the resulting CIB. # crm_simulate -R -x /tmp/next.xml ----- Actual results: dummy1 and dummy3 get moved back to their initial locations. # crm_simulate -R -x /tmp/next.xml ... Transition Summary: * Move dummy1 ( node2 -> node1 ) * Move dummy3 ( node1 -> node2 ) ----- Expected results: dummy1 and dummy3 stay put.