Bug 1975388

Summary: pacemaker seems to end up in an unfence loop
Product: Red Hat Enterprise Linux 9 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED CURRENTRELEASE QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact: Steven J. Levine <slevine>
Priority: high    
Version: 9.0CC: cluster-maint, dabarzil, msmazova, presharm
Target Milestone: betaKeywords: Triaged
Target Release: 9.0 Beta   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: pacemaker-2.1.0-5.el9 Doc Type: Bug Fix
Doc Text:
.Pacemaker attribute manager correctly determines remote node attributes, preventing unfencing loops Previously, Pacemaker's controller on a node might be elected the Designated Controller (DC) before its attribute manager learned an already-active remote node is remote. When this occurred, the node's scheduler would not see any of the remote node's node attributes. If the cluster used unfencing, this could result in an unfencing loop. With the fix, the attribute manager can now learn a remote node is remote by means of additional events, including the initial attribute sync at start-up. As a result, no unfencing loop occurs, regardless of which node is elected DC.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-07 21:57:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ken Gaillot 2021-06-23 14:53:07 UTC
This bug was initially created as a copy of Bug #1961857 for RHEL 9

--- Description by Michele Baldessari: ---

So I was testing some new constraints around a resource called ovn-dbs-bundle in this Instance HA environment and by pure chance I noticed that the cluster seems to be stuck in a constant loop of unfencing the two compute nodes.

Namely I keep seeing the following:
May 18 20:13:14 messaging-0.redhat.local pacemaker-controld[141595]:  notice: State transition S_IDLE -> S_POLICY_ENGINE                                                                                                      [422/1436]
May 18 20:13:14 messaging-0.redhat.local pacemaker-schedulerd[141594]:  error: Stopping compute-unfence-trigger:0 until compute-0 can be unfenced
May 18 20:13:14 messaging-0.redhat.local pacemaker-schedulerd[141594]:  error: Stopping compute-unfence-trigger:1 until compute-1 can be unfenced
May 18 20:13:14 messaging-0.redhat.local pacemaker-schedulerd[141594]:  notice:  * Fence (on) compute-0 'required by compute-unfence-trigger:0 start'
May 18 20:13:14 messaging-0.redhat.local pacemaker-schedulerd[141594]:  notice:  * Fence (on) compute-1 'required by compute-unfence-trigger:1 start'
May 18 20:13:14 messaging-0.redhat.local pacemaker-schedulerd[141594]:  notice:  * Restart    compute-unfence-trigger:0              (                          compute-0 )   due to required stonith
May 18 20:13:14 messaging-0.redhat.local pacemaker-schedulerd[141594]:  notice:  * Restart    compute-unfence-trigger:1              (                          compute-1 )   due to required stonith
May 18 20:13:14 messaging-0.redhat.local pacemaker-schedulerd[141594]:  error: Calculated transition 71 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-42.bz2
May 18 20:13:14 messaging-0.redhat.local pacemaker-controld[141595]:  notice: Initiating stop operation compute-unfence-trigger_stop_0 on compute-0
May 18 20:13:14 messaging-0.redhat.local pacemaker-controld[141595]:  notice: Initiating stop operation compute-unfence-trigger_stop_0 on compute-1
May 18 20:13:14 messaging-0.redhat.local pacemaker-controld[141595]:  notice: Requesting fencing (on) of node compute-0
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Client pacemaker-controld.141595.362e0079 wants to fence (on) 'compute-0' with device '(any)'
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Requesting peer fencing (on) targeting compute-0
May 18 20:13:14 messaging-0.redhat.local pacemaker-controld[141595]:  notice: Requesting fencing (on) of node compute-1
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Client pacemaker-controld.141595.362e0079 wants to fence (on) 'compute-1' with device '(any)'
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Requesting peer fencing (on) targeting compute-1
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254006197b0 is not eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-52540061e462 is not eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254002f3ee2 is not eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-52540099d9d4 is not eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254004abb25 is not eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254007f35e5 is not eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-52540091b7d5 is not eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254005fe973 is eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-525400e04d3b is not eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-52540033a68d is not eligible to fence (on) compute-0: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254006197b0 is eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-52540061e462 is not eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254002f3ee2 is not eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-52540099d9d4 is not eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254004abb25 is not eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254007f35e5 is not eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-52540091b7d5 is not eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-5254005fe973 is not eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-525400e04d3b is not eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: stonith-fence_ipmilan-52540033a68d is not eligible to fence (on) compute-1: static-list
May 18 20:13:14 messaging-0.redhat.local runuser[291621]: pam_unix(runuser:session): session closed for user rabbitmq
May 18 20:13:19 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Requesting that database-1 perform 'on' action targeting compute-1 using 'stonith-fence_ipmilan-5254006197b0'
May 18 20:13:19 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Action 'on' targeting compute-1 using stonith-fence_ipmilan-5254006197b0 on behalf of pacemaker-controld.141595@messaging-0: OK
May 18 20:13:19 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Requesting that database-2 perform 'on' action targeting compute-1 using 'stonith-fence_compute-fence-nova'
May 18 20:13:19 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Requesting that messaging-0 perform 'on' action targeting compute-0 using 'stonith-fence_ipmilan-5254005fe973'
May 18 20:13:19 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Operation 'on' [291676] (call 114 from pacemaker-controld.141595) for host 'compute-0' with device 'stonith-fence_ipmilan-5254005fe973' returned: 0 (OK)
May 18 20:13:19 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Action 'on' targeting compute-0 using stonith-fence_ipmilan-5254005fe973 on behalf of pacemaker-controld.141595@messaging-0: OK
May 18 20:13:19 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Requesting that database-2 perform 'on' action targeting compute-0 using 'stonith-fence_compute-fence-nova'
May 18 20:13:21 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Action 'on' targeting compute-1 using stonith-fence_compute-fence-nova on behalf of pacemaker-controld.141595@messaging-0: OK
May 18 20:13:21 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Operation 'on' targeting compute-1 on database-2 for pacemaker-controld.141595: OK
May 18 20:13:21 messaging-0.redhat.local pacemaker-controld[141595]:  notice: Stonith operation 115/85:71:0:e660d42a-7190-450b-a5f7-d5491a657fd1: OK (0)
May 18 20:13:21 messaging-0.redhat.local pacemaker-controld[141595]:  notice: compute-1 was successfully unfenced by database-2 (at the request of messaging-0)
May 18 20:13:21 messaging-0.redhat.local pacemaker-controld[141595]:  notice: Initiating start operation compute-unfence-trigger_start_0 on compute-1
May 18 20:13:21 messaging-0.redhat.local pacemaker-attrd[141593]:  notice: Setting #node-unfenced[compute-1]: 1621368733 -> 1621368801
May 18 20:13:21 messaging-0.redhat.local pacemaker-attrd[141593]:  notice: Cannot update #node-unfenced[compute-0]=1621368731 because peer UUID not known (will retry if learned)
May 18 20:13:21 messaging-0.redhat.local pacemaker-attrd[141593]:  notice: Cannot update #node-unfenced[compute-1]=1621368801 because peer UUID not known (will retry if learned)
May 18 20:13:21 messaging-0.redhat.local pacemaker-controld[141595]:  notice: Initiating monitor operation compute-unfence-trigger_monitor_10000 on compute-1
May 18 20:13:23 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Action 'on' targeting compute-0 using stonith-fence_compute-fence-nova on behalf of pacemaker-controld.141595@messaging-0: OK
May 18 20:13:23 messaging-0.redhat.local pacemaker-fenced[141591]:  notice: Operation 'on' targeting compute-0 on database-2 for pacemaker-controld.141595: OK
May 18 20:13:23 messaging-0.redhat.local pacemaker-controld[141595]:  notice: Stonith operation 114/86:71:0:e660d42a-7190-450b-a5f7-d5491a657fd1: OK (0)
May 18 20:13:23 messaging-0.redhat.local pacemaker-controld[141595]:  notice: compute-0 was successfully unfenced by database-2 (at the request of messaging-0)


I suspect that the root cause is around these two messages:
May 18 20:13:21 messaging-0.redhat.local pacemaker-attrd[141593]:  notice: Cannot update #node-unfenced[compute-0]=1621368731 because peer UUID not known (will retry if learned)
May 18 20:13:21 messaging-0.redhat.local pacemaker-attrd[141593]:  notice: Cannot update #node-unfenced[compute-1]=1621368801 because peer UUID not known (will retry if learned)


Version-Release number of selected component (if applicable):
pacemaker-2.0.5-9.el8.x86_64

How reproducible:
Not sure yet. Still investigating. Full sosreports + /var/{lib,log}/pacemaker from the DC messaging-0 at http://file.rdu.redhat.com/~mbaldess/pcmk-unfence-loop/

I'll try to reproduce as well in the next days.

--- Comment by Ken Gaillot: ---

The combination of restarting all the cluster nodes one by one, and running cibadmin immediately after they are restarted, triggers a CIB conflict.

Pacemaker has an algorithm to keep the CIB synchronized across all cluster nodes. For example, if a cluster is partitioned, the CIB is modified differently in each partition, and then the partition heals, Pacemaker has to pick one of the new CIBs to keep and trash the other. The algorithm gives a preference to changes from cibadmin since they are made by the user.

You're using cibadmin to remove the tripleo lock immediately after the node starts the cluster, which creates a timing issue. The cibadmin change can be made effective locally before the node has rejoined the cluster's pacemaker-controld layer. When the node does rejoin, its CIB now looks "better" (because it came from cibadmin) than the cluster's current CIB and overwrites it, losing some information in the process, including transient node attributes.

Normally that's not too much of a problem, because the current attribute writer (a role similar to DC, and usually the same node) will write out all transient attributes back to the CIB. However, because every cluster node has been restarted at this point, and the remote nodes have not been restarted, the attribute writer doesn't know that the remote nodes are remote nodes, and can't write out their attributes. This includes unfencing information, so the DC's scheduler now thinks unfencing needs to be redone, and the attributes can never be written, so it keeps looping.

I have a fix ready for testing, but because this is a timing issue, a reliable reproducer might be difficult. I'm thinking it should be possible to bring the last cluster node back up with its corosync ports blocked, run cibadmin manually on it, then unblock corosync, but I haven't tried that yet.

A workaround (and better solution, really) would be to wait until the node has fully rejoined the cluster before running cibadmin.

Comment 6 Ken Gaillot 2021-09-10 20:09:40 UTC
Since this was originally a RHOSP-related bz, RHOSP has verified the corresponding 8.4.z Bug 1972273, and this can get sanity-only testing