Hide Forgot
We have a cluster with 9 full pcmk-nodes (no remotes), hosting the usual OSP resources (bunch of bundles and a few VIPs). The nodes are: controller-0, controller-1, controller-2 messaging-0, messaging-1, messaging-2 database-0, database-1, database-3 Now when we trigger a minor update we basically do the following pseudo-operations: update() { pcs cluster stop bunch of container updates pcs cluster start } for i in 0 1 2; do A) update() on controller-$i B) update() on messaging-$i C) update() on database-$i done Note that A,B,C happen in parallel. We do this because we do not lose quorum and because it speeds things down considerable (2 hours less in the update process, than if we did it serially on each node. NB: when we do this serially no problem is observed). What we observe when we do this update() in parallel is that the cluster ends up in an unexpected state. Here is one timeline we reconstructed on one environment [1] (we obtain the same result when fencing is configured, see [2]): 1. At the following time database-1 (which just got started and joined the cluster correctly, observed node messaging-1 coming back up) Jan 15 04:22:20 database-1 pacemaker-based[249477]: notice: Node messaging-1 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node messaging-1 state is now member Jan 15 04:22:20 database-1 pacemaker-attrd[249480]: notice: Node messaging-1 state is now member Jan 15 04:22:20 database-1 corosync[249448]: [QUORUM] Members[9]: 1 2 3 4 5 6 7 8 9 Jan 15 04:22:20 database-1 corosync[249448]: [MAIN ] Completed service synchronization, ready to provide service. Jan 15 04:22:20 database-1 pacemakerd[249473]: notice: Node messaging-1 state is now member Jan 15 04:22:20 database-1 pacemaker-controld[249482]: notice: Node messaging-1 state is now member 2. The DC which is messaging-2 tells database-1 to shut itself off. At this point database-1 will stop itself and needs manual intervention Jan 15 04:22:20 database-1 pacemaker-attrd[249480]: notice: Detected another attribute writer (messaging-2), starting new election Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node rabbitmq-bundle-1 state is now lost Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node rabbitmq-bundle-2 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node galera-bundle-0 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node redis-bundle-1 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node redis-bundle-2 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node galera-bundle-2 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node galera-bundle-1 state is now lost Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node ovn-dbs-bundle-1 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node redis-bundle-0 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node ovn-dbs-bundle-2 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node ovn-dbs-bundle-0 state is now member Jan 15 04:22:20 database-1 pacemaker-fenced[249478]: notice: Node rabbitmq-bundle-0 state is now member Jan 15 04:22:20 database-1 pacemaker-attrd[249480]: notice: Recorded new attribute writer: messaging-1 (was database-1) Jan 15 04:22:20 database-1 pacemaker-controld[249482]: notice: State transition S_PENDING -> S_NOT_DC Jan 15 04:22:22 database-1 pacemaker-controld[249482]: notice: Result of probe operation for galera-bundle-0 on database-1: 7 (not running) Jan 15 04:22:22 database-1 pacemaker-controld[249482]: notice: Result of probe operation for galera-bundle-2 on database-1: 7 (not running) Jan 15 04:22:22 database-1 pacemaker-controld[249482]: notice: Result of probe operation for rabbitmq-bundle-0 on database-1: 7 (not running) Jan 15 04:22:22 database-1 pacemaker-controld[249482]: notice: Result of probe operation for rabbitmq-bundle-2 on database-1: 7 (not running) Jan 15 04:22:22 database-1 pacemaker-controld[249482]: notice: Result of probe operation for redis-bundle-0 on database-1: 7 (not running) Jan 15 04:22:22 database-1 pacemaker-controld[249482]: error: We didn't ask to be shut down, yet our DC is telling us to. Jan 15 04:22:22 database-1 pacemaker-controld[249482]: notice: State transition S_NOT_DC -> S_STOPPING Jan 15 04:22:22 database-1 pacemaker-controld[249482]: notice: Stopped 0 recurring operations at shutdown... waiting (3 remaining) Jan 15 04:22:22 database-1 pacemaker-controld[249482]: notice: Disconnected from the executor 3. On the DC (messaging-2) we see the following: Jan 15 04:22:20 messaging-2 pacemaker-controld[14317]: notice: Node messaging-1 state is now member Jan 15 04:22:20 messaging-2 pacemakerd[14311]: notice: Node messaging-1 state is now member Jan 15 04:22:20 messaging-2 pacemaker-attrd[14315]: notice: Detected another attribute writer (database-1), starting new election Jan 15 04:22:20 messaging-2 pacemaker-attrd[14315]: notice: Recorded new attribute writer: messaging-1 (was messaging-2) Jan 15 04:22:20 messaging-2 pacemaker-attrd[14315]: notice: Recorded local node as attribute writer (was unset) Jan 15 04:22:22 messaging-2 pacemaker-schedulerd[14316]: notice: Scheduling shutdown of node database-1 Jan 15 04:22:22 messaging-2 pacemaker-schedulerd[14316]: notice: * Shutdown database-1 Jan 15 04:22:22 messaging-2 pacemaker-schedulerd[14316]: notice: * Start galera-bundle-1 ( controller-0 ) due to unrunnable galera-bundle-podman-1 start (blocked) Jan 15 04:22:22 messaging-2 pacemaker-schedulerd[14316]: notice: * Start galera:1 ( galera-bundle-1 ) due to unrunnable galera-bundle-podman-1 start (blocked) It's not entirely obvious to us why messaging-2 (DC) decides to just shut down database-1 right after messaging-1 has joined the cluster. So this seems unexpected? versions: pacemaker-2.0.2-3.el8_1.2.x86_64 corosync-3.0.2-3.el8_1.1.x86_64 Additional Info: We can reproduce this situation fairly reliably, although it takes ~14-15 hours for the job to reproduce, so not exactly super fast ;)
You always find the fun cases. :) Pacemaker coordinates node shutdown via a special transient node attribute named (appropriately enough) "shutdown". If it's nonzero, the DC schedules the node for shutdown. When the node leaves the cluster, it gets cleared. Usually. The DC is the one that clears a shutting-down node's transient node attributes from the CIB. What happened here is that database-1 left the cluster immediately after the DC (messaging-1 at the time) did, so the rest of the cluster was still in the process of electing a new DC when database-1 left -- meaning there was no one to clear its attributes from the CIB. When it later rejoined, its shutdown attribute was still nonzero, so it got scheduled for shutdown again. My first idea for a fix would be for every node to trigger the clearing if there is no DC when a node leaves. It's inefficient, but we already do the same thing when the DC itself leaves. Workarounds in the meantime would be to either serialize updates, or query which node is DC beforehand (via crmadmin -D) and save its updates for last.
Fixed upstream by commit 01b463bd
Hi, so I've run a //role update of OSP16 using the same version that was usually failing with the previous pacemaker version[1] The run was successful relative to the current issue[2][3]. Note that I cannot move this bz to VERIFIED, but it is. I'll expand a bit more and show that the right version of pacemaker was used: # After the update the cluster is fine [heat-admin@controller-1 ~]$ sudo pcs status Cluster name: tripleo_cluster Cluster Summary: * Stack: corosync * Current DC: messaging-0 (version 2.0.3-5.el8-4b1f869f0f) - partition with quorum * Last updated: Thu Feb 27 11:48:35 2020 * Last change: Thu Feb 27 00:58:41 2020 by redis-bundle-2 via crm_attribute on controller-2 * 21 nodes configured * 57 resource instances configured Node List: * Online: [ controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ] * GuestOnline: [ galera-bundle-0@database-0 galera-bundle-1@database-1 galera-bundle-2@database-2 ovn-dbs-bundle-0@controller-1 ovn-dbs-bundle-1@controller-0 ovn-dbs-bundle-2@controller-2 rabbitmq-bundle-0@messaging-0 rabbitmq-bundle-1 @messaging-1 rabbitmq-bundle-2@messaging-2 redis-bundle-0@controller-1 redis-bundle-1@controller-0 redis-bundle-2@controller-2 ] Full List of Resources: * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]: * galera-bundle-0 (ocf::heartbeat:galera): Master database-0 * galera-bundle-1 (ocf::heartbeat:galera): Master database-1 * galera-bundle-2 (ocf::heartbeat:galera): Master database-2 * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]: * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started messaging-0 * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started messaging-1 * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started messaging-2 * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]: * redis-bundle-0 (ocf::heartbeat:redis): Slave controller-1 * redis-bundle-1 (ocf::heartbeat:redis): Slave controller-0 * redis-bundle-2 (ocf::heartbeat:redis): Master controller-2 * ip-192.168.24.33 (ocf::heartbeat:IPaddr2): Started controller-2 * ip-2620.52.0.13b8.5054.ff.fe3e.1 (ocf::heartbeat:IPaddr2): Started controller-2 * ip-fd00.fd00.fd00.2000..33c (ocf::heartbeat:IPaddr2): Started controller-2 * ip-fd00.fd00.fd00.2000..10d (ocf::heartbeat:IPaddr2): Started controller-2 * ip-fd00.fd00.fd00.3000..da (ocf::heartbeat:IPaddr2): Started controller-2 * ip-fd00.fd00.fd00.4000..1f9 (ocf::heartbeat:IPaddr2): Started controller-2 * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]: * ip-fd00.fd00.fd00.4000..1f9 (ocf::heartbeat:IPaddr2): Started controller-2 * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]: * haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started controller-1 * haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started controller-0 * haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started controller-2 * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]: * ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-1 * ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-0 * ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 * ip-fd00.fd00.fd00.2000..d4 (ocf::heartbeat:IPaddr2): Started controller-1 * stonith-fence_ipmilan-5254000e7624 (stonith:fence_ipmilan): Started database-0 * stonith-fence_ipmilan-5254008ddd24 (stonith:fence_ipmilan): Started database-2 * stonith-fence_ipmilan-5254008d88d6 (stonith:fence_ipmilan): Started messaging-0 * stonith-fence_ipmilan-5254006c8596 (stonith:fence_ipmilan): Started messaging-0 * stonith-fence_ipmilan-525400d32f45 (stonith:fence_ipmilan): Started messaging-1 * stonith-fence_ipmilan-525400979c13 (stonith:fence_ipmilan): Started database-2 * stonith-fence_ipmilan-52540017c322 (stonith:fence_ipmilan): Started messaging-2 * stonith-fence_ipmilan-525400a7bb1c (stonith:fence_ipmilan): Started messaging-1 * stonith-fence_ipmilan-525400619c69 (stonith:fence_ipmilan): Started database-2 * Container bundle: openstack-cinder-backup [cluster.common.tag/rhosp16-openstack-cinder-backup:pcmklatest]: * openstack-cinder-backup-podman-0 (ocf::heartbeat:podman): Started controller-1 * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest]: * openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started controller-2 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled # pacemaker is updated in the container: [heat-admin@controller-1 ~]$ sudo podman ps | grep hotfix 1b8a25822eb1 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-redis:20200130.1-hotfix /bin/bash /usr/lo... 11 hours ago Up 11 hours ago redis-bundle-podman-0 ba22ff49e131 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-northd:20200130.1-hotfix /bin/bash /usr/lo... 11 hours ago Up 11 hours ago ovn-dbs-bundle-podman-0 aa8f631b5910 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200130.1-hotfix /bin/bash /usr/lo... 11 hours ago Up 11 hours ago openstack-cinder-backup-podman-0 a7665faeb2f8 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-haproxy:20200130.1-hotfix /bin/bash /usr/lo... 11 hours ago Up 11 hours ago haproxy-bundle-podman-0 [heat-admin@controller-1 ~]$ echo and has the right version and has the right version [heat-admin@controller-1 ~]$ sudo podman exec -ti haproxy-bundle-podman-0 rpm -qa | grep ^pacemaker pacemaker-cluster-libs-2.0.3-5.el8.x86_64 pacemaker-schemas-2.0.3-5.el8.noarch pacemaker-cli-2.0.3-5.el8.x86_64 pacemaker-2.0.3-5.el8.x86_64 pacemaker-libs-2.0.3-5.el8.x86_64 pacemaker-remote-2.0.3-5.el8.x86_64 # on the host too [heat-admin@controller-1 ~]$ rpm -qa | grep ^pacemaker pacemaker-2.0.3-5.el8.x86_64 pacemaker-cli-2.0.3-5.el8.x86_64 pacemaker-libs-2.0.3-5.el8.x86_64 pacemaker-remote-2.0.3-5.el8.x86_64 pacemaker-schemas-2.0.3-5.el8.noarch pacemaker-cluster-libs-2.0.3-5.el8.x86_64 Thanks. [1] current version of osp16 is failing on some other things [2] got a transient error during last step of vm migration but it's completely unrelated in http://staging-jenkins2-qe-playground.usersys.redhat.com/view/update/job/DFG-upgrades-updates-16-from-passed_phase1-composable-ipv6/ job 5 [3] on the same ci, job 4 passed completely but was using pacemaker-2.0.3-4.el8.1.x86_64 in the container.
marking verified in pacemaker-2.0.3-5.el8 as per comment#15
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1609