Bug 2003548 - 8.2 -> 8.4 update has the first updated node stuck on "current num_updates is greater than the replacement" messages
Summary: 8.2 -> 8.4 update has the first updated node stuck on "current num_updates is...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: ansible-pacemaker
Version: 16.2 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z1
: 16.2 (Train on RHEL 8.4)
Assignee: mathieu bultel
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-13 08:22 UTC by Michele Baldessari
Modified: 2021-12-09 20:41 UTC (History)
7 users (show)

Fixed In Version: ansible-pacemaker-1.0.4-2.20210527194421.el8ost.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-09 20:41:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gerrithub.io 519071 0 None None None 2021-09-15 13:24:54 UTC
Red Hat Issue Tracker OSP-9557 0 None None None 2021-11-17 09:54:20 UTC
Red Hat Product Errata RHBA-2021:5067 0 None None None 2021-12-09 20:41:46 UTC

Comment 1 Ken Gaillot 2021-09-13 14:41:36 UTC
What appears to be happening is that some configuration change is made on controller-0 after being upgraded, so it thinks its configuration is the most current one.

I could swear this has come up before, but I can't find an existing bz for it.

A fix isn't really possible on the pacemaker side -- if the configuration diverges on different nodes, pacemaker has to pick one as the "correct" one, and the only information it has about it is the count of changes that have been made.

Workarounds would be either (1) avoid changing the configuration on the updated node until after it rejoins the cluster, or (2) bump the CIB admin epoch on the rest of the cluster after a node is upgraded, so it always wins (no pcs interface currently):

    cibadmin --modify --xml-text '<cib admin_epoch="admin_epoch++"/>'

Comment 2 Michele Baldessari 2021-09-13 14:43:54 UTC
Thanks Ken, perfect. 

Let me check a couple of things and I'll get back to you, I think I remember our previous discussion on this. I just need to find the right bits on this.

Leaving needinfo on me

Comment 11 dabarzil 2021-10-27 07:45:44 UTC
Verified.
Before update:
[heat-admin@controller-0 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.3-5.el8_2.4.x86_64
[heat-admin@controller-1 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.3-5.el8_2.4.x86_64
[heat-admin@controller-2 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.3-5.el8_2.4.x86_64

After update:
[heat-admin@controller-0 ~]$ rpm -qa |grep ansible-pacemaker
ansible-pacemaker-1.0.4-2.20210527194421.el8ost.2.noarch
[heat-admin@controller-0 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.5-9.el8_4.3.x86_64
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Master controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	 Master controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	 Master controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-2
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Master controller-0
    * redis-bundle-1	(ocf::heartbeat:redis):	 Slave controller-1
    * redis-bundle-2	(ocf::heartbeat:redis):	 Slave controller-2
  * ip-192.168.24.19	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.1.109	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.1.114	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.3.114	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.4.120	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Started controller-1
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Started controller-2
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
  * ip-172.17.1.132	(ocf::heartbeat:IPaddr2):	 Started controller-0

[heat-admin@controller-1 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.5-9.el8_4.3.x86_64
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Master controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	 Master controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	 Master controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-2
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Master controller-0
    * redis-bundle-1	(ocf::heartbeat:redis):	 Slave controller-1
    * redis-bundle-2	(ocf::heartbeat:redis):	 Slave controller-2
  * ip-192.168.24.19	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.1.109	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.1.114	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.3.114	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.4.120	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Started controller-1
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Started controller-2
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
  * ip-172.17.1.132	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle: openstack-cinder-backup [cluster.common.tag/rhosp16-openstack-cinder-backup:pcmklatest]:
    * openstack-cinder-backup-podman-0	(ocf::heartbeat:podman):	 Started controller-1
  * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest]:
    * openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	 Started controller-0


[heat-admin@controller-2 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.5-9.el8_4.3.x86_64
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Master controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	 Master controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	 Master controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-2
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Master controller-0
    * redis-bundle-1	(ocf::heartbeat:redis):	 Slave controller-1
    * redis-bundle-2	(ocf::heartbeat:redis):	 Slave controller-2
  * ip-192.168.24.19	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.1.109	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.1.114	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.3.114	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.4.120	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Started controller-1
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Started controller-2
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
  * ip-172.17.1.132	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle: openstack-cinder-backup [cluster.common.tag/rhosp16-openstack-cinder-backup:pcmklatest]:
    * openstack-cinder-backup-podman-0	(ocf::heartbeat:podman):	 Started controller-1
  * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest]:
    * openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	 Started controller-0

Comment 20 errata-xmlrpc 2021-12-09 20:41:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.1 (Train)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:5067


Note You need to log in before you can comment on or make changes to this bug.