Bug 2003548

Summary: 8.2 -> 8.4 update has the first updated node stuck on "current num_updates is greater than the replacement" messages
Product: Red Hat OpenStack Reporter: Michele Baldessari <michele>
Component: ansible-pacemakerAssignee: mathieu bultel <mbultel>
Status: CLOSED ERRATA QA Contact: Arik Chernetsky <achernet>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: cluster-maint, dabarzil, lmiccini, mkrcmari, sathlang, spower, tvignaud
Target Milestone: z1Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ansible-pacemaker-1.0.4-2.20210527194421.el8ost.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-09 20:41:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Ken Gaillot 2021-09-13 14:41:36 UTC
What appears to be happening is that some configuration change is made on controller-0 after being upgraded, so it thinks its configuration is the most current one.

I could swear this has come up before, but I can't find an existing bz for it.

A fix isn't really possible on the pacemaker side -- if the configuration diverges on different nodes, pacemaker has to pick one as the "correct" one, and the only information it has about it is the count of changes that have been made.

Workarounds would be either (1) avoid changing the configuration on the updated node until after it rejoins the cluster, or (2) bump the CIB admin epoch on the rest of the cluster after a node is upgraded, so it always wins (no pcs interface currently):

    cibadmin --modify --xml-text '<cib admin_epoch="admin_epoch++"/>'

Comment 2 Michele Baldessari 2021-09-13 14:43:54 UTC
Thanks Ken, perfect. 

Let me check a couple of things and I'll get back to you, I think I remember our previous discussion on this. I just need to find the right bits on this.

Leaving needinfo on me

Comment 11 dabarzil 2021-10-27 07:45:44 UTC
Verified.
Before update:
[heat-admin@controller-0 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.3-5.el8_2.4.x86_64
[heat-admin@controller-1 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.3-5.el8_2.4.x86_64
[heat-admin@controller-2 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.3-5.el8_2.4.x86_64

After update:
[heat-admin@controller-0 ~]$ rpm -qa |grep ansible-pacemaker
ansible-pacemaker-1.0.4-2.20210527194421.el8ost.2.noarch
[heat-admin@controller-0 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.5-9.el8_4.3.x86_64
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Master controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	 Master controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	 Master controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-2
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Master controller-0
    * redis-bundle-1	(ocf::heartbeat:redis):	 Slave controller-1
    * redis-bundle-2	(ocf::heartbeat:redis):	 Slave controller-2
  * ip-192.168.24.19	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.1.109	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.1.114	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.3.114	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.4.120	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Started controller-1
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Started controller-2
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
  * ip-172.17.1.132	(ocf::heartbeat:IPaddr2):	 Started controller-0

[heat-admin@controller-1 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.5-9.el8_4.3.x86_64
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Master controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	 Master controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	 Master controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-2
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Master controller-0
    * redis-bundle-1	(ocf::heartbeat:redis):	 Slave controller-1
    * redis-bundle-2	(ocf::heartbeat:redis):	 Slave controller-2
  * ip-192.168.24.19	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.1.109	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.1.114	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.3.114	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.4.120	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Started controller-1
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Started controller-2
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
  * ip-172.17.1.132	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle: openstack-cinder-backup [cluster.common.tag/rhosp16-openstack-cinder-backup:pcmklatest]:
    * openstack-cinder-backup-podman-0	(ocf::heartbeat:podman):	 Started controller-1
  * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest]:
    * openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	 Started controller-0


[heat-admin@controller-2 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.5-9.el8_4.3.x86_64
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Master controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	 Master controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	 Master controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-2
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Master controller-0
    * redis-bundle-1	(ocf::heartbeat:redis):	 Slave controller-1
    * redis-bundle-2	(ocf::heartbeat:redis):	 Slave controller-2
  * ip-192.168.24.19	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.1.109	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.1.114	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.3.114	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.4.120	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Started controller-1
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Started controller-2
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
  * ip-172.17.1.132	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle: openstack-cinder-backup [cluster.common.tag/rhosp16-openstack-cinder-backup:pcmklatest]:
    * openstack-cinder-backup-podman-0	(ocf::heartbeat:podman):	 Started controller-1
  * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest]:
    * openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	 Started controller-0

Comment 20 errata-xmlrpc 2021-12-09 20:41:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.1 (Train)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:5067