Bug 2003548
| Summary: | 8.2 -> 8.4 update has the first updated node stuck on "current num_updates is greater than the replacement" messages | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Michele Baldessari <michele> | 
| Component: | ansible-pacemaker | Assignee: | mathieu bultel <mbultel> | 
| Status: | CLOSED ERRATA | QA Contact: | Arik Chernetsky <achernet> | 
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.2 (Train) | CC: | cluster-maint, dabarzil, lmiccini, mkrcmari, sathlang, spower, tvignaud | 
| Target Milestone: | z1 | Keywords: | Triaged | 
| Target Release: | 16.2 (Train on RHEL 8.4) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | ansible-pacemaker-1.0.4-2.20210527194421.el8ost.2 | Doc Type: | If docs needed, set a value | 
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-09 20:41:22 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Thanks Ken, perfect. Let me check a couple of things and I'll get back to you, I think I remember our previous discussion on this. I just need to find the right bits on this. Leaving needinfo on me Verified.
Before update:
[heat-admin@controller-0 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.3-5.el8_2.4.x86_64
[heat-admin@controller-1 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.3-5.el8_2.4.x86_64
[heat-admin@controller-2 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.3-5.el8_2.4.x86_64
After update:
[heat-admin@controller-0 ~]$ rpm -qa |grep ansible-pacemaker
ansible-pacemaker-1.0.4-2.20210527194421.el8ost.2.noarch
[heat-admin@controller-0 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.5-9.el8_4.3.x86_64
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Master controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	 Master controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	 Master controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-2
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Master controller-0
    * redis-bundle-1	(ocf::heartbeat:redis):	 Slave controller-1
    * redis-bundle-2	(ocf::heartbeat:redis):	 Slave controller-2
  * ip-192.168.24.19	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.1.109	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.1.114	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.3.114	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.4.120	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Started controller-1
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Started controller-2
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
  * ip-172.17.1.132	(ocf::heartbeat:IPaddr2):	 Started controller-0
[heat-admin@controller-1 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.5-9.el8_4.3.x86_64
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Master controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	 Master controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	 Master controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-2
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Master controller-0
    * redis-bundle-1	(ocf::heartbeat:redis):	 Slave controller-1
    * redis-bundle-2	(ocf::heartbeat:redis):	 Slave controller-2
  * ip-192.168.24.19	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.1.109	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.1.114	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.3.114	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.4.120	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Started controller-1
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Started controller-2
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
  * ip-172.17.1.132	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle: openstack-cinder-backup [cluster.common.tag/rhosp16-openstack-cinder-backup:pcmklatest]:
    * openstack-cinder-backup-podman-0	(ocf::heartbeat:podman):	 Started controller-1
  * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest]:
    * openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	 Started controller-0
[heat-admin@controller-2 ~]$ rpm -qa |grep pacemaker-2
pacemaker-2.0.5-9.el8_4.3.x86_64
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Master controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	 Master controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	 Master controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Started controller-2
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Master controller-0
    * redis-bundle-1	(ocf::heartbeat:redis):	 Slave controller-1
    * redis-bundle-2	(ocf::heartbeat:redis):	 Slave controller-2
  * ip-192.168.24.19	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.1.109	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.1.114	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * ip-172.17.3.114	(ocf::heartbeat:IPaddr2):	 Started controller-1
  * ip-172.17.4.120	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Started controller-1
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Started controller-2
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
  * ip-172.17.1.132	(ocf::heartbeat:IPaddr2):	 Started controller-0
  * Container bundle: openstack-cinder-backup [cluster.common.tag/rhosp16-openstack-cinder-backup:pcmklatest]:
    * openstack-cinder-backup-podman-0	(ocf::heartbeat:podman):	 Started controller-1
  * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest]:
    * openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	 Started controller-0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.1 (Train)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5067 | 
What appears to be happening is that some configuration change is made on controller-0 after being upgraded, so it thinks its configuration is the most current one. I could swear this has come up before, but I can't find an existing bz for it. A fix isn't really possible on the pacemaker side -- if the configuration diverges on different nodes, pacemaker has to pick one as the "correct" one, and the only information it has about it is the count of changes that have been made. Workarounds would be either (1) avoid changing the configuration on the updated node until after it rejoins the cluster, or (2) bump the CIB admin epoch on the rest of the cluster after a node is upgraded, so it always wins (no pcs interface currently): cibadmin --modify --xml-text '<cib admin_epoch="admin_epoch++"/>'