Description of problem: Since [1,2], HA containers are now configured to use a new image name scheme that acts as an intermediate tag which enables changing container image name during a minor update without service disruption. When deploying an HA overcloud with podman, special image name/tags are created with the following high-level command: # podman tag undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-rabbitmq:20191213.1 cluster-common-tag/rhosp16-openstack-rabbitmq-volume:pcmklatest Unfortunately, unlike docker, podman prepends 'localhost/' in front of the new tag: # podman images | grep cluster localhost/cluster-common-tag/rhosp16-openstack-rabbitmq-volume pcmklatest 10bb0d557540 3 weeks ago 596 MB Now The problem is that in pacemaker, the podman resource agent uses regular expressions to check whether image tag 'cluster-common-tag/rhosp16-openstack-rabbitmq-volume:pcmklatest' exists in the container storage, and it cannot find it. So it refuses to start HA containers and the entire stack deployment fails. [1] Id369154d147cd5cf0a6f997bf806084fc7580e01 [2] I7a63e8e2d9457c5025f3d70aeed6922e24958049 Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-11.3.2-0.20200106152225.bdc5508.el8ost.noarch How reproducible: Always Steps to Reproduce: 1. deploy a default 3-node HA overcloud Actual results: overcloud deployment failed due to HA container not started Expected results: overcloud deployment succeeds
*** Bug 1789063 has been marked as a duplicate of this bug. ***
Fix verified, #before the fix: [stack@undercloud-0 ~]$ cat core_puddle_version RHOS_TRUNK-16.0-RHEL-8-20200107.n.5[stack@undercloud-0 ~]$ [stack@undercloud-0 ~]$ ./rpm_compare openstack-tripleo-heat-templates-11.3.2-0.20200106152225.bdc5508.el8ost.noarch package tested: openstack-tripleo-heat-templates-11.3.2-0.20200106152225.bdc5508.el8ost.noarch package installed : openstack-tripleo-heat-templates-11.3.2-0.20200106152225.bdc5508.el8ost.noarch PASS, package_git tested version is equal or older than the one installed #overcloud deployment fails with : ________________________________________stderr________________________________________ fatal: [controller-2]: FAILED! => {"ansible_job_id": "924947306306.30792", "attempts": 58, "changed": true, "cmd": "python3 /var/lib/container-puppet/container-puppet.py", "delta": "0:03:30.804595", "end": "2020-01-08 19:28:39.528661", "finished": 1, "msg": "non-zero return code", "rc": 1, "start": "2020-01-08 19:25:08.724066", "stderr": "", "stderr_lines": [], "stdout": "2020-01-08 19:25:09,143 INFO: 30798 -- Running container-puppet [root@controller-0 ~]# podman images|grep localhost localhost/cluster-common-tag/rhosp16-openstack-cinder-volume pcmklatest b559c504d389 31 hours ago 1.25 GB localhost/cluster-common-tag/rhosp16-openstack-ovn-northd pcmklatest 34c7d5d0ded5 31 hours ago 748 MB localhost/cluster-common-tag/rhosp16-openstack-redis pcmklatest b055169ab06a 31 hours ago 576 MB localhost/cluster-common-tag/rhosp16-openstack-haproxy pcmklatest dd712903a122 32 hours ago 574 MB localhost/cluster-common-tag/rhosp16-openstack-rabbitmq pcmklatest d462d3466fc2 32 hours ago 618 MB localhost/cluster-common-tag/rhosp16-openstack-mariadb pcmklatest 93f4b763229e 32 hours ago 789 MB #apply fix : [stack@undercloud-0 ~]$ cd /usr/share/openstack-tripleo-heat-templates [stack@undercloud-0 openstack-tripleo-heat-templates]$ find /home/stack -name "*f86d99e.patch*"|xargs sudo git apply -v --reject --ignore-space-change --ignore-whitespace Checking patch deployment/cinder/cinder-backup-pacemaker-puppet.yaml... [..] Applied patch releasenotes/notes/pacemaker-cluster-common-tag-podman-f9a71344af5c73d6.yaml cleanly. #patch check [stack@undercloud-0 openstack-tripleo-heat-templates]$ grep 'expression: concat("cluster' deployment/haproxy/haproxy-pacemaker-puppet.yaml expression: concat("cluster.common.tag/", $.data.rightSplit(separator => "/", maxSplits => 1)[1]) #retry deploymnent and yay:) Ansible passed. Overcloud configuration completed. [..] Overcloud Deployed #some checks on controller-0 [root@controller-0 ~]# podman images|grep localhost||echo 'no localhost string in contaniers' no localhost string in contaniers [root@controller-0 ~]# podman images|grep cluster cluster.common.tag/rhosp16-openstack-cinder-volume pcmklatest b559c504d389 32 hours ago 1.25 GB cluster.common.tag/rhosp16-openstack-ovn-northd pcmklatest 34c7d5d0ded5 32 hours ago 748 MB cluster.common.tag/rhosp16-openstack-redis pcmklatest b055169ab06a 33 hours ago 576 MB cluster.common.tag/rhosp16-openstack-haproxy pcmklatest dd712903a122 33 hours ago 574 MB cluster.common.tag/rhosp16-openstack-rabbitmq pcmklatest d462d3466fc2 33 hours ago 618 MB cluster.common.tag/rhosp16-openstack-mariadb [root@controller-0 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-0 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum Last updated: Wed Jan 8 23:06:33 2020 Last change: Wed Jan 8 22:54:42 2020 by root via cibadmin on controller-0 15 nodes configured 50 resources configured Online: [ controller-0 controller-1 controller-2 ] GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 ovn-dbs-bundle-0@controller-0 ovn-dbs-bundle-1@controller-1 ovn-dbs-bundle-2@controller-2 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ] Full list of resources: Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master controller-0 galera-bundle-1 (ocf::heartbeat:galera): Master controller-1 galera-bundle-2 (ocf::heartbeat:galera): Master controller-2 Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Master controller-0 redis-bundle-1 (ocf::heartbeat:redis): Slave controller-1 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2 ip-192.168.24.101 (ocf::heartbeat:IPaddr2): Started controller-0 ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.1.102 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.1.101 (ocf::heartbeat:IPaddr2): Started controller-0 ip-172.17.3.101 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.4.101 (ocf::heartbeat:IPaddr2): Started controller-2 Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest] haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started controller-0 haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started controller-1 haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started controller-2 Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 ip-172.17.1.98 (ocf::heartbeat:IPaddr2): Started controller-0 stonith-fence_ipmilan-525400d1e8ad (stonith:fence_ipmilan): Started controller-1 stonith-fence_ipmilan-525400544a70 (stonith:fence_ipmilan): Started controller-2 stonith-fence_ipmilan-5254003e688c (stonith:fence_ipmilan): Started controller-1 Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest] openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started controller-0 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283