Bug 1788633 - [OSP16] HA: new container naming scheme "cluster-common-tag" make deployment fail
Summary: [OSP16] HA: new container naming scheme "cluster-common-tag" make deployment ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 16.0 (Train on RHEL 8.1)
Assignee: Damien Ciabrini
QA Contact: pkomarov
URL:
Whiteboard:
: 1789063 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-07 16:38 UTC by Damien Ciabrini
Modified: 2020-02-06 14:44 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200108020432.9a9d0ed.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-06 14:44:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1858648 0 None None None 2020-01-07 16:39:34 UTC
OpenStack gerrit 701477 0 None MERGED HA: Fix the cluster common tag behaviour with podman 2021-02-08 16:55:02 UTC
Red Hat Product Errata RHEA-2020:0283 0 None None None 2020-02-06 14:44:31 UTC

Description Damien Ciabrini 2020-01-07 16:38:39 UTC
Description of problem:
Since [1,2], HA containers are now configured to use a new image name scheme that acts as an intermediate tag which enables changing container image name during a minor update without service disruption.

When deploying an HA overcloud with podman, special image name/tags are created with the following high-level command:

# podman tag undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-rabbitmq:20191213.1 cluster-common-tag/rhosp16-openstack-rabbitmq-volume:pcmklatest

Unfortunately, unlike docker, podman prepends 'localhost/' in front of the new tag:

# podman images | grep cluster
localhost/cluster-common-tag/rhosp16-openstack-rabbitmq-volume pcmklatest 10bb0d557540 3 weeks ago 596 MB

Now The problem is that in pacemaker, the podman resource agent uses regular expressions to check whether image tag 'cluster-common-tag/rhosp16-openstack-rabbitmq-volume:pcmklatest' exists in the container storage, and it cannot find it. So it refuses to start HA containers and the entire stack deployment fails.

[1] Id369154d147cd5cf0a6f997bf806084fc7580e01
[2] I7a63e8e2d9457c5025f3d70aeed6922e24958049


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-11.3.2-0.20200106152225.bdc5508.el8ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. deploy a default 3-node HA overcloud

Actual results:
overcloud deployment failed due to HA container not started

Expected results:
overcloud deployment succeeds

Comment 4 Michele Baldessari 2020-01-08 16:28:22 UTC
*** Bug 1789063 has been marked as a duplicate of this bug. ***

Comment 5 pkomarov 2020-01-08 23:10:28 UTC
Fix verified, 

#before the fix:
 [stack@undercloud-0 ~]$ cat core_puddle_version 
RHOS_TRUNK-16.0-RHEL-8-20200107.n.5[stack@undercloud-0 ~]$ 

[stack@undercloud-0 ~]$ ./rpm_compare openstack-tripleo-heat-templates-11.3.2-0.20200106152225.bdc5508.el8ost.noarch
package tested: openstack-tripleo-heat-templates-11.3.2-0.20200106152225.bdc5508.el8ost.noarch
package installed : openstack-tripleo-heat-templates-11.3.2-0.20200106152225.bdc5508.el8ost.noarch

PASS, package_git tested version is equal or older than the one installed

#overcloud deployment fails with : 

________________________________________stderr________________________________________

fatal: [controller-2]: FAILED! => {"ansible_job_id": "924947306306.30792", "attempts": 58, "changed": true, "cmd": "python3 /var/lib/container-puppet/container-puppet.py", "delta": "0:03:30.804595", "end": "2020-01-08 19:28:39.528661", "finished": 1, "msg": "non-zero return code", "rc": 1, "start": "2020-01-08 19:25:08.724066", "stderr": "", "stderr_lines": [], "stdout": "2020-01-08 19:25:09,143 INFO: 30798 -- Running container-puppet


[root@controller-0 ~]# podman images|grep localhost
localhost/cluster-common-tag/rhosp16-openstack-cinder-volume                           pcmklatest   b559c504d389   31 hours ago   1.25 GB
localhost/cluster-common-tag/rhosp16-openstack-ovn-northd                              pcmklatest   34c7d5d0ded5   31 hours ago   748 MB
localhost/cluster-common-tag/rhosp16-openstack-redis                                   pcmklatest   b055169ab06a   31 hours ago   576 MB
localhost/cluster-common-tag/rhosp16-openstack-haproxy                                 pcmklatest   dd712903a122   32 hours ago   574 MB
localhost/cluster-common-tag/rhosp16-openstack-rabbitmq                                pcmklatest   d462d3466fc2   32 hours ago   618 MB
localhost/cluster-common-tag/rhosp16-openstack-mariadb                                 pcmklatest   93f4b763229e   32 hours ago   789 MB


#apply fix : 
[stack@undercloud-0 ~]$ cd /usr/share/openstack-tripleo-heat-templates
[stack@undercloud-0 openstack-tripleo-heat-templates]$ find /home/stack -name "*f86d99e.patch*"|xargs sudo git apply -v --reject --ignore-space-change --ignore-whitespace
Checking patch deployment/cinder/cinder-backup-pacemaker-puppet.yaml...
[..]
Applied patch releasenotes/notes/pacemaker-cluster-common-tag-podman-f9a71344af5c73d6.yaml cleanly.

#patch check
[stack@undercloud-0 openstack-tripleo-heat-templates]$ grep 'expression: concat("cluster'  deployment/haproxy/haproxy-pacemaker-puppet.yaml
                        expression: concat("cluster.common.tag/", $.data.rightSplit(separator => "/", maxSplits => 1)[1])


#retry deploymnent and yay:)

Ansible passed.
Overcloud configuration completed.
[..]
Overcloud Deployed

#some checks on controller-0
[root@controller-0 ~]#  podman images|grep localhost||echo 'no localhost string in contaniers'
no localhost string in contaniers

[root@controller-0 ~]# podman images|grep cluster
cluster.common.tag/rhosp16-openstack-cinder-volume                                     pcmklatest   b559c504d389   32 hours ago   1.25 GB
cluster.common.tag/rhosp16-openstack-ovn-northd                                        pcmklatest   34c7d5d0ded5   32 hours ago   748 MB
cluster.common.tag/rhosp16-openstack-redis                                             pcmklatest   b055169ab06a   33 hours ago   576 MB
cluster.common.tag/rhosp16-openstack-haproxy                                           pcmklatest   dd712903a122   33 hours ago   574 MB
cluster.common.tag/rhosp16-openstack-rabbitmq                                          pcmklatest   d462d3466fc2   33 hours ago   618 MB
cluster.common.tag/rhosp16-openstack-mariadb  


[root@controller-0 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-0 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Wed Jan  8 23:06:33 2020
Last change: Wed Jan  8 22:54:42 2020 by root via cibadmin on controller-0

15 nodes configured
50 resources configured

Online: [ controller-0 controller-1 controller-2 ]
GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 ovn-dbs-bundle-0@controller-0 ovn-dbs-bundle-1@controller-1 ovn-dbs-bundle-2@controller-2 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ]

Full list of resources:

 Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]
   galera-bundle-0	(ocf::heartbeat:galera):	Master controller-0
   galera-bundle-1	(ocf::heartbeat:galera):	Master controller-1
   galera-bundle-2	(ocf::heartbeat:galera):	Master controller-2
 Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]
   rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	Started controller-0
   rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	Started controller-1
   rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	Started controller-2
 Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]
   redis-bundle-0	(ocf::heartbeat:redis):	Master controller-0
   redis-bundle-1	(ocf::heartbeat:redis):	Slave controller-1
   redis-bundle-2	(ocf::heartbeat:redis):	Slave controller-2
 ip-192.168.24.101	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.102	(ocf::heartbeat:IPaddr2):	Started controller-2
 ip-172.17.1.101	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.3.101	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.4.101	(ocf::heartbeat:IPaddr2):	Started controller-2
 Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]
   haproxy-bundle-podman-0	(ocf::heartbeat:podman):	Started controller-0
   haproxy-bundle-podman-1	(ocf::heartbeat:podman):	Started controller-1
   haproxy-bundle-podman-2	(ocf::heartbeat:podman):	Started controller-2
 Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]
   ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	Master controller-0
   ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	Slave controller-1
   ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	Slave controller-2
 ip-172.17.1.98	(ocf::heartbeat:IPaddr2):	Started controller-0
 stonith-fence_ipmilan-525400d1e8ad	(stonith:fence_ipmilan):	Started controller-1
 stonith-fence_ipmilan-525400544a70	(stonith:fence_ipmilan):	Started controller-2
 stonith-fence_ipmilan-5254003e688c	(stonith:fence_ipmilan):	Started controller-1
 Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest]
   openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 13 errata-xmlrpc 2020-02-06 14:44:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283


Note You need to log in before you can comment on or make changes to this bug.