Description of problem: Update from GA to RHOS_TRUNK-15.0-RHEL-8-20191017.n.0 using default namespace and prefix doesn't work. In particular all HA services are not updated, breaking in one way or another depending on how the pcmklatest system is implemented for the service (ovn for instance is different than mariadb). Note that update doesn't "fail" during controller update even if the control plane is broken, it fails during compute update: Error running ['podman', 'run', '--name', 'nova_wait_for_compute_service', '--label', 'config_id=tripleo_step4', '--label', 'container_name=nova_wait_for_compute_service', '--label', 'managed_by=paunch', '--label', 'config_data={\"command\": \"/container-config-scripts/pyshim.sh /container-config-scripts/nova_wait_for_compute_service.py\ 2019-10-23 18:20:44 | " File \"/container-config-scripts/nova_wait_for_compute_service.py\", line 102, in <module>", 2019-10-23 18:20:44 | " service_list = nova.services.list(binary='nova-compute')", 2019-10-23 18:20:44 | " File \"/usr/lib/python3.6/site-packages/novaclient/v2/services.py\", line 52, in list", 2019-10-23 18:20:44 | " return self._list(url, \"services\")", 2019-10-23 18:20:44 | " File \"/usr/lib/python3.6/site-packages/novaclient/base.py\", line 254, in _list", 2019-10-23 18:20:44 | " resp, body = self.api.client.get(url)", 2019-10-23 18:20:44 | " File \"/usr/lib/python3.6/site-packages/novaclient/client.py\", line 72, in request", 2019-10-23 18:20:44 | " File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", line 208, in get_auth_ref", 2019-10-23 18:20:44 | " return self._plugin.get_auth_ref(session, **kwargs)", 2019-10-23 18:20:44 | "keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://172.17.1.108:5000/v3/auth/tokens: HTTPConnectionPool(host='172.17.1.108', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc61d06fc88>: Failed to establish a new connection: [Errno 113] No route to host',))", 2019-10-23 18:20:44 | "urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fc61d09ac18>: Failed to establish a new connection: [Errno 113] No route to host", 2019-10-23 18:20:44 | "urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='172.17.1.108', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc61d09ac18>: Failed to establish a new connection: [Errno 113] No route to host',))", As the ovndb are down. On the controller this is the status of the cluster: Cluster name: tripleo_cluster Stack: corosync Current DC: controller-0 (version 2.0.1-4.el8_0.4-0eb7991564) - partition with quorum Last updated: Thu Oct 24 06:49:35 2019 Last change: Wed Oct 23 21:49:07 2019 by root via crm_resource on controller-2 15 nodes configured 46 resources configured Online: [ controller-0 controller-1 controller-2 ] GuestOnline: [ galera-bundle-0@controller-1 galera-bundle-1@controller-2 galera-bundle-2@controller-0 rabbitmq-bundle-0@controller-1 rabbitmq-bundle-1@controller-0 rabbitmq-bundle-2@controller-2 redis-bundle-0@con troller-1 redis-bundle-1@controller-0 redis-bundle-2@controller-2 ] Full list of resources: podman container set: galera-bundle [192.168.24.1:8787/rhosp15/openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master controller-1 galera-bundle-1 (ocf::heartbeat:galera): Master controller-2 galera-bundle-2 (ocf::heartbeat:galera): Master controller-0 podman container set: rabbitmq-bundle [192.168.24.1:8787/rhosp15/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 podman container set: redis-bundle [192.168.24.1:8787/rhosp15/openstack-redis:pcmklatest] rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 podman container set: redis-bundle [192.168.24.1:8787/rhosp15/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Slave controller-1 redis-bundle-1 (ocf::heartbeat:redis): Master controller-0 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2 ip-192.168.24.15 (ocf::heartbeat:IPaddr2): Started controller-0 ip-10.0.0.110 (ocf::heartbeat:IPaddr2): Started controller-0 ip-172.17.1.72 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.1.108 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.3.110 (ocf::heartbeat:IPaddr2): Started controller-0 ip-172.17.4.102 (ocf::heartbeat:IPaddr2): Started controller-0 podman container set: haproxy-bundle [192.168.24.1:8787/rhosp15/openstack-haproxy:pcmklatest] haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started controller-2 haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started controller-0 haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started controller-1 podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Stopped ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Stopped podman container: openstack-cinder-volume [192.168.24.1:8787/rhosp15/openstack-cinder-volume:pcmklatest] openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started controller-2 Failed Resource Actions: * ovn-dbs-bundle-podman-0_start_0 on controller-2 'unknown error' (1): call=100, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest', last-rc-change='Wed Oct 23 21:31:29 2019', queued=0ms, exec=616ms * ovn-dbs-bundle-podman-1_start_0 on controller-2 'unknown error' (1): call=112, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest', last-rc-change='Wed Oct 23 21:31:32 2019', queued=0ms, exec=290ms * ovn-dbs-bundle-podman-2_start_0 on controller-2 'unknown error' (1): call=115, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest', last-rc-change='Wed Oct 23 21:31:35 2019', queued=0ms, exec=263ms * openstack-cinder-volume-podman-0_start_0 on controller-0 'unknown error' (1): call=156, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-cinder-volume:pcmklatest', last-rc-change='Wed Oct 23 21:28:32 2019', queued=0ms, exec=292ms * ovn-dbs-bundle-podman-1_start_0 on controller-0 'unknown error' (1): call=142, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest', last-rc-change='Wed Oct 23 20:54:11 2019', queued=0ms, exec=303ms * ovn-dbs-bundle-podman-2_start_0 on controller-0 'unknown error' (1): call=158, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest', last-rc-change='Wed Oct 23 21:28:37 2019', queued=0ms, exec=283ms * ovn-dbs-bundle-podman-0_start_0 on controller-0 'unknown error' (1): call=152, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest', last-rc-change='Wed Oct 23 21:06:29 2019', queued=0ms, exec=347ms * openstack-cinder-volume-podman-0_start_0 on controller-1 'unknown error' (1): call=132, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-cinder-volume:pcmklatest', last-rc-change='Wed Oct 23 21:28:30 2019', queued=0ms, exec=294ms * ovn-dbs-bundle-podman-2_start_0 on controller-1 'unknown error' (1): call=134, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest', last-rc-change='Wed Oct 23 21:28:34 2019', queued=0ms, exec=276ms * ovn-dbs-bundle-podman-1_start_0 on controller-1 'unknown error' (1): call=111, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest', last-rc-change='Wed Oct 23 21:09:25 2019', queued=0ms, exec=287ms * ovn-dbs-bundle-podman-0_start_0 on controller-1 'unknown error' (1): call=101, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest', last-rc-change='Wed Oct 23 21:09:22 2019', queued=0ms, exec=354ms and here the images related to the cluster: [heat-admin@controller-0 ~]$ sudo podman images | grep pcmklatest 192.168.24.1:8787/rh-osbs/rhosp15-openstack-cinder-volume pcmklatest 74e6fe302909 39 hours ago 1.22 GB 192.168.24.1:8787/rh-osbs/rhosp15-openstack-ovn-northd pcmklatest d0e090f75aa9 39 hours ago 720 MB 192.168.24.1:8787/rh-osbs/rhosp15-openstack-redis pcmklatest 2d20cc6fa3aa 39 hours ago 550 MB 192.168.24.1:8787/rh-osbs/rhosp15-openstack-rabbitmq pcmklatest 66e9ddfe41bf 39 hours ago 590 MB 192.168.24.1:8787/rh-osbs/rhosp15-openstack-haproxy pcmklatest ac940eaa469d 39 hours ago 548 MB 192.168.24.1:8787/rh-osbs/rhosp15-openstack-mariadb pcmklatest 40210064f9e0 39 hours ago 763 MB 192.168.24.1:8787/rhosp15/openstack-redis pcmklatest cb55f02698e9 5 weeks ago 502 MB 192.168.24.1:8787/rhosp15/openstack-haproxy pcmklatest c5826c9e9bed 5 weeks ago 500 MB 192.168.24.1:8787/rhosp15/openstack-rabbitmq pcmklatest df24602a69cc 5 weeks ago 543 MB 192.168.24.1:8787/rhosp15/openstack-mariadb pcmklatest 5a9441eaa9e4 5 weeks ago 706 MB Quick analysis: When updating from GA to RHOS_TRUNK-15.0-RHEL-8-20191017.n.0 the prefix and path part of the namespace change. For 1017: --- container-image-prepare: namespace: registry-proxy.engineering.redhat.com/rh-osbs prefix: rhosp15-openstack- tag: 20191014.2 puddle: rhosp: 15.0 id: RHOS_TRUNK-15.0-RHEL-8-20191011.n.0 For GA: container-image-prepare: namespace: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15 prefix: openstack- tag: 20190926.1 puddle: rhosp: 15.0 id: RHOS_TRUNK-15.0-RHEL-8-20190926.n.0 This lead to different image name: - 1017: rh-osbs/rhosp15-openstack-mariadb - GA: rhosp15/openstack-mariadb How reproducible: Always as soon as you change the path part of the namespace and/or the prefix of the images Steps to Reproduce: 1. install osp15 with a namespace/prefix leading to image name rhosp15/openstack-<image-name> 2. update container-prepare.yaml with namespace/prefix leading to image rh-osbs/rhosp15-openstack-<image-name> 3. Run the update Actual results: It breaks. Additional information. I think this kind of namespace/prefix change would break all update back to at least osp13 (since the pcmklatest mechanism exists). But we never had a bug for it, so my guess is that it's not something that happened in real deployment. I put a medium severity because we need first to assess if that will affect only Test environment or if that will be something facing customer as well.
Hi, so, first a need info for rhos-delivery. We would like to know if the namespace/prefix change (related to quay.io) will happen for customer as well. If not, then it's purely a testing issue. But, as this problem won't have an easy solution what would be our best choice in testing update without namespace change. I can see two solutions there: - either we can deploy GA using quay.io style image's name: - either we can have phase1 image's name compatible with GA; Something else ? Thanks,
This will happen in the Field if a customer deploys from latest and then later tries to shift to satellite so they can better control what is exposed when to their production environment.
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days