Description of problem: During upgrade from OSP 15 to OSP 16 overcloud upgrade is failing when trying to reach images on undercloud. We detected that /etc/hosts does not contain references to undercloud. Version-Release number of selected component (if applicable): How reproducible: Every run of openstack overcloud upgrade run --limit "controller-0" Steps to Reproduce: 1. Upgrade undercloud to OSP 16 2. Prepare overcloud upgrade 3. Execute openstack overcloud upgrade run --limit "controller-0" Actual results: TASK [Pull latest cinder_volume images] **************************************** Thursday 23 January 2020 13:05:32 -0500 (0:00:00.097) 0:03:18.890 ****** fatal: [controller-0]: FAILED! => {"changed": true, "cmd": ["podman", "pull", "undercloud-0.ctlplane.redhat.local:8787/tripleomaster/centos-binary-cinder-volume:current-tripleo"], "delta": "0:00:10.083227", "end": "2020-01-23 18:05:42.726092", "msg": "non-zero return code", "rc": 125, "start": "2020-01-23 18:05:32.642865", "stderr": "Trying to pull undercloud-0.ctlplane.redhat.local:8787/tripleomaster/centos-binary-cinder-volume:current-tripleo...\n Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host\nError: error pulling image \"undercloud-0.ctlplane.redhat.local:8787/tripleomaster/centos-binary-cinder-volume:current-tripleo\": unable to pull undercloud-0.ctlplane.redhat.local:8787/tripleomaster/centos-binary-cinder-volume:current-tripleo: unable to pull image: Error initializing source docker://undercloud-0.ctlplane.redhat.local:8787/tripleomaster/centos-binary-cinder-volume:current-tripleo: error pinging docker registry undercloud-0.ctlplane.redhat.local:8787: Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host", "stderr_lines": ["Trying to pull undercloud-0.ctlplane.redhat.local:8787/tripleomaster/centos-binary-cinder-volume:current-tripleo...", " Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host", "Error: error pulling image \"undercloud-0.ctlplane.redhat.local:8787/tripleomaster/centos-binary-cinder-volume:current-tripleo\": unable to pull undercloud-0.ctlplane.redhat.local:8787/tripleomaster/centos-binary-cinder-volume:current-tripleo: unable to pull image: Error initializing source docker://undercloud-0.ctlplane.redhat.local:8787/tripleomaster/centos-binary-cinder-volume:current-tripleo: error pinging docker registry undercloud-0.ctlplane.redhat.local:8787: Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host"], "stdout": "", "stdout_lines": Expected results: Overcloud controllers upgrades successful. Additional info:
The cause of this was the switch from an ip address to a hostname for the undercloud registry when push_destination: true is set. There are two things that are not occuring on an upgrade. 1) the hostname entry is not being properly populated on the overcloud. 2) the insecure registry address does not appear to be correctly updated to reflect the hostname. The contents of /etc/containers/registries.conf still references 192.168.24.1:8787 instead of undercloud-0.redhat.local:8787. There are two things that can be done to work around this: 1) Add an undercloud hostname entry on the /etc/hosts for the overcloud nodes. Eample: 192.168.24.1 undercloud-0.redhat.local undercloud-0 192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external 192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane 2) Include the undercloud hostname entry in the DockerInsecureRegistryAddress variable. parameter_defaults: DockerInsecureRegistryAddress: - undercloud-0.redhat.local
Additionally we're issuing manual podman pulls in the upgrade tasks themselves which is likely not getting the updated insecure registry configuration prior to them being executed.
So because the upgrade tasks are being run prior to the host prep bits that configure the insecure registry configurations, you'd also have to edit /etc/containers/registries.conf and add an entry in the registries.insecure section. Example: [registries.insecure] registries = ['192.168.24.1:8787', 'undercloud-0.ctlplane.redhat.local:8787'] The previous DockerInsecureRegistryAddress might not be necessary as I'm not sure we're actually getting to that code prior to this error.
Could we backport: https://review.opendev.org/687349 and https://review.opendev.org/687347 together with a partial https://review.opendev.org/687388, just the bit's setting the [registries.insecure] in /etc/containers/registries.conf. Ensuring the host entries are added on the overcloud nodes when updating OSP-15 to latest z-release prior to doing the upgrade?
So, as expected, this bug is legit. We're facing it also in the OSP15 to OSP16 upgrades CI job: 2020-02-14 10:19:30 | fatal: [controller-1]: FAILED! => {"changed": true, "cmd": ["podman", "pull", "undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1"], "delta": "0:00:10.121361", "end": "2020-02-14 10:19:29.522401", "msg": "non-zero return code", "rc": 125, "start": "2020-02-14 10:19:19.401040", "stderr": "Trying to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1...\n Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host\nError: error pulling image \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1\": unable to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: unable to pull image: Error initializing source docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: error pinging docker registry undercloud-0.ctlplane.redhat.local:8787: Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host", "stderr_lines": ["Trying to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1...", " Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host", "Error: error pulling image \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1\": unable to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: unable to pull image: Error initializing source docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: error pinging docker registry undercloud-0.ctlplane.redhat.local:8787: Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host"], "stdout": "", "stdout_lines": []} 2020-02-14 10:19:30 | fatal: [controller-2]: FAILED! => {"changed": true, "cmd": ["podman", "pull", "undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1"], "delta": "0:00:10.124989", "end": "2020-02-14 10:19:29.622836", "msg": "non-zero return code", "rc": 125, "start": "2020-02-14 10:19:19.497847", "stderr": "Trying to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1...\n Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host\nError: error pulling image \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1\": unable to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: unable to pull image: Error initializing source docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: error pinging docker registry undercloud-0.ctlplane.redhat.local:8787: Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host", "stderr_lines": ["Trying to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1...", " Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host", "Error: error pulling image \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1\": unable to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: unable to pull image: Error initializing source docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: error pinging docker registry undercloud-0.ctlplane.redhat.local:8787: Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: dial tcp: lookup undercloud-0.ctlplane.redhat.local: no such host"], "stdout": "", "stdout_lines": []} I'll give a try to the suggestions from Alex and Harald.
So, once I managed to solve the /etc/hosts issue I felt also into what @Alex suggested, registries.conf in the overcloud nodes does not include the undercloud's hostname, so it fails with: 2020-02-17 12:21:55 | TASK [Pull latest cinder_backup images] **************************************** 2020-02-17 12:21:55 | Monday 17 February 2020 12:21:54 +0000 (0:00:00.373) 0:06:24.690 ******* 2020-02-17 12:21:55 | fatal: [controller-0]: FAILED! => {"changed": true, "cmd": ["podman", "pull", "undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1"], "delta": "0:00:00.131346", "end": "2020-02-17 12:21:54.826895", "msg": "non-zero return code", "rc": 125, "start": "2020-02-17 12:21:54.695549", "stderr": "Trying to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1...\n Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: http: server gave HTTP response to HTTPS client\nError: error pulling image \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1\": unable to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: unable to pull image: Error initializing source docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: error pinging docker registry undercloud-0.ctlplane.redhat.local:8787: Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: http: server gave HTTP response to HTTPS client", "stderr_lines": ["Trying to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1...", " Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: http: server gave HTTP response to HTTPS client", "Error: error pulling image \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1\": unable to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: unable to pull image: Error initializing source docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:20200210.1: error pinging docker registry undercloud-0.ctlplane.redhat.local:8787: Get https://undercloud-0.ctlplane.redhat.local:8787/v2/: http: server gave HTTP response to HTTPS client"], "stdout": "", "stdout_lines": []}
Reconfiguring podman in the upgrade tasks passing the value for the container unsecure registries fixed the problem: https://review.opendev.org/#/c/707865/ and the upgrade_tasks have successfully finished. 2020-02-17 13:36:00 | TASK [Make sure the Undercloud hostname is included in /etc/hosts] ************* 2020-02-17 13:36:00 | Monday 17 February 2020 13:35:35 +0000 (0:00:00.359) 0:02:31.292 ******* 2020-02-17 13:36:00 | ok: [controller-2] 2020-02-17 13:36:00 | ok: [controller-0] 2020-02-17 13:36:00 | ok: [controller-1] .... 2020-02-17 13:36:00 | TASK [Set container_registry_insecure_registries fact.] ************************ 2020-02-17 13:36:00 | Monday 17 February 2020 13:35:44 +0000 (0:00:00.372) 0:02:40.180 ******* 2020-02-17 13:36:00 | ok: [controller-0] 2020-02-17 13:36:00 | ok: [controller-1] 2020-02-17 13:36:00 | ok: [controller-2] 2020-02-17 13:36:00 | 2020-02-17 13:36:00 | TASK [include_role : tripleo-podman] ******************************************* 2020-02-17 13:36:00 | Monday 17 February 2020 13:35:45 +0000 (0:00:00.371) 0:02:40.551 ******* 2020-02-17 13:36:00 | 2020-02-17 13:36:00 | TASK [tripleo-podman : ensure podman and deps are installed] ******************* 2020-02-17 13:36:00 | Monday 17 February 2020 13:35:45 +0000 (0:00:00.707) 0:02:41.259 ******* 2020-02-17 13:36:00 | ok: [controller-0] 2020-02-17 13:36:00 | ok: [controller-1] 2020-02-17 13:36:00 | ok: [controller-2] 2020-02-17 13:36:00 | 2020-02-17 13:36:00 | TASK [tripleo-podman : Check for cni0 interface] ******************************* 2020-02-17 13:36:00 | Monday 17 February 2020 13:35:48 +0000 (0:00:02.077) 0:02:43.336 ******* 2020-02-17 13:36:00 | ok: [controller-0] 2020-02-17 13:36:00 | ok: [controller-1] 2020-02-17 13:36:00 | ok: [controller-2] 2020-02-17 13:36:00 | 2020-02-17 13:36:00 | TASK [tripleo-podman : Delete cni0 interface] ********************************** 2020-02-17 13:36:00 | Monday 17 February 2020 13:35:48 +0000 (0:00:00.720) 0:02:44.056 ******* 2020-02-17 13:36:00 | skipping: [controller-0] 2020-02-17 13:36:00 | skipping: [controller-1] 2020-02-17 13:36:00 | skipping: [controller-2] 2020-02-17 13:36:00 | 2020-02-17 13:36:00 | TASK [tripleo-podman : Remove default cni config for cni0 if exists] *********** 2020-02-17 13:36:00 | Monday 17 February 2020 13:35:49 +0000 (0:00:00.360) 0:02:44.417 ******* 2020-02-17 13:36:00 | ok: [controller-0] 2020-02-17 13:36:00 | ok: [controller-1] 2020-02-17 13:36:00 | ok: [controller-2] 2020-02-17 13:36:00 | 2020-02-17 13:36:00 | TASK [tripleo-podman : configure insecure registries /etc/containers/registries.conf] *** 2020-02-17 13:36:00 | Monday 17 February 2020 13:35:50 +0000 (0:00:01.253) 0:02:45.670 ******* 2020-02-17 13:36:00 | [WARNING]: The value ['undercloud-0.ctlplane.redhat.local:8787'] (type list) in 2020-02-17 13:36:00 | a string field was converted to "['undercloud-0.ctlplane.redhat.local:8787']" 2020-02-17 13:36:00 | (type string). If this does not look like what you expect, quote the entire 2020-02-17 13:36:00 | value to ensure it does not change. 2020-02-17 13:36:00 | changed: [controller-2] 2020-02-17 13:36:00 | changed: [controller-1] 2020-02-17 13:36:00 | changed: [controller-0] .... 2020-02-17 13:45:43 | PLAY RECAP ********************************************************************* 2020-02-17 13:45:43 | controller-0 : ok=147 changed=73 unreachable=0 failed=0 skipped=315 rescued=0 ignored=3 2020-02-17 13:45:43 | controller-1 : ok=145 changed=71 unreachable=0 failed=0 skipped=315 rescued=0 ignored=3 2020-02-17 13:45:43 | controller-2 : ok=145 changed=71 unreachable=0 failed=0 skipped=315 rescued=0 ignored=3 2020-02-17 13:45:43 | 2020-02-17 13:45:43 | Monday 17 February 2020 13:45:42 +0000 (0:00:00.279) 0:12:37.382 ******* 2020-02-17 13:45:43 | =============================================================================== 2020-02-17 13:45:43 | 2020-02-17 13:45:43 | Updated nodes - Controller 2020-02-17 13:45:43 | Success
Verified with tht package: (undercloud) [stack@undercloud-0 ~]$ sudo rpm -qa | grep tripleo-heat-templates openstack-tripleo-heat-templates-11.3.2-0.20200428015016.d5442cd.el8ost.noarch 2020-05-05 14:02:00 | TASK [Pull latest cinder_backup images] **************************************** 2020-05-05 14:02:00 | changed: [controller-0] => {"changed": true, "cmd": ["podman", "pull", "undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:16.1_20200428.1"], "delta": "0:00:00.842476", "end": "2020-05-05 18:01:44.199831", "rc": 0, "start": "2020-05-05 18:01:43.357355", "stderr": "Trying to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:16.1_20200428.1...\nGetting image source signatures\nCopying blob sha256:58e1deb9693dfb1704ccce2f1cf0e4d663ac77098a7a0f699708a71549cbd924\nCopying blob sha256:71d7feb766eb9ed5f1f9e8fc494135af60a8820ce98bc67ee8f46c6971a640d8\nCopying blob sha256:318bef23e5ac9db3ca50932e4eb584bb0b7aa2d18c3f38cd7d662a6799a0e00d\nCopying blob sha256:57f0457681cd6e1a956dbd81094602cf4a49295adb5045170e65d3eda289a190\nCopying blob sha256:df4e16d3bd0f27ab34ca679d5eafadd8e885e2c7a7caa1ebf0f6aacaf8b69957\nCopying blob sha256:78afc5364ad2c981e4a4919f535aaefef9ac2f990837be01c766764e025b1f31\nCopying config sha256:510b99c2b609e4c8b1b7e51f8d95f2bc9d5dcbefea82d5a83988989e0841913f\nWriting manifest to image destination\nStoring signatures", "stderr_lines": ["Trying to pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:16.1_20200428.1...", "Getting image source signatures", "Copying blob sha256:58e1deb9693dfb1704ccce2f1cf0e4d663ac77098a7a0f699708a71549cbd924", "Copying blob sha256:71d7feb766eb9ed5f1f9e8fc494135af60a8820ce98bc67ee8f46c6971a640d8", "Copying blob sha256:318bef23e5ac9db3ca50932e4eb584bb0b7aa2d18c3f38cd7d662a6799a0e00d", "Copying blob sha256:57f0457681cd6e1a956dbd81094602cf4a49295adb5045170e65d3eda289a190", "Copying blob sha256:df4e16d3bd0f27ab34ca679d5eafadd8e885e2c7a7caa1ebf0f6aacaf8b69957", "Copying blob sha256:78afc5364ad2c981e4a4919f535aaefef9ac2f990837be01c766764e025b1f31", "Copying config sha256:510b99c2b609e4c8b1b7e51f8d95f2bc9d5dcbefea82d5a83988989e0841913f", "Writing manifest to image destination", "Storing signatures"], "stdout": "510b99c2b609e4c8b1b7e51f8d95f2bc9d5dcbefea82d5a83988989e0841913f", "stdout_lines": ["510b99c2b609e4c8b1b7e51f8d95f2bc9d5dcbefea82d5a83988989e0841913f"]} [heat-admin@controller-0 ~]$ cat /etc/hosts # BEGIN ANSIBLE MANAGED BLOCK fd00:fd00:fd00:3000::10 ceph-0.redhat.local ceph-0 fd00:fd00:fd00:3000::10 ceph-0.storage.redhat.local ceph-0.storage fd00:fd00:fd00:4000::17 ceph-0.storagemgmt.redhat.local ceph-0.storagemgmt 192.168.24.9 ceph-0.ctlplane.redhat.local ceph-0.ctlplane .... 2620:52:0:13b8:5054:ff:fe3e:d controller-2.external.redhat.local controller-2.external 192.168.24.10 controller-2.ctlplane.redhat.local controller-2.ctlplane 192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane 192.168.24.6 overcloud.ctlplane.localdomain fd00:fd00:fd00:3000::12 overcloud.storage.localdomain fd00:fd00:fd00:4000::23 overcloud.storagemgmt.localdomain fd00:fd00:fd00:2000::11 overcloud.internalapi.localdomain 2620:52:0:13b8:5054:ff:fe3e:1 overcloud.localdomain # END ANSIBLE MANAGED BLOCK 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148