Bug 1593715
| Summary: | [UPGRADES] DockerInsecureRegistryAddress parameter is not propagated to OC nodes during upgrade | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Yurii Prokulevych <yprokule> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Lukas Bezdicka <lbezdick> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | Gurenko Alex <agurenko> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 13.0 (Queens) | CC: | augol, ccamacho, hbrock, jfrancoa, jpichon, jslagle, lbezdick, mbultel, mburns, morazi, pablo.iranzo, sasha, takito |
| Target Milestone: | zstream | Keywords: | ReleaseNotes, Triaged, ZStream |
| Target Release: | 13.0 (Queens) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
Insecure registry list is being updated later than some container images are pulled during a major upgrade. As such, container images from newly introduced insecure registry fails to download during `openstack overcloud upgrade run` command.
You can use one of the following workarounds:
Option A: Update the /etc/sysconfig/docker file manually on nodes which have containers managed by Pacemaker, and add any newly introduced insecure registries.
Option B: run `openstack overcloud deploy` command right before upgrading, and provide the desired new insecure registry list using an environment file with the DockerInsecureRegistryAddress parameter.
All container images should download successfully during upgrade.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-08-27 15:36:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
After checking registries on all nodes it seems that nodes in CephStorage and Compute roles got registry configured correctly. Nodes in ControllerOpenstack, Database, Messaging, Networker have 'wrong' entries. Root cause: * During `upgrade prepare`, the config management is disabled, so /etc/sysconfig/docker will not get the new values there. * During `upgrade run` we first run upgrade_tasks and then normal deploy tasks. /etc/sysconfig/docker is updated early during the deploy tasks. ** For services managed by Paunch, the refetching of images and creation of containers is handled by Paunch *after* the docker config has been updated, and there's no problem. ** For services managed by pacemaker, the new images are fetched and tagged in upgrade_tasks, so *before* the docker config has been updated. ** The above explains why this issue only appears on nodes which have some pacemaker-managed containers. * The issue was triggered by switching to a different registry during the upgrade than what was used for deploy, and they were both used in "insecure" mode. Impact: * This problem only appears when overcloud is fetching images from an "insecure registry" and the insecure registry used for the upgrade is different than what was used in the last preceding config management run (likely a preceding `overcloud deploy` command). * In production environments it's likely that users would either point to CDN (not insecure registry, won't trigger the issue) or that they'd continue using the same insecure registry (e.g. on undercloud) both for deploy and upgrade (again this wouldn't trigger the problem). Workarounds: If using a different insecure registry for upgrade than for deploy, i see these options to avoid the upgrade failure: * Before upgrading, run an `overcloud deploy` with the new registry URLs added to DockerInsecureRegistryAddress parameter. * Update /etc/sysconfig/docker on the overcloud nodes manually to add the registry URLs (only necessary for nodes which run containers managed by pacemaker). Based on the above investigation i'll triage this as medium/medium but feel free to adjust or post more feedback. Taking into account that the customer base in OSP12 is really low, or non-existing. Could we close this BZ? This issue doesn't happen during FFWD or the upgrade from 13 to 14, as we now have the undercloud.conf file in which the insecure registries can be added. This is fixed in OSP15+ as far as I know. |
Description of problem: ----------------------- Registries specified within DockerInsecureRegistryAddress in container-images.yaml inventory file are not propagated to /etc/sysconfig/docker on overcloud nodes. Excerpt from container-images.yaml: DockerInsecureRegistryAddress: - registry.one.example.com - registry.two.example.com After overcloud upgrade prepare: openstack overcloud upgrade prepare --stack qe-Cloud-1 \ --templates /usr/share/openstack-tripleo-heat-templates \ -e /home/stack/composable_roles/roles/nodes.yaml \ -e /home/stack/composable_roles/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/composable_roles/network/network-environment.yaml \ -e /home/stack/composable_roles/enable-tls.yaml \ -e /home/stack/composable_roles/inject-trust-anchor.yaml \ -e /home/stack/composable_roles/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/composable_roles/hostnames.yaml \ -e /home/stack/composable_roles/debug.yaml \ -e /home/stack/composable_roles/config_heat.yaml \ -e /home/stack/composable_roles/docker-images.yaml \ -e /home/stack/composable_roles/docker-images.yaml \ --roles-file /home/stack/composable_roles/roles/roles_data.yaml 2>&1 [root@controller-0 ~]# awk '/INSECURE_REGISTRY=/' /etc/sysconfig/docker INSECURE_REGISTRY="--insecure-registry 192.168.24.1:8787" registry.one.example.com and registry.two.example.com are not in /etc/sysconfig/docker Problem: ======== Overcloud upgrade fails when trying to fetch new docker images: u'TASK [Pull latest Haproxy images] **********************************************', u'fatal: [192.168.24.15]: FAILED! => {"changed": true, "cmd": ["docker", "pull", "rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy:2018-06-15.2"], "delta": "0:00:00.040523", "end": "2018-06-21 12:22:24.503019", "msg": "non-zero return code", "rc": 1, "start": "2018-06-21 12:22:24.462496", "stderr": "Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client", "stderr_lines": ["Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client"], "stdout": "Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... ", "stdout_lines": ["Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... "]}', u'fatal: [192.168.24.23]: FAILED! => {"changed": true, "cmd": ["docker", "pull", "rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy:2018-06-15.2"], "delta": "0:00:00.036477", "end": "2018-06-21 12:22:24.566352", "msg": "non-zero return code", "rc": 1, "start": "2018-06-21 12:22:24.529875", "stderr": "Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client", "stderr_lines": ["Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client"], "stdout": "Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... ", "stdout_lines": ["Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... "]}', u'fatal: [192.168.24.14]: FAILED! => {"changed": true, "cmd": ["docker", "pull", "rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy:2018-06-15.2"], "delta": "0:00:00.042652", "end": "2018-06-21 12:22:24.575743", "msg": "non-zero return code", "rc": 1, "start": "2018-06-21 12:22:24.533091", "stderr": "Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client", "stderr_lines": ["Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client"], "stdout": "Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... ", "stdout_lines": ["Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... "]}', u'', Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-tripleo-heat-templates-8.0.2-35.el7ost.noarch python-tripleoclient-9.2.1-12.el7ost.noarch How reproducible: ----------------- So far 100% Steps to Reproduce: ------------------- 1. Deploy RHOS-12 2. Upgrade UC to RHOS-13 3. Setup latest repos on overcloud nodes 4. Prepare container images file that point to registries that differ from the one used during deploy Additional info: virtual env with composable roles.