Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1593715 - [UPGRADES] DockerInsecureRegistryAddress parameter is not propagated to OC nodes during upgrade
[UPGRADES] DockerInsecureRegistryAddress parameter is not propagated to OC n...
Status: ASSIGNED
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
13.0 (Queens)
Unspecified Unspecified
medium Severity medium
: zstream
: 13.0 (Queens)
Assigned To: Jiri Stransky
Gurenko Alex
: ReleaseNotes, Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-06-21 08:44 EDT by Yurii Prokulevych
Modified: 2018-09-20 22:23 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
Insecure registry list is being updated later than some container images are pulled during a major upgrade. As such, container images from newly introduced insecure registry fails to download during `openstack overcloud upgrade run` command. You can use one of the following workarounds: Option A: Update the /etc/sysconfig/docker file manually on nodes which have containers managed by Pacemaker, and add any newly introduced insecure registries. Option B: run `openstack overcloud deploy` command right before upgrading, and provide the desired new insecure registry list using an environment file with the DockerInsecureRegistryAddress parameter. All container images should download successfully during upgrade.
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Yurii Prokulevych 2018-06-21 08:44:36 EDT
Description of problem:
-----------------------
Registries specified within DockerInsecureRegistryAddress in container-images.yaml inventory file are not propagated to /etc/sysconfig/docker on overcloud nodes.

Excerpt from container-images.yaml:

DockerInsecureRegistryAddress:
  - registry.one.example.com
  - registry.two.example.com

After overcloud upgrade prepare:

openstack overcloud upgrade prepare --stack qe-Cloud-1 \
    --templates /usr/share/openstack-tripleo-heat-templates \
    -e /home/stack/composable_roles/roles/nodes.yaml \
    -e /home/stack/composable_roles/internal.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
    -e /home/stack/composable_roles/network/network-environment.yaml \
    -e /home/stack/composable_roles/enable-tls.yaml \
    -e /home/stack/composable_roles/inject-trust-anchor.yaml \
    -e /home/stack/composable_roles/public_vip.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
    -e /home/stack/composable_roles/hostnames.yaml \
    -e /home/stack/composable_roles/debug.yaml \
    -e /home/stack/composable_roles/config_heat.yaml \
    -e /home/stack/composable_roles/docker-images.yaml \
    -e /home/stack/composable_roles/docker-images.yaml \
    --roles-file /home/stack/composable_roles/roles/roles_data.yaml 2>&1

[root@controller-0 ~]# awk '/INSECURE_REGISTRY=/' /etc/sysconfig/docker
INSECURE_REGISTRY="--insecure-registry 192.168.24.1:8787"

registry.one.example.com and registry.two.example.com are not in /etc/sysconfig/docker

Problem:
========
Overcloud upgrade fails when trying to fetch new docker images:

u'TASK [Pull latest Haproxy images] **********************************************',
 u'fatal: [192.168.24.15]: FAILED! => {"changed": true, "cmd": ["docker", "pull", "rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy:2018-06-15.2"], "delta": "0:00:00.040523", "end": "2018-06-21 12:22:24.503019", "msg": "non-zero return code", "rc": 1, "start": "2018-06-21 12:22:24.462496", "stderr": "Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client", "stderr_lines": ["Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client"], "stdout": "Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... ", "stdout_lines": ["Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... "]}',
 u'fatal: [192.168.24.23]: FAILED! => {"changed": true, "cmd": ["docker", "pull", "rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy:2018-06-15.2"], "delta": "0:00:00.036477", "end": "2018-06-21 12:22:24.566352", "msg": "non-zero return code", "rc": 1, "start": "2018-06-21 12:22:24.529875", "stderr": "Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client", "stderr_lines": ["Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client"], "stdout": "Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... ", "stdout_lines": ["Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... "]}',
 u'fatal: [192.168.24.14]: FAILED! => {"changed": true, "cmd": ["docker", "pull", "rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy:2018-06-15.2"], "delta": "0:00:00.042652", "end": "2018-06-21 12:22:24.575743", "msg": "non-zero return code", "rc": 1, "start": "2018-06-21 12:22:24.533091", "stderr": "Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client", "stderr_lines": ["Get https://rhos-qe-mirror-qeos.usersys.redhat.com:5000/v1/_ping: http: server gave HTTP response to HTTPS client"], "stdout": "Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... ", "stdout_lines": ["Trying to pull repository rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-haproxy ... "]}',
 u'',

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-8.0.2-35.el7ost.noarch
python-tripleoclient-9.2.1-12.el7ost.noarch

How reproducible:
-----------------
So far 100%

Steps to Reproduce:
-------------------
1. Deploy RHOS-12
2. Upgrade UC to RHOS-13
3. Setup latest repos on overcloud nodes
4. Prepare container images file that point to registries that differ from the one used during deploy

Additional info: virtual env with composable roles.
Comment 1 Yurii Prokulevych 2018-06-21 09:09:09 EDT
After checking registries on all nodes it seems that nodes in CephStorage and Compute roles got registry configured correctly.
Nodes in ControllerOpenstack, Database, Messaging, Networker have 'wrong' entries.
Comment 2 Jiri Stransky 2018-06-21 11:41:19 EDT
Root cause:

* During `upgrade prepare`, the config management is disabled, so /etc/sysconfig/docker will not get the new values there.

* During `upgrade run` we first run upgrade_tasks and then normal deploy tasks. /etc/sysconfig/docker is updated early during the deploy tasks.

** For services managed by Paunch, the refetching of images and creation of containers is handled by Paunch *after* the docker config has been updated, and there's no problem.

** For services managed by pacemaker, the new images are fetched and tagged in upgrade_tasks, so *before* the docker config has been updated.

** The above explains why this issue only appears on nodes which have some pacemaker-managed containers.

* The issue was triggered by switching to a different registry during the upgrade than what was used for deploy, and they were both used in "insecure" mode.

Impact:

* This problem only appears when overcloud is fetching images from an "insecure registry" and the insecure registry used for the upgrade is different than what was used in the last preceding config management run (likely a preceding `overcloud deploy` command).

* In production environments it's likely that users would either point to CDN (not insecure registry, won't trigger the issue) or that they'd continue using the same insecure registry (e.g. on undercloud) both for deploy and upgrade (again this wouldn't trigger the problem).

Workarounds:

If using a different insecure registry for upgrade than for deploy, i see these options to avoid the upgrade failure:

* Before upgrading, run an `overcloud deploy` with the new registry URLs added to DockerInsecureRegistryAddress parameter.

* Update /etc/sysconfig/docker on the overcloud nodes manually to add the registry URLs (only necessary for nodes which run containers managed by pacemaker).
Comment 3 Jiri Stransky 2018-06-21 11:51:07 EDT
Based on the above investigation i'll triage this as medium/medium but feel free to adjust or post more feedback.

Note You need to log in before you can comment on or make changes to this bug.