Description of problem: When updating the overcloud, the undercloud registry fails to provide the correct images, if the following conditions are met: 1. Used containers-prepare-parameter.yaml with 16.1.x tag (with x < 6, as of this writing) to perform the initial deployment of RHOSP 2. Used containers-prepare-parameter.yaml with 16.1 tag when preparing the udpate This causes the MultiView handler on apache to always return the older image, even if the 16.1 tag is specified: [root@controller1-161 ~]# podman images |grep aodh director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator 16.1 d6480804c4d2 8 months ago 743 MB director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator 16.1.4 d6480804c4d2 8 months ago 743 MB How reproducible: Always reproducible by pulling images from CDN to prepare overcloud upgrade, if tag 16.1.4 is used first, and then 16.1. Steps to Reproduce: 1. Deploy undercloud and overcloud using tag 16.1.4 on the file containers-prepare-parameter.yaml $ egrep -B1 tag_from_label containers-prepare-parameter.yaml tag: '16.1.4' tag_from_label: '{version}-{release}' 2. Change tag from 16.1.4 to 16.1, in preparation for update, and as documented on [1] $ sed -i 's/16.1.4/16.1/' containers-prepare-parameter.yaml $ egrep -B1 tag_from_label containers-prepare-parameter.yaml tag: '16.1' tag_from_label: '{version}-{release}' 3. Update undercloud image registry $ sudo openstack tripleo container image prepare -e containers-prepare-parameter.yaml 4. Test for a given image, that the version with tag 16.1 has been populated on the undercloud registry (along with the older one): [stack@director ~]$ cat /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/tags/list | jq . { "name": "rhosp-rhel8/openstack-aodh-listener", "tags": [ "16.1.4", "16.1" ] } [stack@director ~]$ ll /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/*.type-map -rw-r--r--. 1 root root 169 Nov 25 11:01 /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4.type-map -rw-r--r--. 1 root root 167 Nov 25 11:40 /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.type-map [stack@director ~]$ 5. On one of the controllers, check for the current version of the same image: [root@controller1-161 ~]# podman images |grep aodh director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator 16.1.4 d6480804c4d2 8 months ago 743 MB [root@controller1-161 ~]# 6. Pull the image using the tag 16.1: [root@controller1-161 ~]# podman pull director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator:16.1 Trying to pull director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator:16.1... Getting image source signatures Copying blob ea462113cc1f skipped: already exists Copying blob fe9c8cc943e3 skipped: already exists Copying blob 4422ea938fce skipped: already exists Copying blob 00f18994d663 skipped: already exists Copying blob d8ec3e673e1e skipped: already exists Copying blob 505209649de3 skipped: already exists Copying config d6480804c4 done Writing manifest to image destination Storing signatures d6480804c4d2cb600479b46f92f728db804410d68c264566dd5d3859a21bd750 [root@controller1-161 ~]# Actual results: As a result, the controller has pulled the same image than already had, and added the tag 16.1 to it: [root@controller1-161 ~]# podman images |grep aodh director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator 16.1 d6480804c4d2 8 months ago 743 MB director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator 16.1.4 d6480804c4d2 8 months ago 743 MB [root@controller1-161 ~]# When using curl to check the images manifest, they both show the same: [root@controller1-161 ~]# curl -s http://director.ctlplane.localdomain:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1 | jq .config.digest "sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273" [root@controller1-161 ~]# curl -s http://director.ctlplane.localdomain:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4 | jq .config.digest "sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273" [root@controller1-161 ~]# However, the Apache type-map file correctly shows different digest: [root@controller1-161 ~]# curl -s http://director.ctlplane.localdomain:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.type-map | jq .config.digest "sha256:ea548ddc17674629c97f7472cb5fe62f5495208bcf2f27d5918bd9a8e7c9833f" [root@controller1-161 ~]# curl -s http://director.ctlplane.localdomain:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4.type-map | jq .config.digest "sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273" [root@controller1-161 ~]# Therefore, the type-map files have the correct content, but Apache is always using the older one. Expected results: Apache type-map should correctly map the GET request of */16.1, to the type-map file */16.1.type-map, instead of matching 16.1.4.type-map, so that podman can pull the right image. Additional info: [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/keeping_red_hat_openstack_platform_updated/index#updating-your-container-image-preparation-file_keeping-updated
Workaround: Before pulling the new images, remove the old ones from the undercloud registry so that when the overcloud nodes pull images, only the ones with tag 16.1 are available. $ openstack tripleo container image list -c "Image Name" -f value |awk '/16.1.4$/' | while read image ;do echo "Deleting ${image}..." ;sudo openstack tripleo container image delete -y $image ;done [root@controller1-161 ~]# podman images |grep aodh director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator 16.1.4 d6480804c4d2 8 months ago 743 MB [root@controller1-161 ~]# [root@controller1-161 ~]# podman pull director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator:16.1 Trying to pull director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator:16.1... Getting image source signatures Copying blob 17cb0a75ad3b skipped: already exists Copying blob a2070ce838f3 skipped: already exists Copying blob 2cae32376095 skipped: already exists Copying blob c30189fd3718 skipped: already exists Copying blob 328bf4e7fd60 done Copying blob 20ac4e1e3488 done Copying config cef243fa5a done Writing manifest to image destination Storing signatures cef243fa5a06744cc03c27c7f74ea7238540dd328586b327546ea6c722c9529c [root@controller1-161 ~]# [root@controller1-161 ~]# podman images |grep aodh director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator 16.1 cef243fa5a06 6 weeks ago 743 MB director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator 16.1.4 d6480804c4d2 8 months ago 743 MB [root@controller1-161 ~]#
Also worth mentioning that while the undercloud registry is providing the wrong image, the overcloud upgrade fails. After cleaning the registry with the workaround provided above, the overcloud update completes as expected.
Please provide the container image prepare logs (run with --debug). It should be noted that podman images does not list the contents of the image-serve registry. You would need to use `openstack tripleo container image list` to view the contents.
Attached the requested logs
This is a bug in z4 that is fixed in z6. Please upgrade. See Bug 1941412 *** This bug has been marked as a duplicate of bug 1941412 ***
(In reply to Alex Schultz from comment #9) > This is a bug in z4 that is fixed in z6. Please upgrade. See Bug 1941412 > > *** This bug has been marked as a duplicate of bug 1941412 *** I don't see how the problem described here is related to the bug for which this BZ was marked as duplicate. To start with, my test undercloud is already at 16.1.6, and I'm using latest packages available to deploy the overcloud: [root@director ~]# cat /etc/rhosp-release Red Hat OpenStack Platform release 16.1.6 GA (Train) [root@director ~]# [root@director ~]# yum check-update Updating Subscription Management repositories. /usr/lib/python3.6/site-packages/dateutil/parser/_parser.py:70: UnicodeWarning: decode() called on unicode string, see https://bugzilla.redhat.com/show_bug.cgi?id=1693751 instream = instream.decode() Advanced Virtualization for RHEL 8 x86_64 (RPMs) 23 kB/s | 2.8 kB 00:00 Red Hat Enterprise Linux 8 for x86_64 - BaseOS - Extended Update Support (RPMs) 20 kB/s | 2.4 kB 00:00 Fast Datapath for RHEL 8 x86_64 (RPMs) 21 kB/s | 2.4 kB 00:00 Red Hat Enterprise Linux 8 for x86_64 - AppStream - Extended Update Support (RPMs) 25 kB/s | 2.8 kB 00:00 Red Hat Ansible Engine 2.9 for RHEL 8 x86_64 (RPMs) 22 kB/s | 2.3 kB 00:00 Red Hat Enterprise Linux 8 for x86_64 - High Availability - Extended Update Support (RPMs) 22 kB/s | 2.4 kB 00:00 Red Hat OpenStack Platform 16.1 for RHEL 8 x86_64 (RPMs) 19 kB/s | 2.4 kB 00:00 [root@director ~]# On top of that, I don't even need to deploy the overcloud to notice that there's going to be a problem when doing that: [root@director ~]# ll /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/*.type-map -rw-r--r--. 1 root root 169 Nov 25 21:35 /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4.type-map -rw-r--r--. 1 root root 167 Nov 25 21:58 /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.type-map [root@director ~]# [root@director ~]# curl -s http://192.168.24.1:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1 | jq .config.digest "sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273" [root@director ~]# curl -s http://192.168.24.1:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4 | jq .config.digest "sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273" [root@director ~]# [root@director ~]# [root@director ~]# curl -s http://192.168.24.1:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.type-map | jq .config.digest "sha256:ea548ddc17674629c97f7472cb5fe62f5495208bcf2f27d5918bd9a8e7c9833f" [root@director ~]# curl -s http://192.168.24.1:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4.type-map | jq .config.digest "sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273" [root@director ~]# That shows that if I were to deploy an overcloud now with tag 16.1 (not even saying an update, but a fresh install), it would still pull an image of 16.1.4 instead of 16.1.6.
Ok I'll reopen and look when I get a chance. There was an issue with the tags not being properly managed because the metadata wasn't correctly fetched every time. I'll try and duplicate this. That being said, it's not really recommended to use 16.1 unless you are following the latest of 16.1 always. The referenced bz is missing references to changes around image id comparisons which were handled via https://review.opendev.org/q/topic:%22bug%252F1895974-stable%252Ftrain%22+(status:open%20OR%20status:merged)
(In reply to Alex Schultz from comment #11) > it's not really recommended to use 16.1 unless you are following the latest of 16.1 always. > Yes, that is why my customer is changing their container image prepare file to use tag 16.1 instead of 16.1.x. They ran into this issue now as part of the change, but from now on they keep the 16.1 tag and just pull the latest.
I am unable to reproduce this. I just setup a 16.1.4 undercloud. Then I proceeded to run openstack tripleo container image prepare with a switch to tag: '16.1' from tag: '16.1.4'. I then looked at the type-map for openstack-cron which was updated to a different container. [cloud-user@undercloud manifests]$ cat 16.1.type-map URI: 16.1 Content-Type: application/vnd.docker.distribution.manifest.v2+json URI: sha256:de32ea21c4637013c63b95e7289dcf531e0c315f20ace5e0802fcfcb00017470/index.json [cloud-user@undercloud manifests]$ cat 16.1.4.type-map URI: 16.1.4 Content-Type: application/vnd.docker.distribution.manifest.v2+json I even tried copying the 16.1.4.type-map over to 16.1.type-map and rerunning to see if the file doesn't get updated. I was updated with the same content.
Actually I think I've reproduced it. I'll continue to dig deeper.
So the content is being updated and technically the files on disk are correct. What appears to be happening is that the way the tag urls are being intrepreted by apache is causing the 16.1.4 metadata to be provided for the 16.1 tag. I'm looking into how we can address this mismatch. For now the workaround would be to make sure that you don't have 16.1.x tags when using 16.1.
(In reply to Alex Schultz from comment #15) > So the content is being updated and technically the files on disk are > correct. What appears to be happening is that the way the tag urls are > being intrepreted by apache is causing the 16.1.4 metadata to be provided > for the 16.1 tag. I'm looking into how we can address this mismatch. For > now the workaround would be to make sure that you don't have 16.1.x tags > when using 16.1. I agree. That's why I said the issue is with the Apache Multiview handler. Applying the workaround on comment #1 _before_ starting the overcloud update/deploy is enough for the job to complete successfully afterwards.
Is it to early to ask on which z stream is this fix going to be included?
Procedure used was: deploy undercloud with puddle that has fix openstack tripleo container image prepare with a tag of 16.1.7 podman pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-aodh-evaluator openstack tripleo container image prepare with a tag of 16.1 podman pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-aodh-evaluator Saw that 16.1 and 16.1.7 ids were different: (undercloud) [stack@undercloud-0 ~]$ sudo podman images | grep aodh undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-aodh-evaluator 16.1 969a82fce921 5 hours ago 743 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-aodh-evaluator 16.1.7 b21f7139c745 2 months ago 743 MB
Is this bug also affecting 16.2? I don't have a one to verify myself right now.
Yes it impacts 16.2. Bug 2028962 is for 16.2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0986