Bug 2026654 - Undercloud fails to provide correct images to overcloud when updating tags from 16.1.x to 16.1
Summary: Undercloud fails to provide correct images to overcloud when updating tags fr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks: 2028962
TreeView+ depends on / blocked
 
Reported: 2021-11-25 12:25 UTC by Eric Nothen
Modified: 2022-03-24 11:02 UTC (History)
5 users (show)

Fixed In Version: tripleo-ansible-0.5.1-1.20211220033343.902c3c8.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2028962 (view as bug list)
Environment:
Last Closed: 2022-03-24 11:02:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1953198 0 None None None 2021-12-03 20:13:50 UTC
OpenStack gerrit 820594 0 None NEW Disable image-serve MultiViews 2021-12-06 21:35:51 UTC
Red Hat Issue Tracker OSP-11049 0 None None None 2021-11-25 12:26:59 UTC
Red Hat Product Errata RHBA-2022:0986 0 None None None 2022-03-24 11:02:39 UTC

Description Eric Nothen 2021-11-25 12:25:27 UTC
Description of problem:
When updating the overcloud, the undercloud registry fails to provide the correct images, if the following conditions are met:

1. Used containers-prepare-parameter.yaml with 16.1.x tag (with x < 6, as of this writing) to perform the initial deployment of RHOSP
2. Used containers-prepare-parameter.yaml with 16.1 tag when preparing the udpate

This causes the MultiView handler on apache to always return the older image, even if the 16.1 tag is specified:

[root@controller1-161 ~]# podman images |grep aodh
director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator       16.1         d6480804c4d2   8 months ago   743 MB
director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator       16.1.4       d6480804c4d2   8 months ago   743 MB


How reproducible:
Always reproducible by pulling images from CDN to prepare overcloud upgrade, if tag 16.1.4 is used first, and then 16.1.

Steps to Reproduce:
1. Deploy undercloud and overcloud using tag 16.1.4 on the file containers-prepare-parameter.yaml

$ egrep -B1 tag_from_label containers-prepare-parameter.yaml
      tag: '16.1.4'
    tag_from_label: '{version}-{release}'

2. Change tag from 16.1.4 to 16.1, in preparation for update, and as documented on [1]

$ sed -i 's/16.1.4/16.1/' containers-prepare-parameter.yaml
$ egrep -B1 tag_from_label containers-prepare-parameter.yaml
      tag: '16.1'
    tag_from_label: '{version}-{release}'

3. Update undercloud image registry

$ sudo openstack tripleo container image prepare -e containers-prepare-parameter.yaml 

4. Test for a given image, that the version with tag 16.1 has been populated on the undercloud registry (along with the older one):

[stack@director ~]$ cat /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/tags/list  | jq .
{
  "name": "rhosp-rhel8/openstack-aodh-listener",
  "tags": [
    "16.1.4",
    "16.1"
  ]
}
[stack@director ~]$ ll /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/*.type-map
-rw-r--r--. 1 root root 169 Nov 25 11:01 /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4.type-map
-rw-r--r--. 1 root root 167 Nov 25 11:40 /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.type-map
[stack@director ~]$ 

5. On one of the controllers, check for the current version of the same image:

[root@controller1-161 ~]# podman images |grep aodh
director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator       16.1.4       d6480804c4d2   8 months ago   743 MB
[root@controller1-161 ~]# 


6. Pull the image using the tag 16.1:

[root@controller1-161 ~]# podman pull director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator:16.1
Trying to pull director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator:16.1...
Getting image source signatures
Copying blob ea462113cc1f skipped: already exists
Copying blob fe9c8cc943e3 skipped: already exists
Copying blob 4422ea938fce skipped: already exists
Copying blob 00f18994d663 skipped: already exists
Copying blob d8ec3e673e1e skipped: already exists
Copying blob 505209649de3 skipped: already exists
Copying config d6480804c4 done
Writing manifest to image destination
Storing signatures
d6480804c4d2cb600479b46f92f728db804410d68c264566dd5d3859a21bd750
[root@controller1-161 ~]# 


Actual results:

As a result, the controller has pulled the same image than already had, and added the tag 16.1 to it:

[root@controller1-161 ~]# podman images |grep aodh
director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator       16.1         d6480804c4d2   8 months ago   743 MB
director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator       16.1.4       d6480804c4d2   8 months ago   743 MB
[root@controller1-161 ~]# 


When using curl to check the images manifest, they both show the same:

[root@controller1-161 ~]# curl -s http://director.ctlplane.localdomain:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1 | jq .config.digest
"sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273"
[root@controller1-161 ~]# curl -s http://director.ctlplane.localdomain:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4 | jq .config.digest
"sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273"
[root@controller1-161 ~]# 


However, the Apache type-map file correctly shows different digest:

[root@controller1-161 ~]# curl -s http://director.ctlplane.localdomain:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.type-map | jq .config.digest
"sha256:ea548ddc17674629c97f7472cb5fe62f5495208bcf2f27d5918bd9a8e7c9833f"
[root@controller1-161 ~]# curl -s http://director.ctlplane.localdomain:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4.type-map | jq .config.digest
"sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273"
[root@controller1-161 ~]# 

Therefore, the type-map files have the correct content, but Apache is always using the older one.


Expected results:

Apache type-map should correctly map the GET request of */16.1, to the type-map file */16.1.type-map, instead of matching 16.1.4.type-map, so that podman can pull the right image.


Additional info:


[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/keeping_red_hat_openstack_platform_updated/index#updating-your-container-image-preparation-file_keeping-updated

Comment 1 Eric Nothen 2021-11-25 14:36:10 UTC
Workaround: 
Before pulling the new images, remove the old ones from the undercloud registry so that when the overcloud nodes pull images, only the ones with tag 16.1 are available.

$ openstack tripleo container image list -c "Image Name" -f value |awk '/16.1.4$/' | while read image ;do echo "Deleting ${image}..." ;sudo openstack tripleo container image delete -y $image ;done


[root@controller1-161 ~]# podman images |grep aodh
director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator       16.1.4       d6480804c4d2   8 months ago   743 MB
[root@controller1-161 ~]# 
[root@controller1-161 ~]# podman pull director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator:16.1
Trying to pull director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator:16.1...
Getting image source signatures
Copying blob 17cb0a75ad3b skipped: already exists
Copying blob a2070ce838f3 skipped: already exists
Copying blob 2cae32376095 skipped: already exists
Copying blob c30189fd3718 skipped: already exists
Copying blob 328bf4e7fd60 done
Copying blob 20ac4e1e3488 done
Copying config cef243fa5a done
Writing manifest to image destination
Storing signatures
cef243fa5a06744cc03c27c7f74ea7238540dd328586b327546ea6c722c9529c
[root@controller1-161 ~]# 
[root@controller1-161 ~]# podman images |grep aodh
director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator       16.1         cef243fa5a06   6 weeks ago    743 MB
director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-aodh-evaluator       16.1.4       d6480804c4d2   8 months ago   743 MB
[root@controller1-161 ~]#

Comment 2 Eric Nothen 2021-11-25 14:58:25 UTC
Also worth mentioning that while the undercloud registry is providing the wrong image, the overcloud upgrade fails. After cleaning the registry with the workaround provided above, the overcloud update completes as expected.

Comment 3 Alex Schultz 2021-11-25 18:04:50 UTC
Please provide the container image prepare logs (run with --debug).  It should be noted that podman images does not list the contents of the image-serve registry. You would need to use `openstack tripleo container image list` to view the contents.

Comment 8 Eric Nothen 2021-11-26 07:08:52 UTC
Attached the requested logs

Comment 9 Alex Schultz 2021-11-29 16:04:05 UTC
This is a bug in z4 that is fixed in z6. Please upgrade. See Bug 1941412

*** This bug has been marked as a duplicate of bug 1941412 ***

Comment 10 Eric Nothen 2021-12-01 21:12:23 UTC
(In reply to Alex Schultz from comment #9)
> This is a bug in z4 that is fixed in z6. Please upgrade. See Bug 1941412
> 
> *** This bug has been marked as a duplicate of bug 1941412 ***

I don't see how the problem described here is related to the bug for which this BZ was marked as duplicate. To start with, my test undercloud is already at 16.1.6, and I'm using latest packages available to deploy the overcloud:

[root@director ~]# cat /etc/rhosp-release 
Red Hat OpenStack Platform release 16.1.6 GA (Train)
[root@director ~]# 
[root@director ~]# yum check-update
Updating Subscription Management repositories.
/usr/lib/python3.6/site-packages/dateutil/parser/_parser.py:70: UnicodeWarning: decode() called on unicode string, see https://bugzilla.redhat.com/show_bug.cgi?id=1693751
  instream = instream.decode()

Advanced Virtualization for RHEL 8 x86_64 (RPMs)                                                                                                                                                                              23 kB/s | 2.8 kB     00:00    
Red Hat Enterprise Linux 8 for x86_64 - BaseOS - Extended Update Support (RPMs)                                                                                                                                               20 kB/s | 2.4 kB     00:00    
Fast Datapath for RHEL 8 x86_64 (RPMs)                                                                                                                                                                                        21 kB/s | 2.4 kB     00:00    
Red Hat Enterprise Linux 8 for x86_64 - AppStream - Extended Update Support (RPMs)                                                                                                                                            25 kB/s | 2.8 kB     00:00    
Red Hat Ansible Engine 2.9 for RHEL 8 x86_64 (RPMs)                                                                                                                                                                           22 kB/s | 2.3 kB     00:00    
Red Hat Enterprise Linux 8 for x86_64 - High Availability - Extended Update Support (RPMs)                                                                                                                                    22 kB/s | 2.4 kB     00:00    
Red Hat OpenStack Platform 16.1 for RHEL 8 x86_64 (RPMs)                                                                                                                                                                      19 kB/s | 2.4 kB     00:00    
[root@director ~]# 


On top of that, I don't even need to deploy the overcloud to notice that there's going to be a problem when doing that:


[root@director ~]# ll /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/*.type-map
-rw-r--r--. 1 root root 169 Nov 25 21:35 /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4.type-map
-rw-r--r--. 1 root root 167 Nov 25 21:58 /var/lib/image-serve/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.type-map
[root@director ~]# 
[root@director ~]# curl -s http://192.168.24.1:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1 | jq .config.digest
"sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273"
[root@director ~]# curl -s http://192.168.24.1:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4 | jq .config.digest
"sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273"
[root@director ~]# 
[root@director ~]# 
[root@director ~]# curl -s http://192.168.24.1:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.type-map | jq .config.digest
"sha256:ea548ddc17674629c97f7472cb5fe62f5495208bcf2f27d5918bd9a8e7c9833f"
[root@director ~]# curl -s http://192.168.24.1:8787/v2/rhosp-rhel8/openstack-aodh-listener/manifests/16.1.4.type-map | jq .config.digest
"sha256:6d6130ebb9f4c6b0ad2b1d9a29c9d2fa84df24afc9b4744e5232f75346cd4273"
[root@director ~]# 


That shows that if I were to deploy an overcloud now with tag 16.1 (not even saying an update, but a fresh install), it would still pull an image of 16.1.4 instead of 16.1.6.

Comment 11 Alex Schultz 2021-12-01 22:45:00 UTC
Ok I'll reopen and look when I get a chance. There was an issue with the tags not being properly managed because the metadata wasn't correctly fetched every time. I'll try and duplicate this. That being said, it's not really recommended to use 16.1 unless you are following the latest of 16.1 always.


The referenced bz is missing references to changes around image id comparisons which were handled via https://review.opendev.org/q/topic:%22bug%252F1895974-stable%252Ftrain%22+(status:open%20OR%20status:merged)

Comment 12 Eric Nothen 2021-12-02 11:53:27 UTC
(In reply to Alex Schultz from comment #11)
> it's not really recommended to use 16.1 unless you are following the latest of 16.1 always.
> 

Yes, that is why my customer is changing their container image prepare file to use tag 16.1 instead of 16.1.x. They ran into this issue now as part of the change, but from now on they keep the 16.1 tag and just pull the latest.

Comment 13 Alex Schultz 2021-12-02 17:00:08 UTC
I am unable to reproduce this. I just setup a 16.1.4 undercloud. Then I proceeded to run openstack tripleo container image prepare with a switch to tag: '16.1' from tag: '16.1.4'.  I then looked at the type-map for openstack-cron which was updated to a different container.

[cloud-user@undercloud manifests]$ cat 16.1.type-map 
URI: 16.1

Content-Type: application/vnd.docker.distribution.manifest.v2+json
URI: sha256:de32ea21c4637013c63b95e7289dcf531e0c315f20ace5e0802fcfcb00017470/index.json

[cloud-user@undercloud manifests]$ cat 16.1.4.type-map 
URI: 16.1.4

Content-Type: application/vnd.docker.distribution.manifest.v2+json


I even tried copying the 16.1.4.type-map over to 16.1.type-map and rerunning to see if the file doesn't get updated. I was updated with the same content.

Comment 14 Alex Schultz 2021-12-02 17:11:18 UTC
Actually I think I've reproduced it. I'll continue to dig deeper.

Comment 15 Alex Schultz 2021-12-03 15:19:03 UTC
So the content is being updated and technically the files on disk are correct.  What appears to be happening is that the way the tag urls are being intrepreted by apache is causing the 16.1.4 metadata to be provided for the 16.1 tag.  I'm looking into how we can address this mismatch.  For now the workaround would be to make sure that you don't have 16.1.x tags when using 16.1.

Comment 16 Eric Nothen 2021-12-03 15:29:26 UTC
(In reply to Alex Schultz from comment #15)
> So the content is being updated and technically the files on disk are
> correct.  What appears to be happening is that the way the tag urls are
> being intrepreted by apache is causing the 16.1.4 metadata to be provided
> for the 16.1 tag.  I'm looking into how we can address this mismatch.  For
> now the workaround would be to make sure that you don't have 16.1.x tags
> when using 16.1.

I agree. That's why I said the issue is with the Apache Multiview handler. Applying the workaround on comment #1 _before_ starting the overcloud update/deploy is enough for the job to complete successfully afterwards.

Comment 17 Eric Nothen 2022-01-12 08:40:53 UTC
Is it to early to ask on which z stream is this fix going to be included?

Comment 20 David Rosenfeld 2022-02-01 18:23:40 UTC
Procedure used was:

deploy undercloud with puddle that has fix
openstack tripleo container image prepare with a tag of 16.1.7
podman pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-aodh-evaluator
openstack tripleo container image prepare with a tag of 16.1
podman pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-aodh-evaluator

Saw that 16.1 and 16.1.7 ids were different:

(undercloud) [stack@undercloud-0 ~]$ sudo podman images | grep aodh
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-aodh-evaluator              16.1                     969a82fce921   5 hours ago    743 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-aodh-evaluator              16.1.7                   b21f7139c745   2 months ago   743 MB

Comment 21 Eric Nothen 2022-02-01 19:40:53 UTC
Is this bug also affecting 16.2? I don't have a one to verify myself right now.

Comment 22 Alex Schultz 2022-02-01 19:56:40 UTC
Yes it impacts 16.2. Bug 2028962 is for 16.2

Comment 29 errata-xmlrpc 2022-03-24 11:02:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0986


Note You need to log in before you can comment on or make changes to this bug.