Bug 2213672 - Pulling openstack images from undercloud registry fails with 404 when AdditionalArchitectures is set and tag is 16.2
Summary: Pulling openstack images from undercloud registry fails with 404 when Additio...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 16.2 (Train)
Hardware: ppc64le
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: James Slagle
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-09 00:45 UTC by yatanaka
Modified: 2024-03-07 12:21 UTC (History)
4 users (show)

Fixed In Version: openstack-tripleo-common-11.7.1-2.20230809225404.e189622.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-07 12:21:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 886179 0 None NEW Only modify the container manifest if the mediaType has changed. 2023-06-28 14:12:46 UTC
OpenStack gerrit 888026 0 None MERGED Fix unassigned new_manifest_type variable 2023-07-11 13:20:21 UTC
Red Hat Issue Tracker OSP-25703 0 None None None 2023-06-09 00:47:08 UTC

Description yatanaka 2023-06-09 00:45:49 UTC
Description of problem:

I configured `AdditionalArchitectures: [ppc64le]` and `tag: '16.2'` in my containers-prepare-parameter.yaml.

~~~
(undercloud) [stack@undercloud ~]$ cat  containers-prepare-parameter.yaml
parameter_defaults:
  AdditionalArchitectures: [ppc64le]
  ContainerImagePrepare:
  - push_destination: true
    excludes:
      - ceph
      - prometheus
    set:
      name_prefix: openstack-
      name_suffix: ''
      namespace: registry.redhat.io/rhosp-rhel8
      neutron_driver: ovn
      rhel_containers: false
      tag: '16.2'
    tag_from_label: '{version}-{release}'
  ContainerImageRegistryCredentials:
    registry.redhat.io:
       :
~~~

Then I ran the following command to pull images from registry.redhat.io and push them to the undercloud registry.
This command succeeded without errors.

~~~
(undercloud) [stack@undercloud ~]$ sudo openstack tripleo container image prepare -e ~/containers-prepare-parameter.yaml
~~~

`openstack tripleo container image list` shows all containers correctly.

~~~
(undercloud) [stack@undercloud ~]$ openstack tripleo container image list |grep nova
| docker://undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-conductor:16.2             |
| docker://undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-libvirt:16.2               |
| docker://undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-novncproxy:16.2            |
| docker://undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-api:16.2                   |
| docker://undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-compute-ironic:16.2        |
| docker://undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-compute:16.2               |
| docker://undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-scheduler:16.2             |
~~~

However, podman pull fails with 404 error.

~~~
(undercloud) [stack@undercloud ~]$ podman  pull undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-api:16.2
Trying to pull undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-api:16.2...
  StatusCode: 404, <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">...
Error: Error initializing source docker://undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-api:16.2: Error reading manifest 16.2 in undercloud.ctlplane.yatanaka.example.com:8787/rhosp-rhel8/openstack-nova-api: StatusCode: 404, <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">...
~~~

Curl command fails with 404 as well.

~~~
(undercloud) [stack@undercloud ~]$ curl http://undercloud.ctlplane.yatanaka.example.com:8787/v2/rhosp-rhel8/openstack-nova-api/manifests/16.2 -i
HTTP/1.1 404 Not Found
Date: Fri, 09 Jun 2023 00:35:06 GMT
Server: Apache/2.4.37 (Red Hat Enterprise Linux)
Vary: accept
Content-Length: 196
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>
~~~

The reason why it returns 404 is that one of the manifest files doesn't exist under /var/lib/image-serve/v2/rhosp-rhel8 for some reason.

~~~
(undercloud) [stack@undercloud ~]$ cat /var/lib/image-serve/v2/rhosp-rhel8/openstack-nova-api/manifests/16.2.type-map 
URI: 16.2

Content-Type: application/vnd.docker.distribution.manifest.list.v2+json
URI: sha256:5fae92dbe76af9ce17d3c9909440e62a940128b1609a1f5a52221cc9880f7aa6/index.json

Content-Type: application/vnd.docker.distribution.manifest.v2+json
URI: sha256:4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2/index.json


(undercloud) [stack@undercloud ~]$ cat /var/lib/image-serve/v2/rhosp-rhel8/openstack-nova-api/manifests/sha256\:5fae92dbe76af9ce17d3c9909440e62a940128b1609a1f5a52221cc9880f7aa6/index.json
{
   "manifests": [
      {
         "digest": "sha256:4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2",
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         },
         "size": 1091
      },
      {
         "digest": "sha256:def71eb21fca7fab86c0a815940fe6a5424f5bff3ea10a259ba7b872795cb832",
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         },
         "size": 1091
      }
   ],
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "schemaVersion": 2
}


(undercloud) [stack@undercloud ~]$ cat /var/lib/image-serve/v2/rhosp-rhel8/openstack-nova-api/manifests/sha256\:4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2/index.json
cat: '/var/lib/image-serve/v2/rhosp-rhel8/openstack-nova-api/manifests/sha256:4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2/index.json': No such file or directory

  ====> This file doesn't exist for some reason.
~~~

I modified containers-prepare-parameter.yaml several times and I noticed that this issue happens with the following conditions.
It is interesting that this issue doesn't occur when we set `tag: '16.2.2' while this issue occurs when we set `tag: 16.2.5`.

- `AdditionalArchitectures: [ppc64le]` and `tag: '16.2'`   are set : This issue occurs.
- `AdditionalArchitectures: [ppc64le]` and `tag: '16.2.5'` are set : This issue occurs.
- `AdditionalArchitectures: [ppc64le]` is set `tag` is not set     : This issue occurs.
- `AdditionalArchitectures: [ppc64le]` and `tag: '16.2.2'` are set : This issue does NOT occur.
- `AdditionalArchitectures` is not set and `tag: '16.2'`   is  set : This issue does NOT occur.

This issue causes failure of `openstack undercloud upgrade`.
My customer is using both x86_64 and ppc64le architecture and testing update from RHOSP 16.2.2 to RHOSP 16.2.5.
During this update, they modified `tag` from `16.2.2` to `16.2` according to our document[1], then the customer hit this issue.
This issue is preventing the customer's update.

I'm looking for workarounds and resolutions.


Version-Release number of selected component (if applicable):
RHOSP 16.2.5

How reproducible:

Steps to Reproduce:
1. Create containers-prepare-parameter.yaml file with  `AdditionalArchitectures: [ppc64le]` and `tag: '16.2'`
2. Run `sudo openstack tripleo container image prepare -e ~/containers-prepare-parameter.yaml`
3. Run `podman pull`


Actual results:
`podman pull` and `openstack undercloud upgrade` succeed.


Expected results:
`podman pull` and `openstack undercloud upgrade` fail with 404 error.


Additional info:
[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/keeping_red_hat_openstack_platform_updated/index#proc_updating-the-container-image-preparation-file_preparing-minor-update

Comment 1 James Slagle 2023-06-09 21:10:51 UTC
This looks like it might be an issue with newlines (or absence) of in the manifests.

For the nova-api image:

16.2.2 (working):

    (Epdb) pp source_manifests[1]
('{\n'                                                                                                                                 
 '   "schemaVersion": 2,\n'
 '   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",\n'                                                           
 '   "config": {\n'                                                                                                                    
 '      "mediaType": "application/vnd.docker.container.image.v1+json",\n'                                                              
 '      "size": 3950,\n'
 '      "digest": '                                                                                                                    
 '"sha256:d918bf36f953cf0f7b7e839c6a82727a7f27192518e40ad8bf01b48aac02a86c"\n'                                                         
 '   },\n'
 '   "layers": [\n'                                                                                                                    
 '      {\n'
<SNIP>

16.2 and 16.2.5 (same tag) (NOT working):

(Epdb) pp source_manifests[1]
'{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","size":29012,"digest":"sha256:884b5d198e00c080612fc240d7f4df98e5eb684836b95135f7966614397f4b81"},"layers":[{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":82259072,"digest":"sha256:7d8746ab4ad8e51637d5a71d17e876a7432c1bb34ba3db89985a0cb9500cd64b"},{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":65628380,"digest":"sha256:dae5741d9411be6640fff62ffaefcf3077e5eb8a60998286bd728acc322c8760"},{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":41926212,"digest":"sha256:77afff4e43dbf2d3795338554f881e34d3dc3d80b009747d4c7e14374ce3020b"},{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":131758913,"digest":"sha256:df7055066620be8600a171dd89a2fc86cf540d6755c5a588afb7c53cd9427709"},{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":57755329,"digest":"sha256:c19267f5f72085dce6690dc156ffaf9913fe3211647441ac4bc34c20b94e19d7"}]}'


This has the effect of changing the manifest digest when we json.loads, then json.dumps with indent=3. We could possibly fix this in the code, but I think it's also worth looking into why the manifests formats have changed. Tagging DFG:RelDel as well.

Comment 2 James Slagle 2023-06-12 18:24:31 UTC
Ideally, I think we need to get the manifests to have a consistent JSON format. I think that would solve this issue.

Otherwise, we could avoid having to modify the manifest as long as detect that the mediaType does *not* need to change from an OCI type to a docker type. That code is here:
https://github.com/openstack/tripleo-common/blob/stable/train/tripleo_common/image/image_uploader.py#L2129
In that function is where we use json.loads/dumps(...,indent=3) to read and parse the manifest and convert it back to JSON. If we do not need to modify anything, than we could use the original value instead of writing it back out in a new format.

Another fix might be to recalculate the digests and use those new values in the type-map file we write out, although that would be more involved.

I've attached a script which illustrates the issue. Running it shows:

[stack@osp16-uc ~]$ TAG=16.2.2 PASSWORD=<pass> ./compare.sh
Fetched SHA256: ba5d1d36bd6f284e33fa02f08438274d2121e79c7a7ac321ffe2eda2242a3b6d                                                       
New SHA256 after JSON reformat: ba5d1d36bd6f284e33fa02f08438274d2121e79c7a7ac321ffe2eda2242a3b6d                                       
SHA256 equal
[stack@osp16-uc ~]$ TAG=16.2.5 PASSWORD=<pass> ./compare.sh                                                                          
Fetched SHA256: 4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2                                                       
New SHA256 after JSON reformat: d8d0976b7b1c0625e981586ffc40e0f0565cef8776cf548dff1041a5490c3eac                                       
SHA256 not equal

Comment 4 Jon Schlueter 2023-06-13 11:27:23 UTC
So with the attached compare.sh it looks like this is happening also for x86_64 images so not limited to ppc64le

Comment 5 Jon Schlueter 2023-06-13 19:10:03 UTC
so a bit of digging it does appear that there are several @sha256 for same image


after several podman pull and then a podman inspect here is a snippit of the details


$ podman pull registry.redhat.io/rhosp-rhel8/openstack-nova-api:16.2.5-5
$ podman pull registry.redhat.io/rhosp-rhel8/openstack-nova-api:16.2.5
$ podman pull registry.redhat.io/rhosp-rhel8/openstack-nova-api:16.2
$ podman pull registry.redhat.io/rhosp-rhel8/openstack-nova-api@sha256:4f2590d533d9edbf64165577745d5f6f6d1ead7a48c454be3df10f0d434de144
$ podman pull registry.redhat.io/rhosp-rhel8/openstack-nova-api@sha256:4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2

$ podman inspect registry.redhat.io/rhosp-rhel8/openstack-nova-api:16.2.5-5

          "NamesHistory": [
               "registry.redhat.io/rhosp-rhel8/openstack-nova-api:16.2.5",
               "registry.redhat.io/rhosp-rhel8/openstack-nova-api:16.2.5-5",
               "registry.redhat.io/rhosp-rhel8/openstack-nova-api@sha256:4f2590d533d9edbf64165577745d5f6f6d1ead7a48c454be3df10f0d434de144",
               "registry.redhat.io/rhosp-rhel8/openstack-nova-api:16.2",
               "registry.redhat.io/rhosp-rhel8/openstack-nova-api@sha256:4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2"
          ]

From this it would appear that all of those tags currently point to the same image, which is expected behavior.

Issue I think is coming from how the tripleo code is inspecting and trying to fetch/pull/infer the details.

I was able to confirm the difference in behavior between the 16.2.2 vs 16.2.5 tags 

$ TAG=16.2.2 bash compare.sh
Fetched SHA256: ba5d1d36bd6f284e33fa02f08438274d2121e79c7a7ac321ffe2eda2242a3b6d                                                                                                                                                              
New SHA256 after JSON reformat: ba5d1d36bd6f284e33fa02f08438274d2121e79c7a7ac321ffe2eda2242a3b6d                                                                                                                                              
SHA256 equal 

$ TAG=16.2.5 bash compare.sh
Fetched SHA256: 4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2
New SHA256 after JSON reformat: d8d0976b7b1c0625e981586ffc40e0f0565cef8776cf548dff1041a5490c3eac
SHA256 not equal


I wonder if this happened to work before because it was already formatted correctly and now this, while a valid sha256 is not anything that the server could ever hope to be able to generate and provide.

FROM compare.sh script:
# Recalculate sha256 after jq format with indent=3
NEW_SHA256=$(curl -s -k -L https://registry.redhat.io/v2/rhosp-rhel8/openstack-nova-api/manifests/sha256:${SHA256} --user jslagle --oauth2-bearer "${TOKEN}" -H "Accept: application/vnd.docker.distribution.manifest.v2+json" | tee manifest-${TAG}-second | jq -j --indent 3 .| sha256sum | awk '{print $1}')


for the 16.2.5 that call returns the following wrapped json blob

{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","size":29012,"digest":"sha256:884b5d198e00c080612fc240d7f4df98e5eb684836b95135f7966614397f4b81"},"layers":[{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":82259072,"digest":"sha256:7d8746ab4ad8e51637d5a71d17e876a7432c1bb34ba3db89985a0cb9500cd64b"},{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":65628380,"digest":"sha256:dae5741d9411be6640fff62ffaefcf3077e5eb8a60998286bd728acc322c8760"},{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":41926212,"digest":"sha256:77afff4e43dbf2d3795338554f881e34d3dc3d80b009747d4c7e14374ce3020b"},{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":131758913,"digest":"sha256:df7055066620be8600a171dd89a2fc86cf540d6755c5a588afb7c53cd9427709"},{"mediaType":"application/vnd.docker.image.rootfs.diff.tar.gzip","size":57755329,"digest":"sha256:c19267f5f72085dce6690dc156ffaf9913fe3211647441ac4bc34c20b94e19d7"}]}

Where as 16.2.2 that call returns the following formatted json blob
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 3950,
      "digest": "sha256:d918bf36f953cf0f7b7e839c6a82727a7f27192518e40ad8bf01b48aac02a86c"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 80425020,
         "digest": "sha256:f57d6c5e8b75cb7b21b85c774989029ada6655418fcb450df9715555de269958"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 1441,
         "digest": "sha256:f58fc8bc088a7e2275a59a08ee52886df8a4c9bece4fec43ac5f65c4a8d21769"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 62838423,
         "digest": "sha256:2a7c1bc2c9dab15b6d27c56fc16e68ba2f7d6508ddf04360d29985c6f8dc7600"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 40220481,
         "digest": "sha256:2d261712f56fea0917e7d2ff49ca1b1b579e5f10432fdf37a7136a90be57bbfd"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 126350566,
         "digest": "sha256:50d4e8d5a81241b8dbcb07e2833c623becf6d4cd1cc9fbf21a3ba4f8639761e3"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 55358257,
         "digest": "sha256:1f58d94f9e0933a8ade63073ba586767445e06ba4e66a98d47c92ff81fddcb3e"
      }
   ]
}

Comment 6 James Slagle 2023-06-15 13:47:38 UTC
I have a partial fix in https://review.opendev.org/c/openstack/tripleo-common/+/886179
That should work if we detect that the mediaType does *not* need to change. It will need to be backported to train/16.

Comment 7 James Slagle 2023-06-15 14:54:27 UTC
With the patch from https://review.opendev.org/c/openstack/tripleo-common/+/886179, 

I get a type-map file that references dirs that exist:

[root@osp1625 manifests]# pwd
/var/lib/image-serve/v2/rhosp-rhel8/openstack-nova-api/manifests
[root@osp1625 manifests]# cat 16.2.5.type-map 
URI: 16.2.5

Content-Type: application/vnd.docker.distribution.manifest.list.v2+json
URI: sha256:4f2590d533d9edbf64165577745d5f6f6d1ead7a48c454be3df10f0d434de144/index.json

Content-Type: application/vnd.docker.distribution.manifest.v2+json
URI: sha256:4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2/index.json

[root@osp1625 manifests]# ls
16.2.5.type-map
sha256:4ca9544805c94b64b0c5c7f3a66f2bc7e2af24d9336edd32f3ae1b180e4ccbd2
sha256:4f2590d533d9edbf64165577745d5f6f6d1ead7a48c454be3df10f0d434de144
sha256:def71eb21fca7fab86c0a815940fe6a5424f5bff3ea10a259ba7b872795cb832


My ContainerImagePrepare looks like:

[stack@osp1625 ~]$ cat cip-nova-api.yaml 
parameter_defaults:
  AdditionalArchitectures: [ppc64le]
  ContainerImagePrepare:
  - push_destination: true
    includes:
      - nova-api
    set:
      name_prefix: openstack-
      name_suffix: ''
      namespace: registry.redhat.io/rhosp-rhel8
      neutron_driver: ovn
      rhel_containers: false
      tag: '16.2.5'
    tag_from_label: '{version}-{release}'
  ContainerImageRegistryCredentials:
    registry.redhat.io:
      jslagle: <password>

Comment 11 yatanaka 2024-02-07 01:32:44 UTC
Hello team, thank you for working on this BZ.

I checked the latest python3-tripleo-common-15.4.1-17.1.20230927010819.el9ost.noarch.rpm package and it seems that the fix for this bug is included in the package.
But the status of this BZ ticket is still "POST" and I'm a bit confused.

Has this bug fixed in the latest release? When will the fix for this bug will be released?

Comment 12 yatanaka 2024-02-07 01:49:38 UTC
I also checked the latest RHOSP 16.2.6 package "python3-tripleo-common-11.7.1-2.20230809225404.e189622.el8ost" and it seems that the fix for this bug is included.
Has this bug fixed in the latest RHOSP 16.2 release?


Note You need to log in before you can comment on or make changes to this bug.