Created attachment 1863246 [details] Command and output from invalid manifest error Description of problem: For a disconnected installation, we attempt to mirror the OCP 4.9.23-s390x content, but consistently fails on image 4.9.23-s390x-machine-os-content with manifest invalid. This is the command we issue to directly push the release images to our local registry: # oc adm -a ${LOCAL_SECRET_JSON} release mirror --from=quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE} --to=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} --to-release-image=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE} --apply-release-image-signature The majority of the images are pushed to our local registry, until the following error occurs: ... sha256:14e66cc7f40e3efba3fd20105ad1f40913d4fd2085ccf04e33cbfeee7127c1b5 bastion:5000/ocp4/openshift4:4.9.23-s390x-kuryr-controller sha256:14e66cc7f40e3efba3fd20105ad1f40913d4fd2085ccf04e33cbfeee7127c1b5 bastion:5000/ocp4/openshift4:4.9.23-s390x-pod sha256:14e66cc7f40e3efba3fd20105ad1f40913d4fd2085ccf04e33cbfeee7127c1b5 bastion:5000/ocp4/openshift4:4.9.23-s390x-vsphere-csi-driver sha256:14e66cc7f40e3efba3fd20105ad1f40913d4fd2085ccf04e33cbfeee7127c1b5 bastion:5000/ocp4/openshift4:4.9.23-s390x-vsphere-csi-driver-operator sha256:14e66cc7f40e3efba3fd20105ad1f40913d4fd2085ccf04e33cbfeee7127c1b5 bastion:5000/ocp4/openshift4:4.9.23-s390x-vsphere-csi-driver-syncer sha256:14e66cc7f40e3efba3fd20105ad1f40913d4fd2085ccf04e33cbfeee7127c1b5 bastion:5000/ocp4/openshift4:4.9.23-s390x-vsphere-problem-detector error: unable to push manifest to bastion:5000/ocp4/openshift4:4.9.23-s390x-machine-os-content: manifest invalid: manifest invalid info: Mirroring completed in 930ms (0B/s) error: one or more errors occurred while uploading images Version-Release number of selected component (if applicable): OCP 4.9.23 How reproducible: Consistently reproducible. Steps to Reproduce: 1. Have a mirror-registry container started and running: # podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f875644decab docker.io/ibmcom/registry-s390x:2.6.2.5 registry serve /e... 4 hours ago Up 2 hours ago 0.0.0.0:5000->5000/tcp mirror-registry 2. Issue the command to push content to this local registry: # oc adm -a ${LOCAL_SECRET_JSON} release mirror --from=quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE} --to=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} --to-release-image=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE} --apply-release-image-signature 3. Fails with error: unable to push manifest to bastion:5000/ocp4/openshift4:4.9.23-s390x-machine-os-content: manifest invalid: manifest invalid Actual results: A manifest invalid error occurs for image file 4.9.23-s390x-machine-os-content. Expected results: All 4.9.23 image files should be pushed to our local registry successfully. Additional info:
We are now seeing the same failure with OCP 4.10.0-rc.5 when performing the mirror. It is the same image but for rc.5 - s390x-machine-os-content. I will attached the output.
Created attachment 1863319 [details] 4.10.0-rc.5 mirror image failure for machine-os-content
We've also re-mirrored the content from previously OCP releases such as OCP 4.8.33, 4.9.22, and 4.10.0-rc.4 successfully. They all worked fine on both KVM and zVM platforms.
After chatting with the team, we are re-assigning this bug to the ART/Release team to investigate further into the mirroring problem to see if this is cross-arch. Please feel free to re-assign back to us if this problem is multi-arch only. Also note that this issue is observed on both 4.9.23 and 4.10-RC.5
While I didn't take this as far on x86, I was able to confirm that all of the new machine-os-content images are manifested differently than their previous release counterparts: This is the output for x86 machine OS content for 4.9.23 > [jpoulin@rock-kvmlp-3 ~]$ podman manifest inspect quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ae92a919cb6da4d1a5d832f8bc486ae92e55bf3814ebab94bf4baa4c4bcde85d > Error: error parsing manifest blob "{\"schemaVersion\":2,\"config\":{\"mediaType\":\"application/vnd.oci.image.config.v1+json\",\"digest\":\"sha256:88c5613cfae9f21dc2db7fc1d00dcf50f522bd3f85a787a1e71d67d53c445d34\",\"size\":3240},\"layers\":[{\"mediaType\":\"application/vnd.oci.image.layer.v1.tar+gzip\",\"digest\":\"sha256:0672ccd2448e808003cfd58868d18c6fbcba3cd02b6868808fefa6e76b61498f\",\"size\":85670741},{\"mediaType\":\"application/vnd.oci.image.layer.v1.tar+gzip\",\"digest\":\"sha256:0c9ea41036996ec83a5f118963d45a6f1f53e56dc8f0c9b8da744138f590b0d2\",\"size\":1879},{\"mediaType\":\"application/vnd.oci.image.layer.v1.tar+gzip\",\"digest\":\"sha256:7cc8c27e4d3ba855252f073c0712443491e7bb460a2e7a0d9536e899fd200b9b\",\"size\":1104943259}],\"annotations\":{\"org.opencontainers.image.base.digest\":\"sha256:cbc1e8cea8c78cfa1490c4f01b2be59d43ddbbad6987d938def1960f64bcd02c\",\"org.opencontainers.image.base.name\":\"registry.access.redhat.com/ubi8/ubi:latest\"}}" as a "application/vnd.oci.image.manifest.v1+json": Treating single images as manifest lists is not implemented This is the output for x86 machine OS content for 4.9.22 >[jpoulin@rock-kvmlp-3 ~]$ podman manifest inspect quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:07666daf9bb9249e666e66b117f2b7ad7ed0cd68c1f9265124c244e80e685482 > WARN[0000] Warning! The manifest type application/vnd.docker.distribution.manifest.v2+json is not a manifest list but a single image. > { > "schemaVersion": 2, > "mediaType": "application/vnd.docker.distribution.manifest.v2+json", > "config": { > "mediaType": "application/vnd.docker.container.image.v1+json", > "size": 3236, > "digest": "sha256:db548dfe0de420165b67a2b2174c2d94a5542e096beeb5219505759aa847d406" > }, > "layers": [ > { > "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", > "size": 85670741, > "digest": "sha256:0672ccd2448e808003cfd58868d18c6fbcba3cd02b6868808fefa6e76b61498f" > }, > { > "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", > "size": 1879, > "digest": "sha256:0c9ea41036996ec83a5f118963d45a6f1f53e56dc8f0c9b8da744138f590b0d2" > }, > { > "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", > "size": 1104753786, > "digest": "sha256:6b1b21084b0cf1c22b954983610a7cd598b46798d0d3d14243b291da59e3f10a" > } > ] >} All arches I've tested are like this for the latest 4.9 (23) and 4.10 (rc5). I believe this will break mirroring as documented.
OK it's possible that something changed in coreos-assembler here - we are using the current Fedora 35 buildah when generating this image. But there's also a big quay.io update that happened around this same time. I just used coreos-assembler to upload a build to quay.io/cgwalters/ostest:35.20220225.dev.0 and I see ``` $ oc image info quay.io/cgwalters/ostest:35.20220225.dev.0 Name: quay.io/cgwalters/ostest:35.20220225.dev.0 Digest: sha256:2c984693ca74168bd9f6e1405b4302322b6808c21a39d1041e16262dfd687de1 Media Type: application/vnd.oci.image.manifest.v1+json Created: 1m ago ``` So...actually, I think what happened here is quay.io now supports OCI natively, and we now push images that way. I think we may need to change coreos-assembler to explicitly use --format=v2s2 for now, until we're ready to switch to OCI by default.
https://github.com/coreos/coreos-assembler/pull/2726
We tried a disconnected installation using the latest 4.10.0-rc.6 build(under both KVM and zVM) that was just released and the same error occurred during mirroring: sha256:2a1588ed0f99e238e284d4dff10f60d180c892941c8e4d95dce4684e8452ed43 bastion:5000/ocp4/openshift4:4.10.0-rc.6-s390x-vsphere-csi-driver-syncer sha256:2a1588ed0f99e238e284d4dff10f60d180c892941c8e4d95dce4684e8452ed43 bastion:5000/ocp4/openshift4:4.10.0-rc.6-s390x-vsphere-problem-detector error: unable to push manifest to bastion:5000/ocp4/openshift4:4.10.0-rc.6-s390x-machine-os-content: manifest invalid: manifest invalid info: Mirroring completed in 5m45.72s (23MB/s) error: one or more errors occurred while uploading images
Raising bug to urgent as this is currently blocking a number of our tests across both 4.9.x and 4.10.x releases.
Hi Colin, is there a backport PR? since we need it for 4.9 and 4.10.
Overriding the blocker status on this bug as per: https://coreos.slack.com/archives/CB95J6R4N/p1646172701323879?thread_ts=1646153089.903709&cid=CB95J6R4N
See https://github.com/coreos/coreos-assembler/issues?q=label%3Abranch%2Frhcos+is%3Aclosed for the backport PRs, but we're still in progress of ensuring that build is deployed.
OK so, while it's a bit ugly, we can manually convert these images to v2s2 via e.g.: `skopeo copy --format=v2s2 docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e1dcc7ebecab4598c4c2a6a5a3d8768ed4546cd4b42b6f1822e2babd8cc864f7 docker://quay.io/someothernamespace/someimage:4.11.X` Then pass `quay.io/someothernamespace/someimage:4.11.X` when generating `machine-os-content` in `oc adm release new` Or perhaps even simpler, disable the machine-os-content promotion job today, and manually re-push the existing image over itself and let release-controller use that. It may also work to change the promotion job to do this conversion.
SUMMARY: I'm going to rephrase and build on https://bugzilla.redhat.com/show_bug.cgi?id=2058421#c7 Docker was invented, and it initially used some older container image schemas that aren't relevant anymore. Until recently, mostly we use what's called "v2s2" for short, or "docker schema 2": https://docs.docker.com/registry/spec/manifest-v2-2/ But for years now there was the standardized OCI format, and it is actually used *by default* by tools such as podman/buildah at least. It's just that when those tools go to push to a registry, a negotiation happens and if the registry doesn't support oci, the image is converted to v2s2. Everything in the OCP payload (until this bug) was v2s2 - I don't think this was an explicit decision, but really a consequence of the lack of OCI registry support. What happened recently is: quay.io deployed OCI support. And this is great, it's what we want to happen! OCI support unlocks a bunch of things, including OCI Artifacts which I am sure we will make use of. Our tools *should* support OCI. It seems that something in the `oc image mirror` path doesn't though - which should be fixed. But still, until we have a handle on the blast radius of OCI, we should go back to v2s2.
Folks, 1. We tested with the RC.7 build and unfortunately we encounter the same manifest invalid issue with the disconnected install mirror registry: sha256:2a1588ed0f99e238e284d4dff10f60d180c892941c8e4d95dce4684e8452ed43 bastion:5000/ocp4/openshift4:4.10.0-rc.7-s390x-vsphere-csi-driver-operator sha256:2a1588ed0f99e238e284d4dff10f60d180c892941c8e4d95dce4684e8452ed43 bastion:5000/ocp4/openshift4:4.10.0-rc.7-s390x-vsphere-csi-driver-syncer sha256:2a1588ed0f99e238e284d4dff10f60d180c892941c8e4d95dce4684e8452ed43 bastion:5000/ocp4/openshift4:4.10.0-rc.7-s390x-vsphere-problem-detector error: unable to push manifest to bastion:5000/ocp4/openshift4:4.10.0-rc.7-s390x-machine-os-content: manifest invalid: manifest invalid info: Mirroring completed in 1m26.84s (91.55MB/s) error: one or more errors occurred while uploading images 2. FYI. The RHCOS build is the same for RC.6 and RC.7:410.84.202202251632-0 Thank you, Kyle
Thank you for the updates - We have successfully mirrored the RC.8 images for disconnected installation to our local registries. These are the images residing on the CI servers. We no longer encounter the manifest invalid error. We will continue running through our installation and upgrade tests using this build and will also verify 4.10.0-rc.8 when it becomes available on quay.io. Thank you, Phil
While trying to execute the test for upgrading 4.9.23 to 4.10.0-rc.8 in disconnected/restricted environment on POWER(ppc64le arch), the below issue is encountered. This happened while installing 4.9.23: sha256:ff709d98d118eb014a0b6f057bc735ff4d041b1ec104c4b68ac267373dfa5299 -> 4.9.23-ppc64le-cluster-authentication-operator", " stats: shared=0 unique=4 size=1006MiB ratio=1.00", "", "phase 0:", " registry.rdr-mani-dis.ibm.com:5000 ocp4/openshift4 blobs=4 mounts=0 manifests=141 shared=0", "", "info: Planning completed in 25.2s", "error: unable to push manifest to registry.rdr-mani-dis.ibm.com:5000/ocp4/openshift4:4.9.23-ppc64le-machine-os-content: manifest invalid: manifest invalid", "info: Mirroring completed in 1.68s (0B/s)", "error: one or more errors occurred while uploading images"] The detailed error log is attached.
Created attachment 1863967 [details] 4.9.23 disconnected failure on Power
We have now successfully completed all our disconnected install and upgrade tests using OCP 4.10.0-rc.8 and RHCOS 410.84.202202251632-0. We used RC.8 from both CI and quay.io. This was covered on both KVM and zVM platforms. Note that the tests for the disconnected upgrade from OCP 4.9, we used OCP 4.9.22 since the latest 4.9.23 currently has the manifest invalid issue.
The fix to the coreos-assembler tooling was landed in https://github.com/coreos/coreos-assembler/pull/2726 We are waiting for new, successful builds of 4.11 across all arches before we can move this to MODIFIED.
Rebuilds of RHCOS 4.11 across all arches have completed and are being pushed to Quay using v2s2 format
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069