Description of problem: When we're using oc-mirror to mirroring olm and ocp releases, the mirror is done successfully and the icsp auto-generated are splitted in 2 namespaces (release and release-images): ``` --- apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: name: release-0 spec: repositoryDigestMirrors: - mirrors: - kubeframe-registry-kubeframe-registry.apps.test-ci.alklabs.com/olm/openshift/release source: quay.io/openshift-release-dev/ocp-v4.0-art-dev - mirrors: - kubeframe-registry-kubeframe-registry.apps.test-ci.alklabs.com/olm/openshift/release-images source: quay.io/openshift-release-dev/ocp-release ``` During the Agent installation, the agent logs show this error: ``` Failed to fetch container images needed for installation from ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b,ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:eae57963e5746607da7ce8070275abd301ab420ce767706d5b36972f64a0e450' ``` From the agent node: The image the validation did manage to pull: ``` [root@localhost core]# podman pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images:4.9.13-x86_64 Trying to pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images:4.9.13-x86_64... Getting image source signatures Copying blob d2d8e7d185cc skipped: already exists Copying blob 47aa3ed2034c skipped: already exists Copying blob eac1b95df832 skipped: already exists Copying blob 24e8d2e218b0 [--------------------------------------] 0.0b / 0.0b Copying blob f43eacb86576 [--------------------------------------] 0.0b / 0.0b Copying blob c67baba1e9cb [--------------------------------------] 0.0b / 0.0b Copying config ac96e0ee77 done Writing manifest to image destination Storing signatures ac96e0ee77a3534df5707c4e42453c3ded8ce83bb9ccb39e6c31705aed73739a ``` The image that failed the validation: ``` [root@localhost core]# podman pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b Trying to pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b... Error: Error initializing source docker://ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b: Error reading manifest sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b in ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images: manifest unknown: manifest unknown ``` This is what we have on the node: ``` [root@localhost core]# cat /etc/containers/registries.conf unqualified-search-registries = ["registry.access.redhat.com", "docker.io", "ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com"] [[registry]] prefix = "" location = "quay.io/openshift-release-dev/ocp-release" mirror-by-digest-only = true [[registry.mirror]] location = "ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images" [[registry]] prefix = "" location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev" mirror-by-digest-only = true [[registry.mirror]] location = "ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release" [[registry]] prefix = "" location = "quay.io/jparrill/registry" mirror-by-digest-only = false [[registry.mirror]] location = "ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/jparrill/registry" ``` This is the only tag in release-images: ``` [root@localhost core]# curl -uYYY:XXXXX -sk https://ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/v2/olm/openshift/release-images/tags/list?n=1000 | jq { "name": "olm/openshift/release-images", "tags": [ "4.9.13-x86_64" ] } ``` It seems that all the release images are actually here: ``` curl -uYYY:XXX -sk https://ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/v2/olm/openshift/release/tags/list?n=1000 | jq .tags | wc -l 143 ``` Sooo, this works: ``` [root@localhost core]# podman pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b Trying to pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b... Getting image source signatures Copying blob 47aa3ed2034c skipped: already exists Copying blob 24e8d2e218b0 skipped: already exists Copying blob f43eacb86576 skipped: already exists Copying blob eac1b95df832 skipped: already exists Copying blob 55006bb1219d done Copying config 28ea52b98c done Writing manifest to image destination Storing signatures 28ea52b98c63aa5dd899d67bf267a3b7dd623f5a694b97a56793bb12597e2de9 ``` I guess the issue is with how the assisted-installe decides what images to use I don't get why assisted is telling the agent to pull the image from the mirror and not passing what it get from oc: ``` [root@localhost core]# oc adm release info --image-for=machine-config-operator ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images:4.9.13-x86_64 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b [root@localhost core]# podman pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b... Getting image source signatures Copying blob 55006bb1219d skipped: already exists Copying blob 47aa3ed2034c skipped: already exists Copying blob 24e8d2e218b0 skipped: already exists Copying blob f43eacb86576 skipped: already exists Copying blob eac1b95df832 [--------------------------------------] 0.0b / 0.0b Copying config 28ea52b98c done Writing manifest to image destination Storing signatures 28ea52b98c63aa5dd899d67bf267a3b7dd623f5a694b97a56793bb12597e2de9 ``` The oc version in the assisted-service is returning the wrong image: ``` bash-4.4$ oc adm release info --registry-config=/data/ps --image-for=machine-config-operator ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images:4.9.13-x86_64 ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b ``` Version-Release number of selected component (if applicable): How reproducible: Basically the same flow followed with the commands described in the description of the problem Steps to Reproduce: 1. Mirror ocp release (w/e you want) with oc-mirror command to generate the icsp automatically splitted in 2 namespaces (shown before) 2. Deploy ACM + AI 3. Deploy Spokes nodes using BMH workflow 4. The agent should show the same error trying to fetch the release image (if icsp are splitted in 2 like is doing oc-mirror) Actual results: ``` Failed to fetch container images needed for installation from ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b,ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:eae57963e5746607da7ce8070275abd301ab420ce767706d5b36972f64a0e450' ``` Expected results: Should fetch the ocp release image in order to pass the validations which are required to start the spoke installation Additional info:
In essence, the issue is this: 1. The assisted-service is using an unsupported version of oc in order due to https://bugzilla.redhat.com/show_bug.cgi?id=1823143 (oc extract isn't aware of ICSP) 2. The oc mirror command recently changed and now the release-images are mirrored to another namespace (different than the actual release image namespace e.g. `olm/openshift/release-images` vs `olm/openshift/release`) 3. The oc binary in the assisted isn't aware of that change and the oc image-for command is returning the wrong namespace. So, once the actual fix for https://bugzilla.redhat.com/show_bug.cgi?id=1823143 get merged we can start using official oc and this new issue should be resolved as well.
Hi @mfilanov, the OCP bug looks like it's targeting 4.11 so would we not fix this in a ACM 2.5.z stream instead of 2.6? Is there a workaround that should be documented for ACM 2.5?
@njean Eran already opened a PR to fix it so it will be merged into 2.5
While we wait for the actual fix in upstream oc https://github.com/openshift/oc/pull/829 to merge I posted a PR with a temp fix https://github.com/openshift/assisted-service/pull/3598 I get's the job done but a pain to configure downstream. If we can't get the upstream fix to merge in time for 2.5.1 we will deliver the temp fix. About what to document, we should say that disconnected AI deployments require creating the mirror registry using `oc` and not `oc-mirror` (until this issue is resolved)
Yes, let's deliver the temp fix to 2.5
UPDATE: the actual fix is merged to assisted-service master - https://github.com/openshift/assisted-service/pull/3657 Since there is no official oc release with this code we are getting it from a nightly build - https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/4.11.0-0.nightly-2022-04-08-205307/openshift-client-linux.tar.gz Since this is a change to the dockerfile and due to the lack of oc RPM in order to get this to the assisted-service downstream image we need to update the assisted-service image to use upstream oc client in ocm-2.5 - https://github.com/stolostron/backlog/issues/21629 This issue shouldn't be on QA yet since it's not really backported (The change to the Dockerfile doesn't effect the downstream build)
It seems that upstream oc doesn't have the fix, the temp fix isn't working either. This issue will not be resolved in 2.5. we need to add documentation on how to mitigate it. The user should create the mirror registry using `oc adm release mirror` I'm not sure if we should support mirror created with oc-mirror and how
Hi @ercohen , the latest comment in https://bugzilla.redhat.com/show_bug.cgi?id=1823143 suggests that bug is now fixed. In your comment back on 3/30, Bug 1823143 was essentially a pre-req to resolve this particular issue but do you now have some other dependency on the oc upstream (ref to your comment above on 4/26) that prevents this issue from being resolved in the RHACM 2.5 release? Is this a regression from earlier releases or a new problem? Thx.
njean, here is some background: Installing a cluster using the assisted service (AKA infrastructure operator) in a disconnected environment requires creating a mirror registry with the release payload. In the past year the assisted service was using a custom build of oc client that enabled the installation in a disconnected env. Why the assisted service needs oc? 1. The assisted-service is using oc for extracting the installer binary, used for generating manifests and ignition configs oc adm release extract --command=openshift-install). 2. The assisted-service also use oc for getting the MCO image for the release (used for image-availability validation and for applying the ignition config on the live ISO) Assisted-service is using a custom oc because upstream oc doesn't respect ICSP (Bug 1823143) so in disconnected environment the assisted-service pod is failing to get the installer binary and the installation can't start. Note that this custom oc doesn't really enable ICSP but it makes assumptions about where to look for the mirrored image (and so far it worked for the assisted-service use case) The bug we see in this BZ (Bug 2069976) is that the assisted-service oc is returning the wrong image for the MCO. The issue relates to the fact that the mirror was created using the new oc-mirror tool, which splits the mirrored images to 2 repos release (for the release payload) and release-images (for the release image itself) - https://github.com/openshift/oc-mirror/pull/343 About a week ago Bug 1823143 moved to QE and I updated the assisted-service to use upstream oc (that had the PR that enables ICSP). I tested it on amorgant environment and it seemed that the issue described in this BZ is fixed. But not long after it turned out that the oc update actually cause a regression in a real disconnected environment, so the oc update was reverted and the assisted-service is using the same old custom oc build. Bug 1823143 also failed on WE validation and requires more work, once it's verified I'll update the assisted-service to use it (and test it better). So in short this bug isn't a regression, it's a new problem that creats a limitation about how the user should create the mirror registry for the installation.
@ercohen , thank-you for the excellent background and summary. RHACM 2.5 will code freeze May 12th, so that's the remaining window to fix anything. Do you think you'd be able to update your instance of oc to adapt to the new way images are mirrored, or is there some workaround that needs to be doc'd?
I'll try to update the custom oc the assisted use to adapt to the new way images are mirrored, I'll update once I have something working (that isn't causing regression)
I created a patch that should solve this issue and it pass the disconnected regression tests. The code is merged to master and I created a merge request to update the downstream dockerfile. I'll move the BZ to QE once it's part of downstream acm
Thanks Eran!
Validated fix using MCE ds snapshot 2.0.0-DOWNANDBACK-2022-05-04-15-49-55 Validated both `oc adm release mirror` and `oc-mirror` method of mirroring ocp releases is working (Deployed spoke node from each successfully) ############################ # oc-mirror validation steps ############################ ## Mirrored latest stable 4.10 release using latest oc-mirror (built from master source) (1) Create imageset for oc-mirror cat /root/oc-mirror_working/oc-mirror/imageset.yml apiVersion: mirror.openshift.io/v1alpha2 kind: ImageSetConfiguration mirror: platform: channels: - name: stable-4.10 (2) Mirror release to local registry with oc-mirror ./bin/oc-mirror --dir=/var/tmp/build --config imageset.yml --dest-skip-tls docker://registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest (3) Apply oc-mirror generated icsp (example is below) cat imageContentSourcePolicy.yaml --- apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: name: release-0 spec: repositoryDigestMirrors: - mirrors: - registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release-images source: quay.io/openshift-release-dev/ocp-release - mirrors: - registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release source: quay.io/openshift-release-dev/ocp-v4.0-art-dev (4) Update 4.10 clusterimageset to use mirrored release (release url is displayed in oc-mirror output ) registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release-images:4.10.11-x86_64 (5) Remove image-policy-0 icsp if it exists as it will conflict (6) Edit assisted service mirror-registry configmap - Remove entries that match image-policy-0 - Add entries that match oc-mirror generated icsp (7) Delete assisted-service pod to recreate it (Possibly not necessary) (8) Deployed spoke cluster successfully using 4.10 clusterimageset (or whichever clusterimageset contains the oc-mirror mirrored release) (9) Also confirmed - # Assisted internal oc adm release info can return a correct release payload image [root@sealusa11 ~]# oc get clusterimageset NAME RELEASE 4.10 registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release-images:4.10.11- [root@sealusa11 ~]# oc rsh assisted-service-66bfdf6785-9h5pv Defaulted container "assisted-service" out of: assisted-service, postgres sh-4.4$ oc adm release info --image-for=machine-config-operator registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release-images:4.10.11-x86_64 registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release@sha256:8e98944cb0a7fe849f310964aadac4c06afb577dd87ce62d0b609102a32ff05b # Release payload image can be pulled with podman podman pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release@sha256:8e98944cb0a7fe849f310964aadac4c06afb577dd87ce62d0b609102a32ff05b Trying to pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release@sha256:8e98944cb0a7fe849f310964aadac4c06afb577dd87ce62d0b609102a32ff05b... Getting image source signatures Copying blob 39382676eb30 done Copying blob 9ef334919d82 [==================>-------------------] 8.4MiB / 16.5MiB Copying blob 237bfbffb5f2 [===>----------------------------------] 8.3MiB / 79.5MiB